[jira] [Created] (FLINK-23868) JobExecutionResult printed event if suppressSysout is on
Paul Lin created FLINK-23868: Summary: JobExecutionResult printed event if suppressSysout is on Key: FLINK-23868 URL: https://issues.apache.org/jira/browse/FLINK-23868 Project: Flink Issue Type: Bug Components: Client / Job Submission Affects Versions: 1.13.2, 1.14.0 Reporter: Paul Lin Environments prints job execution results to stdout by default and provided a flag `suppressSysout` to disable the behavior. This flag is useful when submitting jobs through REST API or by programmatic approaches. However, JobExecutionResult is still printed when this flag is on, which looks like a bug to me. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Configuring flink-python in IntelliJ
Hi Dian, thank you for the explanation. I guess this is, at least as far as we know, as good as it gets for an IntelliJ setup for all of Flink. I'll put up a pull request to update the IDE setup guide, but also point out that for specific work on PyFlink, a separate setup with PyCharm is recommended. Best Ingo On Wed, Aug 18, 2021 at 4:20 PM Dian Fu wrote: > Hi Ingo, > > There are both Java/Python source codes in PyFlink and I use two IDEs at > the same time: IntelliJ IDEA & PyCharm. > > Regarding ONLY using IntelliJ IDEA for both Python & Java development, I > have tried it and it works just as you said. I have done the following: > - Install Python Plugin > - Mark the module flink-python as a Python module: right click > "flink-python" -> "Open Module Settings", change the Module SDK to Python > - Click file setup.py under flink-python, then it will promote to install a > few Python libraries, just install them > - Run the Python tests under sub-directories of flink-python/pyflink > > Regarding “a) Step (3) will be undone on every Maven import since Maven > sets the SDK for the module”, this has not occurred in my environment for > now. Are there any differences for you regarding the above steps? > > PS: For me, I will still use both IDEs for PyFlink development as it > contains both Java & Python source code under flink-python directory and it > doesn't support configuring both Python & Java SDK for it. However, if > one works > with flink-python occasionally, using only IntelliJ IDEA is a good approach > and should be enough. > > Regards, > Dian > > On Wed, Aug 18, 2021 at 7:33 PM Ingo Bürk wrote: > > > Hi Dian, > > > > thanks for responding! No, I haven't tried, but I'm aware of it. I don't > > work exclusively on PyFlink, but rather just occasionally. So instead of > > having a whole separate IDE and project for one module I'm trying to get > > the whole Flink project to work as one in IntelliJ. Do PyFlink developers > > generally work solely on PyFlink and nothing else / do you switch IDEs > for > > everything? > > > > > > Best > > Ingo > > > > On Wed, Aug 18, 2021 at 1:20 PM Dian Fu wrote: > > > > > Hi Ingo, > > > > > > Thanks a lot for starting up this discussion. I have not tried IntelliJ > > and > > > I’m using PyCharm for PyFlink development. Have you tried this > guideline? > > > [1] > > > It’s written in 1.9 and I think it should still work. > > > > > > Regards, > > > Dian > > > > > > [1] > > > > > > > > > https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/flinkdev/ide_setup/#pycharm > > > > > > On Wed, Aug 18, 2021 at 5:38 PM Till Rohrmann > > > wrote: > > > > > > > Thanks for starting this discussion Ingo. I guess that Dian can > > probably > > > > answer this question. I am big +1 for updating our documentation for > > how > > > to > > > > set up the IDE since this problem will probably be encountered a > couple > > > of > > > > times. > > > > > > > > Cheers, > > > > Till > > > > > > > > On Wed, Aug 18, 2021 at 11:03 AM Ingo Bürk > wrote: > > > > > > > > > Hello @dev, > > > > > > > > > > like probably most of you, I am using IntelliJ to work on Flink. > > > Lately I > > > > > needed to also work with flink-python, and thus was wondering about > > how > > > > to > > > > > properly set it up to work in IntelliJ. So far, I have done the > > > > following: > > > > > > > > > > 1. Install the Python plugin > > > > > 2. Set up a custom Python SDK using a virtualenv > > > > > 3. Configure the flink-python module to use this Python SDK (rather > > > than > > > > a > > > > > Java SDK) > > > > > 4. Install as many of the Python dependencies as possible when > > IntelliJ > > > > > prompted me to do so > > > > > > > > > > This got me into a mostly working state, for example I can run > tests > > > from > > > > > the IDE, at least the ones I tried so far. However, there are two > > > > concerns: > > > > > > > > > > a) Step (3) will be undone on every Maven import since Maven sets > the > > > SDK > > > > > for the module > > > > > b) Step (4) installed most, but a couple of Python dependencies > could > > > not > > > > > be installed, though so far that didn't cause noticeable problems > > > > > > > > > > I'm wondering if there are additional / different steps to do, > > > > specifically > > > > > for (a), and maybe how the PyFlink developers are configuring this > in > > > > their > > > > > IDE. The IDE setup guide didn't seem to contain information about > > that > > > > > (only about separately setting up this module in PyCharm). Ideally, > > we > > > > > could even update the IDE setup guide. > > > > > > > > > > > > > > > Best > > > > > Ingo > > > > > > > > > > > > > > >
Re: [DISCUSS] Merge FLINK-23757 after feature freeze
Sounds good to me. Thank you both! Thank you~ Xintong Song On Thu, Aug 19, 2021 at 12:34 PM Ingo Bürk wrote: > Thanks everyone, and especially Dian, > > the PR was a draft because originally the task was for all JSON methods. > I've now split it to only refer to those which are merged for 1.14 already, > and converted the PR to a normal one. Dian kindly offered to review and > merge it. > > > Best > Ingo > > On Thu, Aug 19, 2021, 04:24 Dian Fu wrote: > >> Hi Xintong, >> >> I can help review the PR. >> >> Regards, >> Dian >> >> > 2021年8月19日 上午9:48,Xintong Song 写道: >> > >> > Thanks all for the discussion. >> > >> > Quick question for @Ingo: >> > When do you think the PR will be ready (given that it's still a draft >> now), >> > and who would review it? >> > >> > Thank you~ >> > >> > Xintong Song >> > >> > >> > >> > On Wed, Aug 18, 2021 at 10:27 PM Dian Fu wrote: >> > >> >> The risk should be very limited and it should not affect other parts >> of the >> >> functionality. So I'm also in favour of merging it. >> >> >> >> Regards, >> >> Dian >> >> >> >> On Wed, Aug 18, 2021 at 8:07 PM Till Rohrmann >> >> wrote: >> >> >> >>> @Dian Fu could you assess how involved this >> >>> change is? If the change is not very involved and the risk is limited, >> >> then >> >>> I'd be in favour of merging it because feature parity of APIs is quite >> >>> important for our users. >> >>> >> >>> Cheers, >> >>> Till >> >>> >> >>> On Wed, Aug 18, 2021 at 1:46 PM Ingo Bürk wrote: >> >>> >> Hello dev, >> >> I was wondering whether we could also consider merging >> FLINK-23757[1][2] >> after the freeze. This is about exposing two built-in functions >> which we >> added to Table API & SQL prior to the freeze also for PyFlink. >> Meaning >> that >> the feature itself isn't new, we only expose it on the Python API, >> and >> >> as >> such it's also entirely isolated from the rest of PyFlink and Flink >> itself. >> As such I'm not sure this is considered a new feature, but I'd rather >> >> ask. >> The main motivation for this would be to retain parity on the APIs. >> Thanks! >> >> [1] https://issues.apache.org/jira/browse/FLINK-23757 >> [2] https://github.com/apache/flink/pull/16874 >> >> >> Best >> Ingo >> >> >>> >> >> >> >>
[jira] [Created] (FLINK-23867) FlinkKafkaInternalProducerITCase.testCommitTransactionAfterClosed fails with IllegalStateException
Xintong Song created FLINK-23867: Summary: FlinkKafkaInternalProducerITCase.testCommitTransactionAfterClosed fails with IllegalStateException Key: FLINK-23867 URL: https://issues.apache.org/jira/browse/FLINK-23867 Project: Flink Issue Type: Bug Components: Connectors / Kafka Affects Versions: 1.13.2 Reporter: Xintong Song Fix For: 1.13.3 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=22465&view=logs&j=72d4811f-9f0d-5fd0-014a-0bc26b72b642&t=c1d93a6a-ba91-515d-3196-2ee8019fbda7&l=6862 {code} Aug 18 23:20:14 [ERROR] Tests run: 10, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 51.905 s <<< FAILURE! - in org.apache.flink.streaming.connectors.kafka.FlinkKafkaInternalProducerITCase Aug 18 23:20:14 [ERROR] testCommitTransactionAfterClosed(org.apache.flink.streaming.connectors.kafka.FlinkKafkaInternalProducerITCase) Time elapsed: 7.848 s <<< ERROR! Aug 18 23:20:14 java.lang.Exception: Unexpected exception, expected but was Aug 18 23:20:14 at org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:28) Aug 18 23:20:14 at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) Aug 18 23:20:14 at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) Aug 18 23:20:14 at java.util.concurrent.FutureTask.run(FutureTask.java:266) Aug 18 23:20:14 at java.lang.Thread.run(Thread.java:748) Aug 18 23:20:14 Caused by: java.lang.AssertionError: The message should have been successfully sent expected null, but was: Aug 18 23:20:14 at org.junit.Assert.fail(Assert.java:88) Aug 18 23:20:14 at org.junit.Assert.failNotNull(Assert.java:755) Aug 18 23:20:14 at org.junit.Assert.assertNull(Assert.java:737) Aug 18 23:20:14 at org.apache.flink.streaming.connectors.kafka.FlinkKafkaInternalProducerITCase.getClosedProducer(FlinkKafkaInternalProducerITCase.java:228) Aug 18 23:20:14 at org.apache.flink.streaming.connectors.kafka.FlinkKafkaInternalProducerITCase.testCommitTransactionAfterClosed(FlinkKafkaInternalProducerITCase.java:177) Aug 18 23:20:14 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Aug 18 23:20:14 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) Aug 18 23:20:14 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) Aug 18 23:20:14 at java.lang.reflect.Method.invoke(Method.java:498) Aug 18 23:20:14 at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) Aug 18 23:20:14 at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) Aug 18 23:20:14 at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) Aug 18 23:20:14 at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) Aug 18 23:20:14 at org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:19) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Merge FLINK-23757 after feature freeze
Thanks everyone, and especially Dian, the PR was a draft because originally the task was for all JSON methods. I've now split it to only refer to those which are merged for 1.14 already, and converted the PR to a normal one. Dian kindly offered to review and merge it. Best Ingo On Thu, Aug 19, 2021, 04:24 Dian Fu wrote: > Hi Xintong, > > I can help review the PR. > > Regards, > Dian > > > 2021年8月19日 上午9:48,Xintong Song 写道: > > > > Thanks all for the discussion. > > > > Quick question for @Ingo: > > When do you think the PR will be ready (given that it's still a draft > now), > > and who would review it? > > > > Thank you~ > > > > Xintong Song > > > > > > > > On Wed, Aug 18, 2021 at 10:27 PM Dian Fu wrote: > > > >> The risk should be very limited and it should not affect other parts of > the > >> functionality. So I'm also in favour of merging it. > >> > >> Regards, > >> Dian > >> > >> On Wed, Aug 18, 2021 at 8:07 PM Till Rohrmann > >> wrote: > >> > >>> @Dian Fu could you assess how involved this > >>> change is? If the change is not very involved and the risk is limited, > >> then > >>> I'd be in favour of merging it because feature parity of APIs is quite > >>> important for our users. > >>> > >>> Cheers, > >>> Till > >>> > >>> On Wed, Aug 18, 2021 at 1:46 PM Ingo Bürk wrote: > >>> > Hello dev, > > I was wondering whether we could also consider merging > FLINK-23757[1][2] > after the freeze. This is about exposing two built-in functions which > we > added to Table API & SQL prior to the freeze also for PyFlink. Meaning > that > the feature itself isn't new, we only expose it on the Python API, and > >> as > such it's also entirely isolated from the rest of PyFlink and Flink > itself. > As such I'm not sure this is considered a new feature, but I'd rather > >> ask. > The main motivation for this would be to retain parity on the APIs. > Thanks! > > [1] https://issues.apache.org/jira/browse/FLINK-23757 > [2] https://github.com/apache/flink/pull/16874 > > > Best > Ingo > > >>> > >> > >
[jira] [Created] (FLINK-23866) KafkaTransactionLogITCase.testGetTransactionsToAbort fails with IllegalStateException
Xintong Song created FLINK-23866: Summary: KafkaTransactionLogITCase.testGetTransactionsToAbort fails with IllegalStateException Key: FLINK-23866 URL: https://issues.apache.org/jira/browse/FLINK-23866 Project: Flink Issue Type: Bug Components: Connectors / Kafka Affects Versions: 1.14.0 Reporter: Xintong Song Fix For: 1.14.0 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=22463&view=logs&j=c5f0071e-1851-543e-9a45-9ac140befc32&t=15a22db7-8faa-5b34-3920-d33c9f0ca23c&l=7098 {code} Aug 18 23:14:24 [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 67.32 s <<< FAILURE! - in org.apache.flink.connector.kafka.sink.KafkaTransactionLogITCase Aug 18 23:14:24 [ERROR] testGetTransactionsToAbort Time elapsed: 22.35 s <<< ERROR! Aug 18 23:14:24 java.lang.IllegalStateException: You can only check the position for partitions assigned to this consumer. Aug 18 23:14:24 at org.apache.kafka.clients.consumer.KafkaConsumer.position(KafkaConsumer.java:1717) Aug 18 23:14:24 at org.apache.kafka.clients.consumer.KafkaConsumer.position(KafkaConsumer.java:1684) Aug 18 23:14:24 at org.apache.flink.connector.kafka.sink.KafkaTransactionLog.hasReadAllRecords(KafkaTransactionLog.java:144) Aug 18 23:14:24 at org.apache.flink.connector.kafka.sink.KafkaTransactionLog.getTransactionsToAbort(KafkaTransactionLog.java:133) Aug 18 23:14:24 at org.apache.flink.connector.kafka.sink.KafkaTransactionLogITCase.testGetTransactionsToAbort(KafkaTransactionLogITCase.java:110) Aug 18 23:14:24 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Aug 18 23:14:24 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) Aug 18 23:14:24 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) Aug 18 23:14:24 at java.lang.reflect.Method.invoke(Method.java:498) Aug 18 23:14:24 at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) Aug 18 23:14:24 at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) Aug 18 23:14:24 at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) Aug 18 23:14:24 at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) Aug 18 23:14:24 at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) Aug 18 23:14:24 at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) Aug 18 23:14:24 at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) Aug 18 23:14:24 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) Aug 18 23:14:24 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) Aug 18 23:14:24 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) Aug 18 23:14:24 at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) Aug 18 23:14:24 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) Aug 18 23:14:24 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) Aug 18 23:14:24 at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) Aug 18 23:14:24 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) Aug 18 23:14:24 at org.testcontainers.containers.FailureDetectingExternalResource$1.evaluate(FailureDetectingExternalResource.java:30) Aug 18 23:14:24 at org.junit.rules.RunRules.evaluate(RunRules.java:20) Aug 18 23:14:24 at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) Aug 18 23:14:24 at org.junit.runners.ParentRunner.run(ParentRunner.java:413) Aug 18 23:14:24 at org.junit.runner.JUnitCore.run(JUnitCore.java:137) Aug 18 23:14:24 at org.junit.runner.JUnitCore.run(JUnitCore.java:115) Aug 18 23:14:24 at org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43) Aug 18 23:14:24 at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) Aug 18 23:14:24 at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) Aug 18 23:14:24 at java.util.Iterator.forEachRemaining(Iterator.java:116) Aug 18 23:14:24 at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) Aug 18 23:14:24 at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) Aug 18 23:14:24 at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) Aug 18 23:14:24 at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) Aug 18 23:14:24
[jira] [Created] (FLINK-23865) Class cast error caused by nested Pojo in legacy outputConversion
zoucao created FLINK-23865: -- Summary: Class cast error caused by nested Pojo in legacy outputConversion Key: FLINK-23865 URL: https://issues.apache.org/jira/browse/FLINK-23865 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 1.13.2 Reporter: zoucao code: {code:java} Table table = tbEnv.fromValues(DataTypes.ROW( DataTypes.FIELD("innerPojo", DataTypes.ROW(DataTypes.FIELD("c", STRING(, DataTypes.FIELD("b", STRING()), DataTypes.FIELD("a", INT())), Row.of(Row.of("str-c"), "str-b", 1)); DataStream pojoDataStream = tbEnv.toAppendStream(table, Pojo.class); - public static class Pojo{ public InnerPojo innerPojo; public String b; public int a; public Pojo() { } } public static class InnerPojo { public String c; public InnerPojo() { } }{code} error: {code:java} java.lang.ClassCastException: org.apache.flink.table.types.logical.IntType cannot be cast to org.apache.flink.table.types.logical.RowTypejava.lang.ClassCastException: org.apache.flink.table.types.logical.IntType cannot be cast to org.apache.flink.table.types.logical.RowType at org.apache.flink.table.planner.sinks.TableSinkUtils$$anonfun$1.apply(TableSinkUtils.scala:163) at org.apache.flink.table.planner.sinks.TableSinkUtils$$anonfun$1.apply(TableSinkUtils.scala:155) {code} The fields of PojoTypeInfo is in the alphabet order, such that in `expandPojoTypeToSchema`, 'pojoType' and 'queryLogicalType' should have own index,but now we use the pojo field index to get 'queryLogicalType', this will casue the field type mismatch. It should be fixed like : {code:java} val queryIndex = queryLogicalType.getFieldIndex(name) val nestedLogicalType = queryLogicalType.getFields()(queryIndex).getType.asInstanceOf[RowType]{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Merge FLINK-23757 after feature freeze
Hi Xintong, I can help review the PR. Regards, Dian > 2021年8月19日 上午9:48,Xintong Song 写道: > > Thanks all for the discussion. > > Quick question for @Ingo: > When do you think the PR will be ready (given that it's still a draft now), > and who would review it? > > Thank you~ > > Xintong Song > > > > On Wed, Aug 18, 2021 at 10:27 PM Dian Fu wrote: > >> The risk should be very limited and it should not affect other parts of the >> functionality. So I'm also in favour of merging it. >> >> Regards, >> Dian >> >> On Wed, Aug 18, 2021 at 8:07 PM Till Rohrmann >> wrote: >> >>> @Dian Fu could you assess how involved this >>> change is? If the change is not very involved and the risk is limited, >> then >>> I'd be in favour of merging it because feature parity of APIs is quite >>> important for our users. >>> >>> Cheers, >>> Till >>> >>> On Wed, Aug 18, 2021 at 1:46 PM Ingo Bürk wrote: >>> Hello dev, I was wondering whether we could also consider merging FLINK-23757[1][2] after the freeze. This is about exposing two built-in functions which we added to Table API & SQL prior to the freeze also for PyFlink. Meaning that the feature itself isn't new, we only expose it on the Python API, and >> as such it's also entirely isolated from the rest of PyFlink and Flink itself. As such I'm not sure this is considered a new feature, but I'd rather >> ask. The main motivation for this would be to retain parity on the APIs. Thanks! [1] https://issues.apache.org/jira/browse/FLINK-23757 [2] https://github.com/apache/flink/pull/16874 Best Ingo >>> >>
Re: [DISCUSS] Merge Kafka-related PRs after feature freeze
Thanks for starting this discussion, Arvid. I'd be fine with merging them, as long as there's no other objections and the PRs are ready for merging on Friday. If this takes more time than that, I'd rather consider it as a distraction for the release stabilization and should be moved to the next release. Thank you~ Xintong Song On Wed, Aug 18, 2021 at 8:59 PM Arvid Heise wrote: > Dear devs, > > we would like to merge these PRs after features freeze: > FLINK-23838: Add FLIP-33 metrics to new KafkaSink [1] > FLINK-23801: Add FLIP-33 metrics to KafkaSource [2] > FLINK-23640: Create a KafkaRecordSerializationSchemas builder [3] > > All of the 3 PRs are smaller quality of life improvements that are purely > implemented in flink-connector-kafka, so the risk in merging them is > minimal in terms of production stability. They also reuse existing test > infrastructure, so I expect little impact on the test stability. > > We are still polishing the PRs and would be ready to merge them on Friday > when the objection period would be over. > > Happy to hear your thoughts, > > Arvid > > [1] https://github.com/apache/flink/pull/16875 > [2] https://github.com/apache/flink/pull/16838 > [3] https://github.com/apache/flink/pull/16783 >
Re: [DISCUSS] Merge FLINK-23757 after feature freeze
Thanks all for the discussion. Quick question for @Ingo: When do you think the PR will be ready (given that it's still a draft now), and who would review it? Thank you~ Xintong Song On Wed, Aug 18, 2021 at 10:27 PM Dian Fu wrote: > The risk should be very limited and it should not affect other parts of the > functionality. So I'm also in favour of merging it. > > Regards, > Dian > > On Wed, Aug 18, 2021 at 8:07 PM Till Rohrmann > wrote: > > > @Dian Fu could you assess how involved this > > change is? If the change is not very involved and the risk is limited, > then > > I'd be in favour of merging it because feature parity of APIs is quite > > important for our users. > > > > Cheers, > > Till > > > > On Wed, Aug 18, 2021 at 1:46 PM Ingo Bürk wrote: > > > >> Hello dev, > >> > >> I was wondering whether we could also consider merging FLINK-23757[1][2] > >> after the freeze. This is about exposing two built-in functions which we > >> added to Table API & SQL prior to the freeze also for PyFlink. Meaning > >> that > >> the feature itself isn't new, we only expose it on the Python API, and > as > >> such it's also entirely isolated from the rest of PyFlink and Flink > >> itself. > >> As such I'm not sure this is considered a new feature, but I'd rather > ask. > >> The main motivation for this would be to retain parity on the APIs. > >> Thanks! > >> > >> [1] https://issues.apache.org/jira/browse/FLINK-23757 > >> [2] https://github.com/apache/flink/pull/16874 > >> > >> > >> Best > >> Ingo > >> > > >
[jira] [Created] (FLINK-23864) The document for Pulsar Source
Yufan Sheng created FLINK-23864: --- Summary: The document for Pulsar Source Key: FLINK-23864 URL: https://issues.apache.org/jira/browse/FLINK-23864 Project: Flink Issue Type: Sub-task Affects Versions: 1.14.0 Reporter: Yufan Sheng Fix For: 1.14.0 The new Pulsar source has been merged into flink master branch. We need a detailed documentation for how to use it on flink. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Merge FLINK-23757 after feature freeze
The risk should be very limited and it should not affect other parts of the functionality. So I'm also in favour of merging it. Regards, Dian On Wed, Aug 18, 2021 at 8:07 PM Till Rohrmann wrote: > @Dian Fu could you assess how involved this > change is? If the change is not very involved and the risk is limited, then > I'd be in favour of merging it because feature parity of APIs is quite > important for our users. > > Cheers, > Till > > On Wed, Aug 18, 2021 at 1:46 PM Ingo Bürk wrote: > >> Hello dev, >> >> I was wondering whether we could also consider merging FLINK-23757[1][2] >> after the freeze. This is about exposing two built-in functions which we >> added to Table API & SQL prior to the freeze also for PyFlink. Meaning >> that >> the feature itself isn't new, we only expose it on the Python API, and as >> such it's also entirely isolated from the rest of PyFlink and Flink >> itself. >> As such I'm not sure this is considered a new feature, but I'd rather ask. >> The main motivation for this would be to retain parity on the APIs. >> Thanks! >> >> [1] https://issues.apache.org/jira/browse/FLINK-23757 >> [2] https://github.com/apache/flink/pull/16874 >> >> >> Best >> Ingo >> >
Re: Common type required when creating TableSchema
Hi Dominik, can you maybe share how you are trying to create the Schema and what you are doing with it afterwards? Best Ingo On Wed, Aug 18, 2021 at 4:08 PM Dominik Wosiński wrote: > Hey, > I've stumbled across a weird behavior and was wondering whether this is > intentional for some reason or the result of a weird bug. So, basically > currently if we want to create *org.apache.flink.table.api.Schema *taht has > one of the types defined as *RAW (*AVRO enum in my case) it's probably not > possible ATM. I've created arrays with names and types, but it's impossible > to create the *Schema* due to the following error: > > *Could not find a common type for arguments: [BIGINT, BIGINT, > RAW('org.test.TestEnum', '...'), INT, STRING, BIGINT]* > > Is that intentional ? If so, is there a way to create a *Schema* that has > *RAW* types ? > > > Thanks, > Cheers, > Dom. >
Re: Configuring flink-python in IntelliJ
Hi Ingo, There are both Java/Python source codes in PyFlink and I use two IDEs at the same time: IntelliJ IDEA & PyCharm. Regarding ONLY using IntelliJ IDEA for both Python & Java development, I have tried it and it works just as you said. I have done the following: - Install Python Plugin - Mark the module flink-python as a Python module: right click "flink-python" -> "Open Module Settings", change the Module SDK to Python - Click file setup.py under flink-python, then it will promote to install a few Python libraries, just install them - Run the Python tests under sub-directories of flink-python/pyflink Regarding “a) Step (3) will be undone on every Maven import since Maven sets the SDK for the module”, this has not occurred in my environment for now. Are there any differences for you regarding the above steps? PS: For me, I will still use both IDEs for PyFlink development as it contains both Java & Python source code under flink-python directory and it doesn't support configuring both Python & Java SDK for it. However, if one works with flink-python occasionally, using only IntelliJ IDEA is a good approach and should be enough. Regards, Dian On Wed, Aug 18, 2021 at 7:33 PM Ingo Bürk wrote: > Hi Dian, > > thanks for responding! No, I haven't tried, but I'm aware of it. I don't > work exclusively on PyFlink, but rather just occasionally. So instead of > having a whole separate IDE and project for one module I'm trying to get > the whole Flink project to work as one in IntelliJ. Do PyFlink developers > generally work solely on PyFlink and nothing else / do you switch IDEs for > everything? > > > Best > Ingo > > On Wed, Aug 18, 2021 at 1:20 PM Dian Fu wrote: > > > Hi Ingo, > > > > Thanks a lot for starting up this discussion. I have not tried IntelliJ > and > > I’m using PyCharm for PyFlink development. Have you tried this guideline? > > [1] > > It’s written in 1.9 and I think it should still work. > > > > Regards, > > Dian > > > > [1] > > > > > https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/flinkdev/ide_setup/#pycharm > > > > On Wed, Aug 18, 2021 at 5:38 PM Till Rohrmann > > wrote: > > > > > Thanks for starting this discussion Ingo. I guess that Dian can > probably > > > answer this question. I am big +1 for updating our documentation for > how > > to > > > set up the IDE since this problem will probably be encountered a couple > > of > > > times. > > > > > > Cheers, > > > Till > > > > > > On Wed, Aug 18, 2021 at 11:03 AM Ingo Bürk wrote: > > > > > > > Hello @dev, > > > > > > > > like probably most of you, I am using IntelliJ to work on Flink. > > Lately I > > > > needed to also work with flink-python, and thus was wondering about > how > > > to > > > > properly set it up to work in IntelliJ. So far, I have done the > > > following: > > > > > > > > 1. Install the Python plugin > > > > 2. Set up a custom Python SDK using a virtualenv > > > > 3. Configure the flink-python module to use this Python SDK (rather > > than > > > a > > > > Java SDK) > > > > 4. Install as many of the Python dependencies as possible when > IntelliJ > > > > prompted me to do so > > > > > > > > This got me into a mostly working state, for example I can run tests > > from > > > > the IDE, at least the ones I tried so far. However, there are two > > > concerns: > > > > > > > > a) Step (3) will be undone on every Maven import since Maven sets the > > SDK > > > > for the module > > > > b) Step (4) installed most, but a couple of Python dependencies could > > not > > > > be installed, though so far that didn't cause noticeable problems > > > > > > > > I'm wondering if there are additional / different steps to do, > > > specifically > > > > for (a), and maybe how the PyFlink developers are configuring this in > > > their > > > > IDE. The IDE setup guide didn't seem to contain information about > that > > > > (only about separately setting up this module in PyCharm). Ideally, > we > > > > could even update the IDE setup guide. > > > > > > > > > > > > Best > > > > Ingo > > > > > > > > > >
Common type required when creating TableSchema
Hey, I've stumbled across a weird behavior and was wondering whether this is intentional for some reason or the result of a weird bug. So, basically currently if we want to create *org.apache.flink.table.api.Schema *taht has one of the types defined as *RAW (*AVRO enum in my case) it's probably not possible ATM. I've created arrays with names and types, but it's impossible to create the *Schema* due to the following error: *Could not find a common type for arguments: [BIGINT, BIGINT, RAW('org.test.TestEnum', '...'), INT, STRING, BIGINT]* Is that intentional ? If so, is there a way to create a *Schema* that has *RAW* types ? Thanks, Cheers, Dom.
Re: Incompatible RAW types in Table API
FYI. I've managed to fix that by switching to using `toDataStream`. It seems to be working fine now. I have created the issue about the UDF though, since it seems to be different issue. Not sure if an issue should be created for `toAppendStream` if this is meant to be deprecated. pon., 9 sie 2021 o 19:33 Timo Walther napisał(a): > Sorry, I meant "will be deprecated in Flink 1.14" > > On 09.08.21 19:32, Timo Walther wrote: > > Hi Dominik, > > > > `toAppendStream` is soft deprecated in Flink 1.13 and will be deprecated > > in Flink 1.13. It uses the old type system and might not match perfectly > > with the other reworked type system in new functions and sources. > > > > For SQL, a lot of Avro classes need to be treated as RAW types. But we > > might address this issue soon and further improve Avro support. > > > > I would suggest to continue this discussion in a JIRA issue. Can you > > also share the code for `NewEvent` and te Avro schema or generated Avro > > class for `Event` for to have a fully reproducible example? > > > > What can help is to explicitly define the types: > > > > E.g. you can also use `DataTypes.of(TypeInformation)` both in > > `ScalarFunction.getTypeInference` and > > `StreamTableEnvironment.toDataStream()`. > > > > I hope this helps. > > > > Timo > > > > On 09.08.21 16:27, Dominik Wosiński wrote: > >> It should be `id` instead of `licence` in the error, I've copy-pasted it > >> incorrectly :< > >> > >> I've also tried additional thing, i.e. creating the ScalarFunction that > >> does mapping of one avro generated enum to additional avro generated > >> enum: > >> > >> @FunctionHint( > >>input = Array( > >> new DataTypeHint(value = "RAW", bridgedTo = classOf[OneEnum]) > >>), > >>output = new DataTypeHint(value = "RAW", bridgedTo = > >> classOf[OtherEnum]) > >> ) > >> class EnumMappingFunction extends ScalarFunction { > >> > >>def eval(event: OneEnum): OtherEnum = {OtherEnum.DEFAULT_VALUE} > >> } > >> > >> This results in the following error: > >> > >> > >> > >> *Invalid argument type at position 0. Data type RAW('org.test.OneEnum', > >> '...') expected but RAW('org.test.OneEnum', '...') passed.* > >> > >> > >> pon., 9 sie 2021 o 15:13 Dominik Wosiński > napisał(a): > >> > >>> Hey all, > >>> > >>> I think I've hit some weird issue in Flink TypeInformation generation. > I > >>> have the following code: > >>> > >>> val stream: DataStream[Event] = ... > >>> tableEnv.createTemporaryView("TableName",stream) > >>> val table = tableEnv > >>> .sqlQuery("SELECT id, timestamp, eventType from TableName") > >>> tableEnvironment.toAppendStream[NewEvent](table) > >>> > >>> In this particual example *Event* is an avro generated class and > >>> *NewEvent > >>> *is just POJO. This is just a toy example so please ignore the fact > that > >>> this operation doesn't make much sense. > >>> > >>> When I try to run the code I am getting the following error: > >>> > >>> > >>> > >>> > >>> > >>> *org.apache.flink.table.api.ValidationException: Column types of query > >>> result and sink for unregistered table do not match.Cause: Incompatible > >>> types for sink column 'licence' at position 0.Query schema: [id: > >>> RAW('org.apache.avro.util.Utf8', '...'), timestamp: BIGINT NOT NULL, > >>> kind: > >>> RAW('org.test.EventType', '...')]* > >>> > >>> *Sink schema: id: RAW('org.apache.avro.util.Utf8', '?'), timestamp: > >>> BIGINT, kind: RAW('org.test.EventType', '?')]* > >>> > >>> So, it seems that the type is recognized correctly but for some reason > >>> there is still mismatch according to Flink, maybe because of > >>> different type > >>> serializer used ? > >>> > >>> Thanks in advance for any help, > >>> Best Regards, > >>> Dom. > >>> > >>> > >>> > >>> > >> > > > >
[jira] [Created] (FLINK-23863) AVRO Raw types do not match for UDFs
Dominik Wosiński created FLINK-23863: Summary: AVRO Raw types do not match for UDFs Key: FLINK-23863 URL: https://issues.apache.org/jira/browse/FLINK-23863 Project: Flink Issue Type: Improvement Components: API / DataStream, Table SQL / API Affects Versions: 1.13.2 Reporter: Dominik Wosiński It seems that for some reason when using UDF, we can't properly use RAW types at least for AVRO generated classes. The following code: ``` {color:#bbb529}@FunctionHint{color}( input = Array( {color:#cc7832}new {color}DataTypeHint(value = {color:#6a8759}"RAW"{color}{color:#cc7832}, {color}bridgedTo = classOf[TestEnum]) ){color:#cc7832}, {color} output = {color:#cc7832}new {color}DataTypeHint(value = {color:#6a8759}"RAW"{color}{color:#cc7832}, {color}bridgedTo = classOf[SecondEnum]) ) {color:#cc7832}class {color}EnumMappingFunction {color:#cc7832}extends {color}ScalarFunction { {color:#cc7832}def {color}{color:#ffc66d}eval{color}({color:#72737a}event{color}: TestEnum): SecondEnum = {SecondEnum.{color:#9876aa}ANOTHER_TEST_VALUE{color}} } ``` Will result in this rather unhelpful error: ``` Caused by: org.apache.flink.table.api.ValidationException: Invalid argument type at position 0. Data type RAW('org.test.TestEnum', '...') expected but RAW('org.test.TestEnum', '...') passed. ``` Perhaps it is due to different type serializer that is generated by Flink SQL and the `DataTypeHint` ? I've created the following minimal reproducible example to make it easier to identify the error https://github.com/Wosin/FlinkRepro. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23862) Race condition while cancelling task during initialization
Roman Khachatryan created FLINK-23862: - Summary: Race condition while cancelling task during initialization Key: FLINK-23862 URL: https://issues.apache.org/jira/browse/FLINK-23862 Project: Flink Issue Type: Bug Components: Runtime / Task Affects Versions: 1.14.0 Reporter: Roman Khachatryan Fix For: 1.14.0 While debugging the recent failures in FLINK-22889, I see that sometimes the operator chain is not closed if the task is cancelled while it's being initialized. The reason is that on restore(), cleanUpInvoke() is only closed if there was an exception, including CancelTaskException. The latter is only thrown if StreamTask.canceled is set, i.e. TaskCanceler has called StreamTask.cancel(). So if StreamTask is cancelled in between restore and normal invoke then it may not close the operator chain and not do other cleanup. One solution is to make StreamTask.cleanup visible to and called from Task. cc: [~akalashnikov], [~pnowojski] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[DISCUSS] Merge Kafka-related PRs after feature freeze
Dear devs, we would like to merge these PRs after features freeze: FLINK-23838: Add FLIP-33 metrics to new KafkaSink [1] FLINK-23801: Add FLIP-33 metrics to KafkaSource [2] FLINK-23640: Create a KafkaRecordSerializationSchemas builder [3] All of the 3 PRs are smaller quality of life improvements that are purely implemented in flink-connector-kafka, so the risk in merging them is minimal in terms of production stability. They also reuse existing test infrastructure, so I expect little impact on the test stability. We are still polishing the PRs and would be ready to merge them on Friday when the objection period would be over. Happy to hear your thoughts, Arvid [1] https://github.com/apache/flink/pull/16875 [2] https://github.com/apache/flink/pull/16838 [3] https://github.com/apache/flink/pull/16783
[jira] [Created] (FLINK-23861) flink sql client support dynamic params
zhangbinzaifendou created FLINK-23861: - Summary: flink sql client support dynamic params Key: FLINK-23861 URL: https://issues.apache.org/jira/browse/FLINK-23861 Project: Flink Issue Type: Improvement Components: Table SQL / Client Affects Versions: 1.13.2 Reporter: zhangbinzaifendou 1 Every time the set command is executed, the method call process is very long and a new createTableEnvironment object is created 2 As a result of the previous discussion in [FLINK-22770|https://issues.apache.org/jira/browse/FLINK-22770], I don’t think that adding quotation marks to key and value is not a good habit for users. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23860) Conversion to relational algebra failed to preserve datatypes
lixu created FLINK-23860: Summary: Conversion to relational algebra failed to preserve datatypes Key: FLINK-23860 URL: https://issues.apache.org/jira/browse/FLINK-23860 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 1.13.2, 1.13.1 Reporter: lixu Fix For: 1.14.0, 1.13.3 {code:java} //代码占位符 StreamExecutionEnvironment streamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment(); StreamTableEnvironment tableEnvironment = StreamTableEnvironment.create(streamExecutionEnvironment); tableEnvironment.executeSql("CREATE TABLE datagen (\n" + " f_sequence INT,\n" + " f_random INT,\n" + " f_random_str STRING,\n" + " ts AS localtimestamp,\n" + " WATERMARK FOR ts AS ts\n" + ") WITH (\n" + " 'connector' = 'datagen',\n" + " 'rows-per-second'='5',\n" + " 'fields.f_sequence.kind'='sequence',\n" + " 'fields.f_sequence.start'='1',\n" + " 'fields.f_sequence.end'='1000',\n" + " 'fields.f_random.min'='1',\n" + " 'fields.f_random.max'='1000',\n" + " 'fields.f_random_str.length'='10'\n" + ")"); Table table = tableEnvironment.sqlQuery("select row(f_sequence, f_random) as c from datagen"); Table table1 = tableEnvironment.sqlQuery("select * from " + table); table1.execute().print(); {code} {code:java} // exception Exception in thread "main" java.lang.AssertionError: Conversion to relational algebra failed to preserve datatypes:Exception in thread "main" java.lang.AssertionError: Conversion to relational algebra failed to preserve datatypes:validated type:RecordType(RecordType:peek_no_expand(INTEGER EXPR$0, INTEGER EXPR$1) NOT NULL c) NOT NULLconverted type:RecordType(RecordType(INTEGER EXPR$0, INTEGER EXPR$1) NOT NULL c) NOT NULLrel:LogicalProject(c=[ROW($0, $1)]) LogicalWatermarkAssigner(rowtime=[ts], watermark=[$3]) LogicalProject(f_sequence=[$0], f_random=[$1], f_random_str=[$2], ts=[LOCALTIMESTAMP]) LogicalTableScan(table=[[default_catalog, default_database, datagen]]) at org.apache.calcite.sql2rel.SqlToRelConverter.checkConvertedType(SqlToRelConverter.java:467) at org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:582) at org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$rel(FlinkPlannerImpl.scala:169) at org.apache.flink.table.planner.calcite.FlinkPlannerImpl.rel(FlinkPlannerImpl.scala:161) at org.apache.flink.table.planner.operations.SqlToOperationConverter.toQueryOperation(SqlToOperationConverter.java:989) at org.apache.flink.table.planner.operations.SqlToOperationConverter.convertSqlQuery(SqlToOperationConverter.java:958) at org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:283) at org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:101) at org.apache.flink.table.api.internal.TableEnvironmentImpl.sqlQuery(TableEnvironmentImpl.java:704) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23859) [typo][flink-core][flink-connectors]fix typo for code
wuguihu created FLINK-23859: --- Summary: [typo][flink-core][flink-connectors]fix typo for code Key: FLINK-23859 URL: https://issues.apache.org/jira/browse/FLINK-23859 Project: Flink Issue Type: Bug Reporter: wuguihu There are some typo issues in these modules. {code:java} # Use the Codespell tool to check typo issue. pip install codespell codespell -h {code} 1、 codespell flink-java/src {code:java} flink-java/src/main/java/org/apache/flink/api/java/operators/PartitionOperator.java:125: partioning ==> partitioning flink-java/src/main/java/org/apache/flink/api/java/operators/PartitionOperator.java:128: neccessary ==> necessary {code} 2、 codespell flink-clients/ {code:java} flink-clients/src/test/java/org/apache/flink/client/program/DefaultPackagedProgramRetrieverTest.java:545: acessible ==> accessible {code} 3、codespell flink-connectors/ -S '*.xml' -S '*.iml' -S '*.txt' {code:java} flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/SourceReaderOptions.java:25: tht ==> that flink-connectors/flink-connector-jdbc/src/test/java/org/apache/flink/connector/jdbc/catalog/PostgresCatalogTestBase.java:192: doens't ==> doesn't flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/dialect/JdbcDialect.java:96: PostgresSQL ==> postgresql flink-connectors/flink-connector-cassandra/src/test/java/org/apache/flink/batch/connectors/cassandra/CustomCassandraAnnotatedPojo.java:38: instanciation ==> instantiation flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/planner/delegation/hive/HiveParserCalcitePlanner.java:822: partion ==> partition flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/planner/delegation/hive/HiveParserTypeCheckProcFactory.java:943: funtion ==> function flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/planner/delegation/hive/copy/HiveASTParseDriver.java:55: funtion ==> function flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/planner/delegation/hive/copy/HiveASTParseDriver.java:51: characteres ==> characters flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/KinesisConfigUtil.java:436: paremeters ==> parameters flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/planner/delegation/hive/copy/HiveParserBaseSemanticAnalyzer.java:2369: Unkown ==> Unknown flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ConsumerConfigConstants.java:75: reprsents ==> represents flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/functions/hive/HiveFunctionWrapper.java:28: functino ==> function flink-connectors/flink-connector-hbase-2.2/src/main/java/org/apache/flink/connector/hbase2/source/HBaseRowDataAsyncLookupFunction.java:62: implemenation flink-connectors/flink-connector-pulsar/src/main/java/org/apache/flink/connector/pulsar/source/enumerator/cursor/StartCursor.java:70: ture ==> true flink-connectors/flink-connector-pulsar/src/test/resources/containers/txnStandalone.conf:907: partions ==> partitions flink-connectors/flink-connector-pulsar/src/test/resources/containers/txnStandalone.conf:468: implementatation ==> implementation flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/enumerate/BlockSplittingRecursiveEnumerator.java:141: bloc ==> block flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/SimpleStreamFormat.java:37: te ==> the flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/reader/deserializer/KafkaRecordDeserializationSchema.java:70: determin ==> determine flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/enumerator/subscriber/TopicListSubscriber.java:36: hav ==> have flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/planner/delegation/hive/copy/HiveParserQBSubQuery.java:555: correlatd ==> correlated flink-connectors/flink-connector-kafka/src/test/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumerBaseTest.java:263: intial ==> initial flink-connectors/flink-connector-kafka/src/test/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumerBaseTest.java:302: intial ==> initial flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/internals/KinesisDataFetcher.java:249: wth ==> with flink-connectors/flink-connector-files/src/test/java/org/apache/flink/connector/file/sink/writer/FileWriterBucketStateSerializerMigrationTest.java:232: comitted ==> committed flink-connectors/flink-connector-rabbitmq/src/main/java/org/apache/flink/streaming/connec
[jira] [Created] (FLINK-23858) Clarify StreamRecord#timestamp.
Arvid Heise created FLINK-23858: --- Summary: Clarify StreamRecord#timestamp. Key: FLINK-23858 URL: https://issues.apache.org/jira/browse/FLINK-23858 Project: Flink Issue Type: Technical Debt Components: Runtime / Network Reporter: Arvid Heise The new Source apparently changed the way we specify records without timestamps. Previously, we used separate methods to create and maintain timestamp-less records. Now, we are shiftings towards using a special value (TimeStampAssigner#NO_TIMESTAMP). We first of all need to document that somewhere; at the very least in the JavaDoc of StreamRecord. We should also revise the consequences: - Do we want to encode it in the {{StreamElementSerializer}}? Currently, we use a flag to indicate no-timestamp on the old path but in the new path we now use 9 bytes to encode NO_TIMESTAMP. - We should check if all code-paths deal with `hasTimestamp() == true && getTimestamp() == TimeStampAssigner#NO_TIMESTAMP`, in particular with sinks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23857) insert overwirite table select * from t where 1 != 1, Unable to clear table data
lixu created FLINK-23857: Summary: insert overwirite table select * from t where 1 != 1, Unable to clear table data Key: FLINK-23857 URL: https://issues.apache.org/jira/browse/FLINK-23857 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 1.13.1 Reporter: lixu Fix For: 1.14.0, 1.13.3 insert overwirite table select * from t where 1 != 1,Unable to clear table data,Unlike hive。 -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Merge FLINK-23757 after feature freeze
@Dian Fu could you assess how involved this change is? If the change is not very involved and the risk is limited, then I'd be in favour of merging it because feature parity of APIs is quite important for our users. Cheers, Till On Wed, Aug 18, 2021 at 1:46 PM Ingo Bürk wrote: > Hello dev, > > I was wondering whether we could also consider merging FLINK-23757[1][2] > after the freeze. This is about exposing two built-in functions which we > added to Table API & SQL prior to the freeze also for PyFlink. Meaning that > the feature itself isn't new, we only expose it on the Python API, and as > such it's also entirely isolated from the rest of PyFlink and Flink itself. > As such I'm not sure this is considered a new feature, but I'd rather ask. > The main motivation for this would be to retain parity on the APIs. Thanks! > > [1] https://issues.apache.org/jira/browse/FLINK-23757 > [2] https://github.com/apache/flink/pull/16874 > > > Best > Ingo >
[DISCUSS] Merge FLINK-23757 after feature freeze
Hello dev, I was wondering whether we could also consider merging FLINK-23757[1][2] after the freeze. This is about exposing two built-in functions which we added to Table API & SQL prior to the freeze also for PyFlink. Meaning that the feature itself isn't new, we only expose it on the Python API, and as such it's also entirely isolated from the rest of PyFlink and Flink itself. As such I'm not sure this is considered a new feature, but I'd rather ask. The main motivation for this would be to retain parity on the APIs. Thanks! [1] https://issues.apache.org/jira/browse/FLINK-23757 [2] https://github.com/apache/flink/pull/16874 Best Ingo
[jira] [Created] (FLINK-23856) Support all JSON methods in PyFlink
Ingo Bürk created FLINK-23856: - Summary: Support all JSON methods in PyFlink Key: FLINK-23856 URL: https://issues.apache.org/jira/browse/FLINK-23856 Project: Flink Issue Type: Sub-task Reporter: Ingo Bürk -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Configuring flink-python in IntelliJ
Hi Dian, thanks for responding! No, I haven't tried, but I'm aware of it. I don't work exclusively on PyFlink, but rather just occasionally. So instead of having a whole separate IDE and project for one module I'm trying to get the whole Flink project to work as one in IntelliJ. Do PyFlink developers generally work solely on PyFlink and nothing else / do you switch IDEs for everything? Best Ingo On Wed, Aug 18, 2021 at 1:20 PM Dian Fu wrote: > Hi Ingo, > > Thanks a lot for starting up this discussion. I have not tried IntelliJ and > I’m using PyCharm for PyFlink development. Have you tried this guideline? > [1] > It’s written in 1.9 and I think it should still work. > > Regards, > Dian > > [1] > > https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/flinkdev/ide_setup/#pycharm > > On Wed, Aug 18, 2021 at 5:38 PM Till Rohrmann > wrote: > > > Thanks for starting this discussion Ingo. I guess that Dian can probably > > answer this question. I am big +1 for updating our documentation for how > to > > set up the IDE since this problem will probably be encountered a couple > of > > times. > > > > Cheers, > > Till > > > > On Wed, Aug 18, 2021 at 11:03 AM Ingo Bürk wrote: > > > > > Hello @dev, > > > > > > like probably most of you, I am using IntelliJ to work on Flink. > Lately I > > > needed to also work with flink-python, and thus was wondering about how > > to > > > properly set it up to work in IntelliJ. So far, I have done the > > following: > > > > > > 1. Install the Python plugin > > > 2. Set up a custom Python SDK using a virtualenv > > > 3. Configure the flink-python module to use this Python SDK (rather > than > > a > > > Java SDK) > > > 4. Install as many of the Python dependencies as possible when IntelliJ > > > prompted me to do so > > > > > > This got me into a mostly working state, for example I can run tests > from > > > the IDE, at least the ones I tried so far. However, there are two > > concerns: > > > > > > a) Step (3) will be undone on every Maven import since Maven sets the > SDK > > > for the module > > > b) Step (4) installed most, but a couple of Python dependencies could > not > > > be installed, though so far that didn't cause noticeable problems > > > > > > I'm wondering if there are additional / different steps to do, > > specifically > > > for (a), and maybe how the PyFlink developers are configuring this in > > their > > > IDE. The IDE setup guide didn't seem to contain information about that > > > (only about separately setting up this module in PyCharm). Ideally, we > > > could even update the IDE setup guide. > > > > > > > > > Best > > > Ingo > > > > > >
Re: Configuring flink-python in IntelliJ
Hi Ingo, Thanks a lot for starting up this discussion. I have not tried IntelliJ and I’m using PyCharm for PyFlink development. Have you tried this guideline? [1] It’s written in 1.9 and I think it should still work. Regards, Dian [1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/flinkdev/ide_setup/#pycharm On Wed, Aug 18, 2021 at 5:38 PM Till Rohrmann wrote: > Thanks for starting this discussion Ingo. I guess that Dian can probably > answer this question. I am big +1 for updating our documentation for how to > set up the IDE since this problem will probably be encountered a couple of > times. > > Cheers, > Till > > On Wed, Aug 18, 2021 at 11:03 AM Ingo Bürk wrote: > > > Hello @dev, > > > > like probably most of you, I am using IntelliJ to work on Flink. Lately I > > needed to also work with flink-python, and thus was wondering about how > to > > properly set it up to work in IntelliJ. So far, I have done the > following: > > > > 1. Install the Python plugin > > 2. Set up a custom Python SDK using a virtualenv > > 3. Configure the flink-python module to use this Python SDK (rather than > a > > Java SDK) > > 4. Install as many of the Python dependencies as possible when IntelliJ > > prompted me to do so > > > > This got me into a mostly working state, for example I can run tests from > > the IDE, at least the ones I tried so far. However, there are two > concerns: > > > > a) Step (3) will be undone on every Maven import since Maven sets the SDK > > for the module > > b) Step (4) installed most, but a couple of Python dependencies could not > > be installed, though so far that didn't cause noticeable problems > > > > I'm wondering if there are additional / different steps to do, > specifically > > for (a), and maybe how the PyFlink developers are configuring this in > their > > IDE. The IDE setup guide didn't seem to contain information about that > > (only about separately setting up this module in PyCharm). Ideally, we > > could even update the IDE setup guide. > > > > > > Best > > Ingo > > >
[jira] [Created] (FLINK-23855) Table API & SQL Configuration Not displayed on flink dashboard
simenliuxing created FLINK-23855: Summary: Table API & SQL Configuration Not displayed on flink dashboard Key: FLINK-23855 URL: https://issues.apache.org/jira/browse/FLINK-23855 Project: Flink Issue Type: Improvement Components: Runtime / Web Frontend Affects Versions: 1.13.2 Reporter: simenliuxing Fix For: 1.14.0 Attachments: image-2021-08-18-18-48-10-985.png branch:1.13.2 planner:blink hi When I run a flinksql task in standalone mode, I set some parameters starting with table., but I can't find them on the dashboard, although I know these parameters are effective.Can these parameters be displayed somewhere -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Configuring flink-python in IntelliJ
Thanks for starting this discussion Ingo. I guess that Dian can probably answer this question. I am big +1 for updating our documentation for how to set up the IDE since this problem will probably be encountered a couple of times. Cheers, Till On Wed, Aug 18, 2021 at 11:03 AM Ingo Bürk wrote: > Hello @dev, > > like probably most of you, I am using IntelliJ to work on Flink. Lately I > needed to also work with flink-python, and thus was wondering about how to > properly set it up to work in IntelliJ. So far, I have done the following: > > 1. Install the Python plugin > 2. Set up a custom Python SDK using a virtualenv > 3. Configure the flink-python module to use this Python SDK (rather than a > Java SDK) > 4. Install as many of the Python dependencies as possible when IntelliJ > prompted me to do so > > This got me into a mostly working state, for example I can run tests from > the IDE, at least the ones I tried so far. However, there are two concerns: > > a) Step (3) will be undone on every Maven import since Maven sets the SDK > for the module > b) Step (4) installed most, but a couple of Python dependencies could not > be installed, though so far that didn't cause noticeable problems > > I'm wondering if there are additional / different steps to do, specifically > for (a), and maybe how the PyFlink developers are configuring this in their > IDE. The IDE setup guide didn't seem to contain information about that > (only about separately setting up this module in PyCharm). Ideally, we > could even update the IDE setup guide. > > > Best > Ingo >
回复:[DISCUSS] FLIP-173: Support DAG of algorithms (Flink ML)
Hi, Mingliang, Thank you for providing a real-world case of heterogeneous topology in the training and inference phase, and Becket has given two options to you to choose. Personally, I think Becket's two options are over-simplified in description, and may be somehow misleading. Here, I would like add some of my thoughts: Proposal Option-2 does NOT have to implement two DAGs in ALL cases. In most cases, the best practice in Proposal Option-2 is to put the common part (inference part) into a Pipeline. In the training phase, the data is preprocessed by AlgoOps or another pipeline, and then fed to Pipeline.fit(). The output PipelineModel can be directly used in the inference phase. The code will be much clearer and cleaner than the complicated manipulation of estimatorInputs and transformerInput in the Graph API. Proposal Option-1 can NOT ALWAYS encapsulate the heterogeneous topology with the Graph/GraphBuilder API. In [1], we already list some cases where Graph API failed to encapsulate the complicated topology, and we also presented concrete scenarios we encountered. And such incapability could bring extra effort when incremental developing your ML task. EVEN IF Mingliang's cases happened to be in the rare positions where Becket's two options applied, In [1], the actual differences between two options are shown in code snippets. I personally do not think implementing two DAGs brings much overhead. You may check those code snippets if you would like. As far as I can see, most inference/predict pipelines are used for online serving (as in offline inference, there is no need to export models). In the situation of online serving, the corresponding pipeline can only accept 1 dataset and produce 1 dataset. It means the item-1 above applies: Proposal Option-2 does the same thing in a clear and clean way. So, Mingliang, if it does not bother you much, you may give more information about your scenarios, and may think with supplementary information above. [1] https://docs.google.com/document/d/1L3aI9LjkcUPoM52liEY6uFktMnFMNFQ6kXAjnz_11do Sincerely, Fan Hong -- 发件人:青雉(祁明良) 发送时间:2021年8月10日(星期二) 11:36 收件人:dev@flink.apache.org 主 题:Re: [DISCUSS] FLIP-173: Support DAG of algorithms (Flink ML) Vote for option 2. It is similar to what we are doing with Tensorflow. 1. Define the graph in training phase 2. Export model with different input/output spec for online inference Thanks, Mingliang On Aug 10, 2021, at 9:39 AM, Becket Qin mailto:becket@gmail.com>> wrote: estimatorInputs 本?件及其附件含有小??公司的保密信息,?限于?送?以上收件人或群?。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、?制、或散?)本?件中的信息。如果??收了本?件,??立即??或?件通知?件人并?除本?件! This communication may contain privileged or other confidential information of Red. If you have received it in error, please advise the sender by reply e-mail and immediately delete the message and any attachments without copying or disclosing the contents. Thank you.
[jira] [Created] (FLINK-23854) KafkaSink error when restart from the checkpoint with a lower parallelism
Hang Ruan created FLINK-23854: - Summary: KafkaSink error when restart from the checkpoint with a lower parallelism Key: FLINK-23854 URL: https://issues.apache.org/jira/browse/FLINK-23854 Project: Flink Issue Type: Bug Components: Connectors / Kafka Affects Versions: 1.14.0 Reporter: Hang Ruan The KafkaSink throws the exception when restarted with a lower parallelism. The exception is like this. {code:java} // code placeholder java.lang.IllegalStateException: Internal error: It is expected that state from previous executions is distributed to the same subtask id.at org.apache.flink.util.Preconditions.checkState(Preconditions.java:193)at org.apache.flink.connector.kafka.sink.KafkaWriter.recoverAndInitializeState(KafkaWriter.java:178) at org.apache.flink.connector.kafka.sink.KafkaWriter.(KafkaWriter.java:130) at org.apache.flink.connector.kafka.sink.KafkaSink.createWriter(KafkaSink.java:99) at org.apache.flink.streaming.runtime.operators.sink.SinkOperator.initializeState(SinkOperator.java:134) at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.initializeOperatorState(StreamOperatorStateHandler.java:118) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:286) at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:109) at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:690) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55) at org.apache.flink.streaming.runtime.tasks.StreamTask.executeRestore(StreamTask.java:666) at org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:785) at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:638) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766)at org.apache.flink.runtime.taskmanager.Task.run(Task.java:572)at java.lang.Thread.run(Thread.java:748)Suppressed: java.lang.NullPointerExceptionat org.apache.flink.streaming.runtime.operators.sink.SinkOperator.close(SinkOperator.java:195) at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.close(StreamOperatorWrapper.java:141) at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.closeAllOperators(RegularOperatorChain.java:127) at org.apache.flink.streaming.runtime.tasks.StreamTask.closeAllOperators(StreamTask.java:1028) at org.apache.flink.streaming.runtime.tasks.StreamTask.runAndSuppressThrowable(StreamTask.java:1014) at org.apache.flink.streaming.runtime.tasks.StreamTask.cleanUpInvoke(StreamTask.java:927) at org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:797) ... 4 more {code} I start the kafka cluster(kafka_2.13-2.8.0) and the flink cluster in my own mac. I change the parallelism from 4 to 2 and restart the job from some completed checkpoint. Then the error occurs. And the cli command and the code are as follows. {code:java} // cli command ./bin/flink run -d -c com.test.KafkaExactlyOnceScaleDownTest -s /Users/test/checkpointDir/ExactlyOnceTest1/67105fcc1724e147fc6208af0dd90618/chk-1 /Users/test/project/self/target/test.jar {code} {code:java} public class KafkaExactlyOnceScaleDownTest { public static void main(String[] args) throws Exception { final String kafkaSourceTopic = "flinkSourceTest"; final String kafkaSinkTopic = "flinkSinkExactlyTest1"; final String groupId = "ExactlyOnceTest1"; final String brokers = "localhost:9092"; final String ckDir = "file:///Users/hangruan/checkpointDir/" + groupId; final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.enableCheckpointing(6); env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE); env.getCheckpointConfig().enableExternalizedCheckpoints( CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION); env.getCheckpointConfig().setCheckpointStorage(ckDir); env.setParallelism(4); KafkaSource source = KafkaSource.builder() .setBootstrapServers(brokers) .setTopics(kafkaSourceTopic) .setGroupId(groupId) .setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest") .setValueOnlyDeserializer(new SimpleStringSchema()) .build(); DataStream flintstones = env.fromSource(source, WatermarkStrategy.noWatermarks(), "Kafka Source"); DataStream adults = flintstones.filter(s -> s != null && s.length() > 2); Properties props = new Properties(); props.setProperty("transaction.timeout.ms", "90"); adults.sinkTo(KafkaSink.builder() .setBootstrapServers(brokers) .setDeliverGuarantee(DeliveryGuarantee.EXACTLY_ONCE) .setTransactional
Configuring flink-python in IntelliJ
Hello @dev, like probably most of you, I am using IntelliJ to work on Flink. Lately I needed to also work with flink-python, and thus was wondering about how to properly set it up to work in IntelliJ. So far, I have done the following: 1. Install the Python plugin 2. Set up a custom Python SDK using a virtualenv 3. Configure the flink-python module to use this Python SDK (rather than a Java SDK) 4. Install as many of the Python dependencies as possible when IntelliJ prompted me to do so This got me into a mostly working state, for example I can run tests from the IDE, at least the ones I tried so far. However, there are two concerns: a) Step (3) will be undone on every Maven import since Maven sets the SDK for the module b) Step (4) installed most, but a couple of Python dependencies could not be installed, though so far that didn't cause noticeable problems I'm wondering if there are additional / different steps to do, specifically for (a), and maybe how the PyFlink developers are configuring this in their IDE. The IDE setup guide didn't seem to contain information about that (only about separately setting up this module in PyCharm). Ideally, we could even update the IDE setup guide. Best Ingo
Flink 1.14 Bi-weekly 2021-08-17 Feature Freeze done, now stabilise!
Dear Flink Community, Thanks for putting all that effort into the 1.14 release. We are now post feature freeze and it looks like most of the features made it. This will be a decent release. Here's a summary of today's bi-weekly, which has been a bit longer. *Exception for feature freeze* There are two efforts around FLIP-147 and the Kafka source which have some pending PRs. The people driving that forward will (or already did) reach out to the mailing list and ask for the exception to merge it. *1.14 release wiki page* Please keep the page [1] up to date. To mark well documented and tested features we are introducing the new state "done done". Please mark a feature as "done done", when it is well documented and tested. Please add issues or PRs for documentation to the corresponding column, as well as tickets that can be picked up by other teams for cross-team testing. *Cleaning up issues* When there was a parent issue targeting the 1.14 release and there are remaining, open subtasks please move them to a new ticket and close the old one for the sake of transparency and clarity. *Documentation* During the stabilisation phase documentation issues will be handled as blockers as they also have an impact on test-ability. *Cross-team testing* As mentioned above, those responsible for a feature should create a ticket for cross-team testing containing instructions to test the newly developed feature. This should be linked on the wiki page. Others should pick some of those and test features they did not contribute to. *Build stability* The last weeks the build stability has been awful, which somehow comes natural as a lot of features have been just merged. Now, this is blocking us from moving towards a first release candidate. We encourage every contributor to look into those issues and pick some to improve the general CI experience. The release managers will try to find assignees for the blockers as soon as they appear and for the critical issues during the syncs. Please ensure to report issues with the right severity and fix version. You can use the Kanban board to get an overview and pick up issues. There are filters ready to use for 1.14 related blockers, critical issues, instabilities and cross-team testing issues. So once again: make sure the fixVersion is set to "1.14.0" to let issues appear on this board. [2] *Syncs* From now on the syncs will happen weekly (Tuesdays, same time, same place). If the load for the release managers is too high we will move to two meetings per week. In the syncs we will go through the blockers and assign unassigned critical issues. *Release branch and RC0* As we want to collect user feedback as early as possible, we'd like to push for branching of the release and creating the first release candidate as soon as possible. This requires work on stability issues. (see above). We will decide in the sync next week about the timing. Thanks for making this happen. Best Xintong, Dawid & Joe [1] https://cwiki.apache.org/confluence/display/FLINK/1.14+Release [2] https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=468
[jira] [Created] (FLINK-23853) Update StateFun's Flink dependency to 1.13.2
Tzu-Li (Gordon) Tai created FLINK-23853: --- Summary: Update StateFun's Flink dependency to 1.13.2 Key: FLINK-23853 URL: https://issues.apache.org/jira/browse/FLINK-23853 Project: Flink Issue Type: Improvement Components: Stateful Functions Reporter: Tzu-Li (Gordon) Tai Assignee: Tzu-Li (Gordon) Tai Fix For: statefun-3.1.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23852) In sql-client mode ,could not change the properties.
Shuaishuai Guo created FLINK-23852: -- Summary: In sql-client mode ,could not change the properties. Key: FLINK-23852 URL: https://issues.apache.org/jira/browse/FLINK-23852 Project: Flink Issue Type: Improvement Components: Documentation, Table SQL / Client Affects Versions: 1.13.0 Reporter: Shuaishuai Guo Attachments: 微信图片_20210818152914.png When we set properties as follows, it does not work. {color:#FF}{{ SET 'sql-client.execution.result-mode' = 'tableau';}}{color} {{The reson is: single quotation mark is needless.}} {{}} {{It should be use like this :}} {{{color:#FF} {color}}}+{color:#FF}{{}}{{SET sql-client.execution.result-mode = 'tableau';}}{color}+ {{Note: all of the properties have this problem.}} -- This message was sent by Atlassian Jira (v8.3.4#803005)