[jira] [Created] (FLINK-24443) IntervalJoinITCase.testRowTimeInnerJoinWithEquiTimeAttrs fail with output mismatch
Dawid Wysakowicz created FLINK-24443: Summary: IntervalJoinITCase.testRowTimeInnerJoinWithEquiTimeAttrs fail with output mismatch Key: FLINK-24443 URL: https://issues.apache.org/jira/browse/FLINK-24443 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 1.15.0 Reporter: Dawid Wysakowicz https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=24716&view=logs&j=a9db68b9-a7e0-54b6-0f98-010e0aff39e2&t=cdd32e0b-6047-565b-c58f-14054472f1be&l=9811 {code} Oct 02 01:08:36 [ERROR] Tests run: 42, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 23.361 s <<< FAILURE! - in org.apache.flink.table.planner.runtime.stream.sql.IntervalJoinITCase Oct 02 01:08:36 [ERROR] testRowTimeInnerJoinWithEquiTimeAttrs[StateBackend=ROCKSDB] Time elapsed: 0.408 s <<< FAILURE! Oct 02 01:08:36 java.lang.AssertionError: expected: but was: Oct 02 01:08:36 at org.junit.Assert.fail(Assert.java:89) Oct 02 01:08:36 at org.junit.Assert.failNotEquals(Assert.java:835) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-24444) OperatorCoordinatorSchedulerTest#shutdownScheduler fails with IllegalState
Dawid Wysakowicz created FLINK-2: Summary: OperatorCoordinatorSchedulerTest#shutdownScheduler fails with IllegalState Key: FLINK-2 URL: https://issues.apache.org/jira/browse/FLINK-2 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.14.0 Reporter: Dawid Wysakowicz https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=24727&view=logs&j=4d4a0d10-fca2-5507-8eed-c07f0bdf4887&t=7b25afdf-cc6c-566f-5459-359dc2585798&l=8053 {code} java.util.concurrent.ExecutionException: org.apache.flink.runtime.messages.FlinkJobNotFoundException: Could not find Flink job (c803a5d701b4e6830a9d7c538fec843e) at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) at org.apache.flink.runtime.rest.handler.legacy.DefaultExecutionGraphCacheTest.testImmediateCacheInvalidationAfterFailure(DefaultExecutionGraphCacheTest.java:147) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at org.junit.runner.JUnitCore.run(JUnitCore.java:115) at org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.Iterator.forEachRemaining(Iterator.java:116) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: The Apache Flink should pay more attention to ensuring API compatibility.
Hi Jark, I also don't see it as a blocker issue at all. If you want to access the metric group across 1.13 and 1.14, you can use (MetricGroup) context.getClass().getMethod("metricGroup").invoke(context) But of course, you will not be able to access the new standardized metrics. For that you will need to maintain two different source/binary builds, since it's a new feature. I agree with Piotr, the issue is that we need a more standardized process around PublicEvolving. Ideally, with every new minor release, we should convert PublicEvolving to Public and Experimental to PublicEvolving. We could extend the interfaces to capture a target version and a comment for why the API is not Public yet. Before every release, we would go through the annotated classes and either find a specific reason to keep the annotation or move it towards Public. If we have a specific reason to keep it Experimental/PublicEvolving, we should plan to address that reason with the next release. We do have good checks in place for Public; we are just too slow with ensuring that new API becomes Public. On Fri, Oct 1, 2021 at 9:41 PM Piotr Nowojski wrote: > Hi, > > I don't understand why we are talking about this being a blocker issue? New > sources were not marked as @Public for a good reason until 1.14. I agree, > we should try better at making APIs @Public sooner. I was even proposing to > create strict timeout rules (enforced via some automated checks) like > (unless for a very very good reason) everything marked @PublicEvolving > or @Experimental should be upgraded to @Public after for example 2 years > [1]. But for example the new Sink API IMO is too fresh to make it > `@Public`. > > It doesn't change the fact that if we could provide a compatibility layer > between 1.13.x and 1.14.x for this SourceReaderContext issue, it would be a > nice thing to do. I would be -1 for keeping it forever, but trying to > support forward compatibility of `@PublicEvolving` APIs for one or two > releases into the future might be a good rule of thumb. > > Best, Piotrek > > [1] "[DISCUSS] Dealing with deprecated and legacy code in Flink" on the dev > mailing list > > > pt., 1 paź 2021 o 16:56 Jark Wu napisał(a): > > > Hi Arvid, > > > > > Should we expect connector devs to release different connector binaries > > for different Flink minors? > > From the discussion of this thread, I think the answer is obviously > "not", > > otherwise OpenInx won't start > > this discussion. As a maintainer of flink-cdc-connector, I have to say > > that it's very hard to release > > connectors for different flink versions. Usually, the connector > community > > doesn't have so much time to > > maintain different branches/modules/code for different flink versions. > > > > > If we change it back, then a specific connector would work for 1.14.1 > and > > 1.13.X but not for 1.14.0 and this would be even more confusing. > > I think this is fine. IMO, this is a blocker issue of 1.14.0 which breaks > > Source connectors. > > We should suggest users to use 1.14.1 if they use Source connectors. > > > > Best, > > Jark > > > > > > On Fri, 1 Oct 2021 at 19:05, Arvid Heise wrote: > > > > > The issue is that if we do not mark them as Public, we will always have > > > incompatibilities. The change of SourceReaderContext#metricGroup is > > > perfectly fine according to the annotation. The implications that we > see > > > here just mean that the interfaces have been expected to be Public. > > > > > > And now the question is what do we expect? > > > Should we expect connector devs to release different connector binaries > > > for different Flink minors? Then PublicEvolving is fine. > > > If we expect that the same connector can work across multiple Flink > > > versions, we need to go into Public. > > > > > > It doesn't make sense to keep them PublicEvolving on the annotation but > > > implicitly assume them to be Public. > > > > > > @Jark Wu I don't see a way to revert the change of > > > SourceReaderContext#metricGroup. For now, connector devs that expose > > > metrics need to release 2 versions. If we change it back, then a > specific > > > connector would work for 1.14.1 and 1.13.X but not for 1.14.0 and this > > > would be even more confusing. > > > > > > On Fri, Oct 1, 2021 at 10:49 AM Ingo Bürk wrote: > > > > > >> Hi, > > >> > > >> > [...] but also the new Source/Sink APIs as public > > >> > > >> I'm not really involved in the new Source/Sink APIs and will happily > > >> listen to the developers working with them here, but since they are > > new, we > > >> should also be careful not to mark them as stable too quickly. We've > > only > > >> begun updating the existing connectors to these interfaces at the > > moment. > > >> Making more progress here and keeping new APIs as Evolving for a > couple > > of > > >> minor releases is probably still a good idea. Maybe we should even > have > > >> actual rules on when APIs can/should be promoted? > > >> > > >> More actively checki
[jira] [Created] (FLINK-24445) Move RPC System packaging to package phase
Chesnay Schepler created FLINK-24445: Summary: Move RPC System packaging to package phase Key: FLINK-24445 URL: https://issues.apache.org/jira/browse/FLINK-24445 Project: Flink Issue Type: Technical Debt Components: Build System Affects Versions: 1.14.0 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.15.0, 1.14.1 {{mvn compile/test}} currently fails because the copying of the flink-rpc-akka jar is done in the generate-sources phase. We should move this copying to the packaging phase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-24446) Casting from STRING to TIMESTAMP_TZ looses fractional seconds
Marios Trivyzas created FLINK-24446: --- Summary: Casting from STRING to TIMESTAMP_TZ looses fractional seconds Key: FLINK-24446 URL: https://issues.apache.org/jira/browse/FLINK-24446 Project: Flink Issue Type: Sub-task Reporter: Marios Trivyzas Currently the method *toTimestamp(str, tz)* from *SqlDateTimeUtils* doesn't accept more then 23 chars in the input and it also return a long which is the millis since epoch so the rest of the fractional secs are ignored. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-24447) Bundle netty in flink-shade-zookeeper 3.5+
Chesnay Schepler created FLINK-24447: Summary: Bundle netty in flink-shade-zookeeper 3.5+ Key: FLINK-24447 URL: https://issues.apache.org/jira/browse/FLINK-24447 Project: Flink Issue Type: Improvement Components: BuildSystem / Shaded Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: shaded-15.0 The client-server SSL support added in ZK 3.5 requires Netty, but we currently exclude it as we assumed it to not be necessary. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-24448) Tumbling Window Not working with EPOOCH Time converted using TO_TIMESTAMP_LTZ
Mehul Batra created FLINK-24448: --- Summary: Tumbling Window Not working with EPOOCH Time converted using TO_TIMESTAMP_LTZ Key: FLINK-24448 URL: https://issues.apache.org/jira/browse/FLINK-24448 Project: Flink Issue Type: Bug Reporter: Mehul Batra *When I am running my code to test the connector = 'print' to see my window aggregated data it is not printing anything and when I am excluding the tumbling window it is printing data, but the same code is working with the tumble window in FLINK 1.13.1.* SQL API CONNECTORS TO REFER : tableEnv.executeSql("CREATE TABLE IF NOT EXISTS Teamtopic (\n" + " eventName String,\n" + " ingestion_time BIGINT,\n" + " t_ltz as TO_TIMESTAMP_LTZ(ingestion_time,3) , + " WATERMARK FOR t_ltz AS t_ltz - INTERVAL '5' SECOND + " as event-time attribute\n" + ") WITH (\n" + " 'connector' = 'kafka' tableEnv.executeSql("CREATE TABLE minutess (\n" + " `minute` TIMESTAMP(3),\n" + " hits BIGINT ,\n" + " type STRING\n" + ") WITH (\n" + " 'connector' = 'print' " + ")"); tableEnv.createStatementSet() .addInsertSql("INSERT INTO minutess \n" + " SELECT " + "TUMBLE_END(t_ltz,INTERVAL '1' MINUTE) AS windowmin ," + "COUNT(eventName) as hits, " + "'team_save_failed_minute_error_types' as type\n" + " FROM TeamSaveFailed\n" +" GROUP BY TUMBLE(t_ltz, INTERVAL '1' MINUTE ),eventName") .execute(); -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] FLIP-176: Unified Iteration to Support Algorithms (Flink ML)
Hi David, Very thanks for the feedback and glad to see that we have the same opinions on a lot of points of the iteration! :) And for the checkpoint, basically to support the checkpoint for a job with feedback edges, we need to also include the records on the feedback edges into the checkpoint snapshot, as described in [1]. We do this by exploiting the reference count mechanism provided by the raw states so that the asynchronous phase would wait until we finish writing all the feedback records into the raw states, which is also similar to the implementation in the statefun. Including the feedback records into snapshot is enough for the unbounded iteration, but for the bounded iteration, we would also need 1. Checkpoint after tasks finished: since for an iteration job with bounded inputs, most time of the execution is spent after all the sources are finished and the iteration body is executing, we would need to support checkpoints during this period. Fortunately in 1.14 we have implemented the first version of this functionality. 2. Keep the notification of round increment exactly-once: for bounded iteration we would notify the round end for each operator via onEpochWatermarkIncrement(), this is done by insert epoch watermarks at the end of each round. We would like to keep the notification of onEpochWatermarkIncrement() exactly-once to simplify the algorithms' development. This is done by ensuring that the epoch watermarks with the same epoch value and the barriers of the same checkpoint always have the same order when transmitting in the iteration body. With this condition, after failover all the operators inside the iteration body must have received the same amount of notifications, and we could start with the next one. Also since the epoch watermarks might also be snapshot in the feedback edge snapshot, we disable the rescaling of the head / tail operators for the bounded iteration. Best, Yun [1] https://arxiv.org/abs/1506.08603 -- From:David Morávek Send Time:2021 Oct. 4 (Mon.) 14:05 To:dev ; Yun Gao Subject:Re: [DISCUSS] FLIP-176: Unified Iteration to Support Algorithms (Flink ML) Hi Yun, I did a quick pass over the design doc and it addresses all of the problems with the current iterations I'm aware of. It's great to see that you've been able to workaround the need of vectorized watermarks by giving up nested iterations (which IMO is more of an academic concept than something with a solid use-case). I'll try to give it some more thoughts, but from a first pass it looks great +1 ;) One thing that I'm unsure about, how do you plan to implement exactly-once checkpointing of the feedback edge? Best, D. On Mon, Oct 4, 2021 at 4:42 AM Yun Gao wrote: > Hi all, > > If we do not have other concerns on this FLIP, we would like to start the > voting by the end of oct 8th. > > Best, > Yun. > > > -- > From:Yun Gao > Send Time:2021 Sep. 15 (Wed.) 20:47 > To:dev > Subject:[DISCUSS] FLIP-176: Unified Iteration to Support Algorithms (Flink > ML) > > > Hi all, > > DongLin, ZhiPeng and I are opening this thread to propose designing and > implementing a new iteration library inside the Flink-ML project, as > described in > FLIP-176[1]. > > Iteration serves as a fundamental functionality required to support the > implementation > of ML algorithms. Previously Flink supports bounded iteration on top of > the > DataSet API and unbounded iteration on top of the DataStream API. However, > since we are going to deprecated the dataset API and the current unbounded > iteration > API on top of the DataStream API is not fully complete, thus we are > proposing > to add the new unified iteration library on top of DataStream API to > support both > unbounded and bounded iterations. > > Very thanks for your feedbacks! > > [1] https://cwiki.apache.org/confluence/x/hAEBCw > > Best,Yun -- From:David Morávek Send Time:2021 Oct. 4 (Mon.) 14:05 To:dev ; Yun Gao Subject:Re: [DISCUSS] FLIP-176: Unified Iteration to Support Algorithms (Flink ML) Hi Yun, I did a quick pass over the design doc and it addresses all of the problems with the current iterations I'm aware of. It's great to see that you've been able to workaround the need of vectorized watermarks by giving up nested iterations (which IMO is more of an academic concept than something with a solid use-case). I'll try to give it some more thoughts, but from a first pass it looks great +1 ;) One thing that I'm unsure about, how do you plan to implement exactly-once checkpointing of the feedback edge? Best, D. On Mon, Oct 4, 2021 at 4:42 AM Yun Gao wrote: > Hi all, > > If we do not have other concerns on this FLIP, we would like to start the > voting by the end of oct 8th. > > Best, > Yun. > > >