[jira] [Created] (FLINK-36356) HadoopRecoverableWriterTest.testRecoverWithState due to IOException
Matthias Pohl created FLINK-36356: - Summary: HadoopRecoverableWriterTest.testRecoverWithState due to IOException Key: FLINK-36356 URL: https://issues.apache.org/jira/browse/FLINK-36356 Project: Flink Issue Type: Bug Components: Connectors / Hadoop Compatibility Affects Versions: 2.0-preview Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62378&view=logs&j=2e8cb2f7-b2d3-5c62-9c05-cd756d33a819&t=2dd510a3-5041-5201-6dc3-54d310f68906&l=10514 {code} Sep 23 07:55:16 07:55:16.451 [ERROR] Tests run: 12, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 20.05 s <<< FAILURE! -- in org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriterTest Sep 23 07:55:16 07:55:16.451 [ERROR] org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriterTest.testRecoverWithState -- Time elapsed: 2.694 s <<< ERROR! Sep 23 07:55:16 java.io.IOException: All datanodes [DatanodeInfoWithStorage[127.0.0.1:45240,DS-13a30476-dff5-4f3a-88b1-887571521a95,DISK]] are bad. Aborting... Sep 23 07:55:16 at org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1537) Sep 23 07:55:16 at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1472) Sep 23 07:55:16 at org.apache.hadoop.hdfs.DataStreamer.processDatanodeError(DataStreamer.java:1244) Sep 23 07:55:16 at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:663) {code} The Maven logs reveal a bit more (I attached the extract of the failed build): {code} 07:55:13,491 [DataXceiver for client DFSClient_NONMAPREDUCE_211593080_35 at /127.0.0.1:59360 [Receiving block BP-289839883-172.27.0.2-1727078098659:blk_1073741832_1016]] ERROR org.apache.hadoop.hdfs.server.datanode.DataNode [] - 127.0.0.1:46429:DataXceiver error processing WRITE_BLOCK operation src: /127.0.0.1:59360 dst: /127.0.0.1:46429 java.nio.channels.ClosedByInterruptException: null at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) ~[?:1.8.0_292] at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:406) ~[?:1.8.0_292] at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) ~[hadoop-common-2.10.2.jar:?] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) ~[hadoop-common-2.10.2.jar:?] at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) ~[hadoop-common-2.10.2.jar:?] at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) ~[hadoop-common-2.10.2.jar:?] at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) ~[?:1.8.0_292] at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) ~[?:1.8.0_292] at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[?:1.8.0_292] at java.io.DataInputStream.read(DataInputStream.java:149) ~[?:1.8.0_292] at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:209) ~[hadoop-common-2.10.2.jar:?] at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:211) ~[hadoop-hdfs-client-2.10.2.jar:?] at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) ~[hadoop-hdfs-client-2.10.2.jar:?] at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) ~[hadoop-hdfs-client-2.10.2.jar:?] at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:528) ~[hadoop-hdfs-2.10.2.jar:?] at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:968) ~[hadoop-hdfs-2.10.2.jar:?] at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:877) ~[hadoop-hdfs-2.10.2.jar:?] at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166) ~[hadoop-hdfs-2.10.2.jar:?] at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) ~[hadoop-hdfs-2.10.2.jar:?] at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:290) [hadoop-hdfs-2.10.2.jar:?] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292] 07:55:13,491 [DataXceiver for client DFSClient_NONMAPREDUCE_211593080_35 at /127.0.0.1:39968 [Receiving block BP-289839883-172.27.0.2-1727078098659:blk_1073741832_1016]] INFO org.apache.hadoop.hdfs.server.datanode.DataNode [] - Exception for BP-289839883-172.27.0.2-1727078098659:blk_1073741832_1017 java.io.IOException: Premature EOF from inputStream at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:211) ~[hadoop-common-2.10.2.jar:?] at org.apache.hadoop.hdfs.protocol.datatransfer.P
[jira] [Created] (FLINK-36350) IllegalAccessError detected in JDK17+ runs
Matthias Pohl created FLINK-36350: - Summary: IllegalAccessError detected in JDK17+ runs Key: FLINK-36350 URL: https://issues.apache.org/jira/browse/FLINK-36350 Project: Flink Issue Type: Bug Components: Tests Affects Versions: 2.0-preview Reporter: Matthias Pohl UnalignedCheckpointRescaleITCase and GroupReduceITCase are affected in JDK17 and JDK21 test profiles. https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62359&view=logs&j=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3&t=712ade8c-ca16-5b76-3acd-14df33bc1cb1 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36349) ClassNotFoundException due to org.apache.flink.runtime.types.FlinkScalaKryoInstantiator missing
Matthias Pohl created FLINK-36349: - Summary: ClassNotFoundException due to org.apache.flink.runtime.types.FlinkScalaKryoInstantiator missing Key: FLINK-36349 URL: https://issues.apache.org/jira/browse/FLINK-36349 Project: Flink Issue Type: Bug Components: API / Type Serialization System Affects Versions: 2.0-preview Reporter: Matthias Pohl This is most likely caused by FLINK-29741 which was recently merged. https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62359&view=logs&j=8fd9202e-fd17-5b26-353c-ac1ff76c8f28&t=ea7cf968-e585-52cb-e0fc-f48de023a7ca&l=17558 {code} Sep 23 01:58:51 01:58:50,533 12326 [AsyncOperations-thread-1] INFO org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer [] - Kryo serializer scala extensions are not available. Sep 23 01:58:51 java.lang.ClassNotFoundException: org.apache.flink.runtime.types.FlinkScalaKryoInstantiator Sep 23 01:58:51 at java.net.URLClassLoader.findClass(URLClassLoader.java:382) ~[?:1.8.0_292] Sep 23 01:58:51 at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_292] Sep 23 01:58:51 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) ~[?:1.8.0_292] Sep 23 01:58:51 at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_292] Sep 23 01:58:51 at java.lang.Class.forName0(Native Method) ~[?:1.8.0_292] Sep 23 01:58:51 at java.lang.Class.forName(Class.java:264) ~[?:1.8.0_292] [...] {code} It causes ClosureCleanerITCase to fail in the AdaptiveScheduler test profile. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36324) MiscAggFunctionITCase expected to raise Throwable
Matthias Pohl created FLINK-36324: - Summary: MiscAggFunctionITCase expected to raise Throwable Key: FLINK-36324 URL: https://issues.apache.org/jira/browse/FLINK-36324 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 2.0-preview Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62234&view=logs&j=0c940707-2659-5648-cbe6-a1ad63045f0a&t=075c2716-8010-5565-fe08-3c4bb45824a4&l=11810 {code} Sep 19 02:06:44 02:06:44.447 [ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.654 s <<< FAILURE! -- in org.apache.flink.table.planner.functions.MiscAggFunctionITCase Sep 19 02:06:44 02:06:44.448 [ERROR] org.apache.flink.table.planner.functions.MiscAggFunctionITCase.test(TestCase)[2] -- Time elapsed: 0.294 s <<< FAILURE! Sep 19 02:06:44 java.lang.AssertionError: Sep 19 02:06:44 Sep 19 02:06:44 Expecting code to raise a throwable. Sep 19 02:06:44 at org.apache.flink.table.planner.functions.BuiltInAggregateFunctionTestBase$ErrorTestItem.execute(BuiltInAggregateFunctionTestBase.java:607) Sep 19 02:06:44 at org.apache.flink.table.planner.functions.BuiltInAggregateFunctionTestBase$TestSpec.lambda$createTestItemExecutable$0(BuiltInAggregateFunctionTestBase.java:323) Sep 19 02:06:44 at org.apache.flink.table.planner.functions.BuiltInFunctionTestBase$TestCase.execute(BuiltInFunctionTestBase.java:119) Sep 19 02:06:44 at org.apache.flink.table.planner.functions.BuiltInAggregateFunctionTestBase.test(BuiltInAggregateFunctionTestBase.java:96) Sep 19 02:06:44 at java.lang.reflect.Method.invoke(Method.java:498) Sep 19 02:06:44 at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) Sep 19 02:06:44 at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) Sep 19 02:06:44 at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) Sep 19 02:06:44 at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) Sep 19 02:06:44 at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36317) Populate the ArchivedExecutionGraph with CheckpointStatsSnapshot data if in WaitingForResources state with a previousExecutionGraph being set
Matthias Pohl created FLINK-36317: - Summary: Populate the ArchivedExecutionGraph with CheckpointStatsSnapshot data if in WaitingForResources state with a previousExecutionGraph being set Key: FLINK-36317 URL: https://issues.apache.org/jira/browse/FLINK-36317 Project: Flink Issue Type: Improvement Components: Runtime / Coordination Affects Versions: 2.0-preview Reporter: Matthias Pohl in FLINK-36295 we noticed an issue with the WaitingForResources state that follows a restartable failure. The CheckpointStatistics are present but not exposed through the ArchivedExecutionGraph despite being available. We should think about adding these stats in {{WaitingForResources#getJob}} to have them accessible even if the job isn't running at the moment. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36302) FileSourceTextLinesITCase timed out
Matthias Pohl created FLINK-36302: - Summary: FileSourceTextLinesITCase timed out Key: FLINK-36302 URL: https://issues.apache.org/jira/browse/FLINK-36302 Project: Flink Issue Type: Bug Components: Connectors / Common Affects Versions: 2.0-preview Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62064&view=logs&j=1c002d28-a73d-5309-26ee-10036d8476b4&t=d1c117a6-8f13-5466-55f0-d48dbb767fcd&l=12386 {code} "ForkJoinPool-1-worker-1" #15 daemon prio=5 os_prio=0 tid=0x7f6c0c8b5800 nid=0xda34 waiting on condition [0x7f6bf0dfc000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xff6d4038> (a java.util.concurrent.CompletableFuture$Signaller) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707) at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3313) at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) at org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.isJobTerminated(CollectResultFetcher.java:213) at org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.next(CollectResultFetcher.java:120) at org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:126) at org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:100) at org.apache.flink.streaming.api.datastream.DataStreamUtils.collectRecordsFromUnboundedStream(DataStreamUtils.java:142) at org.apache.flink.connector.file.src.FileSourceTextLinesITCase.testContinuousTextFileSource(FileSourceTextLinesITCase.java:252) at org.apache.flink.connector.file.src.FileSourceTextLinesITCase.testContinuousTextFileSource(FileSourceTextLinesITCase.java:192) [...] {code} {code} {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36301) TPC-H end-to-end test fails due to TimeoutException
Matthias Pohl created FLINK-36301: - Summary: TPC-H end-to-end test fails due to TimeoutException Key: FLINK-36301 URL: https://issues.apache.org/jira/browse/FLINK-36301 Project: Flink Issue Type: Bug Components: Runtime / Coordination, Tests Affects Versions: 2.0-preview Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62146&view=logs&j=fb37c667-81b7-5c22-dd91-846535e99a97&t=011e961e-597c-5c96-04fe-7941c8b83f23&l=8589 The JobManager logs reveal a TimeoutException: {code} 2024-09-15 01:37:53,628 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job insert-into_default_catalog.default_database.q5 (f40185b602e2336cba7299165d7078fa) switched from state RUNNING to FAILING. org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy at org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:219) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] at org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailureAndReport(ExecutionFailureHandler.java:166) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] at org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:121) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] at org.apache.flink.runtime.scheduler.DefaultScheduler.recordTaskFailure(DefaultScheduler.java:281) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:272) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] at org.apache.flink.runtime.scheduler.adaptivebatch.AdaptiveBatchScheduler.handleTaskFailure(AdaptiveBatchScheduler.java:413) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] at org.apache.flink.runtime.scheduler.DefaultScheduler.onTaskFailed(DefaultScheduler.java:265) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] at org.apache.flink.runtime.scheduler.adaptivebatch.AdaptiveBatchScheduler.onTaskFailed(AdaptiveBatchScheduler.java:405) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] at org.apache.flink.runtime.scheduler.SchedulerBase.onTaskExecutionStateUpdate(SchedulerBase.java:800) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:777) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] at org.apache.flink.runtime.scheduler.UpdateSchedulerNgOnInternalFailuresListener.notifyTaskFailure(UpdateSchedulerNgOnInternalFailuresListener.java:51) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] at org.apache.flink.runtime.executiongraph.DefaultExecutionGraph.notifySchedulerNgAboutInternalTaskFailure(DefaultExecutionGraph.java:1675) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] at org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1190) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] at org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1130) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] at org.apache.flink.runtime.executiongraph.Execution.fail(Execution.java:831) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] at org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot.signalPayloadRelease(SingleLogicalSlot.java:195) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] at org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot.release(SingleLogicalSlot.java:182) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] at org.apache.flink.runtime.scheduler.SimpleExecutionSlotAllocator$LogicalSlotHolder.release(SimpleExecutionSlotAllocator.java:203) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] at org.apache.flink.runtime.jobmaster.slotpool.AllocatedSlot.releasePayload(AllocatedSlot.java:152) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] at org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.releasePayload(DefaultDeclarativeSlotPool.java:515) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] at org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.freeAndReleaseSlots(DefaultDeclarativeSlotPool.java:507) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] at org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.releaseSlots(DefaultDeclarativeSlotPool.java:478) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] at org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.internalReleaseTaskManager(DeclarativeSlotPoolService.java:281) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] at org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.releaseTaskManager(DeclarativeSlotPoolService.j
[jira] [Created] (FLINK-36300) TableEnvHiveConnectorITCase.testDateTimestampPartitionColumns times out
Matthias Pohl created FLINK-36300: - Summary: TableEnvHiveConnectorITCase.testDateTimestampPartitionColumns times out Key: FLINK-36300 URL: https://issues.apache.org/jira/browse/FLINK-36300 Project: Flink Issue Type: Bug Components: Connectors / Hive Affects Versions: 2.0-preview Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62146&view=logs&j=5cae8624-c7eb-5c51-92d3-4d2dacedd221&t=5acec1b4-945b-59ca-34f8-168928ce5199&l=25538 {code} "main" #1 prio=5 os_prio=0 tid=0x7f309e8a2000 nid=0x1b181 waiting on condition [0x7f30a2104000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xff529778> (a java.util.concurrent.CompletableFuture$Signaller) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707) at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) at org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.isJobTerminated(CollectResultFetcher.java:213) at org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.next(CollectResultFetcher.java:120) at org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:126) at org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:100) at org.apache.flink.table.planner.connectors.CollectDynamicSink$CloseableRowIteratorWrapper.hasNext(CollectDynamicSink.java:247) at java.util.Iterator.forEachRemaining(Iterator.java:115) at org.apache.flink.util.CollectionUtil.iteratorToList(CollectionUtil.java:133) at org.apache.flink.connectors.hive.TableEnvHiveConnectorITCase.lambda$testDateTimestampPartitionColumns$4(TableEnvHiveConnectorITCase.java:248) at org.apache.flink.connectors.hive.TableEnvHiveConnectorITCase$$Lambda$8669/2110765445.call(Unknown Source) at org.apache.flink.connectors.hive.TableEnvExecutorUtil.executeInSeparateDatabase(TableEnvExecutorUtil.java:53) at org.apache.flink.connectors.hive.TableEnvExecutorUtil.executeInSeparateDatabase(TableEnvExecutorUtil.java:30) at org.apache.flink.connectors.hive.TableEnvHiveConnectorITCase.testDateTimestampPartitionColumns(TableEnvHiveConnectorITCase.java:214) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36299) AdaptiveSchedulerTest.testStatusMetrics times out
Matthias Pohl created FLINK-36299: - Summary: AdaptiveSchedulerTest.testStatusMetrics times out Key: FLINK-36299 URL: https://issues.apache.org/jira/browse/FLINK-36299 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 2.0-preview Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62146&view=logs&j=d89de3df-4600-5585-dadc-9bbc9a5e661c&t=be5a4b15-4b23-56b1-7582-795f58a645a2&l=9849 {code} Sep 15 02:28:22 "ForkJoinPool-495-worker-25" #9352 daemon prio=5 os_prio=0 tid=0x7fcdde409000 nid=0x77f4 waiting on condition [0x7fcd5c52c000] Sep 15 02:28:22java.lang.Thread.State: WAITING (parking) Sep 15 02:28:22 at sun.misc.Unsafe.park(Native Method) Sep 15 02:28:22 - parking to wait for <0xf8d7d0b8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) Sep 15 02:28:22 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) Sep 15 02:28:22 at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) Sep 15 02:28:22 at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403) Sep 15 02:28:22 at org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerTest$SubmissionBufferingTaskManagerGateway.waitForSubmissions(AdaptiveSchedulerTest.java:2593) Sep 15 02:28:22 at org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerTest.testStatusMetrics(AdaptiveSchedulerTest.java:732) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36298) NullPointerException in Calcite causes a PyFlink test failure
Matthias Pohl created FLINK-36298: - Summary: NullPointerException in Calcite causes a PyFlink test failure Key: FLINK-36298 URL: https://issues.apache.org/jira/browse/FLINK-36298 Project: Flink Issue Type: Bug Components: Table SQL / API Affects Versions: 2.0-preview Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62146&view=logs&j=b53e1644-5cb4-5a3b-5d48-f523f39bcf06&t=b68c9f5c-04c9-5c75-3862-a3a27aabbce3&l=25458 {code} java.lang.NullPointerException: metadataHandlerProvider Sep 15 03:14:04 E at java.base/java.util.Objects.requireNonNull(Objects.java:235) Sep 15 03:14:04 E at org.apache.calcite.rel.metadata.RelMetadataQueryBase.getMetadataHandlerProvider(RelMetadataQueryBase.java:122) Sep 15 03:14:04 E at org.apache.calcite.rel.metadata.RelMetadataQueryBase.revise(RelMetadataQueryBase.java:118) Sep 15 03:14:04 E at org.apache.calcite.rel.metadata.RelMetadataQuery.getNonCumulativeCost(RelMetadataQuery.java:333) Sep 15 03:14:04 E at org.apache.calcite.plan.volcano.VolcanoPlanner.getCost(VolcanoPlanner.java:727) Sep 15 03:14:04 E at org.apache.calcite.plan.volcano.VolcanoPlanner.getCostOrInfinite(VolcanoPlanner.java:714) Sep 15 03:14:04 E at org.apache.calcite.plan.volcano.VolcanoPlanner.propagateCostImprovements(VolcanoPlanner.java:971) Sep 15 03:14:04 E at org.apache.calcite.plan.volcano.VolcanoPlanner.addRelToSet(VolcanoPlanner.java:1408) Sep 15 03:14:04 E at org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1368) Sep 15 03:14:04 E at org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:598) Sep 15 03:14:04 E at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:613) Sep 15 03:14:04 E at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:95) Sep 15 03:14:04 E at org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:274) Sep 15 03:14:04 E at org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1270) Sep 15 03:14:04 E at org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:598) Sep 15 03:14:04 E at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:613) Sep 15 03:14:04 E at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:95) Sep 15 03:14:04 E at org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:274) Sep 15 03:14:04 E at org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1270) Sep 15 03:14:04 E at org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:598) Sep 15 03:14:04 E at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:613) Sep 15 03:14:04 E at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:95) Sep 15 03:14:04 E at org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:274) Sep 15 03:14:04 E at org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1270) Sep 15 03:14:04 E at org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:598) Sep 15 03:14:04 E at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:613) Sep 15 03:14:04 E at org.apache.calcite.plan.volcano.VolcanoPlanner.changeTraits(VolcanoPlanner.java:498) Sep 15 03:14:04 E at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:315) Sep 15 03:14:04 E at org.apache.flink.table.planner.plan.optimize.program.FlinkVolcanoProgram.optimize(FlinkVolcanoProgram.scala:62) Sep 15 03:14:04 E at org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram.$anonfun$optimize$1(FlinkChainedProgram.scala:59) [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36297) SIGSEGV caused CI failure
Matthias Pohl created FLINK-36297: - Summary: SIGSEGV caused CI failure Key: FLINK-36297 URL: https://issues.apache.org/jira/browse/FLINK-36297 Project: Flink Issue Type: Bug Components: Runtime / State Backends Affects Versions: 2.0-preview Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62146&view=logs&j=0e7be18f-84f2-53f0-a32d-4a5e4a174679&t=7c1d86e3-35bd-5fd5-3b7c-30c126a78702&l=11876 {code} Sep 15 02:53:42 02:53:42.535 [WARNING] Tests run: 145, Failures: 0, Errors: 0, Skipped: 13, Time elapsed: 3.133 s -- in org.apache.flink.state.changelog.ChangelogDelegateFileStateBackendTest Sep 15 02:53:44 02:53:44.620 [WARNING] Tests run: 96, Failures: 0, Errors: 0, Skipped: 9, Time elapsed: 9.998 s -- in org.apache.flink.state.changelog.ChangelogStateBackendMigrationTest Sep 15 02:57:58 # Sep 15 02:57:58 # A fatal error has been detected by the Java Runtime Environment: Sep 15 02:57:58 # Sep 15 02:57:58 # SIGSEGV (0xb) at pc=0x7f7c539b7c84, pid=21641, tid=0x7f7c549ff700 Sep 15 02:57:58 # Sep 15 02:57:58 # JRE version: OpenJDK Runtime Environment (8.0_292-b10) (build 1.8.0_292-8u292-b10-0ubuntu1~16.04.1-b10) Sep 15 02:57:58 # Java VM: OpenJDK 64-Bit Server VM (25.292-b10 mixed mode linux-amd64 compressed oops) Sep 15 02:57:58 # Problematic frame: Sep 15 02:57:58 # C [librocksdbjni-linux64.so+0x31bc84] Java_org_rocksdb_WriteBatch_getDataSize+0x4 Sep 15 02:57:58 # Sep 15 02:57:58 # Core dump written. Default location: /__w/1/s/flink-state-backends/flink-statebackend-changelog/core or core.21641 Sep 15 02:57:58 # Sep 15 02:57:58 # An error report file with more information is saved as: Sep 15 02:57:58 # /__w/1/s/flink-state-backends/flink-statebackend-changelog/hs_err_pid21641.log Sep 15 02:57:58 Compiled method (nm) 265136 7875 n 0 org.rocksdb.WriteBatch::getDataSize (native) Sep 15 02:57:58 total in heap [0x7f7c8dff9e50,0x7f7c8dffa1a0] = 848 Sep 15 02:57:58 relocation [0x7f7c8dff9f78,0x7f7c8dff9fc0] = 72 Sep 15 02:57:58 main code [0x7f7c8dff9fc0,0x7f7c8dffa198] = 472 Sep 15 02:57:58 oops [0x7f7c8dffa198,0x7f7c8dffa1a0] = 8 Sep 15 02:57:58 Compiled method (c1) 265136 7876 3 org.apache.flink.contrib.streaming.state.RocksDBWriteBatchWrapper::ensureNotCancelled (27 bytes) Sep 15 02:57:58 total in heap [0x7f7c8dff9490,0x7f7c8dff9a88] = 1528 Sep 15 02:57:58 relocation [0x7f7c8dff95b8,0x7f7c8dff9618] = 96 Sep 15 02:57:58 main code [0x7f7c8dff9620,0x7f7c8dff9880] = 608 Sep 15 02:57:58 stub code [0x7f7c8dff9880,0x7f7c8dff9938] = 184 Sep 15 02:57:58 oops [0x7f7c8dff9938,0x7f7c8dff9940] = 8 Sep 15 02:57:58 metadata [0x7f7c8dff9940,0x7f7c8dff9958] = 24 Sep 15 02:57:58 scopes data[0x7f7c8dff9958,0x7f7c8dff99c8] = 112 Sep 15 02:57:58 scopes pcs [0x7f7c8dff99c8,0x7f7c8dff9a68] = 160 Sep 15 02:57:58 dependencies [0x7f7c8dff9a68,0x7f7c8dff9a70] = 8 Sep 15 02:57:58 nul chk table [0x7f7c8dff9a70,0x7f7c8dff9a88] = 24 Sep 15 02:57:58 # Sep 15 02:57:58 # If you would like to submit a bug report, please visit: Sep 15 02:57:58 # http://bugreport.java.com/bugreport/crash.jsp Sep 15 02:57:58 # The crash happened outside the Java Virtual Machine in native code. Sep 15 02:57:58 # See problematic frame for where to report the bug. Sep 15 02:57:58 # Aborted (core dumped) {code} with 134 exit code: {code} Sep 15 02:57:59 02:57:59.692 [ERROR] Process Exit Code: 134 Sep 15 02:57:59 02:57:59.692 [ERROR] Crashed tests: Sep 15 02:57:59 02:57:59.692 [ERROR] org.apache.flink.state.changelog.ChangelogDelegateEmbeddedRocksDBStateBackendTest Sep 15 02:57:59 02:57:59.692 [ERROR]at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456) Sep 15 02:57:59 02:57:59.692 [ERROR]at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkOnceMultiple(ForkStarter.java:358) Sep 15 02:57:59 02:57:59.692 [ERROR]at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:296) Sep 15 02:57:59 02:57:59.692 [ERROR]at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:250) Sep 15 02:57:59 02:57:59.692 [ERROR]at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1240) Sep 15 02:57:59 02:57:59.692 [ERROR]at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1089) Sep 15 02:57:59 02:57:59.692 [ERROR]at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:905) Sep 15 02:57:59 02:57:59.692 [ERROR]at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPlugin
[jira] [Created] (FLINK-36295) AdaptiveSchedulerClusterITCase. testCheckpointStatsPersistedAcrossRescale failed with
Matthias Pohl created FLINK-36295: - Summary: AdaptiveSchedulerClusterITCase. testCheckpointStatsPersistedAcrossRescale failed with Key: FLINK-36295 URL: https://issues.apache.org/jira/browse/FLINK-36295 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 2.0-preview Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62156&view=logs&j=675bf62c-8558-587e-2555-dcad13acefb5&t=5878eed3-cc1e-5b12-1ed0-9e7139ce0992&l=10234 {code} Sep 16 03:06:30 03:06:30.168 [ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 5.275 s <<< FAILURE! -- in org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerClusterITCase Sep 16 03:06:30 03:06:30.168 [ERROR] org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerClusterITCase.testCheckpointStatsPersistedAcrossRescale -- Time elapsed: 0.676 s <<< ERROR! Sep 16 03:06:30 java.lang.IndexOutOfBoundsException: Index: -1 Sep 16 03:06:30 at java.base/java.util.Collections$EmptyList.get(Collections.java:4586) Sep 16 03:06:30 at org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerClusterITCase.testCheckpointStatsPersistedAcrossRescale(AdaptiveSchedulerClusterITCase.java:214) Sep 16 03:06:30 at java.base/java.lang.reflect.Method.invoke(Method.java:568) Sep 16 03:06:30 at java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:194) Sep 16 03:06:30 at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) Sep 16 03:06:30 at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) Sep 16 03:06:30 at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) Sep 16 03:06:30 at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) Sep 16 03:06:30 at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165) Sep 16 03:06:30 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36294) table stage failed with general junit5 TestEngine failure
Matthias Pohl created FLINK-36294: - Summary: table stage failed with general junit5 TestEngine failure Key: FLINK-36294 URL: https://issues.apache.org/jira/browse/FLINK-36294 Project: Flink Issue Type: Bug Components: Build System / CI Affects Versions: 2.0-preview Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62156&view=logs&j=a9db68b9-a7e0-54b6-0f98-010e0aff39e2&t=cdd32e0b-6047-565b-c58f-14054472f1be&l=12216 {code} Sep 16 04:19:46 04:19:46.181 [ERROR] Errors: Sep 16 04:19:46 04:19:46.182 [ERROR] TestEngine with ID 'junit-jupiter' failed to execute tests {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36293) RocksDBWriteBatchWrapperTest.testAsyncCancellation
Matthias Pohl created FLINK-36293: - Summary: RocksDBWriteBatchWrapperTest.testAsyncCancellation Key: FLINK-36293 URL: https://issues.apache.org/jira/browse/FLINK-36293 Project: Flink Issue Type: Bug Components: Runtime / State Backends Affects Versions: 2.0-preview Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62156&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=11508 {code} Sep 16 02:20:08 02:20:08.194 [ERROR] Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.724 s <<< FAILURE! -- in org.apache.flink.contrib.streaming.state.RocksDBWriteBatchWrapperTest Sep 16 02:20:08 02:20:08.194 [ERROR] org.apache.flink.contrib.streaming.state.RocksDBWriteBatchWrapperTest.testAsyncCancellation -- Time elapsed: 0.121 s <<< ERROR! Sep 16 02:20:08 java.lang.Exception: Unexpected exception, expected but was Sep 16 02:20:08 Caused by: java.lang.AssertionError: Sep 16 02:20:08 Expecting actual: Sep 16 02:20:08 2 Sep 16 02:20:08 to be less than: Sep 16 02:20:08 2 Sep 16 02:20:08 at org.apache.flink.contrib.streaming.state.RocksDBWriteBatchWrapperTest.testAsyncCancellation(RocksDBWriteBatchWrapperTest.java:98) Sep 16 02:20:08 at java.lang.reflect.Method.invoke(Method.java:498) Sep 16 02:20:08 Suppressed: org.apache.flink.runtime.execution.CancelTaskException Sep 16 02:20:08 at org.apache.flink.contrib.streaming.state.RocksDBWriteBatchWrapper.ensureNotCancelled(RocksDBWriteBatchWrapper.java:199) Sep 16 02:20:08 at org.apache.flink.contrib.streaming.state.RocksDBWriteBatchWrapper.close(RocksDBWriteBatchWrapper.java:188) Sep 16 02:20:08 at org.apache.flink.contrib.streaming.state.RocksDBWriteBatchWrapperTest.testAsyncCancellation(RocksDBWriteBatchWrapperTest.java:100) Sep 16 02:20:08 ... 1 more {code} This test was added FLINK-35580 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36292) SplitFetcherManagerTest.testCloseCleansUpPreviouslyClosedFetcher times out
Matthias Pohl created FLINK-36292: - Summary: SplitFetcherManagerTest.testCloseCleansUpPreviouslyClosedFetcher times out Key: FLINK-36292 URL: https://issues.apache.org/jira/browse/FLINK-36292 Project: Flink Issue Type: Bug Components: Connectors / Common Affects Versions: 2.0-preview Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62173&view=logs&j=b6f8a893-8f59-51d5-fe28-fb56a8b0932c&t=095f1730-efbe-5303-c4a3-b5e3696fc4e2&l=10914 {code} Sep 17 01:15:16 01:15:16.318 [ERROR] Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 32.65 s <<< FAILURE! -- in org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManagerTest Sep 17 01:15:16 01:15:16.318 [ERROR] org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManagerTest.testCloseCleansUpPreviouslyClosedFetcher -- Time elapsed: 30.02 s <<< ERROR! Sep 17 01:15:16 org.junit.runners.model.TestTimedOutException: test timed out after 3 milliseconds Sep 17 01:15:16 at sun.misc.Unsafe.park(Native Method) Sep 17 01:15:16 at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) Sep 17 01:15:16 at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) Sep 17 01:15:16 at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1475) Sep 17 01:15:16 at org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager.close(SplitFetcherManager.java:344) Sep 17 01:15:16 at org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManagerTest.testCloseCleansUpPreviouslyClosedFetcher(SplitFetcherManagerTest.java:97) Sep 17 01:15:16 at java.lang.reflect.Method.invoke(Method.java:498) Sep 17 01:15:16 at java.util.concurrent.FutureTask.run(FutureTask.java:266) Sep 17 01:15:16 at java.lang.Thread.run(Thread.java:748) {code} The test was added by FLINK-35924 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36291) java.lang.IllegalMonitorStateException causing a fatal error on the TaskManager side
Matthias Pohl created FLINK-36291: - Summary: java.lang.IllegalMonitorStateException causing a fatal error on the TaskManager side Key: FLINK-36291 URL: https://issues.apache.org/jira/browse/FLINK-36291 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 2.0-preview Reporter: Matthias Pohl HiveDynamicPartitionPruningITCase failed due to the TM timeout. Checking the logs though revealed a fatal error on the taskmanager's side: https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62173&view=logs&j=5cae8624-c7eb-5c51-92d3-4d2dacedd221&t=5acec1b4-945b-59ca-34f8-168928ce5199&l=24046 {code} 03:18:32,209 [taskmanager_72-main-scheduler-thread-1] ERROR org.apache.flink.util.FatalExitExceptionHandler [] - FATAL: Thread 'taskmanager_72-main-scheduler-thread-1' produced an uncaught exception. Stopping the process... java.lang.IllegalMonitorStateException: null at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.signal(AbstractQueuedSynchronizer.java:1939) ~[?:1.8.0_292] at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1103) ~[?:1.8.0_292] at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) ~[?:1.8.0_292] at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) ~[?:1.8.0_292] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) ~[?:1.8.0_292] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_292] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292] {code} But there's also a OutOfMemoryError reported just a line below: {code} 03:19:01,060 [Source Data Fetcher for Source: part[62] (2/2)#0] ERROR org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager [] - Received uncaught exception. java.lang.OutOfMemoryError: Java heap space {code} So that might be related to FLINK-36290 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36290) OutOfMemoryError in connect test run
Matthias Pohl created FLINK-36290: - Summary: OutOfMemoryError in connect test run Key: FLINK-36290 URL: https://issues.apache.org/jira/browse/FLINK-36290 Project: Flink Issue Type: Bug Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Tests Affects Versions: 2.0-preview Reporter: Matthias Pohl We saw a OOM in the connect stage that's caused a fatal error: https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62173&view=logs&j=1c002d28-a73d-5309-26ee-10036d8476b4&t=d1c117a6-8f13-5466-55f0-d48dbb767fcd&l=12182 {code} 03:19:59,975 [ flink-scheduler-1] ERROR org.apache.flink.util.FatalExitExceptionHandler [] - FATAL: Thread 'flink-scheduler-1' produced an uncaught exception. Stopping the process... java.lang.OutOfMemoryError: Java heap space [...] 03:19:59,981 [jobmanager_62-main-scheduler-thread-1] ERROR org.apache.flink.util.FatalExitExceptionHandler [] - FATAL: Thread 'jobmanager_62-main-scheduler-thread-1' produced an uncaught exception. Stopping the process... java.lang.OutOfMemoryError: Java heap space [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36279) RescaleOnCheckpointITCase.testRescaleOnCheckpoint fails
Matthias Pohl created FLINK-36279: - Summary: RescaleOnCheckpointITCase.testRescaleOnCheckpoint fails Key: FLINK-36279 URL: https://issues.apache.org/jira/browse/FLINK-36279 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 2.0-preview Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62105&view=logs&j=5c8e7682-d68f-54d1-16a2-a09310218a49&t=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba&l=11287 {code} Sep 13 17:16:55 "ForkJoinPool-1-worker-25" #28 daemon prio=5 os_prio=0 tid=0x7f973f0c2800 nid=0x31a1 waiting on condition [0x7f97089fc000] Sep 13 17:16:55java.lang.Thread.State: TIMED_WAITING (sleeping) Sep 13 17:16:55 at java.lang.Thread.sleep(Native Method) Sep 13 17:16:55 at org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:152) Sep 13 17:16:55 at org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:145) Sep 13 17:16:55 at org.apache.flink.test.scheduling.UpdateJobResourceRequirementsITCase.waitForRunningTasks(UpdateJobResourceRequirementsITCase.java:219) Sep 13 17:16:55 at org.apache.flink.test.scheduling.RescaleOnCheckpointITCase.testRescaleOnCheckpoint(RescaleOnCheckpointITCase.java:139) Sep 13 17:16:55 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Sep 13 17:16:55 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36272) YarnFileStageTestS3ITCase fails on master
Matthias Pohl created FLINK-36272: - Summary: YarnFileStageTestS3ITCase fails on master Key: FLINK-36272 URL: https://issues.apache.org/jira/browse/FLINK-36272 Project: Flink Issue Type: Bug Components: Deployment / YARN, Tests Affects Versions: 2.0.0 Reporter: Matthias Pohl The issue was introduced by FLINK-34085 where the test failure wasn't discovered because the test didn't run (see [logs|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=61954&view=logs&j=fc5181b0-e452-5c8f-68de-1097947f6483&t=995c650b-6573-581c-9ce6-7ad4cc038461&l=28206]). I would suspect that this is due to the fact that we're not enabling S3 in PR CI. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36207) Disabling japicmp plugin for deprecated APIs
Matthias Pohl created FLINK-36207: - Summary: Disabling japicmp plugin for deprecated APIs Key: FLINK-36207 URL: https://issues.apache.org/jira/browse/FLINK-36207 Project: Flink Issue Type: Improvement Components: Build System Affects Versions: 2.0.0 Reporter: Matthias Pohl The Apache Flink 2.0 release allows for the removal of public API. The japicmp plugin usually checks for these kind of changes. To avoid adding explicit excludes for each change, this Jira issue suggest to disable the API check for APIs that are marked as deprecated. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36194) Shutdown hook for ExecutionGraphInfo store runs concurrently to cluster shutdown hook causing race conditions
Matthias Pohl created FLINK-36194: - Summary: Shutdown hook for ExecutionGraphInfo store runs concurrently to cluster shutdown hook causing race conditions Key: FLINK-36194 URL: https://issues.apache.org/jira/browse/FLINK-36194 Project: Flink Issue Type: Technical Debt Components: Runtime / Coordination Affects Versions: 1.19.1, 1.20.0, 2.0.0 Reporter: Matthias Pohl There is an {{FileNotFoundException}} being logged when shutting down the cluster with currently running jobs: {code} /tmp/executionGraphStore-b2cb1190-2c4d-4021-a73d-8b15027860df/8f6abf294a46345d331590890f7e7c37 (No such file or directory) java.io.FileNotFoundException: /tmp/executionGraphStore-b2cb1190-2c4d-4021-a73d-8b15027860df/8f6abf294a46345d331590890f7e7c37 (No such file or directory) at java.base/java.io.FileOutputStream.open0(Native Method) at java.base/java.io.FileOutputStream.open(FileOutputStream.java:298) at java.base/java.io.FileOutputStream.(FileOutputStream.java:237) at java.base/java.io.FileOutputStream.(FileOutputStream.java:187) at org.apache.flink.runtime.dispatcher.FileExecutionGraphInfoStore.storeExecutionGraphInfo(FileExecutionGraphInfoStore.java:281) at org.apache.flink.runtime.dispatcher.FileExecutionGraphInfoStore.put(FileExecutionGraphInfoStore.java:203) at org.apache.flink.runtime.dispatcher.Dispatcher.writeToExecutionGraphInfoStore(Dispatcher.java:1427) at org.apache.flink.runtime.dispatcher.Dispatcher.jobReachedTerminalState(Dispatcher.java:1357) at org.apache.flink.runtime.dispatcher.Dispatcher.handleJobManagerRunnerResult(Dispatcher.java:750) at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$runJob$6(Dispatcher.java:700) at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930) at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907) at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) [...] {code} This is caused by concurrent shutdown logic being triggered through the {{FileExecutionGraphInfoStore}} shutdown hook. The shutdown hook calls close on the store which will delete its temporary directory. The concurrently performed cluster shutdown will try to suspend all running jobs. The JobManagerRunners are trying to write their {{ExecutionGraphInfo}} to the store which fails (because the temporary folder is deleted). This doesn't have any impact because the JobManager goes away, anyway. But the log message is confusing the the shutdown hook is (IMHO) not needed. Instead, the {{ExecutionGraphInfoStore}}'s close logic should be called by the {{ClusterEntrypoint}} shutdown gracefully. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36168) AdaptiveSchedulerTest doesn't follow the production lifecycle
Matthias Pohl created FLINK-36168: - Summary: AdaptiveSchedulerTest doesn't follow the production lifecycle Key: FLINK-36168 URL: https://issues.apache.org/jira/browse/FLINK-36168 Project: Flink Issue Type: Sub-task Reporter: Matthias Pohl The {{AdaptiveSchedulerTest}} doesn't follow the production lifecycle properly: The executor representing the main thread is shutting down before the AdaptiveScheduler is closed (or more precisely, the scheduler isn't closed at all in most of the tests). This can cause issues when shutting down the executor and tasks still being scheduled and not properly cleaned up. This issue is about fixing the test in this regards. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36147) Removes deprecated location field
Matthias Pohl created FLINK-36147: - Summary: Removes deprecated location field Key: FLINK-36147 URL: https://issues.apache.org/jira/browse/FLINK-36147 Project: Flink Issue Type: Technical Debt Components: Runtime / Coordination Affects Versions: 2.0.0 Reporter: Matthias Pohl Assignee: Matthias Pohl Fix For: 2.0.0 FLINK-33147 introduce a new endpoint field and deprecated the corresponding location field in 1.19. This is issue is about removing the deprecated field. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36099) JobIDLoggingITCase fails due to "Cannot find task to fail for execution [...]" info log message in TM logs
Matthias Pohl created FLINK-36099: - Summary: JobIDLoggingITCase fails due to "Cannot find task to fail for execution [...]" info log message in TM logs Key: FLINK-36099 URL: https://issues.apache.org/jira/browse/FLINK-36099 Project: Flink Issue Type: Bug Components: Runtime / Task Affects Versions: 1.19.1, 1.20.0, 1.18.1, 2.0.0 Reporter: Matthias Pohl {{JobIDLoggingITCase}} can fail (observed with the {{AdaptiveScheduler}} enabled): {code} Test org.apache.flink.test.misc.JobIDLoggingITCase.testJobIDLogging[testJobIDLogging(ClusterClient, Path, MiniCluster)] failed with: java.lang.AssertionError: [too many events without Job ID logged by org.apache.flink.runtime.taskexecutor.TaskExecutor] Expecting empty but was: [Logger=org.apache.flink.runtime.taskexecutor.TaskExecutor Level=INFO Message=Cannot find task to fail for execution 5447dca7a6c7f9679346cad41dc8e3be_cbc357ccb763df2852fee8c4fc7d55f2_0_0 with exception:] at org.apache.flink.test.misc.JobIDLoggingITCase.assertJobIDPresent(JobIDLoggingITCase.java:267) at org.apache.flink.test.misc.JobIDLoggingITCase.testJobIDLogging(JobIDLoggingITCase.java:155) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727) at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156) at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147) at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86) at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103) at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93) at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92) at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:217) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:213) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:138) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:68) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141) at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138) at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95) at org.junit.platform.engine.support.hierarchical.ForkJoinPoolHierarchicalTestExecutorService$ExclusiveTask.comp
[jira] [Created] (FLINK-35748) DeduplicateITCase.testLastRowWithoutAllChangelogOnRowtime with MiniBatch mode and RocksDB backend enabled
Matthias Pohl created FLINK-35748: - Summary: DeduplicateITCase.testLastRowWithoutAllChangelogOnRowtime with MiniBatch mode and RocksDB backend enabled Key: FLINK-35748 URL: https://issues.apache.org/jira/browse/FLINK-35748 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 1.20.0 Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=60613&view=logs&j=0c940707-2659-5648-cbe6-a1ad63045f0a&t=075c2716-8010-5565-fe08-3c4bb45824a4&l=12259 {code} Jul 02 14:44:36 14:44:36.737 [ERROR] Tests run: 40, Failures: 1, Errors: 0, Skipped: 4, Time elapsed: 18.45 s <<< FAILURE! -- in org.apache.flink.table.planner.runtime.stream.sql.DeduplicateITCase Jul 02 14:44:36 14:44:36.737 [ERROR] org.apache.flink.table.planner.runtime.stream.sql.DeduplicateITCase.testLastRowWithoutAllChangelogOnRowtime -- Time elapsed: 0.860 s <<< FAILURE! Jul 02 14:44:36 org.opentest4j.AssertionFailedError: Jul 02 14:44:36 Jul 02 14:44:36 expected: List(+I(1,1,Hi,1970-01-01T00:00:00.001), +I(1,2,Hello world,1970-01-01T00:00:00.002), +I(2,3,I am fine.,1970-01-01T00:00:00.003), +I(2,6,Comment#1,1970-01-01T00:00:00.006), +I(3,5,Comment#2,1970-01-01T00:00:00.005), +I(4,4,Comment#3,1970-01-01T00:00:00.004)) Jul 02 14:44:36 but was: ArrayBuffer(+I(1,1,Hi,1970-01-01T00:00:00.001), +I(1,2,Hello world,1970-01-01T00:00:00.002), +I(1,3,Hello,1970-01-01T00:00:00.003), +I(2,6,Comment#1,1970-01-01T00:00:00.006), +I(3,5,Comment#2,1970-01-01T00:00:00.005), +I(4,4,Comment#3,1970-01-01T00:00:00.004), +U(2,3,I am fine.,1970-01-01T00:00:00.003), -U(1,3,Hello,1970-01-01T00:00:00.003)) Jul 02 14:44:36 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) Jul 02 14:44:36 at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) Jul 02 14:44:36 at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) Jul 02 14:44:36 at org.apache.flink.table.planner.runtime.stream.sql.DeduplicateITCase.testLastRowWithoutAllChangelogOnRowtime(DeduplicateITCase.scala:364) Jul 02 14:44:36 at java.lang.reflect.Method.invoke(Method.java:498) [...] {code} The test failure appeared in a CI run for FLINK-35553. Which does some changes to how checkpointing is triggered. I checked the logs and couldn't find any evidence that the test run included the FLINK-35553 change (no restoring from checkpoint happens in the failed and successful of the test; see attached logs). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35729) HiveITCase.testReadWriteHive
Matthias Pohl created FLINK-35729: - Summary: HiveITCase.testReadWriteHive Key: FLINK-35729 URL: https://issues.apache.org/jira/browse/FLINK-35729 Project: Flink Issue Type: Bug Components: Connectors / Hive Affects Versions: 1.20.0 Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=60534&view=logs&j=6e8542d7-de38-5a33-4aca-458d6c87066d&t=10d6732b-d79a-5c68-62a5-668516de5313&l=16589 {code} Jun 28 04:35:00 04:35:00.945 [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 249.7 s <<< FAILURE! -- in org.apache.flink.tests.hive.HiveITCase Jun 28 04:35:00 04:35:00.945 [ERROR] org.apache.flink.tests.hive.HiveITCase.testReadWriteHive -- Time elapsed: 165.8 s <<< ERROR! Jun 28 04:35:00 java.io.IOException: Process failed due to timeout. Jun 28 04:35:00 at org.apache.flink.tests.util.AutoClosableProcess$AutoClosableProcessBuilder.runBlocking(AutoClosableProcess.java:145) Jun 28 04:35:00 at org.apache.flink.tests.util.flink.FlinkDistribution.submitSQLJobWithSQLClient(FlinkDistribution.java:342) Jun 28 04:35:00 at org.apache.flink.tests.util.flink.FlinkDistribution.submitSQLJob(FlinkDistribution.java:273) Jun 28 04:35:00 at org.apache.flink.tests.util.flink.LocalStandaloneFlinkResource$StandaloneClusterController.submitSQLJob(LocalStandaloneFlinkResource.java:241) Jun 28 04:35:00 at org.apache.flink.tests.hive.HiveITCase.executeSqlStatements(HiveITCase.java:231) Jun 28 04:35:00 at org.apache.flink.tests.hive.HiveITCase.runAndCheckSQL(HiveITCase.java:157) Jun 28 04:35:00 at org.apache.flink.tests.hive.HiveITCase.testReadWriteHive(HiveITCase.java:121) Jun 28 04:35:00 at java.base/java.lang.reflect.Method.invoke(Method.java:566) Jun 28 04:35:00 at org.apache.flink.util.ExternalResource$1.evaluate(ExternalResource.java:48) Jun 28 04:35:00 at org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) Jun 28 04:35:00 at org.testcontainers.containers.FailureDetectingExternalResource$1.evaluate(FailureDetectingExternalResource.java:29) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35728) PyFlink end-to-end test because miniconda couldn't be downloaded
Matthias Pohl created FLINK-35728: - Summary: PyFlink end-to-end test because miniconda couldn't be downloaded Key: FLINK-35728 URL: https://issues.apache.org/jira/browse/FLINK-35728 Project: Flink Issue Type: Bug Components: Test Infrastructure Affects Versions: 1.19.1, 1.18.1, 2.0.0, 1.20.0 Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=60533&view=logs&j=87489130-75dc-54e4-1f45-80c30aa367a3&t=efbee0b1-38ac-597d-6466-1ea8fc908c50&l=8931 {code} Jun 28 02:16:31 Detected machine: x86_64 Jun 28 02:16:31 download miniconda from https://repo.continuum.io/miniconda/Miniconda3-py310_23.5.2-0-Linux-x86_64.sh... Jun 28 02:16:32 Download failed.You can try again Jun 28 02:16:34 No taskexecutor daemon to stop on host fv-az43-235. Jun 28 02:16:36 No standalonesession daemon to stop on host fv-az43-235. Jun 28 02:16:36 [FAIL] Test script contains errors. Jun 28 02:16:36 Checking of logs skipped. Jun 28 02:16:36 Jun 28 02:16:36 [FAIL] 'PyFlink end-to-end test' failed after 0 minutes and 6 seconds! Test exited with exit code 1 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35727) "Run kubernetes pyflink application test" failed due to access denied issue
Matthias Pohl created FLINK-35727: - Summary: "Run kubernetes pyflink application test" failed due to access denied issue Key: FLINK-35727 URL: https://issues.apache.org/jira/browse/FLINK-35727 Project: Flink Issue Type: Bug Components: Test Infrastructure Affects Versions: 1.19.1, 1.18.1, 2.0.0, 1.20.0 Reporter: Matthias Pohl {{Run kubernetes pyflink application test}} fails due to some permission issue: {code} Jun 28 10:46:15 Volumes: Jun 28 10:46:15user-artifacts-volume: Jun 28 10:46:15 Normal Scheduled 61sdefault-scheduler Successfully assigned default/flink-native-k8s-pyflink-application-1-55b44fdbff-jscks to fv-az86-828 Jun 28 10:46:15 Normal Pulling20s (x3 over 60s) kubelet Pulling image "test_kubernetes_pyflink_application" Jun 28 10:46:15 Warning Failed 19s (x3 over 59s) kubelet Failed to pull image "test_kubernetes_pyflink_application": rpc error: code = Unknown desc = Error response from daemon: pull access denied for test_kubernetes_pyflink_application, repository does not exist or may require 'docker login': denied: requested access to the resource is denied Jun 28 10:46:15 Warning Failed 19s (x3 over 59s) kubelet Error: ErrImagePull Jun 28 10:46:15 Normal BackOff7s (x4 over 59s) kubelet Back-off pulling image "test_kubernetes_pyflink_application" Jun 28 10:46:15 Warning Failed 7s (x4 over 59s) kubelet Error: ImagePullBackOff {code} https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=60538&view=logs&j=bea52777-eaf8-5663-8482-18fbc3630e81&t=43ba8ce7-ebbf-57cd-9163-444305d74117&l=10846 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35722) CoordinatorEventsToStreamOperatorRecipientExactlyOnceITCase.testCheckpoint fails because of missed operator event
Matthias Pohl created FLINK-35722: - Summary: CoordinatorEventsToStreamOperatorRecipientExactlyOnceITCase.testCheckpoint fails because of missed operator event Key: FLINK-35722 URL: https://issues.apache.org/jira/browse/FLINK-35722 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 2.0.0, 1.20.0 Reporter: Matthias Pohl A test instability in {{CoordinatorEventsToStreamOperatorRecipientExactlyOnceITCase.testCheckpoint}} was observed where an expected {{OperatorEvent}} was missed: {code:java} Test org.apache.flink.streaming.runtime.tasks.CoordinatorEventsToStreamOperatorRecipientExactlyOnceITCase.testCheckpoint[testCheckpoint()] failed with: java.lang.AssertionError: Expecting actual: Â [0, Â Â 1, Â Â 3, Â Â 4, [...] Â Â 98, Â Â 99] to contain exactly (and in same order): Â [0, Â Â 1, Â Â 2, Â Â 3, Â Â 4, [...] but could not find the following elements: Â [2]Â Â Â Â at org.apache.flink.runtime.operators.coordination.CoordinatorEventsExactlyOnceITCase.checkListContainsSequence(CoordinatorEventsExactlyOnceITCase.java:175) Â Â Â Â at org.apache.flink.streaming.runtime.tasks.CoordinatorEventsToStreamOperatorRecipientExactlyOnceITCase.executeAndVerifyResults(CoordinatorEventsToStreamOperatorRecipientExactlyOnceITCase.java:178) Â Â Â Â at org.apache.flink.streaming.runtime.tasks.CoordinatorEventsToStreamOperatorRecipientExactlyOnceITCase.testCheckpoint(CoordinatorEventsToStreamOperatorRecipientExactlyOnceITCase.java:124) Â Â Â Â at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) {code} The [build failure|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=60530&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8] happened on commit [2e853ce39a|https://github.com/flink-ci/flink/commit/2e853ce39aa2db8212402de3dcc0f049397887fd] for FLINK-35552. I attached the logs for further investigation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35553) Integrate newly added trigger interface with checkpointing
Matthias Pohl created FLINK-35553: - Summary: Integrate newly added trigger interface with checkpointing Key: FLINK-35553 URL: https://issues.apache.org/jira/browse/FLINK-35553 Project: Flink Issue Type: Sub-task Components: Runtime / Checkpointing, Runtime / Coordination Reporter: Matthias Pohl This connects the newly introduced trigger logic (FLINK-35551) with the newly added checkpoint lifecycle listening feature (FLINK-35552). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35552) Move CheckpointStatsTracker out of ExecutionGraph into Scheduler
Matthias Pohl created FLINK-35552: - Summary: Move CheckpointStatsTracker out of ExecutionGraph into Scheduler Key: FLINK-35552 URL: https://issues.apache.org/jira/browse/FLINK-35552 Project: Flink Issue Type: Sub-task Components: Runtime / Checkpointing, Runtime / Coordination Reporter: Matthias Pohl The scheduler needs to know about the CheckpointStatsTracker to allow listening to checkpoint failures and completion. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35551) Introduces RescaleManager#onTrigger endpoint
Matthias Pohl created FLINK-35551: - Summary: Introduces RescaleManager#onTrigger endpoint Key: FLINK-35551 URL: https://issues.apache.org/jira/browse/FLINK-35551 Project: Flink Issue Type: Sub-task Reporter: Matthias Pohl The new endpoint would allow use from separating observing change events from actually triggering the rescale operation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35550) Introduce new component RescaleManager
Matthias Pohl created FLINK-35550: - Summary: Introduce new component RescaleManager Key: FLINK-35550 URL: https://issues.apache.org/jira/browse/FLINK-35550 Project: Flink Issue Type: Sub-task Components: Runtime / Coordination Reporter: Matthias Pohl The goal here is to collect the rescaling logic in a single component to improve testability. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35549) FLIP-461: Synchronize rescaling with checkpoint creation to minimize reprocessing for the AdaptiveScheduler
Matthias Pohl created FLINK-35549: - Summary: FLIP-461: Synchronize rescaling with checkpoint creation to minimize reprocessing for the AdaptiveScheduler Key: FLINK-35549 URL: https://issues.apache.org/jira/browse/FLINK-35549 Project: Flink Issue Type: Improvement Components: Runtime / Checkpointing, Runtime / Coordination Affects Versions: 1.20.0 Reporter: Matthias Pohl This is the umbrella issue for implementing [FLIP-461|https://cwiki.apache.org/confluence/display/FLINK/FLIP-461%3A+Synchronize+rescaling+with+checkpoint+creation+to+minimize+reprocessing+for+the+AdaptiveScheduler] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35000) PullRequest template doesn't use the correct format to refer to the testing code convention
Matthias Pohl created FLINK-35000: - Summary: PullRequest template doesn't use the correct format to refer to the testing code convention Key: FLINK-35000 URL: https://issues.apache.org/jira/browse/FLINK-35000 Project: Flink Issue Type: Bug Components: Build System / CI, Project Website Affects Versions: 1.18.1, 1.19.0, 1.20.0 Reporter: Matthias Pohl The PR template refers to https://flink.apache.org/contributing/code-style-and-quality-common.html#testing rather than https://flink.apache.org/how-to-contribute/code-style-and-quality-common/#7-testing -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34999) PR CI stopped operating
Matthias Pohl created FLINK-34999: - Summary: PR CI stopped operating Key: FLINK-34999 URL: https://issues.apache.org/jira/browse/FLINK-34999 Project: Flink Issue Type: Bug Components: Build System / CI Affects Versions: 1.18.1, 1.19.0, 1.20.0 Reporter: Matthias Pohl There are no [new PR CI runs|https://dev.azure.com/apache-flink/apache-flink/_build?definitionId=2] being picked up anymore. [Recently updated PRs|https://github.com/apache/flink/pulls?q=sort%3Aupdated-desc] are not picked up by the @flinkbot. In the meantime there was a notification sent from GitHub that the password of the @flinkbot was reset for security reasons. It's quite likely that these two events are related. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34989) Apache Infra requests to reduce the runner usage for a project
Matthias Pohl created FLINK-34989: - Summary: Apache Infra requests to reduce the runner usage for a project Key: FLINK-34989 URL: https://issues.apache.org/jira/browse/FLINK-34989 Project: Flink Issue Type: Sub-task Components: Build System / CI Affects Versions: 1.18.1, 1.19.0, 1.20.0 Reporter: Matthias Pohl The GitHub Actions CI utilizes runners that are hosted by Apache Infra right now. These runners are limited. The runner usage can be monitored via the following links: * [Flink-specific report|https://infra-reports.apache.org/#ghactions&project=flink&hours=168] (needs ASF committer rights) This project-specific report can only be modified through the HTTP GET parameters of the URL. * [Global report|https://infra-reports.apache.org/#ghactions] (needs ASF membership) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34988) Class loading issues in JDK17 and JDK21
Matthias Pohl created FLINK-34988: - Summary: Class loading issues in JDK17 and JDK21 Key: FLINK-34988 URL: https://issues.apache.org/jira/browse/FLINK-34988 Project: Flink Issue Type: Bug Components: API / DataStream Affects Versions: 1.20.0 Reporter: Matthias Pohl * JDK 17 (core; NoClassDefFoundError caused by ExceptionInInitializeError): https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58676&view=logs&j=675bf62c-8558-587e-2555-dcad13acefb5&t=5878eed3-cc1e-5b12-1ed0-9e7139ce0992&l=12942 * JDK 17 (misc; ExceptionInInitializeError): https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58676&view=logs&j=d871f0ce-7328-5d00-023b-e7391f5801c8&t=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6&l=22548 * JDK 21 (core; same as above): https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58676&view=logs&j=d06b80b4-9e88-5d40-12a2-18072cf60528&t=609ecd5a-3f6e-5d0c-2239-2096b155a4d0&l=12963 * JDK 21 (misc; same as above): https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58676&view=logs&j=59a2b95a-736b-5c46-b3e0-cee6e587fd86&t=c301da75-e699-5c06-735f-778207c16f50&l=22506 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34961) GitHub Actions statistcs can be monitored per workflow name
Matthias Pohl created FLINK-34961: - Summary: GitHub Actions statistcs can be monitored per workflow name Key: FLINK-34961 URL: https://issues.apache.org/jira/browse/FLINK-34961 Project: Flink Issue Type: Improvement Components: Build System / CI Reporter: Matthias Pohl Apache Infra allows the monitoring of runner usage per workflow (see [report for Flink|https://infra-reports.apache.org/#ghactions&project=flink&hours=168&limit=10]; only accessible with Apache committer rights). They accumulate the data by workflow name. The Flink space has multiple repositories that use the generic workflow name {{CI}}). That makes the differentiation in the report harder. This Jira issue is about identifying all Flink-related projects with a CI workflow (Kubernetes operator and the JDBC connector were identified, for instance) and adding a more distinct name. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34940) LeaderContender implementations handle invalid state
Matthias Pohl created FLINK-34940: - Summary: LeaderContender implementations handle invalid state Key: FLINK-34940 URL: https://issues.apache.org/jira/browse/FLINK-34940 Project: Flink Issue Type: Technical Debt Components: Runtime / Coordination Reporter: Matthias Pohl Currently, LeaderContender implementations (e.g. see [ResourceManagerServiceImplTest#grantLeadership_withExistingLeader_waitTerminationOfExistingLeader|https://github.com/apache/flink/blob/master/flink-runtime/src/test/java/org/apache/flink/runtime/resourcemanager/ResourceManagerServiceImplTest.java#L219]) allow the handling of leader events of the same type happening after each other which shouldn't be the case. Two subsequent leadership grants indicate that the leading instance which received the leadership grant again missed the leadership revocation event causing an invalid state of the overall deployment (i.e. split brain scenario). We should fail fatally in these scenarios rather than handling them. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34939) Harden TestingLeaderElection
Matthias Pohl created FLINK-34939: - Summary: Harden TestingLeaderElection Key: FLINK-34939 URL: https://issues.apache.org/jira/browse/FLINK-34939 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.18.1, 1.19.0, 1.20.0 Reporter: Matthias Pohl The {{TestingLeaderElection}} implementation does not follow the interface contract of {{LeaderElection}} in all of its facets (e.g. leadership acquire and revocation events should be alternating). This issue is about hardening {{LeaderElection}} contract in the test implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34937) Apache Infra GHA policy update
Matthias Pohl created FLINK-34937: - Summary: Apache Infra GHA policy update Key: FLINK-34937 URL: https://issues.apache.org/jira/browse/FLINK-34937 Project: Flink Issue Type: Bug Components: Build System / CI Affects Versions: 1.18.1, 1.19.0, 1.20.0 Reporter: Matthias Pohl There is a policy update [announced in the infra ML|https://lists.apache.org/thread/6qw21x44q88rc3mhkn42jgjjw94rsvb1] which asked Apache projects to limit the number of runners per job. Additionally, the [GHA policy|https://infra.apache.org/github-actions-policy.html] is referenced which I wasn't aware of when working on the action workflow. This issue is about applying the policy to the Flink GHA workflows. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34933) JobMasterServiceLeadershipRunnerTest#testResultFutureCompletionOfOutdatedLeaderIsIgnored isn't implemented properly
Matthias Pohl created FLINK-34933: - Summary: JobMasterServiceLeadershipRunnerTest#testResultFutureCompletionOfOutdatedLeaderIsIgnored isn't implemented properly Key: FLINK-34933 URL: https://issues.apache.org/jira/browse/FLINK-34933 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.18.1, 1.19.0, 1.17.2, 1.20.0 Reporter: Matthias Pohl {{testResultFutureCompletionOfOutdatedLeaderIsIgnored}} doesn't test the desired behavior: The {{TestingJobMasterService#closeAsync()}} callback throws an {{UnsupportedOperationException}} by default which prevents the test from properly finalizing the leadership revocation. The test is still passing because the test checks implicitly for this error. Instead, we should verify that the runner's resultFuture doesn't complete until the runner is closed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34921) SystemProcessingTimeServiceTest fails due to missing output
Matthias Pohl created FLINK-34921: - Summary: SystemProcessingTimeServiceTest fails due to missing output Key: FLINK-34921 URL: https://issues.apache.org/jira/browse/FLINK-34921 Project: Flink Issue Type: Bug Components: API / DataStream Affects Versions: 1.20.0 Reporter: Matthias Pohl This PR CI build with {{AdaptiveScheduler}} enabled failed: https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58476&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=11224 {code} "ForkJoinPool-61-worker-25" #863 daemon prio=5 os_prio=0 tid=0x7f8c19eba000 nid=0x60a5 waiting on condition [0x7f8bc2cf9000] Mar 21 17:19:42java.lang.Thread.State: WAITING (parking) Mar 21 17:19:42 at sun.misc.Unsafe.park(Native Method) Mar 21 17:19:42 - parking to wait for <0xd81959b8> (a java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) Mar 21 17:19:42 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) Mar 21 17:19:42 at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429) Mar 21 17:19:42 at java.util.concurrent.FutureTask.get(FutureTask.java:191) Mar 21 17:19:42 at org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeServiceTest$$Lambda$1443/1477662666.call(Unknown Source) Mar 21 17:19:42 at org.assertj.core.api.ThrowableAssert.catchThrowable(ThrowableAssert.java:63) Mar 21 17:19:42 at org.assertj.core.api.AssertionsForClassTypes.catchThrowable(AssertionsForClassTypes.java:892) Mar 21 17:19:42 at org.assertj.core.api.Assertions.catchThrowable(Assertions.java:1366) Mar 21 17:19:42 at org.assertj.core.api.Assertions.assertThatThrownBy(Assertions.java:1210) Mar 21 17:19:42 at org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeServiceTest.testQuiesceAndAwaitingCancelsScheduledAtFixRateFuture(SystemProcessingTimeServiceTest.java:92) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34897) JobMasterServiceLeadershipRunnerTest#testJobMasterServiceLeadershipRunnerCloseWhenElectionServiceGrantLeaderShip needs to be enabled again
Matthias Pohl created FLINK-34897: - Summary: JobMasterServiceLeadershipRunnerTest#testJobMasterServiceLeadershipRunnerCloseWhenElectionServiceGrantLeaderShip needs to be enabled again Key: FLINK-34897 URL: https://issues.apache.org/jira/browse/FLINK-34897 Project: Flink Issue Type: Technical Debt Components: Runtime / Coordination Affects Versions: 1.18.1, 1.19.0, 1.17.2, 1.20.0 Reporter: Matthias Pohl While working on FLINK-34672 I noticed that {{JobMasterServiceLeadershipRunnerTest#testJobMasterServiceLeadershipRunnerCloseWhenElectionServiceGrantLeaderShip}} is disabled without a reason. It looks like I disabled it accidentally as part of FLINK-31783. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34695) Move Flink's CI docker container into a public repo
Matthias Pohl created FLINK-34695: - Summary: Move Flink's CI docker container into a public repo Key: FLINK-34695 URL: https://issues.apache.org/jira/browse/FLINK-34695 Project: Flink Issue Type: Improvement Components: Build System / CI Affects Versions: 1.18.1, 1.19.0, 1.20.0 Reporter: Matthias Pohl Currently, Flink's CI (GitHub Actions and Azure Pipelines) use a container to run the logic. The intention behind it is to have a way to mimick the CI setup locally as well. The current Docker image is maintained from the [zentol/flink-ci-docker|https://github.com/zentol/flink-ci-docker] fork (owned by [~chesnay]) of [flink-ci/flink-ci-docker|https://github.com/flink-ci/flink-ci-docker] (owned by Ververica) which is not ideal. We should move this repo into a Apache-owned repository. Additionally, the there's no workflow pushing the image automatically to a registry from where it can be used. Instead, the images were pushed to personal Docker Hub repos in the past (rmetzger, chesnay, mapohl). This is also not ideal. We should use a public repo using a GHA workflow to push the image to that repo. Questions to answer here: # Where shall the Docker image code be located? # Which Docker registry should be used? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34646) AggregateITCase.testDistinctWithRetract timed out
Matthias Pohl created FLINK-34646: - Summary: AggregateITCase.testDistinctWithRetract timed out Key: FLINK-34646 URL: https://issues.apache.org/jira/browse/FLINK-34646 Project: Flink Issue Type: Bug Components: Table SQL / Runtime Affects Versions: 1.18.1 Reporter: Matthias Pohl https://github.com/apache/flink/actions/runs/8211401561/job/22460442229#step:10:17161 {code} "main" #1 prio=5 os_prio=0 tid=0x7f70abeb7000 nid=0x4cff3 waiting on condition [0x7f70ac3f6000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xcd24c690> (a java.util.concurrent.CompletableFuture$Signaller) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707) at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2131) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2099) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2077) at org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:876) at org.apache.flink.table.planner.runtime.stream.sql.AggregateITCase.testDistinctWithRetract(AggregateITCase.scala:345) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34645) StreamArrowPythonGroupWindowAggregateFunctionOperatorTest.testFinishBundleTriggeredByCount fails
Matthias Pohl created FLINK-34645: - Summary: StreamArrowPythonGroupWindowAggregateFunctionOperatorTest.testFinishBundleTriggeredByCount fails Key: FLINK-34645 URL: https://issues.apache.org/jira/browse/FLINK-34645 Project: Flink Issue Type: Bug Components: Table SQL / Runtime Affects Versions: 1.18.1 Reporter: Matthias Pohl {code} Error: 02:27:17 02:27:17.025 [ERROR] Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.658 s <<< FAILURE! - in org.apache.flink.table.runtime.operators.python.aggregate.arrow.stream.StreamArrowPythonGroupWindowAggregateFunctionOperatorTest Error: 02:27:17 02:27:17.025 [ERROR] org.apache.flink.table.runtime.operators.python.aggregate.arrow.stream.StreamArrowPythonGroupWindowAggregateFunctionOperatorTest.testFinishBundleTriggeredByCount Time elapsed: 0.3 s <<< FAILURE! Mar 09 02:27:17 java.lang.AssertionError: Mar 09 02:27:17 Mar 09 02:27:17 Expected size: 8 but was: 6 in: Mar 09 02:27:17 [Record @ (undef) : +I(c1,0,1969-12-31T23:59:55,1970-01-01T00:00:05), Mar 09 02:27:17 Record @ (undef) : +I(c2,3,1969-12-31T23:59:55,1970-01-01T00:00:05), Mar 09 02:27:17 Record @ (undef) : +I(c2,3,1970-01-01T00:00,1970-01-01T00:00:10), Mar 09 02:27:17 Record @ (undef) : +I(c1,0,1970-01-01T00:00,1970-01-01T00:00:10), Mar 09 02:27:17 Watermark @ 1, Mar 09 02:27:17 Watermark @ 2] Mar 09 02:27:17 at org.apache.flink.table.runtime.util.RowDataHarnessAssertor.assertOutputEquals(RowDataHarnessAssertor.java:110) Mar 09 02:27:17 at org.apache.flink.table.runtime.util.RowDataHarnessAssertor.assertOutputEquals(RowDataHarnessAssertor.java:70) Mar 09 02:27:17 at org.apache.flink.table.runtime.operators.python.aggregate.arrow.ArrowPythonAggregateFunctionOperatorTestBase.assertOutputEquals(ArrowPythonAggregateFunctionOperatorTestBase.java:62) Mar 09 02:27:17 at org.apache.flink.table.runtime.operators.python.aggregate.arrow.stream.StreamArrowPythonGroupWindowAggregateFunctionOperatorTest.testFinishBundleTriggeredByCount(StreamArrowPythonGroupWindowAggregateFunctionOperatorTest.java:326) Mar 09 02:27:17 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34644) RestServerEndpointITCase.testShouldWaitForHandlersWhenClosing failed with ConnectionClosedException
Matthias Pohl created FLINK-34644: - Summary: RestServerEndpointITCase.testShouldWaitForHandlersWhenClosing failed with ConnectionClosedException Key: FLINK-34644 URL: https://issues.apache.org/jira/browse/FLINK-34644 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.20.0 Reporter: Matthias Pohl https://github.com/apache/flink/actions/runs/8189958608/job/22396362238#step:10:9215 {code} Error: 15:13:33 15:13:33.779 [ERROR] Tests run: 68, Failures: 0, Errors: 1, Skipped: 4, Time elapsed: 17.81 s <<< FAILURE! -- in org.apache.flink.runtime.rest.RestServerEndpointITCase Error: 15:13:33 15:13:33.779 [ERROR] org.apache.flink.runtime.rest.RestServerEndpointITCase.testShouldWaitForHandlersWhenClosing -- Time elapsed: 0.329 s <<< ERROR! Mar 07 15:13:33 java.util.concurrent.ExecutionException: org.apache.flink.runtime.rest.ConnectionClosedException: Channel became inactive. Mar 07 15:13:33 at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) Mar 07 15:13:33 at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) Mar 07 15:13:33 at org.apache.flink.runtime.rest.RestServerEndpointITCase.testShouldWaitForHandlersWhenClosing(RestServerEndpointITCase.java:592) Mar 07 15:13:33 at java.lang.reflect.Method.invoke(Method.java:498) Mar 07 15:13:33 at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) Mar 07 15:13:33 at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) Mar 07 15:13:33 at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) Mar 07 15:13:33 at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) Mar 07 15:13:33 at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) Mar 07 15:13:33 at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) Mar 07 15:13:33 at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) Mar 07 15:13:33 at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) Mar 07 15:13:33 at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) Mar 07 15:13:33 at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) Mar 07 15:13:33 at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) Mar 07 15:13:33 at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) Mar 07 15:13:33 at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) Mar 07 15:13:33 at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) Mar 07 15:13:33 at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:272) Mar 07 15:13:33 at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) Mar 07 15:13:33 at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) Mar 07 15:13:33 at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) Mar 07 15:13:33 at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) Mar 07 15:13:33 at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) Mar 07 15:13:33 at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) Mar 07 15:13:33 at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) Mar 07 15:13:33 at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) Mar 07 15:13:33 at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) Mar 07 15:13:33 at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) Mar 07 15:13:33 at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) Mar 07 15:13:33 at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) Mar 07 15:13:33 Caused by: org.apache.flink.runtime.rest.ConnectionClosedException: Channel became inactive. Mar 07 15:13:33 at org.apache.flink.runtime.rest.RestClient$ClientHandler.channelInactive(RestClient.java:749) Mar 07 15:13:33 at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:305) Mar 07 15:13:33 at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) Mar 07 15:13:33 at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274) Mar 07 15:13:33
[jira] [Created] (FLINK-34643) JobIDLoggingITCase failed
Matthias Pohl created FLINK-34643: - Summary: JobIDLoggingITCase failed Key: FLINK-34643 URL: https://issues.apache.org/jira/browse/FLINK-34643 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.20.0 Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58187&view=logs&j=8fd9202e-fd17-5b26-353c-ac1ff76c8f28&t=ea7cf968-e585-52cb-e0fc-f48de023a7ca&l=7897 {code} Mar 09 01:24:23 01:24:23.498 [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.209 s <<< FAILURE! -- in org.apache.flink.test.misc.JobIDLoggingITCase Mar 09 01:24:23 01:24:23.498 [ERROR] org.apache.flink.test.misc.JobIDLoggingITCase.testJobIDLogging(ClusterClient) -- Time elapsed: 1.459 s <<< ERROR! Mar 09 01:24:23 java.lang.IllegalStateException: Too few log events recorded for org.apache.flink.runtime.jobmaster.JobMaster (12) - this must be a bug in the test code Mar 09 01:24:23 at org.apache.flink.util.Preconditions.checkState(Preconditions.java:215) Mar 09 01:24:23 at org.apache.flink.test.misc.JobIDLoggingITCase.assertJobIDPresent(JobIDLoggingITCase.java:148) Mar 09 01:24:23 at org.apache.flink.test.misc.JobIDLoggingITCase.testJobIDLogging(JobIDLoggingITCase.java:132) Mar 09 01:24:23 at java.lang.reflect.Method.invoke(Method.java:498) Mar 09 01:24:23 at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) Mar 09 01:24:23 at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) Mar 09 01:24:23 at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) Mar 09 01:24:23 at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) Mar 09 01:24:23 at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) Mar 09 01:24:23 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34589) FineGrainedSlotManager doesn't handle errors in the resource reconcilliation step
Matthias Pohl created FLINK-34589: - Summary: FineGrainedSlotManager doesn't handle errors in the resource reconcilliation step Key: FLINK-34589 URL: https://issues.apache.org/jira/browse/FLINK-34589 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.18.1, 1.19.0, 1.20.0 Reporter: Matthias Pohl I noticed during my work on FLINK-34427 that the reconcilliation is scheduled periodically when starting the {{SlotManager}}. But it doesn't handle errors in this step. I see two options here: 1. Fail fatally because such an error might indicate a major issue with the RM backend. 2. Log the failure and continue the scheduled task even in case of an error. My understanding is that we're just not able to recreate TaskManagers which should be a transient issue and could be resolved in the backend (YARN, k8s). That's why I would lean towards option 2. [~xtsong] WDYT? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34588) FineGrainedSlotManager checks whether resources need to reconcile but doesn't act on the result
Matthias Pohl created FLINK-34588: - Summary: FineGrainedSlotManager checks whether resources need to reconcile but doesn't act on the result Key: FLINK-34588 URL: https://issues.apache.org/jira/browse/FLINK-34588 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.18.1, 1.19.0, 1.20.0 Reporter: Matthias Pohl There are a few locations in {{FineGrainedSlotManager}} where we check whether resources can/need to be reconciled but don't care about the result and just trigger the resource update (e.g. in [FineGrainedSlotManager:620|https://github.com/apache/flink/blob/c0d3e495f4c2316a80f251de77b05b943b5be1f8/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/FineGrainedSlotManager.java#L620] and [FineGrainedSlotManager:676|https://github.com/apache/flink/blob/c0d3e495f4c2316a80f251de77b05b943b5be1f8/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/FineGrainedSlotManager.java#L676]). Looks like we could reduce the calls to the backend here. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34571) SortMergeResultPartitionReadSchedulerTest.testOnReadBufferRequestError failed due an assertion
Matthias Pohl created FLINK-34571: - Summary: SortMergeResultPartitionReadSchedulerTest.testOnReadBufferRequestError failed due an assertion Key: FLINK-34571 URL: https://issues.apache.org/jira/browse/FLINK-34571 Project: Flink Issue Type: Bug Components: Runtime / Network Affects Versions: 1.19.0, 1.20.0 Reporter: Matthias Pohl https://github.com/apache/flink/actions/runs/8134965216/job/8875618#step:10:8586 {code} Error: 02:39:36 02:39:36.688 [ERROR] Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 13.68 s <<< FAILURE! -- in org.apache.flink.runtime.io.network.partition.SortMergeResultPartitionReadSchedulerTest Error: 02:39:36 02:39:36.689 [ERROR] org.apache.flink.runtime.io.network.partition.SortMergeResultPartitionReadSchedulerTest.testOnReadBufferRequestError -- Time elapsed: 0.174 s <<< FAILURE! Mar 04 02:39:36 org.opentest4j.AssertionFailedError: Mar 04 02:39:36 Mar 04 02:39:36 Expecting value to be true but was false Mar 04 02:39:36 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) Mar 04 02:39:36 at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) Mar 04 02:39:36 at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) Mar 04 02:39:36 at org.apache.flink.runtime.io.network.partition.SortMergeResultPartitionReadSchedulerTest.testOnReadBufferRequestError(SortMergeResultPartitionReadSchedulerTest.java:225) Mar 04 02:39:36 at java.lang.reflect.Method.invoke(Method.java:498) Mar 04 02:39:36 at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) Mar 04 02:39:36 at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) Mar 04 02:39:36 at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) Mar 04 02:39:36 at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) Mar 04 02:39:36 at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34570) JoinITCase.testLeftJoinWithEqualPk times out
Matthias Pohl created FLINK-34570: - Summary: JoinITCase.testLeftJoinWithEqualPk times out Key: FLINK-34570 URL: https://issues.apache.org/jira/browse/FLINK-34570 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 1.18.1 Reporter: Matthias Pohl https://github.com/apache/flink/actions/runs/8127069912/job/22211928085#step:10:14479 {code} "main" #1 prio=5 os_prio=0 tid=0x7ff4ae2b7000 nid=0x2168b waiting on condition [0x7ff4affdc000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xab096950> (a java.util.concurrent.CompletableFuture$Signaller) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707) at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2131) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2099) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2077) at org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:876) at org.apache.flink.table.planner.runtime.stream.sql.JoinITCase.testLeftJoinWithEqualPk(JoinITCase.scala:705) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34569) 'Streaming File Sink s3 end-to-end test' failed
Matthias Pohl created FLINK-34569: - Summary: 'Streaming File Sink s3 end-to-end test' failed Key: FLINK-34569 URL: https://issues.apache.org/jira/browse/FLINK-34569 Project: Flink Issue Type: Bug Components: Tests Affects Versions: 1.19.0 Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58026&view=logs&j=af184cdd-c6d8-5084-0b69-7e9c67b35f7a&t=0f3adb59-eefa-51c6-2858-3654d9e0749d&l=3957 {code} Mar 02 04:12:57 Waiting until all values have been produced Unable to find image 'stedolan/jq:latest' locally Error: No such container: docker: Error response from daemon: Get "https://registry-1.docker.io/v2/": read tcp 10.1.0.97:42214->54.236.113.205:443: read: connection reset by peer. See 'docker run --help'. Mar 02 04:12:58 Number of produced values 0/6 Error: No such container: Unable to find image 'stedolan/jq:latest' locally latest: Pulling from stedolan/jq [DEPRECATION NOTICE] Docker Image Format v1, and Docker Image manifest version 2, schema 1 support will be removed in an upcoming release. Suggest the author of docker.io/stedolan/jq:latest to upgrade the image to the OCI Format, or Docker Image manifest v2, schema 2. More information at https://docs.docker.com/go/deprecated-image-specs/ 237d5fcd25cf: Pulling fs layer [...] 4dae4fd48813: Pull complete Digest: sha256:a61ed0bca213081b64be94c5e1b402ea58bc549f457c2682a86704dd55231e09 Status: Downloaded newer image for stedolan/jq:latest parse error: Invalid numeric literal at line 1, column 6 Error: No such container: parse error: Invalid numeric literal at line 1, column 6 Error: No such container: parse error: Invalid numeric literal at line 1, column 6 [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34568) YarnFileStageTest.destroyHDFS timed out
Matthias Pohl created FLINK-34568: - Summary: YarnFileStageTest.destroyHDFS timed out Key: FLINK-34568 URL: https://issues.apache.org/jira/browse/FLINK-34568 Project: Flink Issue Type: Bug Components: Connectors / Hadoop Compatibility Affects Versions: 1.17.2 Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58024&view=logs&j=5cae8624-c7eb-5c51-92d3-4d2dacedd221&t=5acec1b4-945b-59ca-34f8-168928ce5199&l=26698 {code} Mar 02 07:28:56 "Listener at localhost/33933" #25 daemon prio=5 os_prio=0 tid=0x7f08490be000 nid=0x12cae runnable [0x7f082ebfc000] Mar 02 07:28:56java.lang.Thread.State: RUNNABLE Mar 02 07:28:56 at org.mortbay.io.nio.SelectorManager$SelectSet.stop(SelectorManager.java:879) Mar 02 07:28:56 - locked <0xd7ae0030> (a org.mortbay.io.nio.SelectorManager$SelectSet) [...] Mar 02 07:28:56 at org.apache.hadoop.hdfs.MiniDFSCluster.stopAndJoinNameNode(MiniDFSCluster.java:2123) Mar 02 07:28:56 at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:2060) Mar 02 07:28:56 at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:2031) Mar 02 07:28:56 at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:2024) Mar 02 07:28:56 at org.apache.flink.yarn.YarnFileStageTest.destroyHDFS(YarnFileStageTest.java:90) [...] {code} Looks like a HDFS issue during shutdown? This will most likely also affect newer versions because there was not much done in the Yarn space since 1.17 (hadoop was bumped in 1.17 itself; FLINK-29710). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34560) JoinITCase seems to fail on a broader scale (MiniCluster issue?)
Matthias Pohl created FLINK-34560: - Summary: JoinITCase seems to fail on a broader scale (MiniCluster issue?) Key: FLINK-34560 URL: https://issues.apache.org/jira/browse/FLINK-34560 Project: Flink Issue Type: Bug Components: Tests Affects Versions: 1.18.1 Reporter: Matthias Pohl https://github.com/apache/flink/actions/runs/8105495458/job/22154140154#step:10:11906 It still needs to be investigated what's the actual cause here. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34551) Align retry mechanisms of FutureUtils
Matthias Pohl created FLINK-34551: - Summary: Align retry mechanisms of FutureUtils Key: FLINK-34551 URL: https://issues.apache.org/jira/browse/FLINK-34551 Project: Flink Issue Type: Technical Debt Components: API / Core Affects Versions: 1.20.0 Reporter: Matthias Pohl The retry mechanisms of FutureUtils include quite a bit of redundant code which makes it hard to understand and to extend. The logic should be aligned properly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34527) Deprecate Time classes also in PyFlink
Matthias Pohl created FLINK-34527: - Summary: Deprecate Time classes also in PyFlink Key: FLINK-34527 URL: https://issues.apache.org/jira/browse/FLINK-34527 Project: Flink Issue Type: Bug Components: API / Python Affects Versions: 1.20.0 Reporter: Matthias Pohl FLINK-32570 deprecated the Time classes. But we missed touched the PyFlink-related APIs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34514) e2e (1) times out because of an error that's most likely caused by a networking issue
Matthias Pohl created FLINK-34514: - Summary: e2e (1) times out because of an error that's most likely caused by a networking issue Key: FLINK-34514 URL: https://issues.apache.org/jira/browse/FLINK-34514 Project: Flink Issue Type: Bug Components: Test Infrastructure Affects Versions: 1.20.0 Reporter: Matthias Pohl https://github.com/apache/flink/actions/runs/8027473891/job/21931649433 {code} Sat, 24 Feb 2024 03:35:54 GMT ERROR: failed to solve: process "/bin/sh -c set -ex; wget -nv -O /usr/local/bin/gosu \"https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-$(dpkg --print-architecture)\"; wget -nv -O /usr/local/bin/gosu.asc \"https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-$(dpkg --print-architecture).asc\"; export GNUPGHOME=\"$(mktemp -d)\"; for server in ha.pool.sks-keyservers.net $(shuf -e hkp://p80.pool.sks-keyservers.net:80 keyserver.ubuntu.com hkp://keyserver.ubuntu.com:80 pgp.mit.edu) ; do gpg --batch --keyserver \"$server\" --recv-keys B42F6819007F00F88E364FD4036A9C25BF357DD4 && break || : ; done && gpg --batch --verify /usr/local/bin/gosu.asc /usr/local/bin/gosu; gpgconf --kill all; rm -rf \"$GNUPGHOME\" /usr/local/bin/gosu.asc; chmod +x /usr/local/bin/gosu; gosu nobody true" did not complete successfully: exit code: 4 Sat, 24 Feb 2024 07:10:28 GMT == Sat, 24 Feb 2024 07:10:28 GMT === WARNING: This task took already 95% of the available time budget of 299 minutes === Sat, 24 Feb 2024 07:10:28 GMT == {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34513) GroupAggregateRestoreTest.testRestore fails
Matthias Pohl created FLINK-34513: - Summary: GroupAggregateRestoreTest.testRestore fails Key: FLINK-34513 URL: https://issues.apache.org/jira/browse/FLINK-34513 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 1.20.0 Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57828&view=logs&j=26b84117-e436-5720-913e-3e280ce55cae&t=77cc7e77-39a0-5007-6d65-4137ac13a471&l=10881 {code} Feb 24 01:12:01 01:12:01.384 [ERROR] Tests run: 10, Failures: 1, Errors: 0, Skipped: 1, Time elapsed: 2.957 s <<< FAILURE! -- in org.apache.flink.table.planner.plan.nodes.exec.stream.GroupAggregateRestoreTest Feb 24 01:12:01 01:12:01.384 [ERROR] org.apache.flink.table.planner.plan.nodes.exec.stream.GroupAggregateRestoreTest.testRestore(TableTestProgram, ExecNodeMetadata)[4] -- Time elapsed: 0.653 s <<< FAILURE! Feb 24 01:12:01 java.lang.AssertionError: Feb 24 01:12:01 Feb 24 01:12:01 Expecting actual: Feb 24 01:12:01 ["+I[3, 1, 2, 8, 31, 10.0, 3]", Feb 24 01:12:01 "+I[2, 1, 4, 14, 42, 7.0, 6]", Feb 24 01:12:01 "+I[1, 1, 4, 12, 24, 6.0, 4]", Feb 24 01:12:01 "+U[2, 1, 4, 14, 57, 8.0, 7]", Feb 24 01:12:01 "+U[1, 1, 4, 12, 32, 6.0, 5]", Feb 24 01:12:01 "+I[7, 0, 1, 7, 7, 7.0, 1]", Feb 24 01:12:01 "+U[2, 1, 4, 14, 57, 7.0, 7]", Feb 24 01:12:01 "+U[1, 1, 4, 12, 32, 5.0, 5]", Feb 24 01:12:01 "+U[3, 1, 2, 8, 31, 9.0, 3]", Feb 24 01:12:01 "+U[7, 0, 1, 7, 7, 7.0, 2]"] Feb 24 01:12:01 to contain exactly in any order: Feb 24 01:12:01 ["+I[3, 1, 2, 8, 31, 10.0, 3]", Feb 24 01:12:01 "+I[2, 1, 4, 14, 42, 7.0, 6]", Feb 24 01:12:01 "+I[1, 1, 4, 12, 24, 6.0, 4]", Feb 24 01:12:01 "+U[2, 1, 4, 14, 57, 8.0, 7]", Feb 24 01:12:01 "+U[1, 1, 4, 12, 32, 6.0, 5]", Feb 24 01:12:01 "+U[3, 1, 2, 8, 31, 9.0, 3]", Feb 24 01:12:01 "+U[2, 1, 4, 14, 57, 7.0, 7]", Feb 24 01:12:01 "+I[7, 0, 1, 7, 7, 7.0, 2]", Feb 24 01:12:01 "+U[1, 1, 4, 12, 32, 5.0, 5]"] Feb 24 01:12:01 elements not found: Feb 24 01:12:01 ["+I[7, 0, 1, 7, 7, 7.0, 2]"] Feb 24 01:12:01 and elements not expected: Feb 24 01:12:01 ["+I[7, 0, 1, 7, 7, 7.0, 1]", "+U[7, 0, 1, 7, 7, 7.0, 2]"] Feb 24 01:12:01 Feb 24 01:12:01 at org.apache.flink.table.planner.plan.nodes.exec.testutils.RestoreTestBase.testRestore(RestoreTestBase.java:313) Feb 24 01:12:01 at java.base/java.lang.reflect.Method.invoke(Method.java:580) [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34508) Migrate S3-related ITCases and e2e tests to Minio
Matthias Pohl created FLINK-34508: - Summary: Migrate S3-related ITCases and e2e tests to Minio Key: FLINK-34508 URL: https://issues.apache.org/jira/browse/FLINK-34508 Project: Flink Issue Type: Sub-task Components: Build System / CI Affects Versions: 1.18.1, 1.19.0, 1.20.0 Reporter: Matthias Pohl Anything that uses {{org.apache.flink.testutils.s3.S3TestCredentials}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34495) Resuming Savepoint (rocks, scale up, heap timers) end-to-end test failure
Matthias Pohl created FLINK-34495: - Summary: Resuming Savepoint (rocks, scale up, heap timers) end-to-end test failure Key: FLINK-34495 URL: https://issues.apache.org/jira/browse/FLINK-34495 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.20.0 Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57760&view=logs&j=e9d3d34f-3d15-59f4-0e3e-35067d100dfe&t=5d91035e-8022-55f2-2d4f-ab121508bf7e&l=2010 I guess the failure occurred due to the existence of a checkpoint failure: {code} Feb 22 00:49:16 2024-02-22 00:49:04,305 WARN org.apache.flink.runtime.checkpoint.CheckpointFailureManager [] - Failed to trigger or complete checkpoint 12 for job 3c9ffc670ead2cb3c4118410cbef3b72. (0 consecutive failed attempts so far) Feb 22 00:49:16 org.apache.flink.runtime.checkpoint.CheckpointException: Checkpoint Coordinator is suspending. Feb 22 00:49:16 at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.stopCheckpointScheduler(CheckpointCoordinator.java:2056) ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] Feb 22 00:49:16 at org.apache.flink.runtime.scheduler.SchedulerBase.stopCheckpointScheduler(SchedulerBase.java:960) ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] Feb 22 00:49:16 at org.apache.flink.runtime.scheduler.SchedulerBase.stopWithSavepoint(SchedulerBase.java:1030) ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] Feb 22 00:49:16 at org.apache.flink.runtime.jobmaster.JobMaster.stopWithSavepoint(JobMaster.java:901) ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] Feb 22 00:49:16 at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?] Feb 22 00:49:16 at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?] Feb 22 00:49:16 at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?] Feb 22 00:49:16 at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] Feb 22 00:49:16 at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRpcInvocation$1(PekkoRpcActor.java:309) ~[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT] Feb 22 00:49:16 at org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83) ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] Feb 22 00:49:16 at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcInvocation(PekkoRpcActor.java:307) ~[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT] Feb 22 00:49:16 at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:222) ~[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT] Feb 22 00:49:16 at org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor.handleRpcMessage(FencedPekkoRpcActor.java:85) ~[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT] Feb 22 00:49:16 at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:168) ~[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT] Feb 22 00:49:16 at org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) [flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT] Feb 22 00:49:16 at org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) [flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT] Feb 22 00:49:16 at scala.PartialFunction.applyOrElse(PartialFunction.scala:127) [flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT] Feb 22 00:49:16 at scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) [flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT] Feb 22 00:49:16 at org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29) [flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT] Feb 22 00:49:16 at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) [flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT] Feb 22 00:49:16 at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) [flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT] Feb 22 00:49:16 at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) [flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT] Feb 22 00:49:16 at org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547) [flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT] Feb 22 00:49:16 at org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545) [flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar
[jira] [Created] (FLINK-34489) New File Sink end-to-end test timed out
Matthias Pohl created FLINK-34489: - Summary: New File Sink end-to-end test timed out Key: FLINK-34489 URL: https://issues.apache.org/jira/browse/FLINK-34489 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.19.0, 1.20.0 Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57707&view=logs&j=af184cdd-c6d8-5084-0b69-7e9c67b35f7a&t=0f3adb59-eefa-51c6-2858-3654d9e0749d&l=3726 {code} eb 21 07:26:03 Number of produced values 10770/6 Feb 21 07:39:50 Test (pid: 151375) did not finish after 900 seconds. Feb 21 07:39:50 Printing Flink logs and killing it: [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34488) Integrate snapshot deployment into GHA nightly workflow
Matthias Pohl created FLINK-34488: - Summary: Integrate snapshot deployment into GHA nightly workflow Key: FLINK-34488 URL: https://issues.apache.org/jira/browse/FLINK-34488 Project: Flink Issue Type: Sub-task Components: Build System / CI Affects Versions: 1.18.1, 1.19.0, 1.20.0 Reporter: Matthias Pohl Analogously to the [Azure Pipelines nightly config|https://github.com/apache/flink/blob/e923d4060b6dabe650a8950774d176d3e92437c2/tools/azure-pipelines/build-apache-repo.yml#L103] we want to deploy the snapshot artifacts in the GHA nightly workflow as well. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34487) Integrate tools/azure-pipelines/build-python-wheels.yml into GHA nightly workflow
Matthias Pohl created FLINK-34487: - Summary: Integrate tools/azure-pipelines/build-python-wheels.yml into GHA nightly workflow Key: FLINK-34487 URL: https://issues.apache.org/jira/browse/FLINK-34487 Project: Flink Issue Type: Sub-task Components: Build System / CI Affects Versions: 1.18.1, 1.19.0, 1.20.0 Reporter: Matthias Pohl Analogously to the [Azure Pipelines nightly config|https://github.com/apache/flink/blob/e923d4060b6dabe650a8950774d176d3e92437c2/tools/azure-pipelines/build-apache-repo.yml#L183] we want to generate the wheels artifacts in the GHA nightly workflow as well. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34486) Add documentation on how to add the shared utils as a submodule to the connector repo
Matthias Pohl created FLINK-34486: - Summary: Add documentation on how to add the shared utils as a submodule to the connector repo Key: FLINK-34486 URL: https://issues.apache.org/jira/browse/FLINK-34486 Project: Flink Issue Type: Improvement Components: Connectors / Common Affects Versions: connector-parent-1.1.0 Reporter: Matthias Pohl [apache/flink-connector-shared-utils:README.md|https://github.com/apache/flink-connector-shared-utils/blob/release_utils/README.md] doesn't state how a the shared utils shall be added as a submodule to a connector repository. But this is expected from within [connector release documentation|https://cwiki.apache.org/confluence/display/FLINK/Creating+a+flink-connector+release#Creatingaflinkconnectorrelease-Buildareleasecandidate]: {quote} The following sections assume that the release_utils branch from flink-connector-shared-utils is mounted as a git submodule under tools/releasing/shared, you can update the submodule by running git submodule update --remote (or git submodule update --init --recursive if the submodule wasn't initialized, yet) to use latest release utils, you need to mount the flink-connector-shared-utils as a submodule under the tools/releasing/shared if it hasn't been mounted in the connector repository. See the README for details. {quote} Let's update the README accordingly and add a link to {{README}} in the connector release documentation -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34475) ZooKeeperLeaderElectionDriverTest failed with exit code 2
Matthias Pohl created FLINK-34475: - Summary: ZooKeeperLeaderElectionDriverTest failed with exit code 2 Key: FLINK-34475 URL: https://issues.apache.org/jira/browse/FLINK-34475 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.18.1 Reporter: Matthias Pohl [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57649&view=logs&j=0e7be18f-84f2-53f0-a32d-4a5e4a174679&t=7c1d86e3-35bd-5fd5-3b7c-30c126a78702&l=8746] {code:java} Feb 20 01:20:02 01:20:02.369 [ERROR] Process Exit Code: 2 Feb 20 01:20:02 01:20:02.369 [ERROR] Crashed tests: Feb 20 01:20:02 01:20:02.369 [ERROR] org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionDriverTest Feb 20 01:20:02 01:20:02.369 [ERROR]at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:748) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34464) actions/cache@v4 times out
Matthias Pohl created FLINK-34464: - Summary: actions/cache@v4 times out Key: FLINK-34464 URL: https://issues.apache.org/jira/browse/FLINK-34464 Project: Flink Issue Type: Bug Components: Build System / CI, Test Infrastructure Reporter: Matthias Pohl [https://github.com/apache/flink/actions/runs/7953599167/job/21710058433#step:4:125] Pulling the docker image stalled. This should be a temporary issue: {code:java} /usr/bin/docker exec 601a5a6e68acf3ba38940ec7a07e08d7c57e763ca0364070124f71bc2f708bc3 sh -c "cat /etc/*release | grep ^ID" 120Received 260046848 of 1429155280 (18.2%), 248.0 MBs/sec 121Received 545259520 of 1429155280 (38.2%), 260.0 MBs/sec [...] Received 914358272 of 1429155280 (64.0%), 0.0 MBs/sec 21645Received 914358272 of 1429155280 (64.0%), 0.0 MBs/sec 21646Error: The operation was canceled. {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34450) TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding failed
Matthias Pohl created FLINK-34450: - Summary: TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding failed Key: FLINK-34450 URL: https://issues.apache.org/jira/browse/FLINK-34450 Project: Flink Issue Type: Bug Components: Runtime / Task Affects Versions: 1.20.0 Reporter: Matthias Pohl https://github.com/XComp/flink/actions/runs/7927275243/job/21643615491#step:10:9880 {code} Error: 07:48:06 07:48:06.643 [ERROR] Tests run: 11, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.309 s <<< FAILURE! -- in org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest Error: 07:48:06 07:48:06.646 [ERROR] org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding -- Time elapsed: 0.036 s <<< FAILURE! Feb 16 07:48:06 Output was not correct.: array lengths differed, expected.length=8 actual.length=7; arrays first differed at element [6]; expected: but was: Feb 16 07:48:06 at org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:78) Feb 16 07:48:06 at org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:28) Feb 16 07:48:06 at org.junit.Assert.internalArrayEquals(Assert.java:534) Feb 16 07:48:06 at org.junit.Assert.assertArrayEquals(Assert.java:285) Feb 16 07:48:06 at org.apache.flink.streaming.util.TestHarnessUtil.assertOutputEquals(TestHarnessUtil.java:59) Feb 16 07:48:06 at org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding(TwoInputStreamTaskTest.java:248) Feb 16 07:48:06 at java.lang.reflect.Method.invoke(Method.java:498) Feb 16 07:48:06 Caused by: java.lang.AssertionError: expected: but was: Feb 16 07:48:06 at org.junit.Assert.fail(Assert.java:89) Feb 16 07:48:06 at org.junit.Assert.failNotEquals(Assert.java:835) Feb 16 07:48:06 at org.junit.Assert.assertEquals(Assert.java:120) Feb 16 07:48:06 at org.junit.Assert.assertEquals(Assert.java:146) Feb 16 07:48:06 at org.junit.internal.ExactComparisonCriteria.assertElementsEqual(ExactComparisonCriteria.java:8) Feb 16 07:48:06 at org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:76) Feb 16 07:48:06 ... 6 more {code} I couldn't reproduce it locally with 2 runs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34449) Flink build took too long
Matthias Pohl created FLINK-34449: - Summary: Flink build took too long Key: FLINK-34449 URL: https://issues.apache.org/jira/browse/FLINK-34449 Project: Flink Issue Type: Bug Components: Build System / CI, Test Infrastructure Reporter: Matthias Pohl We saw a timeout when building Flink in e2e1 stage. No logs are available to investigate the issue: https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57551&view=logs&j=bbb1e2a2-a43c-55c8-fb48-5cfe7a8a0ca6 {code} Nothing to show. Final logs are missing. This can happen when the job is cancelled or times out. {code} I'd consider this an infrastructure issue but created the Jira issue for documentation purposes. Let's see whether that pops up again. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34448) ChangelogLocalRecoveryITCase failed fatally with 127 exit code
Matthias Pohl created FLINK-34448: - Summary: ChangelogLocalRecoveryITCase failed fatally with 127 exit code Key: FLINK-34448 URL: https://issues.apache.org/jira/browse/FLINK-34448 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.20.0 Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57550&view=logs&j=2c3cbe13-dee0-5837-cf47-3053da9a8a78&t=b78d9d30-509a-5cea-1fef-db7abaa325ae&l=8897 \ {code} Feb 16 02:43:47 02:43:47.142 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.2.2:test (integration-tests) on project flink-tests: Feb 16 02:43:47 02:43:47.142 [ERROR] Feb 16 02:43:47 02:43:47.142 [ERROR] Please refer to /__w/1/s/flink-tests/target/surefire-reports for the individual test results. Feb 16 02:43:47 02:43:47.142 [ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream. Feb 16 02:43:47 02:43:47.142 [ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called? Feb 16 02:43:47 02:43:47.142 [ERROR] Command was /bin/sh -c cd '/__w/1/s/flink-tests' && '/usr/lib/jvm/jdk-11.0.19+7/bin/java' '-XX:+UseG1GC' '-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' '--add-opens=java.base/java.util=ALL-UNNAMED' '--add-opens=java.base/java.io=ALL-UNNAMED' '-Xmx1536m' '-jar' '/__w/1/s/flink-tests/target/surefire/surefirebooter-20240216015747138_560.jar' '/__w/1/s/flink-tests/target/surefire' '2024-02-16T01-57-43_286-jvmRun4' 'surefire-20240216015747138_558tmp' 'surefire_185-20240216015747138_559tmp' Feb 16 02:43:47 02:43:47.142 [ERROR] Error occurred in starting fork, check output in log Feb 16 02:43:47 02:43:47.142 [ERROR] Process Exit Code: 127 Feb 16 02:43:47 02:43:47.142 [ERROR] Crashed tests: Feb 16 02:43:47 02:43:47.142 [ERROR] org.apache.flink.test.checkpointing.ChangelogLocalRecoveryITCase Feb 16 02:43:47 02:43:47.142 [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called? Feb 16 02:43:47 02:43:47.142 [ERROR] Command was /bin/sh -c cd '/__w/1/s/flink-tests' && '/usr/lib/jvm/jdk-11.0.19+7/bin/java' '-XX:+UseG1GC' '-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' '--add-opens=java.base/java.util=ALL-UNNAMED' '--add-opens=java.base/java.io=ALL-UNNAMED' '-Xmx1536m' '-jar' '/__w/1/s/flink-tests/target/surefire/surefirebooter-20240216015747138_560.jar' '/__w/1/s/flink-tests/target/surefire' '2024-02-16T01-57-43_286-jvmRun4' 'surefire-20240216015747138_558tmp' 'surefire_185-20240216015747138_559tmp' Feb 16 02:43:47 02:43:47.142 [ERROR] Error occurred in starting fork, check output in log Feb 16 02:43:47 02:43:47.142 [ERROR] Process Exit Code: 127 Feb 16 02:43:47 02:43:47.142 [ERROR] Crashed tests: Feb 16 02:43:47 02:43:47.142 [ERROR] org.apache.flink.test.checkpointing.ChangelogLocalRecoveryITCase Feb 16 02:43:47 02:43:47.142 [ERROR]at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34447) ActiveResourceManagerTest#testWorkerRegistrationTimeoutNotCountingAllocationTime still fails on slow machines
Matthias Pohl created FLINK-34447: - Summary: ActiveResourceManagerTest#testWorkerRegistrationTimeoutNotCountingAllocationTime still fails on slow machines Key: FLINK-34447 URL: https://issues.apache.org/jira/browse/FLINK-34447 Project: Flink Issue Type: Technical Debt Components: Runtime / Coordination Affects Versions: 1.18.1, 1.19.0, 1.20.0 Reporter: Matthias Pohl This appeared in this [PR CI run|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57529&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=7997] of FLINK-34427. {code} Feb 14 18:50:01 18:50:01.283 [ERROR] Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.665 s <<< FAILURE! -- in org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest Feb 14 18:50:01 18:50:01.283 [ERROR] org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest.testWorkerRegistrationTimeoutNotCountingAllocationTime -- Time elapsed: 0.197 s <<< FAILURE! Feb 14 18:50:01 java.lang.AssertionError: Feb 14 18:50:01 Feb 14 18:50:01 Expecting Feb 14 18:50:01 Feb 14 18:50:01 not to be done. Feb 14 18:50:01 Be aware that the state of the future in this message might not reflect the one at the time when the assertion was performed as it is evaluated later on Feb 14 18:50:01 at org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest$15.lambda$new$3(ActiveResourceManagerTest.java:982) Feb 14 18:50:01 at org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest$Context.runTest(ActiveResourceManagerTest.java:1133) Feb 14 18:50:01 at org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest$15.(ActiveResourceManagerTest.java:963) Feb 14 18:50:01 at org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest.testWorkerRegistrationTimeoutNotCountingAllocationTime(ActiveResourceManagerTest.java:946) Feb 14 18:50:01 at java.lang.reflect.Method.invoke(Method.java:498) Feb 14 18:50:01 at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) Feb 14 18:50:01 at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) Feb 14 18:50:01 at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) Feb 14 18:50:01 at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) Feb 14 18:50:01 at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) {code} But I was able to reproduce it locally as well. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34443) YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication failed when deploying job cluster
Matthias Pohl created FLINK-34443: - Summary: YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication failed when deploying job cluster Key: FLINK-34443 URL: https://issues.apache.org/jira/browse/FLINK-34443 Project: Flink Issue Type: Bug Components: Build System / CI, Runtime / Coordination, Test Infrastructure Affects Versions: 1.19.0, 1.20.0 Reporter: Matthias Pohl https://github.com/apache/flink/actions/runs/7895502206/job/21548246199#step:10:28804 {code} Error: 03:04:05 03:04:05.066 [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 68.10 s <<< FAILURE! -- in org.apache.flink.yarn.YARNFileReplicationITCase Error: 03:04:05 03:04:05.067 [ERROR] org.apache.flink.yarn.YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication -- Time elapsed: 1.982 s <<< ERROR! Feb 14 03:04:05 org.apache.flink.client.deployment.ClusterDeploymentException: Could not deploy Yarn job cluster. Feb 14 03:04:05 at org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:566) Feb 14 03:04:05 at org.apache.flink.yarn.YARNFileReplicationITCase.deployPerJob(YARNFileReplicationITCase.java:109) Feb 14 03:04:05 at org.apache.flink.yarn.YARNFileReplicationITCase.lambda$testPerJobModeWithCustomizedFileReplication$0(YARNFileReplicationITCase.java:73) Feb 14 03:04:05 at org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:303) Feb 14 03:04:05 at org.apache.flink.yarn.YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication(YARNFileReplicationITCase.java:73) Feb 14 03:04:05 at java.lang.reflect.Method.invoke(Method.java:498) Feb 14 03:04:05 at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) Feb 14 03:04:05 at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) Feb 14 03:04:05 at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) Feb 14 03:04:05 at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) Feb 14 03:04:05 at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) Feb 14 03:04:05 Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/root/.flink/application_1707879779446_0002/log4j-api-2.17.1.jar could only be written to 0 of the 1 minReplication nodes. There are 2 datanode(s) running and 2 node(s) are excluded in this operation. Feb 14 03:04:05 at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2260) Feb 14 03:04:05 at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294) Feb 14 03:04:05 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2813) Feb 14 03:04:05 at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:908) Feb 14 03:04:05 at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:577) Feb 14 03:04:05 at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) Feb 14 03:04:05 at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:549) Feb 14 03:04:05 at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:518) Feb 14 03:04:05 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086) Feb 14 03:04:05 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029) Feb 14 03:04:05 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:957) Feb 14 03:04:05 at java.security.AccessController.doPrivileged(Native Method) Feb 14 03:04:05 at javax.security.auth.Subject.doAs(Subject.java:422) Feb 14 03:04:05 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) Feb 14 03:04:05 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2957) Feb 14 03:04:05 Feb 14 03:04:05 at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1579) Feb 14 03:04:05 at org.apache.hadoop.ipc.Client.call(Client.java:1525) Feb 14 03:04:05 at org.apache.hadoop.ipc.Client.call(Client.java:1422) Feb 14 03:04:05 at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:231) Feb 14 03:04:05 at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) Feb 14 03:04:05 at com.sun.proxy.$Proxy113.addBlock(Unknown Source) Feb 14 03:04:05 at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.ad
[jira] [Created] (FLINK-34434) DefaultSlotStatusSyncer doesn't complete the returned future
Matthias Pohl created FLINK-34434: - Summary: DefaultSlotStatusSyncer doesn't complete the returned future Key: FLINK-34434 URL: https://issues.apache.org/jira/browse/FLINK-34434 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.18.1, 1.17.2, 1.19.0, 1.20.0 Reporter: Matthias Pohl When looking into FLINK-34427 (unrelated), I noticed an odd line in [DefaultSlotStatusSyncer:155|https://github.com/apache/flink/blob/15fe1653acec45d7c7bac17071e9773a4aa690a4/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/DefaultSlotStatusSyncer.java#L155] where we complete a future that should be already completed (because the callback is triggered after the {{requestFuture}} is already completed in some way. Shouldn't we complete the {{returnedFuture}} instead? I'm keeping the priority at {{Major}} because it doesn't seem to have been an issue in the past. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34433) CollectionFunctionsITCase.test failed due to job restart
Matthias Pohl created FLINK-34433: - Summary: CollectionFunctionsITCase.test failed due to job restart Key: FLINK-34433 URL: https://issues.apache.org/jira/browse/FLINK-34433 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 1.19.0, 1.20.0 Reporter: Matthias Pohl https://github.com/apache/flink/actions/runs/7880739697/job/21503460772#step:10:11312 {code} Error: 02:33:24 02:33:24.955 [ERROR] Tests run: 439, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 56.57 s <<< FAILURE! -- in org.apache.flink.table.planner.functions.CollectionFunctionsITCase Error: 02:33:24 02:33:24.956 [ERROR] org.apache.flink.table.planner.functions.CollectionFunctionsITCase.test(TestCase)[81] -- Time elapsed: 1.141 s <<< ERROR! Feb 13 02:33:24 java.lang.RuntimeException: Job restarted Feb 13 02:33:24 at org.apache.flink.streaming.api.operators.collect.UncheckpointedCollectResultBuffer.sinkRestarted(UncheckpointedCollectResultBuffer.java:42) Feb 13 02:33:24 at org.apache.flink.streaming.api.operators.collect.AbstractCollectResultBuffer.dealWithResponse(AbstractCollectResultBuffer.java:87) Feb 13 02:33:24 at org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.next(CollectResultFetcher.java:124) Feb 13 02:33:24 at org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:126) Feb 13 02:33:24 at org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:100) Feb 13 02:33:24 at org.apache.flink.table.planner.connectors.CollectDynamicSink$CloseableRowIteratorWrapper.hasNext(CollectDynamicSink.java:247) Feb 13 02:33:24 at org.assertj.core.internal.Iterators.assertHasNext(Iterators.java:49) Feb 13 02:33:24 at org.assertj.core.api.AbstractIteratorAssert.hasNext(AbstractIteratorAssert.java:60) Feb 13 02:33:24 at org.apache.flink.table.planner.functions.BuiltInFunctionTestBase$ResultTestItem.test(BuiltInFunctionTestBase.java:383) Feb 13 02:33:24 at org.apache.flink.table.planner.functions.BuiltInFunctionTestBase$TestSetSpec.lambda$getTestCase$4(BuiltInFunctionTestBase.java:341) Feb 13 02:33:24 at org.apache.flink.table.planner.functions.BuiltInFunctionTestBase$TestCase.execute(BuiltInFunctionTestBase.java:119) Feb 13 02:33:24 at org.apache.flink.table.planner.functions.BuiltInFunctionTestBase.test(BuiltInFunctionTestBase.java:99) Feb 13 02:33:24 at java.lang.reflect.Method.invoke(Method.java:498) Feb 13 02:33:24 at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) Feb 13 02:33:24 at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) Feb 13 02:33:24 at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) Feb 13 02:33:24 at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) Feb 13 02:33:24 at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34428) WindowAggregateITCase#testEventTimeHopWindow_GroupingSets times out
Matthias Pohl created FLINK-34428: - Summary: WindowAggregateITCase#testEventTimeHopWindow_GroupingSets times out Key: FLINK-34428 URL: https://issues.apache.org/jira/browse/FLINK-34428 Project: Flink Issue Type: Bug Components: Table SQL / API Affects Versions: 1.18.1 Reporter: Matthias Pohl https://github.com/apache/flink/actions/runs/7866453368/job/21460921339#step:10:15127 {code} "main" #1 prio=5 os_prio=0 tid=0x7f1770cb7000 nid=0x4ad4d waiting on condition [0x7f17711f6000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xab48e3a0> (a java.util.concurrent.CompletableFuture$Signaller) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707) at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2131) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2099) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2077) at org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:876) at org.apache.flink.table.planner.runtime.stream.sql.WindowAggregateITCase.testTumbleWindowWithoutOutputWindowColumns(WindowAggregateITCase.scala:477) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34427) ResourceManagerTaskExecutorTest fails fatally (exit code 239)
Matthias Pohl created FLINK-34427: - Summary: ResourceManagerTaskExecutorTest fails fatally (exit code 239) Key: FLINK-34427 URL: https://issues.apache.org/jira/browse/FLINK-34427 Project: Flink Issue Type: Bug Reporter: Matthias Pohl https://github.com/apache/flink/actions/runs/7866453350/job/21460921911#step:10:8959 {code} Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239 Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests: Error: 02:28:53 02:28:53.220 [ERROR] org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest Error: 02:28:53 02:28:53.220 [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called? Error: 02:28:53 02:28:53.220 [ERROR] Command was /bin/sh -c cd '/root/flink/flink-runtime' && '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' '-XX:+UseG1GC' '-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' '--add-opens=java.base/java.util=ALL-UNNAMED' '--add-opens=java.base/java.lang=ALL-UNNAMED' '--add-opens=java.base/java.net=ALL-UNNAMED' '--add-opens=java.base/java.io=ALL-UNNAMED' '--add-opens=java.base/java.util.concurrent=ALL-UNNAMED' '-Xmx768m' '-jar' '/root/flink/flink-runtime/target/surefire/surefirebooter-20240212022332296_94.jar' '/root/flink/flink-runtime/target/surefire' '2024-02-12T02-21-39_495-jvmRun3' 'surefire-20240212022332296_88tmp' 'surefire_26-20240212022332296_91tmp' Error: 02:28:53 02:28:53.220 [ERROR] Error occurred in starting fork, check output in log Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239 Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests: Error: 02:28:53 02:28:53.221 [ERROR] org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest Error: 02:28:53 02:28:53.221 [ERROR]at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456) [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34426) HybridShuffleITCase.testHybridSelectiveExchangesRestart times out
Matthias Pohl created FLINK-34426: - Summary: HybridShuffleITCase.testHybridSelectiveExchangesRestart times out Key: FLINK-34426 URL: https://issues.apache.org/jira/browse/FLINK-34426 Project: Flink Issue Type: Bug Components: Runtime / Network Affects Versions: 1.18.1 Reporter: Matthias Pohl https://github.com/apache/flink/actions/runs/7851900779/job/21429781783#step:10:9052 {code} "ForkJoinPool-1-worker-3" #16 daemon prio=5 os_prio=0 cpu=3397.79ms elapsed=11462.88s tid=0x7f48966b3800 nid=0x7a303 waiting on condition [0x7f486e97a000] java.lang.Thread.State: WAITING (parking) at jdk.internal.misc.Unsafe.park(java.base@11.0.19/Native Method) - parking to wait for <0xa2faa230> (a java.util.concurrent.CompletableFuture$Signaller) at java.util.concurrent.locks.LockSupport.park(java.base@11.0.19/LockSupport.java:194) at java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.19/CompletableFuture.java:1796) at java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.19/ForkJoinPool.java:3118) at java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.19/CompletableFuture.java:1823) at java.util.concurrent.CompletableFuture.get(java.base@11.0.19/CompletableFuture.java:1998) at org.apache.flink.util.AutoCloseableAsync.close(AutoCloseableAsync.java:36) at org.apache.flink.test.runtime.JobGraphRunningUtil.execute(JobGraphRunningUtil.java:61) at org.apache.flink.test.runtime.BatchShuffleITCaseBase.executeJob(BatchShuffleITCaseBase.java:117) at org.apache.flink.test.runtime.HybridShuffleITCase.testHybridSelectiveExchangesRestart(HybridShuffleITCase.java:79) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.19/Native Method) [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34425) TaskManagerRunnerITCase#testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure times out
Matthias Pohl created FLINK-34425: - Summary: TaskManagerRunnerITCase#testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure times out Key: FLINK-34425 URL: https://issues.apache.org/jira/browse/FLINK-34425 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.19.0, 1.20.0 Reporter: Matthias Pohl https://github.com/apache/flink/actions/runs/7851900616/job/21429757962#step:10:8844 {code} Feb 10 03:21:45 "main" #1 [498632] prio=5 os_prio=0 cpu=619.91ms elapsed=1653.40s tid=0x7fbd29695000 nid=498632 waiting on condition [0x7fbd2b9f3000] Feb 10 03:21:45java.lang.Thread.State: WAITING (parking) Feb 10 03:21:45 at jdk.internal.misc.Unsafe.park(java.base@21.0.1/Native Method) Feb 10 03:21:45 - parking to wait for <0xae6199f0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) Feb 10 03:21:45 at java.util.concurrent.locks.LockSupport.park(java.base@21.0.1/LockSupport.java:371) Feb 10 03:21:45 at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(java.base@21.0.1/AbstractQueuedSynchronizer.java:519) Feb 10 03:21:45 at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@21.0.1/ForkJoinPool.java:3780) Feb 10 03:21:45 at java.util.concurrent.ForkJoinPool.managedBlock(java.base@21.0.1/ForkJoinPool.java:3725) Feb 10 03:21:45 at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@21.0.1/AbstractQueuedSynchronizer.java:1707) Feb 10 03:21:45 at java.lang.ProcessImpl.waitFor(java.base@21.0.1/ProcessImpl.java:425) Feb 10 03:21:45 at org.apache.flink.test.recovery.TaskManagerRunnerITCase.testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure(TaskManagerRunnerITCase.java:126) Feb 10 03:21:45 at java.lang.invoke.LambdaForm$DMH/0x7fbccb1b8000.invokeVirtual(java.base@21.0.1/LambdaForm$DMH) Feb 10 03:21:45 at java.lang.invoke.LambdaForm$MH/0x7fbccb1b8800.invoke(java.base@21.0.1/LambdaForm$MH) [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34424) BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times out
Matthias Pohl created FLINK-34424: - Summary: BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times out Key: FLINK-34424 URL: https://issues.apache.org/jira/browse/FLINK-34424 Project: Flink Issue Type: Bug Components: Runtime / Network Affects Versions: 1.19.0, 1.20.0 Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57446&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=9151 {code} Feb 11 13:55:29 "ForkJoinPool-50-worker-25" #414 daemon prio=5 os_prio=0 tid=0x7f19503af800 nid=0x284c in Object.wait() [0x7f191b6db000] Feb 11 13:55:29java.lang.Thread.State: WAITING (on object monitor) Feb 11 13:55:29 at java.lang.Object.wait(Native Method) Feb 11 13:55:29 at java.lang.Thread.join(Thread.java:1252) Feb 11 13:55:29 - locked <0xe2e019a8> (a org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest$LongReader) Feb 11 13:55:29 at org.apache.flink.core.testutils.CheckedThread.trySync(CheckedThread.java:104) Feb 11 13:55:29 at org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:92) Feb 11 13:55:29 at org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:81) Feb 11 13:55:29 at org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest.testRead10ConsumersConcurrent(BoundedBlockingSubpartitionWriteReadTest.java:177) Feb 11 13:55:29 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34423) Make tool/ci/compile_ci.sh not necessarily rely on clean phase
Matthias Pohl created FLINK-34423: - Summary: Make tool/ci/compile_ci.sh not necessarily rely on clean phase Key: FLINK-34423 URL: https://issues.apache.org/jira/browse/FLINK-34423 Project: Flink Issue Type: Sub-task Components: Build System / CI Affects Versions: 1.18.1, 1.19.0, 1.20.0 Reporter: Matthias Pohl The GHA job {{Test packaging/licensing}} job runs [.github/workflows/template.flink-ci.yml:169|https://github.com/apache/flink/blob/85edd784fc72c1784849e2b122cbf3215f89817c/.github/workflows/template.flink-ci.yml#L169] which enables Maven's {{clean}} phase. This triggers redundant work because the {{Test packaging/licensing}} job wouldn't utilize the build artifacts of the previous {{Compile}} job but rerun the {{test-compile}} once more. Disabling {{clean}} should improve the runtime of the {{Test packaging/licensing}} job. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34419) flink-docker's .github/workflows/snapshot.yml doesn't support JDK 17 and 21
Matthias Pohl created FLINK-34419: - Summary: flink-docker's .github/workflows/snapshot.yml doesn't support JDK 17 and 21 Key: FLINK-34419 URL: https://issues.apache.org/jira/browse/FLINK-34419 Project: Flink Issue Type: Technical Debt Components: Build System / CI Reporter: Matthias Pohl [.github/workflows/snapshot.yml|https://github.com/apache/flink-docker/blob/master/.github/workflows/snapshot.yml#L40] needs to be updated: JDK 17 support was added in 1.18 (FLINK-15736). JDK 21 support was added in 1.19 (FLINK-33163) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34418) YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots fail
Matthias Pohl created FLINK-34418: - Summary: YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots failed due to disk space Key: FLINK-34418 URL: https://issues.apache.org/jira/browse/FLINK-34418 Project: Flink Issue Type: Bug Components: Test Infrastructure Affects Versions: 1.18.1, 1.19.0, 1.20.0 Reporter: Matthias Pohl [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746] {code:java} [...] Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device 27608Feb 09 03:00:13at java.io.FileOutputStream.writeBytes(Native Method) 27609Feb 09 03:00:13at java.io.FileOutputStream.write(FileOutputStream.java:326) 27610Feb 09 03:00:13at org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250) 27611Feb 09 03:00:13... 39 more [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34416) "Local recovery and sticky scheduling end-to-end test" still doesn't work with AdaptiveScheduler
Matthias Pohl created FLINK-34416: - Summary: "Local recovery and sticky scheduling end-to-end test" still doesn't work with AdaptiveScheduler Key: FLINK-34416 URL: https://issues.apache.org/jira/browse/FLINK-34416 Project: Flink Issue Type: Technical Debt Components: Runtime / Coordination Affects Versions: 1.18.1, 1.19.0, 1.20.0 Reporter: Matthias Pohl We tried to enable all {{AdaptiveScheduler}}-related tests in FLINK-34409 because it appeared that all Jira issues that were referenced are resolved. That's not the case for the {{"Local recovery and sticky scheduling end-to-end test"}} tests, though. With the {{AdaptiveScheduler}} being enabled, we run into issues where the test runs forever due to a {{NullPointerException}} continuously triggering a failure: {code} Feb 07 19:02:59 2024-02-07 19:02:21,706 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Flat Map -> Sink: Unnamed (3/4) (54075d3d22edb729e5f396726f777860_20ba6b65f97481d5570070de90e4e791_2_16292) switched from INITIALIZING to FAILED on localhost:40893-09ff7> Feb 07 19:02:59 java.lang.NullPointerException: Expected to find info here. Feb 07 19:02:59 at org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:76) ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] Feb 07 19:02:59 at org.apache.flink.streaming.tests.StickyAllocationAndLocalRecoveryTestJob$StateCreatingFlatMap.initializeState(StickyAllocationAndLocalRecoveryTestJob.java:340) ~[?:?] Feb 07 19:02:59 at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:187) ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] Feb 07 19:02:59 at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:169) ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] Feb 07 19:02:59 at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96) ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] Feb 07 19:02:59 at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.initializeOperatorState(StreamOperatorStateHandler.java:134) ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] Feb 07 19:02:59 at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:285) ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] Feb 07 19:02:59 at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106) ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] Feb 07 19:02:59 at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateAndGates(StreamTask.java:799) ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] Feb 07 19:02:59 at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$restoreInternal$3(StreamTask.java:753) ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] Feb 07 19:02:59 at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55) ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] Feb 07 19:02:59 at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:753) ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] Feb 07 19:02:59 at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:712) ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] Feb 07 19:02:59 at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958) ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] Feb 07 19:02:59 at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] Feb 07 19:02:59 at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:751) ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] Feb 07 19:02:59 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566) ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] Feb 07 19:02:59 at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_402] {code} This error is caused by a Precondition in [StickyAllocationAndLocalRecoveryTestJob:340|https://github.com/apache/flink/blob/0f3470db83c1fddba9ac9a7299b1e61baab4ff12/flink-end-to-end-tests/flink-local-recovery-and-allocation-test/src/main/java/org/apache/flink/streaming/tests/StickyAllocationAndLocalRecoveryTestJob.java#L340] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34412) ResultPartitionDeploymentDescriptorTest fails due to fatal error (239 exit code)
Matthias Pohl created FLINK-34412: - Summary: ResultPartitionDeploymentDescriptorTest fails due to fatal error (239 exit code) Key: FLINK-34412 URL: https://issues.apache.org/jira/browse/FLINK-34412 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.17.2 Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57388&view=logs&j=77a9d8e1-d610-59b3-fc2a-4766541e0e33&t=125e07e7-8de0-5c6c-a541-a567415af3ef&l=8323 {code} Feb 08 04:56:31 [ERROR] org.apache.flink.runtime.deployment.ResultPartitionDeploymentDescriptorTest Feb 08 04:56:31 [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called? Feb 08 04:56:31 [ERROR] Command was /bin/sh -c cd /__w/1/s/flink-runtime && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -XX:+UseG1GC -Xms256m -Xmx768m -jar /__w/1/s/flink-runtime/target/surefire/surefirebooter6684124987290515696.jar /__w/1/s/flink-runtime/target/surefire 2024-02-08T04-45-49_396-jvmRun4 surefire6142105262662423760tmp surefire_245661504424247139476tmp Feb 08 04:56:31 [ERROR] Error occurred in starting fork, check output in log Feb 08 04:56:31 [ERROR] Process Exit Code: 239 Feb 08 04:56:31 [ERROR] Crashed tests: Feb 08 04:56:31 [ERROR] org.apache.flink.runtime.deployment.ResultPartitionDeploymentDescriptorTest Feb 08 04:56:31 [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:532) Feb 08 04:56:31 [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkOnceMultiple(ForkStarter.java:405) Feb 08 04:56:31 [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:321) Feb 08 04:56:31 [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:266) Feb 08 04:56:31 [ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1314) Feb 08 04:56:31 [ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1159) Feb 08 04:56:31 [ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:932) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34411) "Wordcount on Docker test (custom fs plugin)" timed out with some strange issue while setting the test up
Matthias Pohl created FLINK-34411: - Summary: "Wordcount on Docker test (custom fs plugin)" timed out with some strange issue while setting the test up Key: FLINK-34411 URL: https://issues.apache.org/jira/browse/FLINK-34411 Project: Flink Issue Type: Bug Components: Test Infrastructure Affects Versions: 1.19.0, 1.20.0 Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57380&view=logs&j=bea52777-eaf8-5663-8482-18fbc3630e81&t=43ba8ce7-ebbf-57cd-9163-444305d74117&l=5802 {code} Feb 07 15:22:39 == Feb 07 15:22:39 Running 'Wordcount on Docker test (custom fs plugin)' Feb 07 15:22:39 == Feb 07 15:22:39 TEST_DATA_DIR: /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-39516987853 Feb 07 15:22:40 Flink dist directory: /home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT Feb 07 15:22:40 Flink dist directory: /home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT Feb 07 15:22:41 Docker version 24.0.7, build afdd53b Feb 07 15:22:44 docker-compose version 1.29.2, build 5becea4c Feb 07 15:22:44 Starting fileserver for Flink distribution Feb 07 15:22:44 ~/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin ~/work/1/s Feb 07 15:23:07 ~/work/1/s Feb 07 15:23:07 ~/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-39516987853 ~/work/1/s Feb 07 15:23:07 Preparing Dockeriles Feb 07 15:23:07 Executing command: git clone https://github.com/apache/flink-docker.git --branch dev-1.19 --single-branch Cloning into 'flink-docker'... /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/common_docker.sh: line 65: ./add-custom.sh: No such file or directory Feb 07 15:23:07 Building images ERROR: unable to prepare context: path "dev/test_docker_embedded_job-ubuntu" not found Feb 07 15:23:09 ~/work/1/s Feb 07 15:23:09 Command: build_image test_docker_embedded_job failed. Retrying... Feb 07 15:23:14 Starting fileserver for Flink distribution Feb 07 15:23:14 ~/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin ~/work/1/s Feb 07 15:23:36 ~/work/1/s Feb 07 15:23:36 ~/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-39516987853 ~/work/1/s Feb 07 15:23:36 Preparing Dockeriles Feb 07 15:23:36 Executing command: git clone https://github.com/apache/flink-docker.git --branch dev-1.19 --single-branch fatal: destination path 'flink-docker' already exists and is not an empty directory. Feb 07 15:23:36 Retry 1/5 exited 128, retrying in 1 seconds... Traceback (most recent call last): File "/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/python3_fileserver.py", line 26, in httpd = socketserver.TCPServer(("", ), handler) File "/usr/lib/python3.8/socketserver.py", line 452, in __init__ self.server_bind() File "/usr/lib/python3.8/socketserver.py", line 466, in server_bind self.socket.bind(self.server_address) OSError: [Errno 98] Address already in use [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34410) Disable nightly trigger in forks
Matthias Pohl created FLINK-34410: - Summary: Disable nightly trigger in forks Key: FLINK-34410 URL: https://issues.apache.org/jira/browse/FLINK-34410 Project: Flink Issue Type: Technical Debt Components: Build System / CI Affects Versions: 1.20.0 Reporter: Matthias Pohl We can disable the automatic triggering of the nightly trigger workflow in fork (see [GHA docs|https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions]s: {code} if: github.repository == 'octo-org/octo-repo-prod' {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34409) Increase test coverage for AdaptiveScheduler
Matthias Pohl created FLINK-34409: - Summary: Increase test coverage for AdaptiveScheduler Key: FLINK-34409 URL: https://issues.apache.org/jira/browse/FLINK-34409 Project: Flink Issue Type: Technical Debt Components: Runtime / Coordination Affects Versions: 1.18.1, 1.19.0, 1.20.0 Reporter: Matthias Pohl There are still several tests disabled for the {{AdaptiveScheduler}} which we can enable now. All the issues seem to have been fixed. We can even remove the annotation {{@FailsWithAdaptiveScheduler}} now. It's not needed anymore. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34408) VeryBigPbProtoToRowTest#testSimple fails with OOM
Matthias Pohl created FLINK-34408: - Summary: VeryBigPbProtoToRowTest#testSimple fails with OOM Key: FLINK-34408 URL: https://issues.apache.org/jira/browse/FLINK-34408 Project: Flink Issue Type: Bug Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) Affects Versions: 1.20.0 Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57371&view=logs&j=fc5181b0-e452-5c8f-68de-1097947f6483&t=995c650b-6573-581c-9ce6-7ad4cc038461&l=23861 {code} Feb 07 09:40:16 09:40:16.314 [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 29.58 s <<< FAILURE! -- in org.apache.flink.formats.protobuf.VeryBigPbProtoToRowTest Feb 07 09:40:16 09:40:16.314 [ERROR] org.apache.flink.formats.protobuf.VeryBigPbProtoToRowTest.testSimple -- Time elapsed: 29.57 s <<< ERROR! Feb 07 09:40:16 org.apache.flink.util.FlinkRuntimeException: Error in serialization. Feb 07 09:40:16 at org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327) Feb 07 09:40:16 at org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162) Feb 07 09:40:16 at org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007) Feb 07 09:40:16 at org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56) Feb 07 09:40:16 at org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45) Feb 07 09:40:16 at org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61) Feb 07 09:40:16 at org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104) Feb 07 09:40:16 at org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81) Feb 07 09:40:16 at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440) Feb 07 09:40:16 at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421) Feb 07 09:40:16 at org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495) Feb 07 09:40:16 at org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382) Feb 07 09:40:16 at org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367) Feb 07 09:40:16 at org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66) Feb 07 09:40:16 at org.apache.flink.formats.protobuf.ProtobufTestHelper.pbBytesToRow(ProtobufTestHelper.java:121) Feb 07 09:40:16 at org.apache.flink.formats.protobuf.ProtobufTestHelper.pbBytesToRow(ProtobufTestHelper.java:103) Feb 07 09:40:16 at org.apache.flink.formats.protobuf.ProtobufTestHelper.pbBytesToRow(ProtobufTestHelper.java:98) Feb 07 09:40:16 at org.apache.flink.formats.protobuf.VeryBigPbProtoToRowTest.testSimple(VeryBigPbProtoToRowTest.java:36) Feb 07 09:40:16 at java.lang.reflect.Method.invoke(Method.java:498) Feb 07 09:40:16 Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space Feb 07 09:40:16 at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) Feb 07 09:40:16 at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) Feb 07 09:40:16 at org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:323) Feb 07 09:40:16 ... 18 more Feb 07 09:40:16 Caused by: java.lang.OutOfMemoryError: Java heap space Feb 07 09:40:16 at java.util.Arrays.copyOf(Arrays.java:3236) Feb 07 09:40:16 at java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:191) Feb 07 09:40:16 at org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:555) Feb 07 09:40:16 at org.apache.flink.util.InstantiationUtil.writeObjectToConfig(InstantiationUtil.java:486) Feb 07 09:40:16 at org.apache.flink.streaming.api.graph.StreamConfig.lambda$triggerSerializationAndReturnFuture$0(StreamConfig.java:182) Feb 07 09:40:16 at org.apache.flink.streaming.api.graph.StreamConfig$$Lambda$1582/1961611609.accept(Unknown Source) Feb 07 09:40:16 at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670) Feb 07 09:40:16 at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:646) Feb 07 09:40:16 at java.util.concurrent.CompletableFuture$Completion.run(CompletableFutu
[jira] [Created] (FLINK-34405) RightOuterJoinTaskTest#testCancelOuterJoinTaskWhileSort2 fails
Matthias Pohl created FLINK-34405: - Summary: RightOuterJoinTaskTest#testCancelOuterJoinTaskWhileSort2 fails Key: FLINK-34405 URL: https://issues.apache.org/jira/browse/FLINK-34405 Project: Flink Issue Type: Bug Components: API / Core Affects Versions: 1.19.0, 1.20.0 Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57357&view=logs&j=d89de3df-4600-5585-dadc-9bbc9a5e661c&t=be5a4b15-4b23-56b1-7582-795f58a645a2&l=9027 {code} Feb 07 03:20:16 03:20:16.223 [ERROR] Failures: Feb 07 03:20:16 03:20:16.223 [ERROR] org.apache.flink.runtime.operators.RightOuterJoinTaskTest.testCancelOuterJoinTaskWhileSort2 Feb 07 03:20:16 03:20:16.223 [ERROR] Run 1: RightOuterJoinTaskTest>AbstractOuterJoinTaskTest.testCancelOuterJoinTaskWhileSort2:435 Feb 07 03:20:16 expected: Feb 07 03:20:16 null Feb 07 03:20:16 but was: Feb 07 03:20:16 java.lang.Exception: The data preparation caused an error: Interrupted Feb 07 03:20:16 at org.apache.flink.runtime.operators.testutils.BinaryOperatorTestBase.testDriverInternal(BinaryOperatorTestBase.java:209) Feb 07 03:20:16 at org.apache.flink.runtime.operators.testutils.BinaryOperatorTestBase.testDriver(BinaryOperatorTestBase.java:189) Feb 07 03:20:16 at org.apache.flink.runtime.operators.AbstractOuterJoinTaskTest.access$100(AbstractOuterJoinTaskTest.java:48) Feb 07 03:20:16 ...(1 remaining lines not displayed - this can be changed with Assertions.setMaxStackTraceElementsDisplayed) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34404) RestoreTestBase#testRestore times out
Matthias Pohl created FLINK-34404: - Summary: RestoreTestBase#testRestore times out Key: FLINK-34404 URL: https://issues.apache.org/jira/browse/FLINK-34404 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 1.19.0, 1.20.0 Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57357&view=logs&j=32715a4c-21b8-59a3-4171-744e5ab107eb&t=ff64056b-5320-5afe-c22c-6fa339e59586&l=11603 {code} Feb 07 02:17:40 "ForkJoinPool-74-worker-1" #382 daemon prio=5 os_prio=0 cpu=282.22ms elapsed=961.78s tid=0x7f880a485c00 nid=0x6745 waiting on condition [0x7f878a6f9000] Feb 07 02:17:40java.lang.Thread.State: WAITING (parking) Feb 07 02:17:40 at jdk.internal.misc.Unsafe.park(java.base@17.0.7/Native Method) Feb 07 02:17:40 - parking to wait for <0xff73d060> (a java.util.concurrent.CompletableFuture$Signaller) Feb 07 02:17:40 at java.util.concurrent.locks.LockSupport.park(java.base@17.0.7/LockSupport.java:211) Feb 07 02:17:40 at java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.7/CompletableFuture.java:1864) Feb 07 02:17:40 at java.util.concurrent.ForkJoinPool.compensatedBlock(java.base@17.0.7/ForkJoinPool.java:3449) Feb 07 02:17:40 at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.7/ForkJoinPool.java:3432) Feb 07 02:17:40 at java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.7/CompletableFuture.java:1898) Feb 07 02:17:40 at java.util.concurrent.CompletableFuture.get(java.base@17.0.7/CompletableFuture.java:2072) Feb 07 02:17:40 at org.apache.flink.table.planner.plan.nodes.exec.testutils.RestoreTestBase.testRestore(RestoreTestBase.java:292) Feb 07 02:17:40 at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@17.0.7/Native Method) Feb 07 02:17:40 at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@17.0.7/NativeMethodAccessorImpl.java:77) Feb 07 02:17:40 at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@17.0.7/DelegatingMethodAccessorImpl.java:43) Feb 07 02:17:40 at java.lang.reflect.Method.invoke(java.base@17.0.7/Method.java:568) Feb 07 02:17:40 at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728) [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34361) PyFlink end-to-end test fails in GHA
Matthias Pohl created FLINK-34361: - Summary: PyFlink end-to-end test fails in GHA Key: FLINK-34361 URL: https://issues.apache.org/jira/browse/FLINK-34361 Project: Flink Issue Type: Bug Components: API / Python Affects Versions: 1.19.0 Reporter: Matthias Pohl "PyFlink end-to-end test" fails: https://github.com/apache/flink/actions/runs/7778642859/job/21208811659#step:14:7420 The only error I could identify is: {code} ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. conda 23.5.2 requires ruamel-yaml<0.18,>=0.11.14, but you have ruamel-yaml 0.18.5 which is incompatible. Feb 05 03:31:54 Successfully installed apache-beam-2.48.0 avro-python3-1.10.2 cloudpickle-2.2.1 crcmod-1.7 cython-3.0.8 dill-0.3.1.1 dnspython-2.5.0 docopt-0.6.2 exceptiongroup-1.2.0 fastavro-1.9.3 fasteners-0.19 find-libpython-0.3.1 grpcio-1.50.0 grpcio-tools-1.50.0 hdfs-2.7.3 httplib2-0.22.0 iniconfig-2.0.0 numpy-1.24.4 objsize-0.6.1 orjson-3.9.13 pandas-2.2.0 pemja-0.4.1 proto-plus-1.23.0 protobuf-4.23.4 py4j-0.10.9.7 pyarrow-11.0.0 pydot-1.4.2 pymongo-4.6.1 pyparsing-3.1.1 pytest-7.4.4 python-dateutil-2.8.2 pytz-2024.1 regex-2023.12.25 ruamel.yaml-0.18.5 ruamel.yaml.clib-0.2.8 tomli-2.0.1 typing-extensions-4.9.0 tzdata-2023.4 /home/runner/work/flink/flink/flink-python/dev/.conda/lib/python3.10/site-packages/Cython/Compiler/Main.py:381: FutureWarning: Cython directive 'language_level' not set, using '3str' for now (Py3). This has changed from earlier releases! File: /home/runner/work/flink/flink/flink-python/pyflink/fn_execution/table/window_aggregate_fast.pxd tree = Parsing.p_module(s, pxd, full_module_name) {code} Not sure whether that's the actual cause. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34360) GHA e2e test failure due to no space left on device error
Matthias Pohl created FLINK-34360: - Summary: GHA e2e test failure due to no space left on device error Key: FLINK-34360 URL: https://issues.apache.org/jira/browse/FLINK-34360 Project: Flink Issue Type: Bug Components: Tests Reporter: Matthias Pohl https://github.com/apache/flink/actions/runs/7763815214 {code} AdaptiveScheduler / E2E (group 2) Process completed with exit code 1. AdaptiveScheduler / E2E (group 2) You are running out of disk space. The runner will stop working when the machine runs out of disk space. Free space left: 35 MB {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34359) "Kerberized YARN per-job on Docker test (default input)" failed due to IllegalStateException
Matthias Pohl created FLINK-34359: - Summary: "Kerberized YARN per-job on Docker test (default input)" failed due to IllegalStateException Key: FLINK-34359 URL: https://issues.apache.org/jira/browse/FLINK-34359 Project: Flink Issue Type: Bug Components: Deployment / YARN Affects Versions: 1.18.1 Reporter: Matthias Pohl This looks similar to FLINK-34357 because it's also due to some YARN issue. But the e2e test "Kerberized YARN per-job on Docker test (default input)" is causing the failure: {code} [...] Exception in thread "Thread-4" java.lang.IllegalStateException: Trying to access closed classloader. Please check if you store classloaders directly or indirectly in static fields. If the stacktrace suggests that the leak occurs in a third party library and cannot be fixed immediately, you can disable this check with the configuration 'classloader.check-leaked-classloader'. at org.apache.flink.util.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:184) at org.apache.flink.util.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.getResource(FlinkUserCodeClassLoaders.java:208) at org.apache.hadoop.conf.Configuration.getResource(Configuration.java:2570) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2801) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2776) at org.apache.hadoop.conf.Configuration.loadProps(Configuration.java:2654) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2636) at org.apache.hadoop.conf.Configuration.get(Configuration.java:1100) at org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1707) at org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1688) at org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183) at org.apache.hadoop.util.ShutdownHookManager.shutdownExecutor(ShutdownHookManager.java:145) at org.apache.hadoop.util.ShutdownHookManager.access$300(ShutdownHookManager.java:65) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:102) {code} https://github.com/apache/flink/actions/runs/7770984519/job/21191905887#step:14:11720 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34357) IllegalAnnotationsException causes "PyFlink YARN per-job on Docker test" e2e test to fail
Matthias Pohl created FLINK-34357: - Summary: IllegalAnnotationsException causes "PyFlink YARN per-job on Docker test" e2e test to fail Key: FLINK-34357 URL: https://issues.apache.org/jira/browse/FLINK-34357 Project: Flink Issue Type: Bug Components: Deployment / YARN Affects Versions: 1.18.1 Reporter: Matthias Pohl https://github.com/apache/flink/actions/runs/7763815214/job/21176570116#step:14:10009 {code} Feb 03 03:29:04 SEVERE: Failed to generate the schema for the JAX-B elements Feb 03 03:29:04 javax.xml.bind.JAXBException Feb 03 03:29:04 - with linked exception: Feb 03 03:29:04 [java.lang.reflect.InvocationTargetException] Feb 03 03:29:04 at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:262) Feb 03 03:29:04 at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:234) [...] Feb 03 03:29:04 at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Feb 03 03:29:04 Caused by: java.lang.reflect.InvocationTargetException Feb 03 03:29:04 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Feb 03 03:29:04 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) Feb 03 03:29:04 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) Feb 03 03:29:04 at java.lang.reflect.Method.invoke(Method.java:498) Feb 03 03:29:04 at org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.ContextFactory.createContext(ContextFactory.java:44) Feb 03 03:29:04 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Feb 03 03:29:04 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) Feb 03 03:29:04 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) Feb 03 03:29:04 at java.lang.reflect.Method.invoke(Method.java:498) Feb 03 03:29:04 at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:247) Feb 03 03:29:04 ... 57 more Feb 03 03:29:04 Caused by: com.sun.xml.internal.bind.v2.runtime.IllegalAnnotationsException: 1 counts of IllegalAnnotationExceptions Feb 03 03:29:04 java.util.Set is an interface, and JAXB can't handle interfaces. Feb 03 03:29:04 this problem is related to the following location: Feb 03 03:29:04 at java.util.Set Feb 03 03:29:04 at public java.util.HashMap org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getPrimaryFiltersJAXB() Feb 03 03:29:04 at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity Feb 03 03:29:04 at public java.util.List org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities() Feb 03 03:29:04 at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities Feb 03 03:29:04 Feb 03 03:29:04 at com.sun.xml.internal.bind.v2.runtime.IllegalAnnotationsException$Builder.check(IllegalAnnotationsException.java:91) Feb 03 03:29:04 at com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.getTypeInfoSet(JAXBContextImpl.java:445) Feb 03 03:29:04 at com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:277) Feb 03 03:29:04 at com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:124) Feb 03 03:29:04 at com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1123) Feb 03 03:29:04 at com.sun.xml.internal.bind.v2.ContextFactory.createContext(ContextFactory.java:147) Feb 03 03:29:04 ... 67 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34343) ResourceManager registration is not completed when registering the JobMaster
Matthias Pohl created FLINK-34343: - Summary: ResourceManager registration is not completed when registering the JobMaster Key: FLINK-34343 URL: https://issues.apache.org/jira/browse/FLINK-34343 Project: Flink Issue Type: Bug Components: Runtime / Coordination, Runtime / RPC Affects Versions: 1.19.0 Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57203&view=logs&j=64debf87-ecdb-5aef-788d-8720d341b5cb&t=2302fb98-0839-5df2-3354-bbae636f81a7&l=8066 The test run failed due to a NullPointerException: {code} Feb 02 01:11:55 2024-02-02 01:11:47,791 INFO org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor [] - The rpc endpoint org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager has not been started yet. Discarding message LocalFencedMessage(000 0, LocalRpcInvocation(ResourceManagerGateway.registerJobMaster(JobMasterId, ResourceID, String, JobID, Time))) until processing is started. Feb 02 01:11:55 2024-02-02 01:11:47,797 WARN org.apache.flink.runtime.rpc.pekko.SupervisorActor [] - RpcActor pekko://flink/user/rpc/resourcemanager_2 has failed. Shutting it down now. Feb 02 01:11:55 java.lang.NullPointerException: Cannot invoke "org.apache.flink.runtime.rpc.RpcServer.getAddress()" because "this.rpcServer" is null Feb 02 01:11:55 at org.apache.flink.runtime.rpc.RpcEndpoint.getAddress(RpcEndpoint.java:322) ~[flink-dist-1.19-SNAPSHOT.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:182) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at scala.PartialFunction.applyOrElse(PartialFunction.scala:127) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at java.util.concurrent.ForkJoinTask.doExec(Unknown Source) ~[?:?] Feb 02 01:11:55 at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source) ~[?:?] Feb 02 01:11:55 at java.util.concurrent.ForkJoinPool.scan(Unknown Source) ~[?:?] Feb 02 01:11:55 at java.util.concurrent.ForkJoinPool.runWorker(Unknown Source) ~[?:?] Feb 02 01:11:55 at java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source) ~[?:?] {code} -- This
[jira] [Created] (FLINK-34333) Fix FLINK-34007 LeaderElector bug in 1.18
Matthias Pohl created FLINK-34333: - Summary: Fix FLINK-34007 LeaderElector bug in 1.18 Key: FLINK-34333 URL: https://issues.apache.org/jira/browse/FLINK-34333 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.18.1 Reporter: Matthias Pohl FLINK-34007 revealed a bug in the k8s client v6.6.2 which we're using since Flink 1.18. This issue was fixed with FLINK-34007 for Flink 1.19 which required an update of the k8s client to v6.9.0. This Jira issue is about finding a solution in Flink 1.18 for the very same problem FLINK-34007 covered. It's a dedicated Jira issue because we want to unblock the release of 1.19 by resolving FLINK-34007. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34332) Investigate the permissions
Matthias Pohl created FLINK-34332: - Summary: Investigate the permissions Key: FLINK-34332 URL: https://issues.apache.org/jira/browse/FLINK-34332 Project: Flink Issue Type: Sub-task Components: Build System / CI Affects Versions: 1.18.1, 1.19.0 Reporter: Matthias Pohl We're currently using {{read-all}} for our workflows. We might want to limit the scope and document why certain reads are needed (see [GHA docs|https://docs.github.com/en/actions/using-jobs/assigning-permissions-to-jobs]). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34331) Enable Apache INFRA runners for nightly builds
Matthias Pohl created FLINK-34331: - Summary: Enable Apache INFRA runners for nightly builds Key: FLINK-34331 URL: https://issues.apache.org/jira/browse/FLINK-34331 Project: Flink Issue Type: Sub-task Components: Build System / CI Affects Versions: 1.18.1, 1.19.0 Reporter: Matthias Pohl The nightly CI is currently still utilizing the GitHub runners. We want to switch to Apache INFRA runners or ephemeral runners. -- This message was sent by Atlassian Jira (v8.20.10#820010)