from:"Matthias Pohl \(Jira\)"

[jira] [Created] (FLINK-36356) HadoopRecoverableWriterTest.testRecoverWithState due to IOException

2024-09-24 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-36356:
-

 Summary: HadoopRecoverableWriterTest.testRecoverWithState due to 
IOException
 Key: FLINK-36356
 URL: https://issues.apache.org/jira/browse/FLINK-36356
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hadoop Compatibility
Affects Versions: 2.0-preview
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62378&view=logs&j=2e8cb2f7-b2d3-5c62-9c05-cd756d33a819&t=2dd510a3-5041-5201-6dc3-54d310f68906&l=10514

{code}
Sep 23 07:55:16 07:55:16.451 [ERROR] Tests run: 12, Failures: 0, Errors: 1, 
Skipped: 0, Time elapsed: 20.05 s <<< FAILURE! -- in 
org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriterTest
Sep 23 07:55:16 07:55:16.451 [ERROR] 
org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriterTest.testRecoverWithState
 -- Time elapsed: 2.694 s <<< ERROR!
Sep 23 07:55:16 java.io.IOException: All datanodes 
[DatanodeInfoWithStorage[127.0.0.1:45240,DS-13a30476-dff5-4f3a-88b1-887571521a95,DISK]]
 are bad. Aborting...
Sep 23 07:55:16 at 
org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1537)
Sep 23 07:55:16 at 
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1472)
Sep 23 07:55:16 at 
org.apache.hadoop.hdfs.DataStreamer.processDatanodeError(DataStreamer.java:1244)
Sep 23 07:55:16 at 
org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:663)
{code}

The Maven logs reveal a bit more (I attached the extract of the failed build):
{code}
07:55:13,491 [DataXceiver for client DFSClient_NONMAPREDUCE_211593080_35 at 
/127.0.0.1:59360 [Receiving block 
BP-289839883-172.27.0.2-1727078098659:blk_1073741832_1016]] ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode  [] - 
127.0.0.1:46429:DataXceiver error processing WRITE_BLOCK operation  src: 
/127.0.0.1:59360 dst: /127.0.0.1:46429
java.nio.channels.ClosedByInterruptException: null
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
 ~[?:1.8.0_292]
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:406) 
~[?:1.8.0_292]
at 
org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
 ~[hadoop-common-2.10.2.jar:?]
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) 
~[hadoop-common-2.10.2.jar:?]
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) 
~[hadoop-common-2.10.2.jar:?]
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) 
~[hadoop-common-2.10.2.jar:?]
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) 
~[?:1.8.0_292]
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) 
~[?:1.8.0_292]
at java.io.BufferedInputStream.read(BufferedInputStream.java:345) 
~[?:1.8.0_292]
at java.io.DataInputStream.read(DataInputStream.java:149) ~[?:1.8.0_292]
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:209) 
~[hadoop-common-2.10.2.jar:?]
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:211)
 ~[hadoop-hdfs-client-2.10.2.jar:?]
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
 ~[hadoop-hdfs-client-2.10.2.jar:?]
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
 ~[hadoop-hdfs-client-2.10.2.jar:?]
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:528)
 ~[hadoop-hdfs-2.10.2.jar:?]
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:968)
 ~[hadoop-hdfs-2.10.2.jar:?]
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:877)
 ~[hadoop-hdfs-2.10.2.jar:?]
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166)
 ~[hadoop-hdfs-2.10.2.jar:?]
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
 ~[hadoop-hdfs-2.10.2.jar:?]
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:290) 
[hadoop-hdfs-2.10.2.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
07:55:13,491 [DataXceiver for client DFSClient_NONMAPREDUCE_211593080_35 at 
/127.0.0.1:39968 [Receiving block 
BP-289839883-172.27.0.2-1727078098659:blk_1073741832_1016]] INFO  
org.apache.hadoop.hdfs.server.datanode.DataNode  [] - Exception for 
BP-289839883-172.27.0.2-1727078098659:blk_1073741832_1017
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:211) 
~[hadoop-common-2.10.2.jar:?]
at 
org.apache.hadoop.hdfs.protocol.datatransfer.P

[jira] [Created] (FLINK-36350) IllegalAccessError detected in JDK17+ runs

2024-09-23 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-36350:
-

 Summary: IllegalAccessError detected in JDK17+ runs
 Key: FLINK-36350
 URL: https://issues.apache.org/jira/browse/FLINK-36350
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 2.0-preview
Reporter: Matthias Pohl


UnalignedCheckpointRescaleITCase and GroupReduceITCase are affected in JDK17 
and JDK21 test profiles.

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62359&view=logs&j=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3&t=712ade8c-ca16-5b76-3acd-14df33bc1cb1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-36349) ClassNotFoundException due to org.apache.flink.runtime.types.FlinkScalaKryoInstantiator missing

2024-09-23 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-36349:
-

 Summary: ClassNotFoundException due to 
org.apache.flink.runtime.types.FlinkScalaKryoInstantiator missing
 Key: FLINK-36349
 URL: https://issues.apache.org/jira/browse/FLINK-36349
 Project: Flink
  Issue Type: Bug
  Components: API / Type Serialization System
Affects Versions: 2.0-preview
Reporter: Matthias Pohl


This is most likely caused by FLINK-29741 which was recently merged.
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62359&view=logs&j=8fd9202e-fd17-5b26-353c-ac1ff76c8f28&t=ea7cf968-e585-52cb-e0fc-f48de023a7ca&l=17558

{code}
Sep 23 01:58:51 01:58:50,533 12326 [AsyncOperations-thread-1] INFO  
org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer [] - Kryo 
serializer scala extensions are not available.
Sep 23 01:58:51 java.lang.ClassNotFoundException: 
org.apache.flink.runtime.types.FlinkScalaKryoInstantiator
Sep 23 01:58:51 at 
java.net.URLClassLoader.findClass(URLClassLoader.java:382) ~[?:1.8.0_292]
Sep 23 01:58:51 at 
java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_292]
Sep 23 01:58:51 at 
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) ~[?:1.8.0_292]
Sep 23 01:58:51 at 
java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_292]
Sep 23 01:58:51 at java.lang.Class.forName0(Native Method) 
~[?:1.8.0_292]
Sep 23 01:58:51 at java.lang.Class.forName(Class.java:264) 
~[?:1.8.0_292]
[...]
{code}

It causes ClosureCleanerITCase to fail in the AdaptiveScheduler test profile.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-36324) MiscAggFunctionITCase expected to raise Throwable

2024-09-19 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-36324:
-

 Summary: MiscAggFunctionITCase expected to raise Throwable
 Key: FLINK-36324
 URL: https://issues.apache.org/jira/browse/FLINK-36324
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 2.0-preview
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62234&view=logs&j=0c940707-2659-5648-cbe6-a1ad63045f0a&t=075c2716-8010-5565-fe08-3c4bb45824a4&l=11810

{code}
Sep 19 02:06:44 02:06:44.447 [ERROR] Tests run: 2, Failures: 1, Errors: 0, 
Skipped: 0, Time elapsed: 0.654 s <<< FAILURE! -- in 
org.apache.flink.table.planner.functions.MiscAggFunctionITCase
Sep 19 02:06:44 02:06:44.448 [ERROR] 
org.apache.flink.table.planner.functions.MiscAggFunctionITCase.test(TestCase)[2]
 -- Time elapsed: 0.294 s <<< FAILURE!
Sep 19 02:06:44 java.lang.AssertionError: 
Sep 19 02:06:44 
Sep 19 02:06:44 Expecting code to raise a throwable.
Sep 19 02:06:44 at 
org.apache.flink.table.planner.functions.BuiltInAggregateFunctionTestBase$ErrorTestItem.execute(BuiltInAggregateFunctionTestBase.java:607)
Sep 19 02:06:44 at 
org.apache.flink.table.planner.functions.BuiltInAggregateFunctionTestBase$TestSpec.lambda$createTestItemExecutable$0(BuiltInAggregateFunctionTestBase.java:323)
Sep 19 02:06:44 at 
org.apache.flink.table.planner.functions.BuiltInFunctionTestBase$TestCase.execute(BuiltInFunctionTestBase.java:119)
Sep 19 02:06:44 at 
org.apache.flink.table.planner.functions.BuiltInAggregateFunctionTestBase.test(BuiltInAggregateFunctionTestBase.java:96)
Sep 19 02:06:44 at java.lang.reflect.Method.invoke(Method.java:498)
Sep 19 02:06:44 at 
java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
Sep 19 02:06:44 at 
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
Sep 19 02:06:44 at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
Sep 19 02:06:44 at 
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
Sep 19 02:06:44 at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-36317) Populate the ArchivedExecutionGraph with CheckpointStatsSnapshot data if in WaitingForResources state with a previousExecutionGraph being set

2024-09-18 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-36317:
-

 Summary: Populate the ArchivedExecutionGraph with 
CheckpointStatsSnapshot data if in WaitingForResources state with a 
previousExecutionGraph being set
 Key: FLINK-36317
 URL: https://issues.apache.org/jira/browse/FLINK-36317
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Affects Versions: 2.0-preview
Reporter: Matthias Pohl


in FLINK-36295 we noticed an issue with the WaitingForResources state that 
follows a restartable failure. The CheckpointStatistics are present but not 
exposed through the ArchivedExecutionGraph despite being available.

We should think about adding these stats in {{WaitingForResources#getJob}} to 
have them accessible even if the job isn't running at the moment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-36302) FileSourceTextLinesITCase timed out

2024-09-17 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-36302:
-

 Summary: FileSourceTextLinesITCase timed out
 Key: FLINK-36302
 URL: https://issues.apache.org/jira/browse/FLINK-36302
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Common
Affects Versions: 2.0-preview
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62064&view=logs&j=1c002d28-a73d-5309-26ee-10036d8476b4&t=d1c117a6-8f13-5466-55f0-d48dbb767fcd&l=12386

{code}

"ForkJoinPool-1-worker-1" #15 daemon prio=5 os_prio=0 tid=0x7f6c0c8b5800 
nid=0xda34 waiting on condition [0x7f6bf0dfc000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xff6d4038> (a 
java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
at 
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3313)
at 
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at 
org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.isJobTerminated(CollectResultFetcher.java:213)
at 
org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.next(CollectResultFetcher.java:120)
at 
org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:126)
at 
org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:100)
at 
org.apache.flink.streaming.api.datastream.DataStreamUtils.collectRecordsFromUnboundedStream(DataStreamUtils.java:142)
at 
org.apache.flink.connector.file.src.FileSourceTextLinesITCase.testContinuousTextFileSource(FileSourceTextLinesITCase.java:252)
at 
org.apache.flink.connector.file.src.FileSourceTextLinesITCase.testContinuousTextFileSource(FileSourceTextLinesITCase.java:192)
[...]
{code}

{code}

{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-36301) TPC-H end-to-end test fails due to TimeoutException

2024-09-17 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-36301:
-

 Summary: TPC-H end-to-end test fails due to TimeoutException
 Key: FLINK-36301
 URL: https://issues.apache.org/jira/browse/FLINK-36301
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination, Tests
Affects Versions: 2.0-preview
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62146&view=logs&j=fb37c667-81b7-5c22-dd91-846535e99a97&t=011e961e-597c-5c96-04fe-7941c8b83f23&l=8589

The JobManager logs reveal a TimeoutException:
{code}
2024-09-15 01:37:53,628 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - Job 
insert-into_default_catalog.default_database.q5 
(f40185b602e2336cba7299165d7078fa) switched from state RUNNING to FAILING.
org.apache.flink.runtime.JobException: Recovery is suppressed by 
NoRestartBackoffTimeStrategy
at 
org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:219)
 ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
at 
org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailureAndReport(ExecutionFailureHandler.java:166)
 ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
at 
org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:121)
 ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
at 
org.apache.flink.runtime.scheduler.DefaultScheduler.recordTaskFailure(DefaultScheduler.java:281)
 ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
at 
org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:272)
 ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
at 
org.apache.flink.runtime.scheduler.adaptivebatch.AdaptiveBatchScheduler.handleTaskFailure(AdaptiveBatchScheduler.java:413)
 ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
at 
org.apache.flink.runtime.scheduler.DefaultScheduler.onTaskFailed(DefaultScheduler.java:265)
 ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
at 
org.apache.flink.runtime.scheduler.adaptivebatch.AdaptiveBatchScheduler.onTaskFailed(AdaptiveBatchScheduler.java:405)
 ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
at 
org.apache.flink.runtime.scheduler.SchedulerBase.onTaskExecutionStateUpdate(SchedulerBase.java:800)
 ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
at 
org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:777)
 ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
at 
org.apache.flink.runtime.scheduler.UpdateSchedulerNgOnInternalFailuresListener.notifyTaskFailure(UpdateSchedulerNgOnInternalFailuresListener.java:51)
 ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
at 
org.apache.flink.runtime.executiongraph.DefaultExecutionGraph.notifySchedulerNgAboutInternalTaskFailure(DefaultExecutionGraph.java:1675)
 ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
at 
org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1190)
 ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
at 
org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1130)
 ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
at 
org.apache.flink.runtime.executiongraph.Execution.fail(Execution.java:831) 
~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
at 
org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot.signalPayloadRelease(SingleLogicalSlot.java:195)
 ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
at 
org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot.release(SingleLogicalSlot.java:182)
 ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
at 
org.apache.flink.runtime.scheduler.SimpleExecutionSlotAllocator$LogicalSlotHolder.release(SimpleExecutionSlotAllocator.java:203)
 ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
at 
org.apache.flink.runtime.jobmaster.slotpool.AllocatedSlot.releasePayload(AllocatedSlot.java:152)
 ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
at 
org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.releasePayload(DefaultDeclarativeSlotPool.java:515)
 ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
at 
org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.freeAndReleaseSlots(DefaultDeclarativeSlotPool.java:507)
 ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
at 
org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.releaseSlots(DefaultDeclarativeSlotPool.java:478)
 ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
at 
org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.internalReleaseTaskManager(DeclarativeSlotPoolService.java:281)
 ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
at 
org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.releaseTaskManager(DeclarativeSlotPoolService.j

[jira] [Created] (FLINK-36300) TableEnvHiveConnectorITCase.testDateTimestampPartitionColumns times out

2024-09-17 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-36300:
-

 Summary: 
TableEnvHiveConnectorITCase.testDateTimestampPartitionColumns times out
 Key: FLINK-36300
 URL: https://issues.apache.org/jira/browse/FLINK-36300
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Affects Versions: 2.0-preview
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62146&view=logs&j=5cae8624-c7eb-5c51-92d3-4d2dacedd221&t=5acec1b4-945b-59ca-34f8-168928ce5199&l=25538

{code}
"main" #1 prio=5 os_prio=0 tid=0x7f309e8a2000 nid=0x1b181 waiting on 
condition [0x7f30a2104000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xff529778> (a 
java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
at 
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
at 
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at 
org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.isJobTerminated(CollectResultFetcher.java:213)
at 
org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.next(CollectResultFetcher.java:120)
at 
org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:126)
at 
org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:100)
at 
org.apache.flink.table.planner.connectors.CollectDynamicSink$CloseableRowIteratorWrapper.hasNext(CollectDynamicSink.java:247)
at java.util.Iterator.forEachRemaining(Iterator.java:115)
at 
org.apache.flink.util.CollectionUtil.iteratorToList(CollectionUtil.java:133)
at 
org.apache.flink.connectors.hive.TableEnvHiveConnectorITCase.lambda$testDateTimestampPartitionColumns$4(TableEnvHiveConnectorITCase.java:248)
at 
org.apache.flink.connectors.hive.TableEnvHiveConnectorITCase$$Lambda$8669/2110765445.call(Unknown
 Source)
at 
org.apache.flink.connectors.hive.TableEnvExecutorUtil.executeInSeparateDatabase(TableEnvExecutorUtil.java:53)
at 
org.apache.flink.connectors.hive.TableEnvExecutorUtil.executeInSeparateDatabase(TableEnvExecutorUtil.java:30)
at 
org.apache.flink.connectors.hive.TableEnvHiveConnectorITCase.testDateTimestampPartitionColumns(TableEnvHiveConnectorITCase.java:214)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-36299) AdaptiveSchedulerTest.testStatusMetrics times out

2024-09-17 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-36299:
-

 Summary: AdaptiveSchedulerTest.testStatusMetrics times out
 Key: FLINK-36299
 URL: https://issues.apache.org/jira/browse/FLINK-36299
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 2.0-preview
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62146&view=logs&j=d89de3df-4600-5585-dadc-9bbc9a5e661c&t=be5a4b15-4b23-56b1-7582-795f58a645a2&l=9849

{code}
Sep 15 02:28:22 "ForkJoinPool-495-worker-25" #9352 daemon prio=5 os_prio=0 
tid=0x7fcdde409000 nid=0x77f4 waiting on condition [0x7fcd5c52c000]
Sep 15 02:28:22java.lang.Thread.State: WAITING (parking)
Sep 15 02:28:22 at sun.misc.Unsafe.park(Native Method)
Sep 15 02:28:22 - parking to wait for  <0xf8d7d0b8> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
Sep 15 02:28:22 at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
Sep 15 02:28:22 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
Sep 15 02:28:22 at 
java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403)
Sep 15 02:28:22 at 
org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerTest$SubmissionBufferingTaskManagerGateway.waitForSubmissions(AdaptiveSchedulerTest.java:2593)
Sep 15 02:28:22 at 
org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerTest.testStatusMetrics(AdaptiveSchedulerTest.java:732)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-36298) NullPointerException in Calcite causes a PyFlink test failure

2024-09-17 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-36298:
-

 Summary: NullPointerException in Calcite causes a PyFlink test 
failure
 Key: FLINK-36298
 URL: https://issues.apache.org/jira/browse/FLINK-36298
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Affects Versions: 2.0-preview
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62146&view=logs&j=b53e1644-5cb4-5a3b-5d48-f523f39bcf06&t=b68c9f5c-04c9-5c75-3862-a3a27aabbce3&l=25458

{code}
java.lang.NullPointerException: metadataHandlerProvider
Sep 15 03:14:04 E   at 
java.base/java.util.Objects.requireNonNull(Objects.java:235)
Sep 15 03:14:04 E   at 
org.apache.calcite.rel.metadata.RelMetadataQueryBase.getMetadataHandlerProvider(RelMetadataQueryBase.java:122)
Sep 15 03:14:04 E   at 
org.apache.calcite.rel.metadata.RelMetadataQueryBase.revise(RelMetadataQueryBase.java:118)
Sep 15 03:14:04 E   at 
org.apache.calcite.rel.metadata.RelMetadataQuery.getNonCumulativeCost(RelMetadataQuery.java:333)
Sep 15 03:14:04 E   at 
org.apache.calcite.plan.volcano.VolcanoPlanner.getCost(VolcanoPlanner.java:727)
Sep 15 03:14:04 E   at 
org.apache.calcite.plan.volcano.VolcanoPlanner.getCostOrInfinite(VolcanoPlanner.java:714)
Sep 15 03:14:04 E   at 
org.apache.calcite.plan.volcano.VolcanoPlanner.propagateCostImprovements(VolcanoPlanner.java:971)
Sep 15 03:14:04 E   at 
org.apache.calcite.plan.volcano.VolcanoPlanner.addRelToSet(VolcanoPlanner.java:1408)
Sep 15 03:14:04 E   at 
org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1368)
Sep 15 03:14:04 E   at 
org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:598)
Sep 15 03:14:04 E   at 
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:613)
Sep 15 03:14:04 E   at 
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:95)
Sep 15 03:14:04 E   at 
org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:274)
Sep 15 03:14:04 E   at 
org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1270)
Sep 15 03:14:04 E   at 
org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:598)
Sep 15 03:14:04 E   at 
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:613)
Sep 15 03:14:04 E   at 
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:95)
Sep 15 03:14:04 E   at 
org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:274)
Sep 15 03:14:04 E   at 
org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1270)
Sep 15 03:14:04 E   at 
org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:598)
Sep 15 03:14:04 E   at 
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:613)
Sep 15 03:14:04 E   at 
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:95)
Sep 15 03:14:04 E   at 
org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:274)
Sep 15 03:14:04 E   at 
org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1270)
Sep 15 03:14:04 E   at 
org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:598)
Sep 15 03:14:04 E   at 
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:613)
Sep 15 03:14:04 E   at 
org.apache.calcite.plan.volcano.VolcanoPlanner.changeTraits(VolcanoPlanner.java:498)
Sep 15 03:14:04 E   at 
org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:315)
Sep 15 03:14:04 E   at 
org.apache.flink.table.planner.plan.optimize.program.FlinkVolcanoProgram.optimize(FlinkVolcanoProgram.scala:62)
Sep 15 03:14:04 E   at 
org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram.$anonfun$optimize$1(FlinkChainedProgram.scala:59)
[...]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-36297) SIGSEGV caused CI failure

2024-09-17 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-36297:
-

 Summary: SIGSEGV caused CI failure
 Key: FLINK-36297
 URL: https://issues.apache.org/jira/browse/FLINK-36297
 Project: Flink
  Issue Type: Bug
  Components: Runtime / State Backends
Affects Versions: 2.0-preview
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62146&view=logs&j=0e7be18f-84f2-53f0-a32d-4a5e4a174679&t=7c1d86e3-35bd-5fd5-3b7c-30c126a78702&l=11876

{code}
Sep 15 02:53:42 02:53:42.535 [WARNING] Tests run: 145, Failures: 0, Errors: 0, 
Skipped: 13, Time elapsed: 3.133 s -- in 
org.apache.flink.state.changelog.ChangelogDelegateFileStateBackendTest
Sep 15 02:53:44 02:53:44.620 [WARNING] Tests run: 96, Failures: 0, Errors: 0, 
Skipped: 9, Time elapsed: 9.998 s -- in 
org.apache.flink.state.changelog.ChangelogStateBackendMigrationTest
Sep 15 02:57:58 #
Sep 15 02:57:58 # A fatal error has been detected by the Java Runtime 
Environment:
Sep 15 02:57:58 #
Sep 15 02:57:58 #  SIGSEGV (0xb) at pc=0x7f7c539b7c84, pid=21641, 
tid=0x7f7c549ff700
Sep 15 02:57:58 #
Sep 15 02:57:58 # JRE version: OpenJDK Runtime Environment (8.0_292-b10) (build 
1.8.0_292-8u292-b10-0ubuntu1~16.04.1-b10)
Sep 15 02:57:58 # Java VM: OpenJDK 64-Bit Server VM (25.292-b10 mixed mode 
linux-amd64 compressed oops)
Sep 15 02:57:58 # Problematic frame:
Sep 15 02:57:58 # C  [librocksdbjni-linux64.so+0x31bc84]  
Java_org_rocksdb_WriteBatch_getDataSize+0x4
Sep 15 02:57:58 #
Sep 15 02:57:58 # Core dump written. Default location: 
/__w/1/s/flink-state-backends/flink-statebackend-changelog/core or core.21641
Sep 15 02:57:58 #
Sep 15 02:57:58 # An error report file with more information is saved as:
Sep 15 02:57:58 # 
/__w/1/s/flink-state-backends/flink-statebackend-changelog/hs_err_pid21641.log
Sep 15 02:57:58 Compiled method (nm)  265136 7875 n 0   
org.rocksdb.WriteBatch::getDataSize (native)
Sep 15 02:57:58  total in heap  [0x7f7c8dff9e50,0x7f7c8dffa1a0] = 848
Sep 15 02:57:58  relocation [0x7f7c8dff9f78,0x7f7c8dff9fc0] = 72
Sep 15 02:57:58  main code  [0x7f7c8dff9fc0,0x7f7c8dffa198] = 472
Sep 15 02:57:58  oops   [0x7f7c8dffa198,0x7f7c8dffa1a0] = 8
Sep 15 02:57:58 Compiled method (c1)  265136 7876   3   
org.apache.flink.contrib.streaming.state.RocksDBWriteBatchWrapper::ensureNotCancelled
 (27 bytes)
Sep 15 02:57:58  total in heap  [0x7f7c8dff9490,0x7f7c8dff9a88] = 1528
Sep 15 02:57:58  relocation [0x7f7c8dff95b8,0x7f7c8dff9618] = 96
Sep 15 02:57:58  main code  [0x7f7c8dff9620,0x7f7c8dff9880] = 608
Sep 15 02:57:58  stub code  [0x7f7c8dff9880,0x7f7c8dff9938] = 184
Sep 15 02:57:58  oops   [0x7f7c8dff9938,0x7f7c8dff9940] = 8
Sep 15 02:57:58  metadata   [0x7f7c8dff9940,0x7f7c8dff9958] = 24
Sep 15 02:57:58  scopes data[0x7f7c8dff9958,0x7f7c8dff99c8] = 112
Sep 15 02:57:58  scopes pcs [0x7f7c8dff99c8,0x7f7c8dff9a68] = 160
Sep 15 02:57:58  dependencies   [0x7f7c8dff9a68,0x7f7c8dff9a70] = 8
Sep 15 02:57:58  nul chk table  [0x7f7c8dff9a70,0x7f7c8dff9a88] = 24
Sep 15 02:57:58 #
Sep 15 02:57:58 # If you would like to submit a bug report, please visit:
Sep 15 02:57:58 #   http://bugreport.java.com/bugreport/crash.jsp
Sep 15 02:57:58 # The crash happened outside the Java Virtual Machine in native 
code.
Sep 15 02:57:58 # See problematic frame for where to report the bug.
Sep 15 02:57:58 #
Aborted (core dumped)
{code}

with 134 exit code:
{code}
Sep 15 02:57:59 02:57:59.692 [ERROR] Process Exit Code: 134
Sep 15 02:57:59 02:57:59.692 [ERROR] Crashed tests:
Sep 15 02:57:59 02:57:59.692 [ERROR] 
org.apache.flink.state.changelog.ChangelogDelegateEmbeddedRocksDBStateBackendTest
Sep 15 02:57:59 02:57:59.692 [ERROR]at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456)
Sep 15 02:57:59 02:57:59.692 [ERROR]at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkOnceMultiple(ForkStarter.java:358)
Sep 15 02:57:59 02:57:59.692 [ERROR]at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:296)
Sep 15 02:57:59 02:57:59.692 [ERROR]at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:250)
Sep 15 02:57:59 02:57:59.692 [ERROR]at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1240)
Sep 15 02:57:59 02:57:59.692 [ERROR]at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1089)
Sep 15 02:57:59 02:57:59.692 [ERROR]at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:905)
Sep 15 02:57:59 02:57:59.692 [ERROR]at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPlugin

[jira] [Created] (FLINK-36295) AdaptiveSchedulerClusterITCase. testCheckpointStatsPersistedAcrossRescale failed with

2024-09-17 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-36295:
-

 Summary: AdaptiveSchedulerClusterITCase. 
testCheckpointStatsPersistedAcrossRescale failed with 
 Key: FLINK-36295
 URL: https://issues.apache.org/jira/browse/FLINK-36295
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 2.0-preview
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62156&view=logs&j=675bf62c-8558-587e-2555-dcad13acefb5&t=5878eed3-cc1e-5b12-1ed0-9e7139ce0992&l=10234

{code}
Sep 16 03:06:30 03:06:30.168 [ERROR] Tests run: 3, Failures: 0, Errors: 1, 
Skipped: 0, Time elapsed: 5.275 s <<< FAILURE! -- in 
org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerClusterITCase
Sep 16 03:06:30 03:06:30.168 [ERROR] 
org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerClusterITCase.testCheckpointStatsPersistedAcrossRescale
 -- Time elapsed: 0.676 s <<< ERROR!
Sep 16 03:06:30 java.lang.IndexOutOfBoundsException: Index: -1
Sep 16 03:06:30 at 
java.base/java.util.Collections$EmptyList.get(Collections.java:4586)
Sep 16 03:06:30 at 
org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerClusterITCase.testCheckpointStatsPersistedAcrossRescale(AdaptiveSchedulerClusterITCase.java:214)
Sep 16 03:06:30 at 
java.base/java.lang.reflect.Method.invoke(Method.java:568)
Sep 16 03:06:30 at 
java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:194)
Sep 16 03:06:30 at 
java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
Sep 16 03:06:30 at 
java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
Sep 16 03:06:30 at 
java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
Sep 16 03:06:30 at 
java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
Sep 16 03:06:30 at 
java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
Sep 16 03:06:30
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-36294) table stage failed with general junit5 TestEngine failure

2024-09-17 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-36294:
-

 Summary: table stage failed with general junit5 TestEngine failure
 Key: FLINK-36294
 URL: https://issues.apache.org/jira/browse/FLINK-36294
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 2.0-preview
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62156&view=logs&j=a9db68b9-a7e0-54b6-0f98-010e0aff39e2&t=cdd32e0b-6047-565b-c58f-14054472f1be&l=12216

{code}
Sep 16 04:19:46 04:19:46.181 [ERROR] Errors: 
Sep 16 04:19:46 04:19:46.182 [ERROR]   TestEngine with ID 'junit-jupiter' 
failed to execute tests
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-36293) RocksDBWriteBatchWrapperTest.testAsyncCancellation

2024-09-17 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-36293:
-

 Summary: RocksDBWriteBatchWrapperTest.testAsyncCancellation 
 Key: FLINK-36293
 URL: https://issues.apache.org/jira/browse/FLINK-36293
 Project: Flink
  Issue Type: Bug
  Components: Runtime / State Backends
Affects Versions: 2.0-preview
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62156&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=11508

{code}
Sep 16 02:20:08 02:20:08.194 [ERROR] Tests run: 6, Failures: 0, Errors: 1, 
Skipped: 0, Time elapsed: 0.724 s <<< FAILURE! -- in 
org.apache.flink.contrib.streaming.state.RocksDBWriteBatchWrapperTest
Sep 16 02:20:08 02:20:08.194 [ERROR] 
org.apache.flink.contrib.streaming.state.RocksDBWriteBatchWrapperTest.testAsyncCancellation
 -- Time elapsed: 0.121 s <<< ERROR!
Sep 16 02:20:08 java.lang.Exception: Unexpected exception, 
expected but 
was
Sep 16 02:20:08 Caused by: java.lang.AssertionError: 
Sep 16 02:20:08 Expecting actual:
Sep 16 02:20:08   2
Sep 16 02:20:08 to be less than:
Sep 16 02:20:08   2 
Sep 16 02:20:08 at 
org.apache.flink.contrib.streaming.state.RocksDBWriteBatchWrapperTest.testAsyncCancellation(RocksDBWriteBatchWrapperTest.java:98)
Sep 16 02:20:08 at java.lang.reflect.Method.invoke(Method.java:498)
Sep 16 02:20:08 Suppressed: 
org.apache.flink.runtime.execution.CancelTaskException
Sep 16 02:20:08 at 
org.apache.flink.contrib.streaming.state.RocksDBWriteBatchWrapper.ensureNotCancelled(RocksDBWriteBatchWrapper.java:199)
Sep 16 02:20:08 at 
org.apache.flink.contrib.streaming.state.RocksDBWriteBatchWrapper.close(RocksDBWriteBatchWrapper.java:188)
Sep 16 02:20:08 at 
org.apache.flink.contrib.streaming.state.RocksDBWriteBatchWrapperTest.testAsyncCancellation(RocksDBWriteBatchWrapperTest.java:100)
Sep 16 02:20:08 ... 1 more
{code}

This test was added FLINK-35580



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-36292) SplitFetcherManagerTest.testCloseCleansUpPreviouslyClosedFetcher times out

2024-09-17 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-36292:
-

 Summary: 
SplitFetcherManagerTest.testCloseCleansUpPreviouslyClosedFetcher times out
 Key: FLINK-36292
 URL: https://issues.apache.org/jira/browse/FLINK-36292
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Common
Affects Versions: 2.0-preview
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62173&view=logs&j=b6f8a893-8f59-51d5-fe28-fb56a8b0932c&t=095f1730-efbe-5303-c4a3-b5e3696fc4e2&l=10914

{code}
Sep 17 01:15:16 01:15:16.318 [ERROR] Tests run: 5, Failures: 0, Errors: 1, 
Skipped: 0, Time elapsed: 32.65 s <<< FAILURE! -- in 
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManagerTest
Sep 17 01:15:16 01:15:16.318 [ERROR] 
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManagerTest.testCloseCleansUpPreviouslyClosedFetcher
 -- Time elapsed: 30.02 s <<< ERROR!
Sep 17 01:15:16 org.junit.runners.model.TestTimedOutException: test timed out 
after 3 milliseconds
Sep 17 01:15:16 at sun.misc.Unsafe.park(Native Method)
Sep 17 01:15:16 at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
Sep 17 01:15:16 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
Sep 17 01:15:16 at 
java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1475)
Sep 17 01:15:16 at 
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager.close(SplitFetcherManager.java:344)
Sep 17 01:15:16 at 
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManagerTest.testCloseCleansUpPreviouslyClosedFetcher(SplitFetcherManagerTest.java:97)
Sep 17 01:15:16 at java.lang.reflect.Method.invoke(Method.java:498)
Sep 17 01:15:16 at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
Sep 17 01:15:16 at java.lang.Thread.run(Thread.java:748)
{code}

The test was added by FLINK-35924



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-36291) java.lang.IllegalMonitorStateException causing a fatal error on the TaskManager side

2024-09-16 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-36291:
-

 Summary: java.lang.IllegalMonitorStateException causing a fatal 
error on the TaskManager side
 Key: FLINK-36291
 URL: https://issues.apache.org/jira/browse/FLINK-36291
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 2.0-preview
Reporter: Matthias Pohl


HiveDynamicPartitionPruningITCase failed due to the TM timeout. Checking the 
logs though revealed a fatal error on the taskmanager's side:
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62173&view=logs&j=5cae8624-c7eb-5c51-92d3-4d2dacedd221&t=5acec1b4-945b-59ca-34f8-168928ce5199&l=24046

{code}
03:18:32,209 [taskmanager_72-main-scheduler-thread-1] ERROR 
org.apache.flink.util.FatalExitExceptionHandler  [] - FATAL: Thread 
'taskmanager_72-main-scheduler-thread-1' produced an uncaught exception. 
Stopping the process...
java.lang.IllegalMonitorStateException: null
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.signal(AbstractQueuedSynchronizer.java:1939)
 ~[?:1.8.0_292]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1103)
 ~[?:1.8.0_292]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
 ~[?:1.8.0_292]
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) 
~[?:1.8.0_292]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) 
~[?:1.8.0_292]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_292]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
{code}

But there's also a OutOfMemoryError reported just a line below:
{code}
03:19:01,060 [Source Data Fetcher for Source: part[62] (2/2)#0] ERROR 
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager [] - 
Received uncaught exception.
java.lang.OutOfMemoryError: Java heap space
{code}

So that might be related to FLINK-36290



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-36290) OutOfMemoryError in connect test run

2024-09-16 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-36290:
-

 Summary: OutOfMemoryError in connect test run
 Key: FLINK-36290
 URL: https://issues.apache.org/jira/browse/FLINK-36290
 Project: Flink
  Issue Type: Bug
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Tests
Affects Versions: 2.0-preview
Reporter: Matthias Pohl


We saw a OOM in the connect stage that's caused a fatal error:
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62173&view=logs&j=1c002d28-a73d-5309-26ee-10036d8476b4&t=d1c117a6-8f13-5466-55f0-d48dbb767fcd&l=12182

{code}
03:19:59,975 [   flink-scheduler-1] ERROR 
org.apache.flink.util.FatalExitExceptionHandler  [] - FATAL: Thread 
'flink-scheduler-1' produced an uncaught exception. Stopping the process...
java.lang.OutOfMemoryError: Java heap space
[...]
03:19:59,981 [jobmanager_62-main-scheduler-thread-1] ERROR 
org.apache.flink.util.FatalExitExceptionHandler  [] - FATAL: Thread 
'jobmanager_62-main-scheduler-thread-1' produced an uncaught exception. 
Stopping the process...
java.lang.OutOfMemoryError: Java heap space
[...]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-36279) RescaleOnCheckpointITCase.testRescaleOnCheckpoint fails

2024-09-13 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-36279:
-

 Summary: RescaleOnCheckpointITCase.testRescaleOnCheckpoint fails
 Key: FLINK-36279
 URL: https://issues.apache.org/jira/browse/FLINK-36279
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 2.0-preview
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62105&view=logs&j=5c8e7682-d68f-54d1-16a2-a09310218a49&t=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba&l=11287

{code}
Sep 13 17:16:55 "ForkJoinPool-1-worker-25" #28 daemon prio=5 os_prio=0 
tid=0x7f973f0c2800 nid=0x31a1 waiting on condition [0x7f97089fc000]
Sep 13 17:16:55java.lang.Thread.State: TIMED_WAITING (sleeping)
Sep 13 17:16:55 at java.lang.Thread.sleep(Native Method)
Sep 13 17:16:55 at 
org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:152)
Sep 13 17:16:55 at 
org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:145)
Sep 13 17:16:55 at 
org.apache.flink.test.scheduling.UpdateJobResourceRequirementsITCase.waitForRunningTasks(UpdateJobResourceRequirementsITCase.java:219)
Sep 13 17:16:55 at 
org.apache.flink.test.scheduling.RescaleOnCheckpointITCase.testRescaleOnCheckpoint(RescaleOnCheckpointITCase.java:139)
Sep 13 17:16:55 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
Sep 13 17:16:55 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[...]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-36272) YarnFileStageTestS3ITCase fails on master

2024-09-12 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-36272:
-

 Summary: YarnFileStageTestS3ITCase fails on master
 Key: FLINK-36272
 URL: https://issues.apache.org/jira/browse/FLINK-36272
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN, Tests
Affects Versions: 2.0.0
Reporter: Matthias Pohl


The issue was introduced by FLINK-34085 where the test failure wasn't 
discovered because the test didn't run (see 
[logs|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=61954&view=logs&j=fc5181b0-e452-5c8f-68de-1097947f6483&t=995c650b-6573-581c-9ce6-7ad4cc038461&l=28206]).

I would suspect that this is due to the fact that we're not enabling S3 in PR 
CI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-36207) Disabling japicmp plugin for deprecated APIs

2024-09-03 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-36207:
-

 Summary: Disabling japicmp plugin for deprecated APIs
 Key: FLINK-36207
 URL: https://issues.apache.org/jira/browse/FLINK-36207
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 2.0.0
Reporter: Matthias Pohl


The Apache Flink 2.0 release allows for the removal of public API. The japicmp 
plugin usually checks for these kind of changes. To avoid adding explicit 
excludes for each change, this Jira issue suggest to disable the API check for 
APIs that are marked as deprecated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-36194) Shutdown hook for ExecutionGraphInfo store runs concurrently to cluster shutdown hook causing race conditions

2024-09-02 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-36194:
-

 Summary: Shutdown hook for ExecutionGraphInfo store runs 
concurrently to cluster shutdown hook causing race conditions
 Key: FLINK-36194
 URL: https://issues.apache.org/jira/browse/FLINK-36194
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Coordination
Affects Versions: 1.19.1, 1.20.0, 2.0.0
Reporter: Matthias Pohl


There is an {{FileNotFoundException}} being logged when shutting down the 
cluster with currently running jobs:
{code}
/tmp/executionGraphStore-b2cb1190-2c4d-4021-a73d-8b15027860df/8f6abf294a46345d331590890f7e7c37
 (No such file or directory)

java.io.FileNotFoundException: 
/tmp/executionGraphStore-b2cb1190-2c4d-4021-a73d-8b15027860df/8f6abf294a46345d331590890f7e7c37
 (No such file or directory)
at java.base/java.io.FileOutputStream.open0(Native Method)
at java.base/java.io.FileOutputStream.open(FileOutputStream.java:298)
at java.base/java.io.FileOutputStream.(FileOutputStream.java:237)
at java.base/java.io.FileOutputStream.(FileOutputStream.java:187)
at 
org.apache.flink.runtime.dispatcher.FileExecutionGraphInfoStore.storeExecutionGraphInfo(FileExecutionGraphInfoStore.java:281)
at 
org.apache.flink.runtime.dispatcher.FileExecutionGraphInfoStore.put(FileExecutionGraphInfoStore.java:203)
at 
org.apache.flink.runtime.dispatcher.Dispatcher.writeToExecutionGraphInfoStore(Dispatcher.java:1427)
at 
org.apache.flink.runtime.dispatcher.Dispatcher.jobReachedTerminalState(Dispatcher.java:1357)
at 
org.apache.flink.runtime.dispatcher.Dispatcher.handleJobManagerRunnerResult(Dispatcher.java:750)
at 
org.apache.flink.runtime.dispatcher.Dispatcher.lambda$runJob$6(Dispatcher.java:700)
at 
java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)
at 
java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907)
at 
java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
[...]
{code}

This is caused by concurrent shutdown logic being triggered through the 
{{FileExecutionGraphInfoStore}} shutdown hook. The shutdown hook calls close on 
the store which will delete its temporary directory. 

The concurrently performed cluster shutdown will try to suspend all running 
jobs. The JobManagerRunners are trying to write their {{ExecutionGraphInfo}} to 
the store which fails (because the temporary folder is deleted).

This doesn't have any impact because the JobManager goes away, anyway. But the 
log message is confusing the the shutdown hook is (IMHO) not needed. Instead, 
the {{ExecutionGraphInfoStore}}'s close logic should be called by the 
{{ClusterEntrypoint}} shutdown gracefully.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-36168) AdaptiveSchedulerTest doesn't follow the production lifecycle

2024-08-28 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-36168:
-

 Summary: AdaptiveSchedulerTest doesn't follow the production 
lifecycle
 Key: FLINK-36168
 URL: https://issues.apache.org/jira/browse/FLINK-36168
 Project: Flink
  Issue Type: Sub-task
Reporter: Matthias Pohl


The {{AdaptiveSchedulerTest}} doesn't follow the production lifecycle properly: 
The executor representing the main thread is shutting down before the 
AdaptiveScheduler is closed (or more precisely, the scheduler isn't closed at 
all in most of the tests).

This can cause issues when shutting down the executor and tasks still being 
scheduled and not properly cleaned up. This issue is about fixing the test in 
this regards.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-36147) Removes deprecated location field

2024-08-23 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-36147:
-

 Summary: Removes deprecated location field
 Key: FLINK-36147
 URL: https://issues.apache.org/jira/browse/FLINK-36147
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Coordination
Affects Versions: 2.0.0
Reporter: Matthias Pohl
Assignee: Matthias Pohl
 Fix For: 2.0.0


FLINK-33147 introduce a new endpoint field and deprecated the corresponding 
location field in 1.19. This is issue is about removing the deprecated field.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-36099) JobIDLoggingITCase fails due to "Cannot find task to fail for execution [...]" info log message in TM logs

2024-08-19 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-36099:
-

 Summary: JobIDLoggingITCase fails due to "Cannot find task to fail 
for execution [...]" info log message in TM logs
 Key: FLINK-36099
 URL: https://issues.apache.org/jira/browse/FLINK-36099
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Task
Affects Versions: 1.19.1, 1.20.0, 1.18.1, 2.0.0
Reporter: Matthias Pohl


{{JobIDLoggingITCase}} can fail (observed with the {{AdaptiveScheduler}} 
enabled):
{code}
Test 
org.apache.flink.test.misc.JobIDLoggingITCase.testJobIDLogging[testJobIDLogging(ClusterClient,
 Path, MiniCluster)] failed with:
java.lang.AssertionError: [too many events without Job ID logged by 
org.apache.flink.runtime.taskexecutor.TaskExecutor]
Expecting empty but was: 
[Logger=org.apache.flink.runtime.taskexecutor.TaskExecutor Level=INFO 
Message=Cannot find task to fail for execution 
5447dca7a6c7f9679346cad41dc8e3be_cbc357ccb763df2852fee8c4fc7d55f2_0_0 with 
exception:]
at 
org.apache.flink.test.misc.JobIDLoggingITCase.assertJobIDPresent(JobIDLoggingITCase.java:267)
at 
org.apache.flink.test.misc.JobIDLoggingITCase.testJobIDLogging(JobIDLoggingITCase.java:155)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727)
at 
org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
at 
org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86)
at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:217)
at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:213)
at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:138)
at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:68)
at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
at 
org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
at 
org.junit.platform.engine.support.hierarchical.ForkJoinPoolHierarchicalTestExecutorService$ExclusiveTask.comp

[jira] [Created] (FLINK-35748) DeduplicateITCase.testLastRowWithoutAllChangelogOnRowtime with MiniBatch mode and RocksDB backend enabled

2024-07-03 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-35748:
-

 Summary: DeduplicateITCase.testLastRowWithoutAllChangelogOnRowtime 
with MiniBatch mode and RocksDB backend enabled
 Key: FLINK-35748
 URL: https://issues.apache.org/jira/browse/FLINK-35748
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.20.0
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=60613&view=logs&j=0c940707-2659-5648-cbe6-a1ad63045f0a&t=075c2716-8010-5565-fe08-3c4bb45824a4&l=12259

{code}
Jul 02 14:44:36 14:44:36.737 [ERROR] Tests run: 40, Failures: 1, Errors: 0, 
Skipped: 4, Time elapsed: 18.45 s <<< FAILURE! -- in 
org.apache.flink.table.planner.runtime.stream.sql.DeduplicateITCase
Jul 02 14:44:36 14:44:36.737 [ERROR] 
org.apache.flink.table.planner.runtime.stream.sql.DeduplicateITCase.testLastRowWithoutAllChangelogOnRowtime
 -- Time elapsed: 0.860 s <<< FAILURE!
Jul 02 14:44:36 org.opentest4j.AssertionFailedError: 
Jul 02 14:44:36 
Jul 02 14:44:36 expected: List(+I(1,1,Hi,1970-01-01T00:00:00.001), +I(1,2,Hello 
world,1970-01-01T00:00:00.002), +I(2,3,I am fine.,1970-01-01T00:00:00.003), 
+I(2,6,Comment#1,1970-01-01T00:00:00.006), 
+I(3,5,Comment#2,1970-01-01T00:00:00.005), 
+I(4,4,Comment#3,1970-01-01T00:00:00.004))
Jul 02 14:44:36  but was: ArrayBuffer(+I(1,1,Hi,1970-01-01T00:00:00.001), 
+I(1,2,Hello world,1970-01-01T00:00:00.002), 
+I(1,3,Hello,1970-01-01T00:00:00.003), 
+I(2,6,Comment#1,1970-01-01T00:00:00.006), 
+I(3,5,Comment#2,1970-01-01T00:00:00.005), 
+I(4,4,Comment#3,1970-01-01T00:00:00.004), +U(2,3,I am 
fine.,1970-01-01T00:00:00.003), -U(1,3,Hello,1970-01-01T00:00:00.003))
Jul 02 14:44:36 at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
Jul 02 14:44:36 at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
Jul 02 14:44:36 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
Jul 02 14:44:36 at 
org.apache.flink.table.planner.runtime.stream.sql.DeduplicateITCase.testLastRowWithoutAllChangelogOnRowtime(DeduplicateITCase.scala:364)
Jul 02 14:44:36 at java.lang.reflect.Method.invoke(Method.java:498)
[...]
{code}

The test failure appeared in a CI run for FLINK-35553. Which does some changes 
to how checkpointing is triggered. I checked the logs and couldn't find any 
evidence that the test run included the FLINK-35553 change (no restoring from 
checkpoint happens in the failed and successful of the test; see attached logs).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-35729) HiveITCase.testReadWriteHive

2024-06-28 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-35729:
-

 Summary: HiveITCase.testReadWriteHive
 Key: FLINK-35729
 URL: https://issues.apache.org/jira/browse/FLINK-35729
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Affects Versions: 1.20.0
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=60534&view=logs&j=6e8542d7-de38-5a33-4aca-458d6c87066d&t=10d6732b-d79a-5c68-62a5-668516de5313&l=16589
{code}
Jun 28 04:35:00 04:35:00.945 [ERROR] Tests run: 2, Failures: 0, Errors: 1, 
Skipped: 0, Time elapsed: 249.7 s <<< FAILURE! -- in 
org.apache.flink.tests.hive.HiveITCase
Jun 28 04:35:00 04:35:00.945 [ERROR] 
org.apache.flink.tests.hive.HiveITCase.testReadWriteHive -- Time elapsed: 165.8 
s <<< ERROR!
Jun 28 04:35:00 java.io.IOException: Process failed due to timeout.
Jun 28 04:35:00 at 
org.apache.flink.tests.util.AutoClosableProcess$AutoClosableProcessBuilder.runBlocking(AutoClosableProcess.java:145)
Jun 28 04:35:00 at 
org.apache.flink.tests.util.flink.FlinkDistribution.submitSQLJobWithSQLClient(FlinkDistribution.java:342)
Jun 28 04:35:00 at 
org.apache.flink.tests.util.flink.FlinkDistribution.submitSQLJob(FlinkDistribution.java:273)
Jun 28 04:35:00 at 
org.apache.flink.tests.util.flink.LocalStandaloneFlinkResource$StandaloneClusterController.submitSQLJob(LocalStandaloneFlinkResource.java:241)
Jun 28 04:35:00 at 
org.apache.flink.tests.hive.HiveITCase.executeSqlStatements(HiveITCase.java:231)
Jun 28 04:35:00 at 
org.apache.flink.tests.hive.HiveITCase.runAndCheckSQL(HiveITCase.java:157)
Jun 28 04:35:00 at 
org.apache.flink.tests.hive.HiveITCase.testReadWriteHive(HiveITCase.java:121)
Jun 28 04:35:00 at 
java.base/java.lang.reflect.Method.invoke(Method.java:566)
Jun 28 04:35:00 at 
org.apache.flink.util.ExternalResource$1.evaluate(ExternalResource.java:48)
Jun 28 04:35:00 at 
org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
Jun 28 04:35:00 at 
org.testcontainers.containers.FailureDetectingExternalResource$1.evaluate(FailureDetectingExternalResource.java:29)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-35728) PyFlink end-to-end test because miniconda couldn't be downloaded

2024-06-28 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-35728:
-

 Summary: PyFlink end-to-end test because miniconda couldn't be 
downloaded
 Key: FLINK-35728
 URL: https://issues.apache.org/jira/browse/FLINK-35728
 Project: Flink
  Issue Type: Bug
  Components: Test Infrastructure
Affects Versions: 1.19.1, 1.18.1, 2.0.0, 1.20.0
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=60533&view=logs&j=87489130-75dc-54e4-1f45-80c30aa367a3&t=efbee0b1-38ac-597d-6466-1ea8fc908c50&l=8931

{code}
Jun 28 02:16:31 Detected machine: x86_64
Jun 28 02:16:31 download miniconda from 
https://repo.continuum.io/miniconda/Miniconda3-py310_23.5.2-0-Linux-x86_64.sh...
Jun 28 02:16:32 Download failed.You can try again
Jun 28 02:16:34 No taskexecutor daemon to stop on host fv-az43-235.
Jun 28 02:16:36 No standalonesession daemon to stop on host fv-az43-235.
Jun 28 02:16:36 [FAIL] Test script contains errors.
Jun 28 02:16:36 Checking of logs skipped.
Jun 28 02:16:36
Jun 28 02:16:36 [FAIL] 'PyFlink end-to-end test' failed after 0 minutes and 6 
seconds! Test exited with exit code 1
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-35727) "Run kubernetes pyflink application test" failed due to access denied issue

2024-06-28 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-35727:
-

 Summary: "Run kubernetes pyflink application test" failed due to 
access denied issue
 Key: FLINK-35727
 URL: https://issues.apache.org/jira/browse/FLINK-35727
 Project: Flink
  Issue Type: Bug
  Components: Test Infrastructure
Affects Versions: 1.19.1, 1.18.1, 2.0.0, 1.20.0
Reporter: Matthias Pohl


{{Run kubernetes pyflink application test}} fails due to some permission issue:

{code}
Jun 28 10:46:15   Volumes:
Jun 28 10:46:15user-artifacts-volume:
Jun 28 10:46:15   Normal   Scheduled  61sdefault-scheduler  
Successfully assigned 
default/flink-native-k8s-pyflink-application-1-55b44fdbff-jscks to fv-az86-828
Jun 28 10:46:15   Normal   Pulling20s (x3 over 60s)  kubelet
Pulling image "test_kubernetes_pyflink_application"
Jun 28 10:46:15   Warning  Failed 19s (x3 over 59s)  kubelet
Failed to pull image "test_kubernetes_pyflink_application": rpc error: code = 
Unknown desc = Error response from daemon: pull access denied for 
test_kubernetes_pyflink_application, repository does not exist or may require 
'docker login': denied: requested access to the resource is denied
Jun 28 10:46:15   Warning  Failed 19s (x3 over 59s)  kubelet
Error: ErrImagePull
Jun 28 10:46:15   Normal   BackOff7s (x4 over 59s)   kubelet
Back-off pulling image "test_kubernetes_pyflink_application"
Jun 28 10:46:15   Warning  Failed 7s (x4 over 59s)   kubelet
Error: ImagePullBackOff
{code}

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=60538&view=logs&j=bea52777-eaf8-5663-8482-18fbc3630e81&t=43ba8ce7-ebbf-57cd-9163-444305d74117&l=10846



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-35722) CoordinatorEventsToStreamOperatorRecipientExactlyOnceITCase.testCheckpoint fails because of missed operator event

2024-06-28 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-35722:
-

 Summary: 
CoordinatorEventsToStreamOperatorRecipientExactlyOnceITCase.testCheckpoint 
fails because of missed operator event
 Key: FLINK-35722
 URL: https://issues.apache.org/jira/browse/FLINK-35722
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 2.0.0, 1.20.0
Reporter: Matthias Pohl


A test instability in 
{{CoordinatorEventsToStreamOperatorRecipientExactlyOnceITCase.testCheckpoint}} 
was observed where an expected {{OperatorEvent}} was missed:
{code:java}
Test 
org.apache.flink.streaming.runtime.tasks.CoordinatorEventsToStreamOperatorRecipientExactlyOnceITCase.testCheckpoint[testCheckpoint()]
 failed with:
java.lang.AssertionError:
Expecting actual:
  [0,
    1,
    3,
    4,
[...]
    98,
    99]
to contain exactly (and in same order):
  [0,
    1,
    2,
    3,
    4,
[...]
but could not find the following elements:
  [2]        at 
org.apache.flink.runtime.operators.coordination.CoordinatorEventsExactlyOnceITCase.checkListContainsSequence(CoordinatorEventsExactlyOnceITCase.java:175)
        at 
org.apache.flink.streaming.runtime.tasks.CoordinatorEventsToStreamOperatorRecipientExactlyOnceITCase.executeAndVerifyResults(CoordinatorEventsToStreamOperatorRecipientExactlyOnceITCase.java:178)
        at 
org.apache.flink.streaming.runtime.tasks.CoordinatorEventsToStreamOperatorRecipientExactlyOnceITCase.testCheckpoint(CoordinatorEventsToStreamOperatorRecipientExactlyOnceITCase.java:124)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) {code}
The [build 
failure|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=60530&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8]
 happened on commit 
[2e853ce39a|https://github.com/flink-ci/flink/commit/2e853ce39aa2db8212402de3dcc0f049397887fd]
 for FLINK-35552.

I attached the logs for further investigation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-35553) Integrate newly added trigger interface with checkpointing

2024-06-07 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-35553:
-

 Summary: Integrate newly added trigger interface with checkpointing
 Key: FLINK-35553
 URL: https://issues.apache.org/jira/browse/FLINK-35553
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Checkpointing, Runtime / Coordination
Reporter: Matthias Pohl


This connects the newly introduced trigger logic (FLINK-35551) with the newly 
added checkpoint lifecycle listening feature (FLINK-35552).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-35552) Move CheckpointStatsTracker out of ExecutionGraph into Scheduler

2024-06-07 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-35552:
-

 Summary: Move CheckpointStatsTracker out of ExecutionGraph into 
Scheduler
 Key: FLINK-35552
 URL: https://issues.apache.org/jira/browse/FLINK-35552
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Checkpointing, Runtime / Coordination
Reporter: Matthias Pohl


The scheduler needs to know about the CheckpointStatsTracker to allow listening 
to checkpoint failures and completion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-35551) Introduces RescaleManager#onTrigger endpoint

2024-06-07 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-35551:
-

 Summary: Introduces RescaleManager#onTrigger endpoint
 Key: FLINK-35551
 URL: https://issues.apache.org/jira/browse/FLINK-35551
 Project: Flink
  Issue Type: Sub-task
Reporter: Matthias Pohl


The new endpoint would allow use from separating observing change events from 
actually triggering the rescale operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-35550) Introduce new component RescaleManager

2024-06-07 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-35550:
-

 Summary: Introduce new component RescaleManager
 Key: FLINK-35550
 URL: https://issues.apache.org/jira/browse/FLINK-35550
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Matthias Pohl


The goal here is to collect the rescaling logic in a single component to 
improve testability.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-35549) FLIP-461: Synchronize rescaling with checkpoint creation to minimize reprocessing for the AdaptiveScheduler

2024-06-07 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-35549:
-

 Summary: FLIP-461: Synchronize rescaling with checkpoint creation 
to minimize reprocessing for the AdaptiveScheduler
 Key: FLINK-35549
 URL: https://issues.apache.org/jira/browse/FLINK-35549
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Checkpointing, Runtime / Coordination
Affects Versions: 1.20.0
Reporter: Matthias Pohl


This is the umbrella issue for implementing 
[FLIP-461|https://cwiki.apache.org/confluence/display/FLINK/FLIP-461%3A+Synchronize+rescaling+with+checkpoint+creation+to+minimize+reprocessing+for+the+AdaptiveScheduler]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-35000) PullRequest template doesn't use the correct format to refer to the testing code convention

2024-04-03 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-35000:
-

 Summary: PullRequest template doesn't use the correct format to 
refer to the testing code convention
 Key: FLINK-35000
 URL: https://issues.apache.org/jira/browse/FLINK-35000
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI, Project Website
Affects Versions: 1.18.1, 1.19.0, 1.20.0
Reporter: Matthias Pohl


The PR template refers to 
https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
 rather than 
https://flink.apache.org/how-to-contribute/code-style-and-quality-common/#7-testing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34999) PR CI stopped operating

2024-04-03 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34999:
-

 Summary: PR CI stopped operating
 Key: FLINK-34999
 URL: https://issues.apache.org/jira/browse/FLINK-34999
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 1.18.1, 1.19.0, 1.20.0
Reporter: Matthias Pohl


There are no [new PR CI 
runs|https://dev.azure.com/apache-flink/apache-flink/_build?definitionId=2] 
being picked up anymore. [Recently updated 
PRs|https://github.com/apache/flink/pulls?q=sort%3Aupdated-desc] are not picked 
up by the @flinkbot.

In the meantime there was a notification sent from GitHub that the password of 
the @flinkbot was reset for security reasons. It's quite likely that these two 
events are related.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34989) Apache Infra requests to reduce the runner usage for a project

2024-04-02 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34989:
-

 Summary: Apache Infra requests to reduce the runner usage for a 
project
 Key: FLINK-34989
 URL: https://issues.apache.org/jira/browse/FLINK-34989
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / CI
Affects Versions: 1.18.1, 1.19.0, 1.20.0
Reporter: Matthias Pohl


The GitHub Actions CI utilizes runners that are hosted by Apache Infra right 
now. These runners are limited. The runner usage can be monitored via the 
following links:
* [Flink-specific 
report|https://infra-reports.apache.org/#ghactions&project=flink&hours=168] 
(needs ASF committer rights) This project-specific report can only be modified 
through the HTTP GET parameters of the URL.
* [Global report|https://infra-reports.apache.org/#ghactions] (needs ASF 
membership)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34988) Class loading issues in JDK17 and JDK21

2024-04-02 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34988:
-

 Summary: Class loading issues in JDK17 and JDK21
 Key: FLINK-34988
 URL: https://issues.apache.org/jira/browse/FLINK-34988
 Project: Flink
  Issue Type: Bug
  Components: API / DataStream
Affects Versions: 1.20.0
Reporter: Matthias Pohl


* JDK 17 (core; NoClassDefFoundError caused by ExceptionInInitializeError): 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58676&view=logs&j=675bf62c-8558-587e-2555-dcad13acefb5&t=5878eed3-cc1e-5b12-1ed0-9e7139ce0992&l=12942
* JDK 17 (misc; ExceptionInInitializeError): 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58676&view=logs&j=d871f0ce-7328-5d00-023b-e7391f5801c8&t=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6&l=22548
* JDK 21 (core; same as above): 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58676&view=logs&j=d06b80b4-9e88-5d40-12a2-18072cf60528&t=609ecd5a-3f6e-5d0c-2239-2096b155a4d0&l=12963
* JDK 21 (misc; same as above): 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58676&view=logs&j=59a2b95a-736b-5c46-b3e0-cee6e587fd86&t=c301da75-e699-5c06-735f-778207c16f50&l=22506



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34961) GitHub Actions statistcs can be monitored per workflow name

2024-03-28 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34961:
-

 Summary: GitHub Actions statistcs can be monitored per workflow 
name
 Key: FLINK-34961
 URL: https://issues.apache.org/jira/browse/FLINK-34961
 Project: Flink
  Issue Type: Improvement
  Components: Build System / CI
Reporter: Matthias Pohl


Apache Infra allows the monitoring of runner usage per workflow (see [report 
for 
Flink|https://infra-reports.apache.org/#ghactions&project=flink&hours=168&limit=10];
  only accessible with Apache committer rights). They accumulate the data by 
workflow name. The Flink space has multiple repositories that use the generic 
workflow name {{CI}}). That makes the differentiation in the report harder.

This Jira issue is about identifying all Flink-related projects with a CI 
workflow (Kubernetes operator and the JDBC connector were identified, for 
instance) and adding a more distinct name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34940) LeaderContender implementations handle invalid state

2024-03-26 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34940:
-

 Summary: LeaderContender implementations handle invalid state
 Key: FLINK-34940
 URL: https://issues.apache.org/jira/browse/FLINK-34940
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Coordination
Reporter: Matthias Pohl


Currently, LeaderContender implementations (e.g. see 
[ResourceManagerServiceImplTest#grantLeadership_withExistingLeader_waitTerminationOfExistingLeader|https://github.com/apache/flink/blob/master/flink-runtime/src/test/java/org/apache/flink/runtime/resourcemanager/ResourceManagerServiceImplTest.java#L219])
 allow the handling of leader events of the same type happening after each 
other which shouldn't be the case.

Two subsequent leadership grants indicate that the leading instance which 
received the leadership grant again missed the leadership revocation event 
causing an invalid state of the overall deployment (i.e. split brain scenario). 
We should fail fatally in these scenarios rather than handling them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34939) Harden TestingLeaderElection

2024-03-26 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34939:
-

 Summary: Harden TestingLeaderElection
 Key: FLINK-34939
 URL: https://issues.apache.org/jira/browse/FLINK-34939
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.18.1, 1.19.0, 1.20.0
Reporter: Matthias Pohl


The {{TestingLeaderElection}} implementation does not follow the interface 
contract of {{LeaderElection}} in all of its facets (e.g. leadership acquire 
and revocation events should be alternating).

This issue is about hardening {{LeaderElection}} contract in the test 
implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34937) Apache Infra GHA policy update

2024-03-26 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34937:
-

 Summary: Apache Infra GHA policy update
 Key: FLINK-34937
 URL: https://issues.apache.org/jira/browse/FLINK-34937
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 1.18.1, 1.19.0, 1.20.0
Reporter: Matthias Pohl


There is a policy update [announced in the infra 
ML|https://lists.apache.org/thread/6qw21x44q88rc3mhkn42jgjjw94rsvb1] which 
asked Apache projects to limit the number of runners per job. Additionally, the 
[GHA policy|https://infra.apache.org/github-actions-policy.html] is referenced 
which I wasn't aware of when working on the action workflow.

This issue is about applying the policy to the Flink GHA workflows.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34933) JobMasterServiceLeadershipRunnerTest#testResultFutureCompletionOfOutdatedLeaderIsIgnored isn't implemented properly

2024-03-25 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34933:
-

 Summary: 
JobMasterServiceLeadershipRunnerTest#testResultFutureCompletionOfOutdatedLeaderIsIgnored
 isn't implemented properly
 Key: FLINK-34933
 URL: https://issues.apache.org/jira/browse/FLINK-34933
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.18.1, 1.19.0, 1.17.2, 1.20.0
Reporter: Matthias Pohl


{{testResultFutureCompletionOfOutdatedLeaderIsIgnored}} doesn't test the 
desired behavior: The {{TestingJobMasterService#closeAsync()}} callback throws 
an {{UnsupportedOperationException}} by default which prevents the test from 
properly finalizing the leadership revocation.

The test is still passing because the test checks implicitly for this error. 
Instead, we should verify that the runner's resultFuture doesn't complete until 
the runner is closed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34921) SystemProcessingTimeServiceTest fails due to missing output

2024-03-22 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34921:
-

 Summary: SystemProcessingTimeServiceTest fails due to missing 
output
 Key: FLINK-34921
 URL: https://issues.apache.org/jira/browse/FLINK-34921
 Project: Flink
  Issue Type: Bug
  Components: API / DataStream
Affects Versions: 1.20.0
Reporter: Matthias Pohl


This PR CI build with {{AdaptiveScheduler}} enabled failed:
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58476&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=11224

{code}
"ForkJoinPool-61-worker-25" #863 daemon prio=5 os_prio=0 tid=0x7f8c19eba000 
nid=0x60a5 waiting on condition [0x7f8bc2cf9000]
Mar 21 17:19:42java.lang.Thread.State: WAITING (parking)
Mar 21 17:19:42 at sun.misc.Unsafe.park(Native Method)
Mar 21 17:19:42 - parking to wait for  <0xd81959b8> (a 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
Mar 21 17:19:42 at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
Mar 21 17:19:42 at 
java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
Mar 21 17:19:42 at 
java.util.concurrent.FutureTask.get(FutureTask.java:191)
Mar 21 17:19:42 at 
org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeServiceTest$$Lambda$1443/1477662666.call(Unknown
 Source)
Mar 21 17:19:42 at 
org.assertj.core.api.ThrowableAssert.catchThrowable(ThrowableAssert.java:63)
Mar 21 17:19:42 at 
org.assertj.core.api.AssertionsForClassTypes.catchThrowable(AssertionsForClassTypes.java:892)
Mar 21 17:19:42 at 
org.assertj.core.api.Assertions.catchThrowable(Assertions.java:1366)
Mar 21 17:19:42 at 
org.assertj.core.api.Assertions.assertThatThrownBy(Assertions.java:1210)
Mar 21 17:19:42 at 
org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeServiceTest.testQuiesceAndAwaitingCancelsScheduledAtFixRateFuture(SystemProcessingTimeServiceTest.java:92)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34897) JobMasterServiceLeadershipRunnerTest#testJobMasterServiceLeadershipRunnerCloseWhenElectionServiceGrantLeaderShip needs to be enabled again

2024-03-20 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34897:
-

 Summary: 
JobMasterServiceLeadershipRunnerTest#testJobMasterServiceLeadershipRunnerCloseWhenElectionServiceGrantLeaderShip
 needs to be enabled again
 Key: FLINK-34897
 URL: https://issues.apache.org/jira/browse/FLINK-34897
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Coordination
Affects Versions: 1.18.1, 1.19.0, 1.17.2, 1.20.0
Reporter: Matthias Pohl


While working on FLINK-34672 I noticed that 
{{JobMasterServiceLeadershipRunnerTest#testJobMasterServiceLeadershipRunnerCloseWhenElectionServiceGrantLeaderShip}}
 is disabled without a reason.

It looks like I disabled it accidentally as part of FLINK-31783.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34695) Move Flink's CI docker container into a public repo

2024-03-15 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34695:
-

 Summary: Move Flink's CI docker container into a public repo
 Key: FLINK-34695
 URL: https://issues.apache.org/jira/browse/FLINK-34695
 Project: Flink
  Issue Type: Improvement
  Components: Build System / CI
Affects Versions: 1.18.1, 1.19.0, 1.20.0
Reporter: Matthias Pohl


Currently, Flink's CI (GitHub Actions and Azure Pipelines) use a container to 
run the logic. The intention behind it is to have a way to mimick the CI setup 
locally as well.

The current Docker image is maintained from the 
[zentol/flink-ci-docker|https://github.com/zentol/flink-ci-docker] fork (owned 
by [~chesnay]) of 
[flink-ci/flink-ci-docker|https://github.com/flink-ci/flink-ci-docker] (owned 
by Ververica) which is not ideal. We should move this repo into a Apache-owned 
repository.

Additionally, the there's no workflow pushing the image automatically to a 
registry from where it can be used. Instead, the images were pushed to personal 
Docker Hub repos in the past (rmetzger, chesnay, mapohl). This is also not 
ideal. We should use a public repo using a GHA workflow to push the image to 
that repo.

Questions to answer here:
# Where shall the Docker image code be located?
# Which Docker registry should be used?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34646) AggregateITCase.testDistinctWithRetract timed out

2024-03-11 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34646:
-

 Summary: AggregateITCase.testDistinctWithRetract timed out
 Key: FLINK-34646
 URL: https://issues.apache.org/jira/browse/FLINK-34646
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Runtime
Affects Versions: 1.18.1
Reporter: Matthias Pohl


https://github.com/apache/flink/actions/runs/8211401561/job/22460442229#step:10:17161
{code}
"main" #1 prio=5 os_prio=0 tid=0x7f70abeb7000 nid=0x4cff3 waiting on 
condition [0x7f70ac3f6000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xcd24c690> (a 
java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
at 
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
at 
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2131)
at 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2099)
at 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2077)
at 
org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:876)
at 
org.apache.flink.table.planner.runtime.stream.sql.AggregateITCase.testDistinctWithRetract(AggregateITCase.scala:345)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[...]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34645) StreamArrowPythonGroupWindowAggregateFunctionOperatorTest.testFinishBundleTriggeredByCount fails

2024-03-11 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34645:
-

 Summary: 
StreamArrowPythonGroupWindowAggregateFunctionOperatorTest.testFinishBundleTriggeredByCount
 fails
 Key: FLINK-34645
 URL: https://issues.apache.org/jira/browse/FLINK-34645
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Runtime
Affects Versions: 1.18.1
Reporter: Matthias Pohl


{code}
Error: 02:27:17 02:27:17.025 [ERROR] Tests run: 3, Failures: 1, Errors: 0, 
Skipped: 0, Time elapsed: 0.658 s <<< FAILURE! - in 
org.apache.flink.table.runtime.operators.python.aggregate.arrow.stream.StreamArrowPythonGroupWindowAggregateFunctionOperatorTest
Error: 02:27:17 02:27:17.025 [ERROR] 
org.apache.flink.table.runtime.operators.python.aggregate.arrow.stream.StreamArrowPythonGroupWindowAggregateFunctionOperatorTest.testFinishBundleTriggeredByCount
  Time elapsed: 0.3 s  <<< FAILURE!
Mar 09 02:27:17 java.lang.AssertionError: 
Mar 09 02:27:17 
Mar 09 02:27:17 Expected size: 8 but was: 6 in:
Mar 09 02:27:17 [Record @ (undef) : 
+I(c1,0,1969-12-31T23:59:55,1970-01-01T00:00:05),
Mar 09 02:27:17 Record @ (undef) : 
+I(c2,3,1969-12-31T23:59:55,1970-01-01T00:00:05),
Mar 09 02:27:17 Record @ (undef) : 
+I(c2,3,1970-01-01T00:00,1970-01-01T00:00:10),
Mar 09 02:27:17 Record @ (undef) : 
+I(c1,0,1970-01-01T00:00,1970-01-01T00:00:10),
Mar 09 02:27:17 Watermark @ 1,
Mar 09 02:27:17 Watermark @ 2]
Mar 09 02:27:17 at 
org.apache.flink.table.runtime.util.RowDataHarnessAssertor.assertOutputEquals(RowDataHarnessAssertor.java:110)
Mar 09 02:27:17 at 
org.apache.flink.table.runtime.util.RowDataHarnessAssertor.assertOutputEquals(RowDataHarnessAssertor.java:70)
Mar 09 02:27:17 at 
org.apache.flink.table.runtime.operators.python.aggregate.arrow.ArrowPythonAggregateFunctionOperatorTestBase.assertOutputEquals(ArrowPythonAggregateFunctionOperatorTestBase.java:62)
Mar 09 02:27:17 at 
org.apache.flink.table.runtime.operators.python.aggregate.arrow.stream.StreamArrowPythonGroupWindowAggregateFunctionOperatorTest.testFinishBundleTriggeredByCount(StreamArrowPythonGroupWindowAggregateFunctionOperatorTest.java:326)
Mar 09 02:27:17 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
[...]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34644) RestServerEndpointITCase.testShouldWaitForHandlersWhenClosing failed with ConnectionClosedException

2024-03-11 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34644:
-

 Summary: 
RestServerEndpointITCase.testShouldWaitForHandlersWhenClosing failed with 
ConnectionClosedException
 Key: FLINK-34644
 URL: https://issues.apache.org/jira/browse/FLINK-34644
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.20.0
Reporter: Matthias Pohl


https://github.com/apache/flink/actions/runs/8189958608/job/22396362238#step:10:9215

{code}
Error: 15:13:33 15:13:33.779 [ERROR] Tests run: 68, Failures: 0, Errors: 1, 
Skipped: 4, Time elapsed: 17.81 s <<< FAILURE! -- in 
org.apache.flink.runtime.rest.RestServerEndpointITCase
Error: 15:13:33 15:13:33.779 [ERROR] 
org.apache.flink.runtime.rest.RestServerEndpointITCase.testShouldWaitForHandlersWhenClosing
 -- Time elapsed: 0.329 s <<< ERROR!
Mar 07 15:13:33 java.util.concurrent.ExecutionException: 
org.apache.flink.runtime.rest.ConnectionClosedException: Channel became 
inactive.
Mar 07 15:13:33 at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
Mar 07 15:13:33 at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
Mar 07 15:13:33 at 
org.apache.flink.runtime.rest.RestServerEndpointITCase.testShouldWaitForHandlersWhenClosing(RestServerEndpointITCase.java:592)
Mar 07 15:13:33 at java.lang.reflect.Method.invoke(Method.java:498)
Mar 07 15:13:33 at 
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
Mar 07 15:13:33 at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
Mar 07 15:13:33 at 
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
Mar 07 15:13:33 at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
Mar 07 15:13:33 at 
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
Mar 07 15:13:33 at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
Mar 07 15:13:33 at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
Mar 07 15:13:33 at 
java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
Mar 07 15:13:33 at 
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
Mar 07 15:13:33 at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
Mar 07 15:13:33 at 
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
Mar 07 15:13:33 at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
Mar 07 15:13:33 at 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
Mar 07 15:13:33 at 
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
Mar 07 15:13:33 at 
java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:272)
Mar 07 15:13:33 at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
Mar 07 15:13:33 at 
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
Mar 07 15:13:33 at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
Mar 07 15:13:33 at 
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
Mar 07 15:13:33 at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
Mar 07 15:13:33 at 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
Mar 07 15:13:33 at 
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
Mar 07 15:13:33 at 
java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
Mar 07 15:13:33 at 
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
Mar 07 15:13:33 at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
Mar 07 15:13:33 at 
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
Mar 07 15:13:33 at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
Mar 07 15:13:33 Caused by: 
org.apache.flink.runtime.rest.ConnectionClosedException: Channel became 
inactive.
Mar 07 15:13:33 at 
org.apache.flink.runtime.rest.RestClient$ClientHandler.channelInactive(RestClient.java:749)
Mar 07 15:13:33 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:305)
Mar 07 15:13:33 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281)
Mar 07 15:13:33 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274)
Mar 07 15:13:33

[jira] [Created] (FLINK-34643) JobIDLoggingITCase failed

2024-03-11 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34643:
-

 Summary: JobIDLoggingITCase failed
 Key: FLINK-34643
 URL: https://issues.apache.org/jira/browse/FLINK-34643
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.20.0
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58187&view=logs&j=8fd9202e-fd17-5b26-353c-ac1ff76c8f28&t=ea7cf968-e585-52cb-e0fc-f48de023a7ca&l=7897

{code}
Mar 09 01:24:23 01:24:23.498 [ERROR] Tests run: 1, Failures: 0, Errors: 1, 
Skipped: 0, Time elapsed: 4.209 s <<< FAILURE! -- in 
org.apache.flink.test.misc.JobIDLoggingITCase
Mar 09 01:24:23 01:24:23.498 [ERROR] 
org.apache.flink.test.misc.JobIDLoggingITCase.testJobIDLogging(ClusterClient) 
-- Time elapsed: 1.459 s <<< ERROR!
Mar 09 01:24:23 java.lang.IllegalStateException: Too few log events recorded 
for org.apache.flink.runtime.jobmaster.JobMaster (12) - this must be a bug in 
the test code
Mar 09 01:24:23 at 
org.apache.flink.util.Preconditions.checkState(Preconditions.java:215)
Mar 09 01:24:23 at 
org.apache.flink.test.misc.JobIDLoggingITCase.assertJobIDPresent(JobIDLoggingITCase.java:148)
Mar 09 01:24:23 at 
org.apache.flink.test.misc.JobIDLoggingITCase.testJobIDLogging(JobIDLoggingITCase.java:132)
Mar 09 01:24:23 at java.lang.reflect.Method.invoke(Method.java:498)
Mar 09 01:24:23 at 
java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
Mar 09 01:24:23 at 
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
Mar 09 01:24:23 at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
Mar 09 01:24:23 at 
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
Mar 09 01:24:23 at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
Mar 09 01:24:23 
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34589) FineGrainedSlotManager doesn't handle errors in the resource reconcilliation step

2024-03-06 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34589:
-

 Summary: FineGrainedSlotManager doesn't handle errors in the 
resource reconcilliation step
 Key: FLINK-34589
 URL: https://issues.apache.org/jira/browse/FLINK-34589
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.18.1, 1.19.0, 1.20.0
Reporter: Matthias Pohl


I noticed during my work on FLINK-34427 that the reconcilliation is scheduled 
periodically when starting the {{SlotManager}}. But it doesn't handle errors in 
this step. I see two options here:
1. Fail fatally because such an error might indicate a major issue with the RM 
backend.
2. Log the failure and continue the scheduled task even in case of an error.

My understanding is that we're just not able to recreate TaskManagers which 
should be a transient issue and could be resolved in the backend (YARN, k8s). 
That's why I would lean towards option 2.

[~xtsong] WDYT?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34588) FineGrainedSlotManager checks whether resources need to reconcile but doesn't act on the result

2024-03-06 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34588:
-

 Summary: FineGrainedSlotManager checks whether resources need to 
reconcile but doesn't act on the result
 Key: FLINK-34588
 URL: https://issues.apache.org/jira/browse/FLINK-34588
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.18.1, 1.19.0, 1.20.0
Reporter: Matthias Pohl


There are a few locations in {{FineGrainedSlotManager}} where we check whether 
resources can/need to be reconciled but don't care about the result and just 
trigger the resource update (e.g. in 
[FineGrainedSlotManager:620|https://github.com/apache/flink/blob/c0d3e495f4c2316a80f251de77b05b943b5be1f8/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/FineGrainedSlotManager.java#L620]
 and 
[FineGrainedSlotManager:676|https://github.com/apache/flink/blob/c0d3e495f4c2316a80f251de77b05b943b5be1f8/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/FineGrainedSlotManager.java#L676]).
 Looks like we could reduce the calls to the backend here.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34571) SortMergeResultPartitionReadSchedulerTest.testOnReadBufferRequestError failed due an assertion

2024-03-03 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34571:
-

 Summary: 
SortMergeResultPartitionReadSchedulerTest.testOnReadBufferRequestError failed 
due an assertion
 Key: FLINK-34571
 URL: https://issues.apache.org/jira/browse/FLINK-34571
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network
Affects Versions: 1.19.0, 1.20.0
Reporter: Matthias Pohl


https://github.com/apache/flink/actions/runs/8134965216/job/8875618#step:10:8586
{code}
Error: 02:39:36 02:39:36.688 [ERROR] Tests run: 9, Failures: 1, Errors: 0, 
Skipped: 0, Time elapsed: 13.68 s <<< FAILURE! -- in 
org.apache.flink.runtime.io.network.partition.SortMergeResultPartitionReadSchedulerTest
Error: 02:39:36 02:39:36.689 [ERROR] 
org.apache.flink.runtime.io.network.partition.SortMergeResultPartitionReadSchedulerTest.testOnReadBufferRequestError
 -- Time elapsed: 0.174 s <<< FAILURE!
Mar 04 02:39:36 org.opentest4j.AssertionFailedError: 
Mar 04 02:39:36 
Mar 04 02:39:36 Expecting value to be true but was false
Mar 04 02:39:36 at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
Mar 04 02:39:36 at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
Mar 04 02:39:36 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
Mar 04 02:39:36 at 
org.apache.flink.runtime.io.network.partition.SortMergeResultPartitionReadSchedulerTest.testOnReadBufferRequestError(SortMergeResultPartitionReadSchedulerTest.java:225)
Mar 04 02:39:36 at java.lang.reflect.Method.invoke(Method.java:498)
Mar 04 02:39:36 at 
java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
Mar 04 02:39:36 at 
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
Mar 04 02:39:36 at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
Mar 04 02:39:36 at 
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
Mar 04 02:39:36 at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34570) JoinITCase.testLeftJoinWithEqualPk times out

2024-03-03 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34570:
-

 Summary: JoinITCase.testLeftJoinWithEqualPk times out
 Key: FLINK-34570
 URL: https://issues.apache.org/jira/browse/FLINK-34570
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.18.1
Reporter: Matthias Pohl


https://github.com/apache/flink/actions/runs/8127069912/job/22211928085#step:10:14479

{code}
"main" #1 prio=5 os_prio=0 tid=0x7ff4ae2b7000 nid=0x2168b waiting on 
condition [0x7ff4affdc000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xab096950> (a 
java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
at 
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
at 
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2131)
at 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2099)
at 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2077)
at 
org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:876)
at 
org.apache.flink.table.planner.runtime.stream.sql.JoinITCase.testLeftJoinWithEqualPk(JoinITCase.scala:705)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[...]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34569) 'Streaming File Sink s3 end-to-end test' failed

2024-03-03 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34569:
-

 Summary: 'Streaming File Sink s3 end-to-end test' failed
 Key: FLINK-34569
 URL: https://issues.apache.org/jira/browse/FLINK-34569
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.19.0
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58026&view=logs&j=af184cdd-c6d8-5084-0b69-7e9c67b35f7a&t=0f3adb59-eefa-51c6-2858-3654d9e0749d&l=3957

{code}
Mar 02 04:12:57 Waiting until all values have been produced
Unable to find image 'stedolan/jq:latest' locally
Error: No such container: 
docker: Error response from daemon: Get "https://registry-1.docker.io/v2/": 
read tcp 10.1.0.97:42214->54.236.113.205:443: read: connection reset by peer.
See 'docker run --help'.
Mar 02 04:12:58 Number of produced values 0/6
Error: No such container: 
Unable to find image 'stedolan/jq:latest' locally
latest: Pulling from stedolan/jq
[DEPRECATION NOTICE] Docker Image Format v1, and Docker Image manifest version 
2, schema 1 support will be removed in an upcoming release. Suggest the author 
of docker.io/stedolan/jq:latest to upgrade the image to the OCI Format, or 
Docker Image manifest v2, schema 2. More information at 
https://docs.docker.com/go/deprecated-image-specs/
237d5fcd25cf: Pulling fs layer
[...]
4dae4fd48813: Pull complete
Digest: sha256:a61ed0bca213081b64be94c5e1b402ea58bc549f457c2682a86704dd55231e09
Status: Downloaded newer image for stedolan/jq:latest
parse error: Invalid numeric literal at line 1, column 6
Error: No such container: 
parse error: Invalid numeric literal at line 1, column 6
Error: No such container: 
parse error: Invalid numeric literal at line 1, column 6
[...]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34568) YarnFileStageTest.destroyHDFS timed out

2024-03-03 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34568:
-

 Summary: YarnFileStageTest.destroyHDFS timed out
 Key: FLINK-34568
 URL: https://issues.apache.org/jira/browse/FLINK-34568
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hadoop Compatibility
Affects Versions: 1.17.2
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58024&view=logs&j=5cae8624-c7eb-5c51-92d3-4d2dacedd221&t=5acec1b4-945b-59ca-34f8-168928ce5199&l=26698

{code}
Mar 02 07:28:56 "Listener at localhost/33933" #25 daemon prio=5 os_prio=0 
tid=0x7f08490be000 nid=0x12cae runnable [0x7f082ebfc000]
Mar 02 07:28:56java.lang.Thread.State: RUNNABLE
Mar 02 07:28:56 at 
org.mortbay.io.nio.SelectorManager$SelectSet.stop(SelectorManager.java:879)
Mar 02 07:28:56 - locked <0xd7ae0030> (a 
org.mortbay.io.nio.SelectorManager$SelectSet)
[...]
Mar 02 07:28:56 at 
org.apache.hadoop.hdfs.MiniDFSCluster.stopAndJoinNameNode(MiniDFSCluster.java:2123)
Mar 02 07:28:56 at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:2060)
Mar 02 07:28:56 at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:2031)
Mar 02 07:28:56 at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:2024)
Mar 02 07:28:56 at 
org.apache.flink.yarn.YarnFileStageTest.destroyHDFS(YarnFileStageTest.java:90)
[...]
{code}

Looks like a HDFS issue during shutdown? This will most likely also affect 
newer versions because there was not much done in the Yarn space since 1.17 
(hadoop was bumped in 1.17 itself; FLINK-29710).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34560) JoinITCase seems to fail on a broader scale (MiniCluster issue?)

2024-03-01 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34560:
-

 Summary: JoinITCase seems to fail on a broader scale (MiniCluster 
issue?)
 Key: FLINK-34560
 URL: https://issues.apache.org/jira/browse/FLINK-34560
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.18.1
Reporter: Matthias Pohl


https://github.com/apache/flink/actions/runs/8105495458/job/22154140154#step:10:11906

It still needs to be investigated what's the actual cause here.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34551) Align retry mechanisms of FutureUtils

2024-02-29 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34551:
-

 Summary: Align retry mechanisms of FutureUtils
 Key: FLINK-34551
 URL: https://issues.apache.org/jira/browse/FLINK-34551
 Project: Flink
  Issue Type: Technical Debt
  Components: API / Core
Affects Versions: 1.20.0
Reporter: Matthias Pohl


The retry mechanisms of FutureUtils include quite a bit of redundant code which 
makes it hard to understand and to extend. The logic should be aligned properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34527) Deprecate Time classes also in PyFlink

2024-02-27 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34527:
-

 Summary: Deprecate Time classes also in PyFlink
 Key: FLINK-34527
 URL: https://issues.apache.org/jira/browse/FLINK-34527
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.20.0
Reporter: Matthias Pohl


FLINK-32570 deprecated the Time classes. But we missed touched the 
PyFlink-related APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34514) e2e (1) times out because of an error that's most likely caused by a networking issue

2024-02-25 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34514:
-

 Summary: e2e (1) times out because of an error that's most likely 
caused by a networking issue
 Key: FLINK-34514
 URL: https://issues.apache.org/jira/browse/FLINK-34514
 Project: Flink
  Issue Type: Bug
  Components: Test Infrastructure
Affects Versions: 1.20.0
Reporter: Matthias Pohl


https://github.com/apache/flink/actions/runs/8027473891/job/21931649433

{code}
Sat, 24 Feb 2024 03:35:54 GMT
ERROR: failed to solve: process "/bin/sh -c set -ex;   wget -nv -O 
/usr/local/bin/gosu 
\"https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-$(dpkg 
--print-architecture)\";   wget -nv -O /usr/local/bin/gosu.asc 
\"https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-$(dpkg 
--print-architecture).asc\";   export GNUPGHOME=\"$(mktemp -d)\";   for server 
in ha.pool.sks-keyservers.net $(shuf -e   
hkp://p80.pool.sks-keyservers.net:80   
keyserver.ubuntu.com   hkp://keyserver.ubuntu.com:80
   pgp.mit.edu) ; do   gpg --batch --keyserver 
\"$server\" --recv-keys B42F6819007F00F88E364FD4036A9C25BF357DD4 && break || : 
;   done &&   gpg --batch --verify /usr/local/bin/gosu.asc /usr/local/bin/gosu; 
  gpgconf --kill all;   rm -rf \"$GNUPGHOME\" /usr/local/bin/gosu.asc;   chmod 
+x /usr/local/bin/gosu;   gosu nobody true" did not complete successfully: exit 
code: 4
Sat, 24 Feb 2024 07:10:28 GMT
==
Sat, 24 Feb 2024 07:10:28 GMT
=== WARNING: This task took already 95% of the available time budget of 299 
minutes ===
Sat, 24 Feb 2024 07:10:28 GMT
==
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34513) GroupAggregateRestoreTest.testRestore fails

2024-02-25 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34513:
-

 Summary: GroupAggregateRestoreTest.testRestore fails
 Key: FLINK-34513
 URL: https://issues.apache.org/jira/browse/FLINK-34513
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.20.0
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57828&view=logs&j=26b84117-e436-5720-913e-3e280ce55cae&t=77cc7e77-39a0-5007-6d65-4137ac13a471&l=10881

{code}
Feb 24 01:12:01 01:12:01.384 [ERROR] Tests run: 10, Failures: 1, Errors: 0, 
Skipped: 1, Time elapsed: 2.957 s <<< FAILURE! -- in 
org.apache.flink.table.planner.plan.nodes.exec.stream.GroupAggregateRestoreTest
Feb 24 01:12:01 01:12:01.384 [ERROR] 
org.apache.flink.table.planner.plan.nodes.exec.stream.GroupAggregateRestoreTest.testRestore(TableTestProgram,
 ExecNodeMetadata)[4] -- Time elapsed: 0.653 s <<< FAILURE!
Feb 24 01:12:01 java.lang.AssertionError: 
Feb 24 01:12:01 
Feb 24 01:12:01 Expecting actual:
Feb 24 01:12:01   ["+I[3, 1, 2, 8, 31, 10.0, 3]",
Feb 24 01:12:01 "+I[2, 1, 4, 14, 42, 7.0, 6]",
Feb 24 01:12:01 "+I[1, 1, 4, 12, 24, 6.0, 4]",
Feb 24 01:12:01 "+U[2, 1, 4, 14, 57, 8.0, 7]",
Feb 24 01:12:01 "+U[1, 1, 4, 12, 32, 6.0, 5]",
Feb 24 01:12:01 "+I[7, 0, 1, 7, 7, 7.0, 1]",
Feb 24 01:12:01 "+U[2, 1, 4, 14, 57, 7.0, 7]",
Feb 24 01:12:01 "+U[1, 1, 4, 12, 32, 5.0, 5]",
Feb 24 01:12:01 "+U[3, 1, 2, 8, 31, 9.0, 3]",
Feb 24 01:12:01 "+U[7, 0, 1, 7, 7, 7.0, 2]"]
Feb 24 01:12:01 to contain exactly in any order:
Feb 24 01:12:01   ["+I[3, 1, 2, 8, 31, 10.0, 3]",
Feb 24 01:12:01 "+I[2, 1, 4, 14, 42, 7.0, 6]",
Feb 24 01:12:01 "+I[1, 1, 4, 12, 24, 6.0, 4]",
Feb 24 01:12:01 "+U[2, 1, 4, 14, 57, 8.0, 7]",
Feb 24 01:12:01 "+U[1, 1, 4, 12, 32, 6.0, 5]",
Feb 24 01:12:01 "+U[3, 1, 2, 8, 31, 9.0, 3]",
Feb 24 01:12:01 "+U[2, 1, 4, 14, 57, 7.0, 7]",
Feb 24 01:12:01 "+I[7, 0, 1, 7, 7, 7.0, 2]",
Feb 24 01:12:01 "+U[1, 1, 4, 12, 32, 5.0, 5]"]
Feb 24 01:12:01 elements not found:
Feb 24 01:12:01   ["+I[7, 0, 1, 7, 7, 7.0, 2]"]
Feb 24 01:12:01 and elements not expected:
Feb 24 01:12:01   ["+I[7, 0, 1, 7, 7, 7.0, 1]", "+U[7, 0, 1, 7, 7, 7.0, 2]"]
Feb 24 01:12:01 
Feb 24 01:12:01 at 
org.apache.flink.table.planner.plan.nodes.exec.testutils.RestoreTestBase.testRestore(RestoreTestBase.java:313)
Feb 24 01:12:01 at 
java.base/java.lang.reflect.Method.invoke(Method.java:580)
[...]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34508) Migrate S3-related ITCases and e2e tests to Minio

2024-02-23 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34508:
-

 Summary: Migrate S3-related ITCases and e2e tests to Minio 
 Key: FLINK-34508
 URL: https://issues.apache.org/jira/browse/FLINK-34508
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / CI
Affects Versions: 1.18.1, 1.19.0, 1.20.0
Reporter: Matthias Pohl


Anything that uses {{org.apache.flink.testutils.s3.S3TestCredentials}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34495) Resuming Savepoint (rocks, scale up, heap timers) end-to-end test failure

2024-02-21 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34495:
-

 Summary: Resuming Savepoint (rocks, scale up, heap timers) 
end-to-end test failure
 Key: FLINK-34495
 URL: https://issues.apache.org/jira/browse/FLINK-34495
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.20.0
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57760&view=logs&j=e9d3d34f-3d15-59f4-0e3e-35067d100dfe&t=5d91035e-8022-55f2-2d4f-ab121508bf7e&l=2010

I guess the failure occurred due to the existence of a checkpoint failure:
{code}
Feb 22 00:49:16 2024-02-22 00:49:04,305 WARN  
org.apache.flink.runtime.checkpoint.CheckpointFailureManager [] - Failed to 
trigger or complete checkpoint 12 for job 3c9ffc670ead2cb3c4118410cbef3b72. (0 
consecutive failed attempts so far)
Feb 22 00:49:16 org.apache.flink.runtime.checkpoint.CheckpointException: 
Checkpoint Coordinator is suspending.
Feb 22 00:49:16 at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.stopCheckpointScheduler(CheckpointCoordinator.java:2056)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
org.apache.flink.runtime.scheduler.SchedulerBase.stopCheckpointScheduler(SchedulerBase.java:960)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
org.apache.flink.runtime.scheduler.SchedulerBase.stopWithSavepoint(SchedulerBase.java:1030)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
org.apache.flink.runtime.jobmaster.JobMaster.stopWithSavepoint(JobMaster.java:901)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
Feb 22 00:49:16 at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 ~[?:?]
Feb 22 00:49:16 at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:?]
Feb 22 00:49:16 at java.lang.reflect.Method.invoke(Method.java:566) 
~[?:?]
Feb 22 00:49:16 at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRpcInvocation$1(PekkoRpcActor.java:309)
 ~[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcInvocation(PekkoRpcActor.java:307)
 ~[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:222)
 ~[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor.handleRpcMessage(FencedPekkoRpcActor.java:85)
 ~[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:168)
 ~[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) 
[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) 
[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
scala.PartialFunction.applyOrElse(PartialFunction.scala:127) 
[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) 
[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29) 
[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) 
[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547) 
[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545) 
[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar

[jira] [Created] (FLINK-34489) New File Sink end-to-end test timed out

2024-02-21 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34489:
-

 Summary: New File Sink end-to-end test timed out
 Key: FLINK-34489
 URL: https://issues.apache.org/jira/browse/FLINK-34489
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.19.0, 1.20.0
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57707&view=logs&j=af184cdd-c6d8-5084-0b69-7e9c67b35f7a&t=0f3adb59-eefa-51c6-2858-3654d9e0749d&l=3726

{code}
eb 21 07:26:03 Number of produced values 10770/6
Feb 21 07:39:50 Test (pid: 151375) did not finish after 900 seconds.
Feb 21 07:39:50 Printing Flink logs and killing it:
[...]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34488) Integrate snapshot deployment into GHA nightly workflow

2024-02-21 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34488:
-

 Summary: Integrate snapshot deployment into GHA nightly workflow
 Key: FLINK-34488
 URL: https://issues.apache.org/jira/browse/FLINK-34488
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / CI
Affects Versions: 1.18.1, 1.19.0, 1.20.0
Reporter: Matthias Pohl


Analogously to the [Azure Pipelines nightly 
config|https://github.com/apache/flink/blob/e923d4060b6dabe650a8950774d176d3e92437c2/tools/azure-pipelines/build-apache-repo.yml#L103]
 we want to deploy the snapshot artifacts in the GHA nightly workflow as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34487) Integrate tools/azure-pipelines/build-python-wheels.yml into GHA nightly workflow

2024-02-21 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34487:
-

 Summary: Integrate tools/azure-pipelines/build-python-wheels.yml 
into GHA nightly workflow
 Key: FLINK-34487
 URL: https://issues.apache.org/jira/browse/FLINK-34487
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / CI
Affects Versions: 1.18.1, 1.19.0, 1.20.0
Reporter: Matthias Pohl


Analogously to the [Azure Pipelines nightly 
config|https://github.com/apache/flink/blob/e923d4060b6dabe650a8950774d176d3e92437c2/tools/azure-pipelines/build-apache-repo.yml#L183]
 we want to generate the wheels artifacts in the GHA nightly workflow as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34486) Add documentation on how to add the shared utils as a submodule to the connector repo

2024-02-21 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34486:
-

 Summary: Add documentation on how to add the shared utils as a 
submodule to the connector repo
 Key: FLINK-34486
 URL: https://issues.apache.org/jira/browse/FLINK-34486
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Common
Affects Versions: connector-parent-1.1.0
Reporter: Matthias Pohl


[apache/flink-connector-shared-utils:README.md|https://github.com/apache/flink-connector-shared-utils/blob/release_utils/README.md]
 doesn't state how a the shared utils shall be added as a submodule to a 
connector repository. But this is expected from within [connector release 
documentation|https://cwiki.apache.org/confluence/display/FLINK/Creating+a+flink-connector+release#Creatingaflinkconnectorrelease-Buildareleasecandidate]:
{quote}
The following sections assume that the release_utils branch from 
flink-connector-shared-utils is mounted as a git submodule under 
tools/releasing/shared, you can update the submodule by running  git submodule 
update --remote (or git submodule update --init --recursive if the submodule 
wasn't initialized, yet) to use latest release utils, you need to mount the  
flink-connector-shared-utils  as a submodule under the tools/releasing/shared 
if it hasn't been mounted in the connector repository. See the README for 
details.
{quote}

Let's update the README accordingly and add a link to {{README}} in the 
connector release documentation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34475) ZooKeeperLeaderElectionDriverTest failed with exit code 2

2024-02-20 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34475:
-

 Summary: ZooKeeperLeaderElectionDriverTest failed with exit code 2
 Key: FLINK-34475
 URL: https://issues.apache.org/jira/browse/FLINK-34475
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.18.1
Reporter: Matthias Pohl


[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57649&view=logs&j=0e7be18f-84f2-53f0-a32d-4a5e4a174679&t=7c1d86e3-35bd-5fd5-3b7c-30c126a78702&l=8746]
{code:java}
Feb 20 01:20:02 01:20:02.369 [ERROR] Process Exit Code: 2
Feb 20 01:20:02 01:20:02.369 [ERROR] Crashed tests:
Feb 20 01:20:02 01:20:02.369 [ERROR] 
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionDriverTest
Feb 20 01:20:02 01:20:02.369 [ERROR]at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:748)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34464) actions/cache@v4 times out

2024-02-19 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34464:
-

 Summary: actions/cache@v4 times out
 Key: FLINK-34464
 URL: https://issues.apache.org/jira/browse/FLINK-34464
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI, Test Infrastructure
Reporter: Matthias Pohl


[https://github.com/apache/flink/actions/runs/7953599167/job/21710058433#step:4:125]

Pulling the docker image stalled. This should be a temporary issue:
{code:java}
/usr/bin/docker exec  
601a5a6e68acf3ba38940ec7a07e08d7c57e763ca0364070124f71bc2f708bc3 sh -c "cat 
/etc/*release | grep ^ID"
120Received 260046848 of 1429155280 (18.2%), 248.0 MBs/sec
121Received 545259520 of 1429155280 (38.2%), 260.0 MBs/sec
[...]
Received 914358272 of 1429155280 (64.0%), 0.0 MBs/sec
21645Received 914358272 of 1429155280 (64.0%), 0.0 MBs/sec
21646Error: The operation was canceled. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34450) TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding failed

2024-02-16 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34450:
-

 Summary: 
TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding failed
 Key: FLINK-34450
 URL: https://issues.apache.org/jira/browse/FLINK-34450
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Task
Affects Versions: 1.20.0
Reporter: Matthias Pohl


https://github.com/XComp/flink/actions/runs/7927275243/job/21643615491#step:10:9880

{code}
Error: 07:48:06 07:48:06.643 [ERROR] Tests run: 11, Failures: 1, Errors: 0, 
Skipped: 0, Time elapsed: 0.309 s <<< FAILURE! -- in 
org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest
Error: 07:48:06 07:48:06.646 [ERROR] 
org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding
 -- Time elapsed: 0.036 s <<< FAILURE!
Feb 16 07:48:06 Output was not correct.: array lengths differed, 
expected.length=8 actual.length=7; arrays first differed at element [6]; 
expected: but was:
Feb 16 07:48:06 at 
org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:78)
Feb 16 07:48:06 at 
org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:28)
Feb 16 07:48:06 at org.junit.Assert.internalArrayEquals(Assert.java:534)
Feb 16 07:48:06 at org.junit.Assert.assertArrayEquals(Assert.java:285)
Feb 16 07:48:06 at 
org.apache.flink.streaming.util.TestHarnessUtil.assertOutputEquals(TestHarnessUtil.java:59)
Feb 16 07:48:06 at 
org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding(TwoInputStreamTaskTest.java:248)
Feb 16 07:48:06 at java.lang.reflect.Method.invoke(Method.java:498)
Feb 16 07:48:06 Caused by: java.lang.AssertionError: expected: 
but was:
Feb 16 07:48:06 at org.junit.Assert.fail(Assert.java:89)
Feb 16 07:48:06 at org.junit.Assert.failNotEquals(Assert.java:835)
Feb 16 07:48:06 at org.junit.Assert.assertEquals(Assert.java:120)
Feb 16 07:48:06 at org.junit.Assert.assertEquals(Assert.java:146)
Feb 16 07:48:06 at 
org.junit.internal.ExactComparisonCriteria.assertElementsEqual(ExactComparisonCriteria.java:8)
Feb 16 07:48:06 at 
org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:76)
Feb 16 07:48:06 ... 6 more
{code}

I couldn't reproduce it locally with 2 runs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34449) Flink build took too long

2024-02-16 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34449:
-

 Summary: Flink build took too long
 Key: FLINK-34449
 URL: https://issues.apache.org/jira/browse/FLINK-34449
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI, Test Infrastructure
Reporter: Matthias Pohl


We saw a timeout when building Flink in e2e1 stage. No logs are available to 
investigate the issue:
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57551&view=logs&j=bbb1e2a2-a43c-55c8-fb48-5cfe7a8a0ca6

{code}
Nothing to show. Final logs are missing. This can happen when the job is 
cancelled or times out.
{code}

I'd consider this an infrastructure issue but created the Jira issue for 
documentation purposes. Let's see whether that pops up again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34448) ChangelogLocalRecoveryITCase failed fatally with 127 exit code

2024-02-15 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34448:
-

 Summary: ChangelogLocalRecoveryITCase failed fatally with 127 exit 
code
 Key: FLINK-34448
 URL: https://issues.apache.org/jira/browse/FLINK-34448
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.20.0
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57550&view=logs&j=2c3cbe13-dee0-5837-cf47-3053da9a8a78&t=b78d9d30-509a-5cea-1fef-db7abaa325ae&l=8897
\
{code}
Feb 16 02:43:47 02:43:47.142 [ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:3.2.2:test (integration-tests) 
on project flink-tests: 
Feb 16 02:43:47 02:43:47.142 [ERROR] 
Feb 16 02:43:47 02:43:47.142 [ERROR] Please refer to 
/__w/1/s/flink-tests/target/surefire-reports for the individual test results.
Feb 16 02:43:47 02:43:47.142 [ERROR] Please refer to dump files (if any exist) 
[date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
Feb 16 02:43:47 02:43:47.142 [ERROR] ExecutionException The forked VM 
terminated without properly saying goodbye. VM crash or System.exit called?
Feb 16 02:43:47 02:43:47.142 [ERROR] Command was /bin/sh -c cd 
'/__w/1/s/flink-tests' && '/usr/lib/jvm/jdk-11.0.19+7/bin/java' '-XX:+UseG1GC' 
'-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' 
'--add-opens=java.base/java.util=ALL-UNNAMED' 
'--add-opens=java.base/java.io=ALL-UNNAMED' '-Xmx1536m' '-jar' 
'/__w/1/s/flink-tests/target/surefire/surefirebooter-20240216015747138_560.jar' 
'/__w/1/s/flink-tests/target/surefire' '2024-02-16T01-57-43_286-jvmRun4' 
'surefire-20240216015747138_558tmp' 'surefire_185-20240216015747138_559tmp'
Feb 16 02:43:47 02:43:47.142 [ERROR] Error occurred in starting fork, check 
output in log
Feb 16 02:43:47 02:43:47.142 [ERROR] Process Exit Code: 127
Feb 16 02:43:47 02:43:47.142 [ERROR] Crashed tests:
Feb 16 02:43:47 02:43:47.142 [ERROR] 
org.apache.flink.test.checkpointing.ChangelogLocalRecoveryITCase
Feb 16 02:43:47 02:43:47.142 [ERROR] 
org.apache.maven.surefire.booter.SurefireBooterForkException: 
ExecutionException The forked VM terminated without properly saying goodbye. VM 
crash or System.exit called?
Feb 16 02:43:47 02:43:47.142 [ERROR] Command was /bin/sh -c cd 
'/__w/1/s/flink-tests' && '/usr/lib/jvm/jdk-11.0.19+7/bin/java' '-XX:+UseG1GC' 
'-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' 
'--add-opens=java.base/java.util=ALL-UNNAMED' 
'--add-opens=java.base/java.io=ALL-UNNAMED' '-Xmx1536m' '-jar' 
'/__w/1/s/flink-tests/target/surefire/surefirebooter-20240216015747138_560.jar' 
'/__w/1/s/flink-tests/target/surefire' '2024-02-16T01-57-43_286-jvmRun4' 
'surefire-20240216015747138_558tmp' 'surefire_185-20240216015747138_559tmp'
Feb 16 02:43:47 02:43:47.142 [ERROR] Error occurred in starting fork, check 
output in log
Feb 16 02:43:47 02:43:47.142 [ERROR] Process Exit Code: 127
Feb 16 02:43:47 02:43:47.142 [ERROR] Crashed tests:
Feb 16 02:43:47 02:43:47.142 [ERROR] 
org.apache.flink.test.checkpointing.ChangelogLocalRecoveryITCase
Feb 16 02:43:47 02:43:47.142 [ERROR]at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456)

{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34447) ActiveResourceManagerTest#testWorkerRegistrationTimeoutNotCountingAllocationTime still fails on slow machines

2024-02-15 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34447:
-

 Summary: 
ActiveResourceManagerTest#testWorkerRegistrationTimeoutNotCountingAllocationTime
 still fails on slow machines
 Key: FLINK-34447
 URL: https://issues.apache.org/jira/browse/FLINK-34447
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Coordination
Affects Versions: 1.18.1, 1.19.0, 1.20.0
Reporter: Matthias Pohl


This appeared in this [PR CI 
run|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57529&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=7997]
 of FLINK-34427.
{code}
Feb 14 18:50:01 18:50:01.283 [ERROR] Tests run: 18, Failures: 1, Errors: 0, 
Skipped: 0, Time elapsed: 0.665 s <<< FAILURE! -- in 
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest
Feb 14 18:50:01 18:50:01.283 [ERROR] 
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest.testWorkerRegistrationTimeoutNotCountingAllocationTime
 -- Time elapsed: 0.197 s <<< FAILURE!
Feb 14 18:50:01 java.lang.AssertionError: 
Feb 14 18:50:01 
Feb 14 18:50:01 Expecting
Feb 14 18:50:01   
Feb 14 18:50:01 not to be done.
Feb 14 18:50:01 Be aware that the state of the future in this message might not 
reflect the one at the time when the assertion was performed as it is evaluated 
later on
Feb 14 18:50:01 at 
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest$15.lambda$new$3(ActiveResourceManagerTest.java:982)
Feb 14 18:50:01 at 
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest$Context.runTest(ActiveResourceManagerTest.java:1133)
Feb 14 18:50:01 at 
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest$15.(ActiveResourceManagerTest.java:963)
Feb 14 18:50:01 at 
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest.testWorkerRegistrationTimeoutNotCountingAllocationTime(ActiveResourceManagerTest.java:946)
Feb 14 18:50:01 at java.lang.reflect.Method.invoke(Method.java:498)
Feb 14 18:50:01 at 
java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
Feb 14 18:50:01 at 
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
Feb 14 18:50:01 at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
Feb 14 18:50:01 at 
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
Feb 14 18:50:01 at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
{code}

But I was able to reproduce it locally as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34443) YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication failed when deploying job cluster

2024-02-14 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34443:
-

 Summary: 
YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication failed 
when deploying job cluster
 Key: FLINK-34443
 URL: https://issues.apache.org/jira/browse/FLINK-34443
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI, Runtime / Coordination, Test 
Infrastructure
Affects Versions: 1.19.0, 1.20.0
Reporter: Matthias Pohl


https://github.com/apache/flink/actions/runs/7895502206/job/21548246199#step:10:28804

{code}
Error: 03:04:05 03:04:05.066 [ERROR] Tests run: 2, Failures: 0, Errors: 1, 
Skipped: 0, Time elapsed: 68.10 s <<< FAILURE! -- in 
org.apache.flink.yarn.YARNFileReplicationITCase
Error: 03:04:05 03:04:05.067 [ERROR] 
org.apache.flink.yarn.YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication
 -- Time elapsed: 1.982 s <<< ERROR!
Feb 14 03:04:05 org.apache.flink.client.deployment.ClusterDeploymentException: 
Could not deploy Yarn job cluster.
Feb 14 03:04:05 at 
org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:566)
Feb 14 03:04:05 at 
org.apache.flink.yarn.YARNFileReplicationITCase.deployPerJob(YARNFileReplicationITCase.java:109)
Feb 14 03:04:05 at 
org.apache.flink.yarn.YARNFileReplicationITCase.lambda$testPerJobModeWithCustomizedFileReplication$0(YARNFileReplicationITCase.java:73)
Feb 14 03:04:05 at 
org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:303)
Feb 14 03:04:05 at 
org.apache.flink.yarn.YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication(YARNFileReplicationITCase.java:73)
Feb 14 03:04:05 at java.lang.reflect.Method.invoke(Method.java:498)
Feb 14 03:04:05 at 
java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
Feb 14 03:04:05 at 
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
Feb 14 03:04:05 at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
Feb 14 03:04:05 at 
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
Feb 14 03:04:05 at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
Feb 14 03:04:05 Caused by: 
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
/user/root/.flink/application_1707879779446_0002/log4j-api-2.17.1.jar could 
only be written to 0 of the 1 minReplication nodes. There are 2 datanode(s) 
running and 2 node(s) are excluded in this operation.
Feb 14 03:04:05 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2260)
Feb 14 03:04:05 at 
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
Feb 14 03:04:05 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2813)
Feb 14 03:04:05 at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:908)
Feb 14 03:04:05 at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:577)
Feb 14 03:04:05 at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
Feb 14 03:04:05 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:549)
Feb 14 03:04:05 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:518)
Feb 14 03:04:05 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086)
Feb 14 03:04:05 at 
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029)
Feb 14 03:04:05 at 
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:957)
Feb 14 03:04:05 at java.security.AccessController.doPrivileged(Native 
Method)
Feb 14 03:04:05 at javax.security.auth.Subject.doAs(Subject.java:422)
Feb 14 03:04:05 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
Feb 14 03:04:05 at 
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2957)
Feb 14 03:04:05 
Feb 14 03:04:05 at 
org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1579)
Feb 14 03:04:05 at org.apache.hadoop.ipc.Client.call(Client.java:1525)
Feb 14 03:04:05 at org.apache.hadoop.ipc.Client.call(Client.java:1422)
Feb 14 03:04:05 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:231)
Feb 14 03:04:05 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
Feb 14 03:04:05 at com.sun.proxy.$Proxy113.addBlock(Unknown Source)
Feb 14 03:04:05 at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.ad

[jira] [Created] (FLINK-34434) DefaultSlotStatusSyncer doesn't complete the returned future

2024-02-13 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34434:
-

 Summary: DefaultSlotStatusSyncer doesn't complete the returned 
future
 Key: FLINK-34434
 URL: https://issues.apache.org/jira/browse/FLINK-34434
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.18.1, 1.17.2, 1.19.0, 1.20.0
Reporter: Matthias Pohl


When looking into FLINK-34427 (unrelated), I noticed an odd line in 
[DefaultSlotStatusSyncer:155|https://github.com/apache/flink/blob/15fe1653acec45d7c7bac17071e9773a4aa690a4/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/DefaultSlotStatusSyncer.java#L155]
 where we complete a future that should be already completed (because the 
callback is triggered after the {{requestFuture}} is already completed in some 
way. Shouldn't we complete the {{returnedFuture}} instead?

I'm keeping the priority at {{Major}} because it doesn't seem to have been an 
issue in the past.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34433) CollectionFunctionsITCase.test failed due to job restart

2024-02-13 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34433:
-

 Summary: CollectionFunctionsITCase.test failed due to job restart
 Key: FLINK-34433
 URL: https://issues.apache.org/jira/browse/FLINK-34433
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.19.0, 1.20.0
Reporter: Matthias Pohl


https://github.com/apache/flink/actions/runs/7880739697/job/21503460772#step:10:11312

{code}
Error: 02:33:24 02:33:24.955 [ERROR] Tests run: 439, Failures: 0, Errors: 1, 
Skipped: 0, Time elapsed: 56.57 s <<< FAILURE! -- in 
org.apache.flink.table.planner.functions.CollectionFunctionsITCase
Error: 02:33:24 02:33:24.956 [ERROR] 
org.apache.flink.table.planner.functions.CollectionFunctionsITCase.test(TestCase)[81]
 -- Time elapsed: 1.141 s <<< ERROR!
Feb 13 02:33:24 java.lang.RuntimeException: Job restarted
Feb 13 02:33:24 at 
org.apache.flink.streaming.api.operators.collect.UncheckpointedCollectResultBuffer.sinkRestarted(UncheckpointedCollectResultBuffer.java:42)
Feb 13 02:33:24 at 
org.apache.flink.streaming.api.operators.collect.AbstractCollectResultBuffer.dealWithResponse(AbstractCollectResultBuffer.java:87)
Feb 13 02:33:24 at 
org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.next(CollectResultFetcher.java:124)
Feb 13 02:33:24 at 
org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:126)
Feb 13 02:33:24 at 
org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:100)
Feb 13 02:33:24 at 
org.apache.flink.table.planner.connectors.CollectDynamicSink$CloseableRowIteratorWrapper.hasNext(CollectDynamicSink.java:247)
Feb 13 02:33:24 at 
org.assertj.core.internal.Iterators.assertHasNext(Iterators.java:49)
Feb 13 02:33:24 at 
org.assertj.core.api.AbstractIteratorAssert.hasNext(AbstractIteratorAssert.java:60)
Feb 13 02:33:24 at 
org.apache.flink.table.planner.functions.BuiltInFunctionTestBase$ResultTestItem.test(BuiltInFunctionTestBase.java:383)
Feb 13 02:33:24 at 
org.apache.flink.table.planner.functions.BuiltInFunctionTestBase$TestSetSpec.lambda$getTestCase$4(BuiltInFunctionTestBase.java:341)
Feb 13 02:33:24 at 
org.apache.flink.table.planner.functions.BuiltInFunctionTestBase$TestCase.execute(BuiltInFunctionTestBase.java:119)
Feb 13 02:33:24 at 
org.apache.flink.table.planner.functions.BuiltInFunctionTestBase.test(BuiltInFunctionTestBase.java:99)
Feb 13 02:33:24 at java.lang.reflect.Method.invoke(Method.java:498)
Feb 13 02:33:24 at 
java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
Feb 13 02:33:24 at 
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
Feb 13 02:33:24 at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
Feb 13 02:33:24 at 
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
Feb 13 02:33:24 at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34428) WindowAggregateITCase#testEventTimeHopWindow_GroupingSets times out

2024-02-12 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34428:
-

 Summary: WindowAggregateITCase#testEventTimeHopWindow_GroupingSets 
times out
 Key: FLINK-34428
 URL: https://issues.apache.org/jira/browse/FLINK-34428
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Affects Versions: 1.18.1
Reporter: Matthias Pohl


https://github.com/apache/flink/actions/runs/7866453368/job/21460921339#step:10:15127

{code}
"main" #1 prio=5 os_prio=0 tid=0x7f1770cb7000 nid=0x4ad4d waiting on 
condition [0x7f17711f6000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xab48e3a0> (a 
java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
at 
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
at 
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2131)
at 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2099)
at 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2077)
at 
org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:876)
at 
org.apache.flink.table.planner.runtime.stream.sql.WindowAggregateITCase.testTumbleWindowWithoutOutputWindowColumns(WindowAggregateITCase.scala:477)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[...]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34427) ResourceManagerTaskExecutorTest fails fatally (exit code 239)

2024-02-12 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34427:
-

 Summary: ResourceManagerTaskExecutorTest fails fatally (exit code 
239)
 Key: FLINK-34427
 URL: https://issues.apache.org/jira/browse/FLINK-34427
 Project: Flink
  Issue Type: Bug
Reporter: Matthias Pohl


https://github.com/apache/flink/actions/runs/7866453350/job/21460921911#step:10:8959

{code}
Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239
Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests:
Error: 02:28:53 02:28:53.220 [ERROR] 
org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest
Error: 02:28:53 02:28:53.220 [ERROR] 
org.apache.maven.surefire.booter.SurefireBooterForkException: 
ExecutionException The forked VM terminated without properly saying goodbye. VM 
crash or System.exit called?
Error: 02:28:53 02:28:53.220 [ERROR] Command was /bin/sh -c cd 
'/root/flink/flink-runtime' && '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' 
'-XX:+UseG1GC' '-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' 
'--add-opens=java.base/java.util=ALL-UNNAMED' 
'--add-opens=java.base/java.lang=ALL-UNNAMED' 
'--add-opens=java.base/java.net=ALL-UNNAMED' 
'--add-opens=java.base/java.io=ALL-UNNAMED' 
'--add-opens=java.base/java.util.concurrent=ALL-UNNAMED' '-Xmx768m' '-jar' 
'/root/flink/flink-runtime/target/surefire/surefirebooter-20240212022332296_94.jar'
 '/root/flink/flink-runtime/target/surefire' '2024-02-12T02-21-39_495-jvmRun3' 
'surefire-20240212022332296_88tmp' 'surefire_26-20240212022332296_91tmp'
Error: 02:28:53 02:28:53.220 [ERROR] Error occurred in starting fork, check 
output in log
Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239
Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests:
Error: 02:28:53 02:28:53.221 [ERROR] 
org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest
Error: 02:28:53 02:28:53.221 [ERROR]at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456)
[...]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34426) HybridShuffleITCase.testHybridSelectiveExchangesRestart times out

2024-02-12 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34426:
-

 Summary: HybridShuffleITCase.testHybridSelectiveExchangesRestart 
times out
 Key: FLINK-34426
 URL: https://issues.apache.org/jira/browse/FLINK-34426
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network
Affects Versions: 1.18.1
Reporter: Matthias Pohl


https://github.com/apache/flink/actions/runs/7851900779/job/21429781783#step:10:9052

{code}
"ForkJoinPool-1-worker-3" #16 daemon prio=5 os_prio=0 cpu=3397.79ms 
elapsed=11462.88s tid=0x7f48966b3800 nid=0x7a303 waiting on condition  
[0x7f486e97a000]
   java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@11.0.19/Native Method)
- parking to wait for  <0xa2faa230> (a 
java.util.concurrent.CompletableFuture$Signaller)
at 
java.util.concurrent.locks.LockSupport.park(java.base@11.0.19/LockSupport.java:194)
at 
java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.19/CompletableFuture.java:1796)
at 
java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.19/ForkJoinPool.java:3118)
at 
java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.19/CompletableFuture.java:1823)
at 
java.util.concurrent.CompletableFuture.get(java.base@11.0.19/CompletableFuture.java:1998)
at 
org.apache.flink.util.AutoCloseableAsync.close(AutoCloseableAsync.java:36)
at 
org.apache.flink.test.runtime.JobGraphRunningUtil.execute(JobGraphRunningUtil.java:61)
at 
org.apache.flink.test.runtime.BatchShuffleITCaseBase.executeJob(BatchShuffleITCaseBase.java:117)
at 
org.apache.flink.test.runtime.HybridShuffleITCase.testHybridSelectiveExchangesRestart(HybridShuffleITCase.java:79)
at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.19/Native 
Method)
[...]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34425) TaskManagerRunnerITCase#testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure times out

2024-02-12 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34425:
-

 Summary: 
TaskManagerRunnerITCase#testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure
 times out
 Key: FLINK-34425
 URL: https://issues.apache.org/jira/browse/FLINK-34425
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.19.0, 1.20.0
Reporter: Matthias Pohl


https://github.com/apache/flink/actions/runs/7851900616/job/21429757962#step:10:8844

{code}
Feb 10 03:21:45 "main" #1 [498632] prio=5 os_prio=0 cpu=619.91ms 
elapsed=1653.40s tid=0x7fbd29695000 nid=498632 waiting on condition  
[0x7fbd2b9f3000]
Feb 10 03:21:45java.lang.Thread.State: WAITING (parking)
Feb 10 03:21:45 at 
jdk.internal.misc.Unsafe.park(java.base@21.0.1/Native Method)
Feb 10 03:21:45 - parking to wait for  <0xae6199f0> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
Feb 10 03:21:45 at 
java.util.concurrent.locks.LockSupport.park(java.base@21.0.1/LockSupport.java:371)
Feb 10 03:21:45 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(java.base@21.0.1/AbstractQueuedSynchronizer.java:519)
Feb 10 03:21:45 at 
java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@21.0.1/ForkJoinPool.java:3780)
Feb 10 03:21:45 at 
java.util.concurrent.ForkJoinPool.managedBlock(java.base@21.0.1/ForkJoinPool.java:3725)
Feb 10 03:21:45 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@21.0.1/AbstractQueuedSynchronizer.java:1707)
Feb 10 03:21:45 at 
java.lang.ProcessImpl.waitFor(java.base@21.0.1/ProcessImpl.java:425)
Feb 10 03:21:45 at 
org.apache.flink.test.recovery.TaskManagerRunnerITCase.testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure(TaskManagerRunnerITCase.java:126)
Feb 10 03:21:45 at 
java.lang.invoke.LambdaForm$DMH/0x7fbccb1b8000.invokeVirtual(java.base@21.0.1/LambdaForm$DMH)
Feb 10 03:21:45 at 
java.lang.invoke.LambdaForm$MH/0x7fbccb1b8800.invoke(java.base@21.0.1/LambdaForm$MH)
[...]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34424) BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times out

2024-02-11 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34424:
-

 Summary: 
BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times out
 Key: FLINK-34424
 URL: https://issues.apache.org/jira/browse/FLINK-34424
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network
Affects Versions: 1.19.0, 1.20.0
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57446&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=9151

{code}
Feb 11 13:55:29 "ForkJoinPool-50-worker-25" #414 daemon prio=5 os_prio=0 
tid=0x7f19503af800 nid=0x284c in Object.wait() [0x7f191b6db000]
Feb 11 13:55:29java.lang.Thread.State: WAITING (on object monitor)
Feb 11 13:55:29 at java.lang.Object.wait(Native Method)
Feb 11 13:55:29 at java.lang.Thread.join(Thread.java:1252)
Feb 11 13:55:29 - locked <0xe2e019a8> (a 
org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest$LongReader)
Feb 11 13:55:29 at 
org.apache.flink.core.testutils.CheckedThread.trySync(CheckedThread.java:104)
Feb 11 13:55:29 at 
org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:92)
Feb 11 13:55:29 at 
org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:81)
Feb 11 13:55:29 at 
org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest.testRead10ConsumersConcurrent(BoundedBlockingSubpartitionWriteReadTest.java:177)
Feb 11 13:55:29 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
[...]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34423) Make tool/ci/compile_ci.sh not necessarily rely on clean phase

2024-02-11 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34423:
-

 Summary: Make tool/ci/compile_ci.sh not necessarily rely on clean 
phase
 Key: FLINK-34423
 URL: https://issues.apache.org/jira/browse/FLINK-34423
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / CI
Affects Versions: 1.18.1, 1.19.0, 1.20.0
Reporter: Matthias Pohl


The GHA job {{Test packaging/licensing}} job runs 
[.github/workflows/template.flink-ci.yml:169|https://github.com/apache/flink/blob/85edd784fc72c1784849e2b122cbf3215f89817c/.github/workflows/template.flink-ci.yml#L169]
 which enables Maven's {{clean}} phase. This triggers redundant work because 
the {{Test packaging/licensing}} job wouldn't utilize the build artifacts of 
the previous {{Compile}} job but rerun the {{test-compile}} once more.

Disabling {{clean}} should improve the runtime of the {{Test 
packaging/licensing}} job.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34419) flink-docker's .github/workflows/snapshot.yml doesn't support JDK 17 and 21

2024-02-09 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34419:
-

 Summary: flink-docker's .github/workflows/snapshot.yml doesn't 
support JDK 17 and 21
 Key: FLINK-34419
 URL: https://issues.apache.org/jira/browse/FLINK-34419
 Project: Flink
  Issue Type: Technical Debt
  Components: Build System / CI
Reporter: Matthias Pohl


[.github/workflows/snapshot.yml|https://github.com/apache/flink-docker/blob/master/.github/workflows/snapshot.yml#L40]
 needs to be updated: JDK 17 support was added in 1.18 (FLINK-15736). JDK 21 
support was added in 1.19 (FLINK-33163)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34418) YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots fail

2024-02-09 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34418:
-

 Summary: 
YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots
 failed due to disk space
 Key: FLINK-34418
 URL: https://issues.apache.org/jira/browse/FLINK-34418
 Project: Flink
  Issue Type: Bug
  Components: Test Infrastructure
Affects Versions: 1.18.1, 1.19.0, 1.20.0
Reporter: Matthias Pohl


[https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746]
{code:java}
[...]
Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device
27608Feb 09 03:00:13at java.io.FileOutputStream.writeBytes(Native Method)
27609Feb 09 03:00:13at 
java.io.FileOutputStream.write(FileOutputStream.java:326)
27610Feb 09 03:00:13at 
org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250)
27611Feb 09 03:00:13... 39 more
[...] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34416) "Local recovery and sticky scheduling end-to-end test" still doesn't work with AdaptiveScheduler

2024-02-08 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34416:
-

 Summary: "Local recovery and sticky scheduling end-to-end test" 
still doesn't work with AdaptiveScheduler
 Key: FLINK-34416
 URL: https://issues.apache.org/jira/browse/FLINK-34416
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Coordination
Affects Versions: 1.18.1, 1.19.0, 1.20.0
Reporter: Matthias Pohl


We tried to enable all {{AdaptiveScheduler}}-related tests in FLINK-34409 
because it appeared that all Jira issues that were referenced are resolved. 
That's not the case for the {{"Local recovery and sticky scheduling end-to-end 
test"}} tests, though.

With the {{AdaptiveScheduler}} being enabled, we run into issues where the test 
runs forever due to a {{NullPointerException}} continuously triggering a 
failure:
{code}
Feb 07 19:02:59 2024-02-07 19:02:21,706 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - Flat Map -> 
Sink: Unnamed (3/4) 
(54075d3d22edb729e5f396726f777860_20ba6b65f97481d5570070de90e4e791_2_16292) 
switched from INITIALIZING to FAILED on localhost:40893-09ff7>
Feb 07 19:02:59 java.lang.NullPointerException: Expected to find info here.
Feb 07 19:02:59 at 
org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:76) 
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Feb 07 19:02:59 at 
org.apache.flink.streaming.tests.StickyAllocationAndLocalRecoveryTestJob$StateCreatingFlatMap.initializeState(StickyAllocationAndLocalRecoveryTestJob.java:340)
 ~[?:?]
Feb 07 19:02:59 at 
org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:187)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Feb 07 19:02:59 at 
org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:169)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Feb 07 19:02:59 at 
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Feb 07 19:02:59 at 
org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.initializeOperatorState(StreamOperatorStateHandler.java:134)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Feb 07 19:02:59 at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:285)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Feb 07 19:02:59 at 
org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Feb 07 19:02:59 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateAndGates(StreamTask.java:799)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Feb 07 19:02:59 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$restoreInternal$3(StreamTask.java:753)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Feb 07 19:02:59 at 
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Feb 07 19:02:59 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:753)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Feb 07 19:02:59 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:712)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Feb 07 19:02:59 at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Feb 07 19:02:59 at 
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) 
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Feb 07 19:02:59 at 
org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:751) 
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Feb 07 19:02:59 at 
org.apache.flink.runtime.taskmanager.Task.run(Task.java:566) 
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Feb 07 19:02:59 at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_402]
{code}

This error is caused by a Precondition in 
[StickyAllocationAndLocalRecoveryTestJob:340|https://github.com/apache/flink/blob/0f3470db83c1fddba9ac9a7299b1e61baab4ff12/flink-end-to-end-tests/flink-local-recovery-and-allocation-test/src/main/java/org/apache/flink/streaming/tests/StickyAllocationAndLocalRecoveryTestJob.java#L340]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34412) ResultPartitionDeploymentDescriptorTest fails due to fatal error (239 exit code)

2024-02-08 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34412:
-

 Summary: ResultPartitionDeploymentDescriptorTest fails due to 
fatal error (239 exit code)
 Key: FLINK-34412
 URL: https://issues.apache.org/jira/browse/FLINK-34412
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.17.2
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57388&view=logs&j=77a9d8e1-d610-59b3-fc2a-4766541e0e33&t=125e07e7-8de0-5c6c-a541-a567415af3ef&l=8323

{code}
Feb 08 04:56:31 [ERROR] 
org.apache.flink.runtime.deployment.ResultPartitionDeploymentDescriptorTest
Feb 08 04:56:31 [ERROR] 
org.apache.maven.surefire.booter.SurefireBooterForkException: 
ExecutionException The forked VM terminated without properly saying goodbye. VM 
crash or System.exit called?
Feb 08 04:56:31 [ERROR] Command was /bin/sh -c cd /__w/1/s/flink-runtime && 
/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -XX:+UseG1GC -Xms256m -Xmx768m 
-jar 
/__w/1/s/flink-runtime/target/surefire/surefirebooter6684124987290515696.jar 
/__w/1/s/flink-runtime/target/surefire 2024-02-08T04-45-49_396-jvmRun4 
surefire6142105262662423760tmp surefire_245661504424247139476tmp
Feb 08 04:56:31 [ERROR] Error occurred in starting fork, check output in log
Feb 08 04:56:31 [ERROR] Process Exit Code: 239
Feb 08 04:56:31 [ERROR] Crashed tests:
Feb 08 04:56:31 [ERROR] 
org.apache.flink.runtime.deployment.ResultPartitionDeploymentDescriptorTest
Feb 08 04:56:31 [ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:532)
Feb 08 04:56:31 [ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkOnceMultiple(ForkStarter.java:405)
Feb 08 04:56:31 [ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:321)
Feb 08 04:56:31 [ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:266)
Feb 08 04:56:31 [ERROR] at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1314)
Feb 08 04:56:31 [ERROR] at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1159)
Feb 08 04:56:31 [ERROR] at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:932)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34411) "Wordcount on Docker test (custom fs plugin)" timed out with some strange issue while setting the test up

2024-02-08 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34411:
-

 Summary: "Wordcount on Docker test (custom fs plugin)" timed out 
with some strange issue while setting the test up
 Key: FLINK-34411
 URL: https://issues.apache.org/jira/browse/FLINK-34411
 Project: Flink
  Issue Type: Bug
  Components: Test Infrastructure
Affects Versions: 1.19.0, 1.20.0
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57380&view=logs&j=bea52777-eaf8-5663-8482-18fbc3630e81&t=43ba8ce7-ebbf-57cd-9163-444305d74117&l=5802

{code}
Feb 07 15:22:39 
==
Feb 07 15:22:39 Running 'Wordcount on Docker test (custom fs plugin)'
Feb 07 15:22:39 
==
Feb 07 15:22:39 TEST_DATA_DIR: 
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-39516987853
Feb 07 15:22:40 Flink dist directory: 
/home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT
Feb 07 15:22:40 Flink dist directory: 
/home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT
Feb 07 15:22:41 Docker version 24.0.7, build afdd53b
Feb 07 15:22:44 docker-compose version 1.29.2, build 5becea4c
Feb 07 15:22:44 Starting fileserver for Flink distribution
Feb 07 15:22:44 ~/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin ~/work/1/s
Feb 07 15:23:07 ~/work/1/s
Feb 07 15:23:07 
~/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-39516987853 
~/work/1/s
Feb 07 15:23:07 Preparing Dockeriles
Feb 07 15:23:07 Executing command: git clone 
https://github.com/apache/flink-docker.git --branch dev-1.19 --single-branch
Cloning into 'flink-docker'...
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/common_docker.sh: line 
65: ./add-custom.sh: No such file or directory
Feb 07 15:23:07 Building images
ERROR: unable to prepare context: path "dev/test_docker_embedded_job-ubuntu" 
not found
Feb 07 15:23:09 ~/work/1/s
Feb 07 15:23:09 Command: build_image test_docker_embedded_job failed. 
Retrying...
Feb 07 15:23:14 Starting fileserver for Flink distribution
Feb 07 15:23:14 ~/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin ~/work/1/s
Feb 07 15:23:36 ~/work/1/s
Feb 07 15:23:36 
~/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-39516987853 
~/work/1/s
Feb 07 15:23:36 Preparing Dockeriles
Feb 07 15:23:36 Executing command: git clone 
https://github.com/apache/flink-docker.git --branch dev-1.19 --single-branch
fatal: destination path 'flink-docker' already exists and is not an empty 
directory.
Feb 07 15:23:36 Retry 1/5 exited 128, retrying in 1 seconds...
Traceback (most recent call last):
  File 
"/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/python3_fileserver.py",
 line 26, in 
httpd = socketserver.TCPServer(("", ), handler)
  File "/usr/lib/python3.8/socketserver.py", line 452, in __init__
self.server_bind()
  File "/usr/lib/python3.8/socketserver.py", line 466, in server_bind
self.socket.bind(self.server_address)
OSError: [Errno 98] Address already in use
[...]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34410) Disable nightly trigger in forks

2024-02-07 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34410:
-

 Summary: Disable nightly trigger in forks
 Key: FLINK-34410
 URL: https://issues.apache.org/jira/browse/FLINK-34410
 Project: Flink
  Issue Type: Technical Debt
  Components: Build System / CI
Affects Versions: 1.20.0
Reporter: Matthias Pohl


We can disable the automatic triggering of the nightly trigger workflow in fork 
(see [GHA 
docs|https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions]s:
{code}
if: github.repository == 'octo-org/octo-repo-prod'
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34409) Increase test coverage for AdaptiveScheduler

2024-02-07 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34409:
-

 Summary: Increase test coverage for AdaptiveScheduler
 Key: FLINK-34409
 URL: https://issues.apache.org/jira/browse/FLINK-34409
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Coordination
Affects Versions: 1.18.1, 1.19.0, 1.20.0
Reporter: Matthias Pohl


There are still several tests disabled for the {{AdaptiveScheduler}} which we 
can enable now. All the issues seem to have been fixed.

We can even remove the annotation {{@FailsWithAdaptiveScheduler}} now. It's not 
needed anymore.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34408) VeryBigPbProtoToRowTest#testSimple fails with OOM

2024-02-07 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34408:
-

 Summary: VeryBigPbProtoToRowTest#testSimple fails with OOM
 Key: FLINK-34408
 URL: https://issues.apache.org/jira/browse/FLINK-34408
 Project: Flink
  Issue Type: Bug
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.20.0
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57371&view=logs&j=fc5181b0-e452-5c8f-68de-1097947f6483&t=995c650b-6573-581c-9ce6-7ad4cc038461&l=23861

{code}
Feb 07 09:40:16 09:40:16.314 [ERROR] Tests run: 1, Failures: 0, Errors: 1, 
Skipped: 0, Time elapsed: 29.58 s <<< FAILURE! -- in 
org.apache.flink.formats.protobuf.VeryBigPbProtoToRowTest
Feb 07 09:40:16 09:40:16.314 [ERROR] 
org.apache.flink.formats.protobuf.VeryBigPbProtoToRowTest.testSimple -- Time 
elapsed: 29.57 s <<< ERROR!
Feb 07 09:40:16 org.apache.flink.util.FlinkRuntimeException: Error in 
serialization.
Feb 07 09:40:16 at 
org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327)
Feb 07 09:40:16 at 
org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162)
Feb 07 09:40:16 at 
org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007)
Feb 07 09:40:16 at 
org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56)
Feb 07 09:40:16 at 
org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45)
Feb 07 09:40:16 at 
org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61)
Feb 07 09:40:16 at 
org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104)
Feb 07 09:40:16 at 
org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81)
Feb 07 09:40:16 at 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440)
Feb 07 09:40:16 at 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421)
Feb 07 09:40:16 at 
org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495)
Feb 07 09:40:16 at 
org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382)
Feb 07 09:40:16 at 
org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367)
Feb 07 09:40:16 at 
org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66)
Feb 07 09:40:16 at 
org.apache.flink.formats.protobuf.ProtobufTestHelper.pbBytesToRow(ProtobufTestHelper.java:121)
Feb 07 09:40:16 at 
org.apache.flink.formats.protobuf.ProtobufTestHelper.pbBytesToRow(ProtobufTestHelper.java:103)
Feb 07 09:40:16 at 
org.apache.flink.formats.protobuf.ProtobufTestHelper.pbBytesToRow(ProtobufTestHelper.java:98)
Feb 07 09:40:16 at 
org.apache.flink.formats.protobuf.VeryBigPbProtoToRowTest.testSimple(VeryBigPbProtoToRowTest.java:36)
Feb 07 09:40:16 at java.lang.reflect.Method.invoke(Method.java:498)
Feb 07 09:40:16 Caused by: java.util.concurrent.ExecutionException: 
java.lang.OutOfMemoryError: Java heap space
Feb 07 09:40:16 at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
Feb 07 09:40:16 at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
Feb 07 09:40:16 at 
org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:323)
Feb 07 09:40:16 ... 18 more
Feb 07 09:40:16 Caused by: java.lang.OutOfMemoryError: Java heap space
Feb 07 09:40:16 at java.util.Arrays.copyOf(Arrays.java:3236)
Feb 07 09:40:16 at 
java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:191)
Feb 07 09:40:16 at 
org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:555)
Feb 07 09:40:16 at 
org.apache.flink.util.InstantiationUtil.writeObjectToConfig(InstantiationUtil.java:486)
Feb 07 09:40:16 at 
org.apache.flink.streaming.api.graph.StreamConfig.lambda$triggerSerializationAndReturnFuture$0(StreamConfig.java:182)
Feb 07 09:40:16 at 
org.apache.flink.streaming.api.graph.StreamConfig$$Lambda$1582/1961611609.accept(Unknown
 Source)
Feb 07 09:40:16 at 
java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670)
Feb 07 09:40:16 at 
java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:646)
Feb 07 09:40:16 at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFutu

[jira] [Created] (FLINK-34405) RightOuterJoinTaskTest#testCancelOuterJoinTaskWhileSort2 fails

2024-02-07 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34405:
-

 Summary: RightOuterJoinTaskTest#testCancelOuterJoinTaskWhileSort2 
fails
 Key: FLINK-34405
 URL: https://issues.apache.org/jira/browse/FLINK-34405
 Project: Flink
  Issue Type: Bug
  Components: API / Core
Affects Versions: 1.19.0, 1.20.0
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57357&view=logs&j=d89de3df-4600-5585-dadc-9bbc9a5e661c&t=be5a4b15-4b23-56b1-7582-795f58a645a2&l=9027

{code}
Feb 07 03:20:16 03:20:16.223 [ERROR] Failures: 
Feb 07 03:20:16 03:20:16.223 [ERROR] 
org.apache.flink.runtime.operators.RightOuterJoinTaskTest.testCancelOuterJoinTaskWhileSort2
Feb 07 03:20:16 03:20:16.223 [ERROR]   Run 1: 
RightOuterJoinTaskTest>AbstractOuterJoinTaskTest.testCancelOuterJoinTaskWhileSort2:435
 
Feb 07 03:20:16 expected: 
Feb 07 03:20:16   null
Feb 07 03:20:16  but was: 
Feb 07 03:20:16   java.lang.Exception: The data preparation caused an error: 
Interrupted
Feb 07 03:20:16 at 
org.apache.flink.runtime.operators.testutils.BinaryOperatorTestBase.testDriverInternal(BinaryOperatorTestBase.java:209)
Feb 07 03:20:16 at 
org.apache.flink.runtime.operators.testutils.BinaryOperatorTestBase.testDriver(BinaryOperatorTestBase.java:189)
Feb 07 03:20:16 at 
org.apache.flink.runtime.operators.AbstractOuterJoinTaskTest.access$100(AbstractOuterJoinTaskTest.java:48)
Feb 07 03:20:16 ...(1 remaining lines not displayed - this can be 
changed with Assertions.setMaxStackTraceElementsDisplayed)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34404) RestoreTestBase#testRestore times out

2024-02-07 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34404:
-

 Summary: RestoreTestBase#testRestore times out
 Key: FLINK-34404
 URL: https://issues.apache.org/jira/browse/FLINK-34404
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.19.0, 1.20.0
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57357&view=logs&j=32715a4c-21b8-59a3-4171-744e5ab107eb&t=ff64056b-5320-5afe-c22c-6fa339e59586&l=11603

{code}
Feb 07 02:17:40 "ForkJoinPool-74-worker-1" #382 daemon prio=5 os_prio=0 
cpu=282.22ms elapsed=961.78s tid=0x7f880a485c00 nid=0x6745 waiting on 
condition  [0x7f878a6f9000]
Feb 07 02:17:40java.lang.Thread.State: WAITING (parking)
Feb 07 02:17:40 at 
jdk.internal.misc.Unsafe.park(java.base@17.0.7/Native Method)
Feb 07 02:17:40 - parking to wait for  <0xff73d060> (a 
java.util.concurrent.CompletableFuture$Signaller)
Feb 07 02:17:40 at 
java.util.concurrent.locks.LockSupport.park(java.base@17.0.7/LockSupport.java:211)
Feb 07 02:17:40 at 
java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.7/CompletableFuture.java:1864)
Feb 07 02:17:40 at 
java.util.concurrent.ForkJoinPool.compensatedBlock(java.base@17.0.7/ForkJoinPool.java:3449)
Feb 07 02:17:40 at 
java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.7/ForkJoinPool.java:3432)
Feb 07 02:17:40 at 
java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.7/CompletableFuture.java:1898)
Feb 07 02:17:40 at 
java.util.concurrent.CompletableFuture.get(java.base@17.0.7/CompletableFuture.java:2072)
Feb 07 02:17:40 at 
org.apache.flink.table.planner.plan.nodes.exec.testutils.RestoreTestBase.testRestore(RestoreTestBase.java:292)
Feb 07 02:17:40 at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@17.0.7/Native 
Method)
Feb 07 02:17:40 at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@17.0.7/NativeMethodAccessorImpl.java:77)
Feb 07 02:17:40 at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@17.0.7/DelegatingMethodAccessorImpl.java:43)
Feb 07 02:17:40 at 
java.lang.reflect.Method.invoke(java.base@17.0.7/Method.java:568)
Feb 07 02:17:40 at 
org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728)
[...]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34361) PyFlink end-to-end test fails in GHA

2024-02-05 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34361:
-

 Summary: PyFlink end-to-end test fails in GHA
 Key: FLINK-34361
 URL: https://issues.apache.org/jira/browse/FLINK-34361
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.19.0
Reporter: Matthias Pohl


"PyFlink end-to-end test" fails:
https://github.com/apache/flink/actions/runs/7778642859/job/21208811659#step:14:7420

The only error I could identify is:
{code}
ERROR: pip's dependency resolver does not currently take into account all the 
packages that are installed. This behaviour is the source of the following 
dependency conflicts.
conda 23.5.2 requires ruamel-yaml<0.18,>=0.11.14, but you have ruamel-yaml 
0.18.5 which is incompatible.
Feb 05 03:31:54 Successfully installed apache-beam-2.48.0 avro-python3-1.10.2 
cloudpickle-2.2.1 crcmod-1.7 cython-3.0.8 dill-0.3.1.1 dnspython-2.5.0 
docopt-0.6.2 exceptiongroup-1.2.0 fastavro-1.9.3 fasteners-0.19 
find-libpython-0.3.1 grpcio-1.50.0 grpcio-tools-1.50.0 hdfs-2.7.3 
httplib2-0.22.0 iniconfig-2.0.0 numpy-1.24.4 objsize-0.6.1 orjson-3.9.13 
pandas-2.2.0 pemja-0.4.1 proto-plus-1.23.0 protobuf-4.23.4 py4j-0.10.9.7 
pyarrow-11.0.0 pydot-1.4.2 pymongo-4.6.1 pyparsing-3.1.1 pytest-7.4.4 
python-dateutil-2.8.2 pytz-2024.1 regex-2023.12.25 ruamel.yaml-0.18.5 
ruamel.yaml.clib-0.2.8 tomli-2.0.1 typing-extensions-4.9.0 tzdata-2023.4
/home/runner/work/flink/flink/flink-python/dev/.conda/lib/python3.10/site-packages/Cython/Compiler/Main.py:381:
 FutureWarning: Cython directive 'language_level' not set, using '3str' for now 
(Py3). This has changed from earlier releases! File: 
/home/runner/work/flink/flink/flink-python/pyflink/fn_execution/table/window_aggregate_fast.pxd
  tree = Parsing.p_module(s, pxd, full_module_name)
{code}
Not sure whether that's the actual cause.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34360) GHA e2e test failure due to no space left on device error

2024-02-05 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34360:
-

 Summary: GHA e2e test failure due to no space left on device error
 Key: FLINK-34360
 URL: https://issues.apache.org/jira/browse/FLINK-34360
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Matthias Pohl


https://github.com/apache/flink/actions/runs/7763815214

{code}
AdaptiveScheduler / E2E (group 2)
Process completed with exit code 1.
AdaptiveScheduler / E2E (group 2)
You are running out of disk space. The runner will stop working when the 
machine runs out of disk space. Free space left: 35 MB
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34359) "Kerberized YARN per-job on Docker test (default input)" failed due to IllegalStateException

2024-02-04 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34359:
-

 Summary: "Kerberized YARN per-job on Docker test (default input)" 
failed due to IllegalStateException
 Key: FLINK-34359
 URL: https://issues.apache.org/jira/browse/FLINK-34359
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN
Affects Versions: 1.18.1
Reporter: Matthias Pohl


This looks similar to FLINK-34357 because it's also due to some YARN issue. But 
the e2e test "Kerberized YARN per-job on Docker test (default input)" is 
causing the failure:
{code}
[...]
Exception in thread "Thread-4" java.lang.IllegalStateException: Trying to 
access closed classloader. Please check if you store classloaders directly or 
indirectly in static fields. If the stacktrace suggests that the leak occurs in 
a third party library and cannot be fixed immediately, you can disable this 
check with the configuration 'classloader.check-leaked-classloader'.
at 
org.apache.flink.util.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:184)
at 
org.apache.flink.util.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.getResource(FlinkUserCodeClassLoaders.java:208)
at 
org.apache.hadoop.conf.Configuration.getResource(Configuration.java:2570)
at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2801)
at 
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2776)
at 
org.apache.hadoop.conf.Configuration.loadProps(Configuration.java:2654)
at 
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2636)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:1100)
at 
org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1707)
at 
org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1688)
at 
org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183)
at 
org.apache.hadoop.util.ShutdownHookManager.shutdownExecutor(ShutdownHookManager.java:145)
at 
org.apache.hadoop.util.ShutdownHookManager.access$300(ShutdownHookManager.java:65)
at 
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:102)
{code}

https://github.com/apache/flink/actions/runs/7770984519/job/21191905887#step:14:11720



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34357) IllegalAnnotationsException causes "PyFlink YARN per-job on Docker test" e2e test to fail

2024-02-04 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34357:
-

 Summary: IllegalAnnotationsException causes "PyFlink YARN per-job 
on Docker test" e2e test to fail
 Key: FLINK-34357
 URL: https://issues.apache.org/jira/browse/FLINK-34357
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN
Affects Versions: 1.18.1
Reporter: Matthias Pohl


https://github.com/apache/flink/actions/runs/7763815214/job/21176570116#step:14:10009

{code}
Feb 03 03:29:04 SEVERE: Failed to generate the schema for the JAX-B elements
Feb 03 03:29:04 javax.xml.bind.JAXBException
Feb 03 03:29:04  - with linked exception:
Feb 03 03:29:04 [java.lang.reflect.InvocationTargetException]
Feb 03 03:29:04 at 
javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:262)
Feb 03 03:29:04 at 
javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:234)
[...]
Feb 03 03:29:04 at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Feb 03 03:29:04 Caused by: java.lang.reflect.InvocationTargetException
Feb 03 03:29:04 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
Feb 03 03:29:04 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Feb 03 03:29:04 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Feb 03 03:29:04 at java.lang.reflect.Method.invoke(Method.java:498)
Feb 03 03:29:04 at 
org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.ContextFactory.createContext(ContextFactory.java:44)
Feb 03 03:29:04 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
Feb 03 03:29:04 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Feb 03 03:29:04 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Feb 03 03:29:04 at java.lang.reflect.Method.invoke(Method.java:498)
Feb 03 03:29:04 at 
javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:247)
Feb 03 03:29:04 ... 57 more
Feb 03 03:29:04 Caused by: 
com.sun.xml.internal.bind.v2.runtime.IllegalAnnotationsException: 1 counts of 
IllegalAnnotationExceptions
Feb 03 03:29:04 java.util.Set is an interface, and JAXB can't handle interfaces.
Feb 03 03:29:04 this problem is related to the following location:
Feb 03 03:29:04 at java.util.Set
Feb 03 03:29:04 at public java.util.HashMap 
org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getPrimaryFiltersJAXB()
Feb 03 03:29:04 at 
org.apache.hadoop.yarn.api.records.timeline.TimelineEntity
Feb 03 03:29:04 at public java.util.List 
org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities()
Feb 03 03:29:04 at 
org.apache.hadoop.yarn.api.records.timeline.TimelineEntities
Feb 03 03:29:04 
Feb 03 03:29:04 at 
com.sun.xml.internal.bind.v2.runtime.IllegalAnnotationsException$Builder.check(IllegalAnnotationsException.java:91)
Feb 03 03:29:04 at 
com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.getTypeInfoSet(JAXBContextImpl.java:445)
Feb 03 03:29:04 at 
com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:277)
Feb 03 03:29:04 at 
com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:124)
Feb 03 03:29:04 at 
com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1123)
Feb 03 03:29:04 at 
com.sun.xml.internal.bind.v2.ContextFactory.createContext(ContextFactory.java:147)
Feb 03 03:29:04 ... 67 more
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34343) ResourceManager registration is not completed when registering the JobMaster

2024-02-02 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34343:
-

 Summary: ResourceManager registration is not completed when 
registering the JobMaster
 Key: FLINK-34343
 URL: https://issues.apache.org/jira/browse/FLINK-34343
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination, Runtime / RPC
Affects Versions: 1.19.0
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57203&view=logs&j=64debf87-ecdb-5aef-788d-8720d341b5cb&t=2302fb98-0839-5df2-3354-bbae636f81a7&l=8066

The test run failed due to a NullPointerException:
{code}
Feb 02 01:11:55 2024-02-02 01:11:47,791 INFO  
org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor   [] - The rpc 
endpoint org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager 
has not been started yet. Discarding message 
LocalFencedMessage(000
0, 
LocalRpcInvocation(ResourceManagerGateway.registerJobMaster(JobMasterId, 
ResourceID, String, JobID, Time))) until processing is started.
Feb 02 01:11:55 2024-02-02 01:11:47,797 WARN  
org.apache.flink.runtime.rpc.pekko.SupervisorActor   [] - RpcActor 
pekko://flink/user/rpc/resourcemanager_2 has failed. Shutting it down now.
Feb 02 01:11:55 java.lang.NullPointerException: Cannot invoke 
"org.apache.flink.runtime.rpc.RpcServer.getAddress()" because "this.rpcServer" 
is null
Feb 02 01:11:55 at 
org.apache.flink.runtime.rpc.RpcEndpoint.getAddress(RpcEndpoint.java:322) 
~[flink-dist-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:182)
 ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
scala.PartialFunction.applyOrElse(PartialFunction.scala:127) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at java.util.concurrent.ForkJoinTask.doExec(Unknown 
Source) ~[?:?]
Feb 02 01:11:55 at 
java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source) ~[?:?]
Feb 02 01:11:55 at java.util.concurrent.ForkJoinPool.scan(Unknown 
Source) ~[?:?]
Feb 02 01:11:55 at java.util.concurrent.ForkJoinPool.runWorker(Unknown 
Source) ~[?:?]
Feb 02 01:11:55 at 
java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source) ~[?:?]
{code}



--
This

[jira] [Created] (FLINK-34333) Fix FLINK-34007 LeaderElector bug in 1.18

2024-02-01 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34333:
-

 Summary: Fix FLINK-34007 LeaderElector bug in 1.18
 Key: FLINK-34333
 URL: https://issues.apache.org/jira/browse/FLINK-34333
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.18.1
Reporter: Matthias Pohl


FLINK-34007 revealed a bug in the k8s client v6.6.2 which we're using since 
Flink 1.18. This issue was fixed with FLINK-34007 for Flink 1.19 which required 
an update of the k8s client to v6.9.0.

This Jira issue is about finding a solution in Flink 1.18 for the very same 
problem FLINK-34007 covered. It's a dedicated Jira issue because we want to 
unblock the release of 1.19 by resolving FLINK-34007.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34332) Investigate the permissions

2024-02-01 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34332:
-

 Summary: Investigate the permissions
 Key: FLINK-34332
 URL: https://issues.apache.org/jira/browse/FLINK-34332
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / CI
Affects Versions: 1.18.1, 1.19.0
Reporter: Matthias Pohl


We're currently using {{read-all}} for our workflows. We might want to limit 
the scope and document why certain reads are needed (see [GHA 
docs|https://docs.github.com/en/actions/using-jobs/assigning-permissions-to-jobs]).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34331) Enable Apache INFRA runners for nightly builds

2024-02-01 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34331:
-

 Summary: Enable Apache INFRA runners for nightly builds
 Key: FLINK-34331
 URL: https://issues.apache.org/jira/browse/FLINK-34331
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / CI
Affects Versions: 1.18.1, 1.19.0
Reporter: Matthias Pohl


The nightly CI is currently still utilizing the GitHub runners. We want to 
switch to Apache INFRA runners or ephemeral runners.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

1 2 3 4 5 6 7 >

1 - 100 of 617 matches

Mail list logo