[jira] [Updated] (FLINK-33555) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory:

2024-02-15 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-33555:
--
Priority: Critical  (was: Major)

> LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory:
> ---
>
> Key: FLINK-33555
> URL: https://issues.apache.org/jira/browse/FLINK-33555
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> https://github.com/XComp/flink/actions/runs/6868936761/job/18680977238#step:12:13492
> {code}
> Error: 21:44:15 21:44:15.144 [ERROR]   
> LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory:119
>  [The task was deployed to AllocationID(fcf411eadbae8beed895a78ea1653046) but 
> it should have been deployed to 
> AllocationID(dec337d82b9d960004ffd73be8a2c5d5) for local recovery., The task 
> was deployed to AllocationID(a61fd8a6bc5ef9d467f32f918bdfb385) but it should 
> have been deployed to AllocationID(fcf411eadbae8beed895a78ea1653046) for 
> local recovery., The task was deployed to 
> AllocationID(dec337d82b9d960004ffd73be8a2c5d5) but it should have been 
> deployed to AllocationID(a61fd8a6bc5ef9d467f32f918bdfb385) for local 
> recovery.] ==> expected:  but was: 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33555) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory:

2024-02-15 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-33555:
--
Affects Version/s: 1.19.0
   1.20.0

> LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory:
> ---
>
> Key: FLINK-33555
> URL: https://issues.apache.org/jira/browse/FLINK-33555
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: github-actions, test-stability
>
> https://github.com/XComp/flink/actions/runs/6868936761/job/18680977238#step:12:13492
> {code}
> Error: 21:44:15 21:44:15.144 [ERROR]   
> LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory:119
>  [The task was deployed to AllocationID(fcf411eadbae8beed895a78ea1653046) but 
> it should have been deployed to 
> AllocationID(dec337d82b9d960004ffd73be8a2c5d5) for local recovery., The task 
> was deployed to AllocationID(a61fd8a6bc5ef9d467f32f918bdfb385) but it should 
> have been deployed to AllocationID(fcf411eadbae8beed895a78ea1653046) for 
> local recovery., The task was deployed to 
> AllocationID(dec337d82b9d960004ffd73be8a2c5d5) but it should have been 
> deployed to AllocationID(a61fd8a6bc5ef9d467f32f918bdfb385) for local 
> recovery.] ==> expected:  but was: 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33555) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory:

2024-02-15 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817860#comment-17817860
 ] 

Matthias Pohl commented on FLINK-33555:
---

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57550=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=15485
{code}
Feb 16 01:14:56 01:14:56.299 [ERROR] Tests run: 1, Failures: 1, Errors: 0, 
Skipped: 0, Time elapsed: 39.33 s <<< FAILURE! -- in 
org.apache.flink.test.recovery.LocalRecoveryITCase
Feb 16 01:14:56 01:14:56.299 [ERROR] 
org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory
 -- Time elapsed: 39.27 s <<< FAILURE!
Feb 16 01:14:56 org.opentest4j.AssertionFailedError: [The task was deployed to 
AllocationID(34c031bb72931f33a70b6a55fe30501c) but it should have been deployed 
to AllocationID(ee1115e87892e59107adfa6b7bfbfd13) for local recovery., The task 
was deployed to AllocationID(ee1115e87892e59107adfa6b7bfbfd13) but it should 
have been deployed to AllocationID(34c031bb72931f33a70b6a55fe30501c) for local 
recovery.] ==> expected:  but was: 
Feb 16 01:14:56 at 
org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
Feb 16 01:14:56 at 
org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
Feb 16 01:14:56 at 
org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
Feb 16 01:14:56 at 
org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
Feb 16 01:14:56 at 
org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:214)
Feb 16 01:14:56 at 
org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory(LocalRecoveryITCase.java:119)
Feb 16 01:14:56 at java.lang.reflect.Method.invoke(Method.java:498)
Feb 16 01:14:56 at 
java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
Feb 16 01:14:56 at 
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
Feb 16 01:14:56 at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
Feb 16 01:14:56 at 
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
Feb 16 01:14:56 at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
{code}

> LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory:
> ---
>
> Key: FLINK-33555
> URL: https://issues.apache.org/jira/browse/FLINK-33555
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: github-actions, test-stability
>
> https://github.com/XComp/flink/actions/runs/6868936761/job/18680977238#step:12:13492
> {code}
> Error: 21:44:15 21:44:15.144 [ERROR]   
> LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory:119
>  [The task was deployed to AllocationID(fcf411eadbae8beed895a78ea1653046) but 
> it should have been deployed to 
> AllocationID(dec337d82b9d960004ffd73be8a2c5d5) for local recovery., The task 
> was deployed to AllocationID(a61fd8a6bc5ef9d467f32f918bdfb385) but it should 
> have been deployed to AllocationID(fcf411eadbae8beed895a78ea1653046) for 
> local recovery., The task was deployed to 
> AllocationID(dec337d82b9d960004ffd73be8a2c5d5) but it should have been 
> deployed to AllocationID(a61fd8a6bc5ef9d467f32f918bdfb385) for local 
> recovery.] ==> expected:  but was: 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33555) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory:

2024-02-15 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817860#comment-17817860
 ] 

Matthias Pohl edited comment on FLINK-33555 at 2/16/24 7:57 AM:


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57550=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=15485
{code}
Feb 16 01:14:56 01:14:56.299 [ERROR] Tests run: 1, Failures: 1, Errors: 0, 
Skipped: 0, Time elapsed: 39.33 s <<< FAILURE! -- in 
org.apache.flink.test.recovery.LocalRecoveryITCase
Feb 16 01:14:56 01:14:56.299 [ERROR] 
org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory
 -- Time elapsed: 39.27 s <<< FAILURE!
Feb 16 01:14:56 org.opentest4j.AssertionFailedError: [The task was deployed to 
AllocationID(34c031bb72931f33a70b6a55fe30501c) but it should have been deployed 
to AllocationID(ee1115e87892e59107adfa6b7bfbfd13) for local recovery., The task 
was deployed to AllocationID(ee1115e87892e59107adfa6b7bfbfd13) but it should 
have been deployed to AllocationID(34c031bb72931f33a70b6a55fe30501c) for local 
recovery.] ==> expected:  but was: 
Feb 16 01:14:56 at 
org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
Feb 16 01:14:56 at 
org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
Feb 16 01:14:56 at 
org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
Feb 16 01:14:56 at 
org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
Feb 16 01:14:56 at 
org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:214)
Feb 16 01:14:56 at 
org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory(LocalRecoveryITCase.java:119)
Feb 16 01:14:56 at java.lang.reflect.Method.invoke(Method.java:498)
Feb 16 01:14:56 at 
java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
Feb 16 01:14:56 at 
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
Feb 16 01:14:56 at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
Feb 16 01:14:56 at 
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
Feb 16 01:14:56 at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
{code}

I'm moving this one out of FLINK-27075 as it appeared in Azure Pipelines as 
well.


was (Author: mapohl):
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57550=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=15485
{code}
Feb 16 01:14:56 01:14:56.299 [ERROR] Tests run: 1, Failures: 1, Errors: 0, 
Skipped: 0, Time elapsed: 39.33 s <<< FAILURE! -- in 
org.apache.flink.test.recovery.LocalRecoveryITCase
Feb 16 01:14:56 01:14:56.299 [ERROR] 
org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory
 -- Time elapsed: 39.27 s <<< FAILURE!
Feb 16 01:14:56 org.opentest4j.AssertionFailedError: [The task was deployed to 
AllocationID(34c031bb72931f33a70b6a55fe30501c) but it should have been deployed 
to AllocationID(ee1115e87892e59107adfa6b7bfbfd13) for local recovery., The task 
was deployed to AllocationID(ee1115e87892e59107adfa6b7bfbfd13) but it should 
have been deployed to AllocationID(34c031bb72931f33a70b6a55fe30501c) for local 
recovery.] ==> expected:  but was: 
Feb 16 01:14:56 at 
org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
Feb 16 01:14:56 at 
org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
Feb 16 01:14:56 at 
org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
Feb 16 01:14:56 at 
org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
Feb 16 01:14:56 at 
org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:214)
Feb 16 01:14:56 at 
org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory(LocalRecoveryITCase.java:119)
Feb 16 01:14:56 at java.lang.reflect.Method.invoke(Method.java:498)
Feb 16 01:14:56 at 
java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
Feb 16 01:14:56 at 
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
Feb 16 01:14:56 at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
Feb 16 01:14:56 at 
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
Feb 16 01:14:56 at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
{code}

> LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory:
> 

[jira] [Commented] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable

2024-02-15 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817857#comment-17817857
 ] 

Matthias Pohl commented on FLINK-22765:
---

JDK21: 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57550=logs=a657ddbf-d986-5381-9649-342d9c92e7fb=dc085d4a-05c8-580e-06ab-21f5624dab16=7806

> ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
> 
>
> Key: FLINK-22765
> URL: https://issues.apache.org/jira/browse/FLINK-22765
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0, 1.20.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: pull-request-available, stale-assigned, test-stability
> Fix For: 1.14.0, 1.16.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56
> {code}
> May 25 00:56:38 java.lang.AssertionError: 
> May 25 00:56:38 
> May 25 00:56:38 Expected: is ""
> May 25 00:56:38  but: was "The system is out of resources.\nConsult the 
> following stack trace for details."
> May 25 00:56:38   at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:956)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:923)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError(ExceptionUtilsITCase.java:70)
> May 25 00:56:38   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> May 25 00:56:38   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> May 25 00:56:38   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> May 25 00:56:38   at java.lang.reflect.Method.invoke(Method.java:498)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> May 25 00:56:38   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> May 25 00:56:38   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> May 25 00:56:38   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> May 25 00:56:38   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> May 25 00:56:38   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> May 25 00:56:38   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> May 25 00:56:38   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> May 25 00:56:38   at 
> 

[jira] [Commented] (FLINK-29114) TableSourceITCase#testTableHintWithLogicalTableScanReuse sometimes fails with result mismatch

2024-02-15 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817858#comment-17817858
 ] 

Matthias Pohl commented on FLINK-29114:
---

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57550=logs=f2c100be-250b-5e85-7bbe-176f68fcddc5=05efd11e-5400-54a4-0d27-a4663be008a9=11539

> TableSourceITCase#testTableHintWithLogicalTableScanReuse sometimes fails with 
> result mismatch 
> --
>
> Key: FLINK-29114
> URL: https://issues.apache.org/jira/browse/FLINK-29114
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner, Tests
>Affects Versions: 1.15.0, 1.19.0, 1.20.0
>Reporter: Sergey Nuyanzin
>Priority: Minor
>  Labels: auto-deprioritized-major, test-stability
>
> It could be reproduced locally by repeating tests. Usually about 100 
> iterations are enough to have several failed tests
> {noformat}
> [ERROR] Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 1.664 s <<< FAILURE! - in 
> org.apache.flink.table.planner.runtime.batch.sql.TableSourceITCase
> [ERROR] 
> org.apache.flink.table.planner.runtime.batch.sql.TableSourceITCase.testTableHintWithLogicalTableScanReuse
>   Time elapsed: 0.108 s  <<< FAILURE!
> java.lang.AssertionError: expected: 3,2,Hello world, 3,2,Hello world, 3,2,Hello world)> but was: 2,2,Hello, 2,2,Hello, 3,2,Hello world, 3,2,Hello world)>
>     at org.junit.Assert.fail(Assert.java:89)
>     at org.junit.Assert.failNotEquals(Assert.java:835)
>     at org.junit.Assert.assertEquals(Assert.java:120)
>     at org.junit.Assert.assertEquals(Assert.java:146)
>     at 
> org.apache.flink.table.planner.runtime.batch.sql.TableSourceITCase.testTableHintWithLogicalTableScanReuse(TableSourceITCase.scala:428)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>     at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>     at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>     at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>     at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
>     at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>     at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>     at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>     at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
>     at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
>     at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>     at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>     at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
>     at 
> org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42)
>     at 
> org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80)
>     at 
> org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:72)
>     at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:107)
>     at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88)
>     at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54)
>     at 
> 

[jira] [Updated] (FLINK-34448) ChangelogLocalRecoveryITCase failed fatally with 127 exit code

2024-02-15 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-34448:
--
Priority: Critical  (was: Major)

> ChangelogLocalRecoveryITCase failed fatally with 127 exit code
> --
>
> Key: FLINK-34448
> URL: https://issues.apache.org/jira/browse/FLINK-34448
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57550=logs=2c3cbe13-dee0-5837-cf47-3053da9a8a78=b78d9d30-509a-5cea-1fef-db7abaa325ae=8897
> \
> {code}
> Feb 16 02:43:47 02:43:47.142 [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:3.2.2:test (integration-tests) 
> on project flink-tests: 
> Feb 16 02:43:47 02:43:47.142 [ERROR] 
> Feb 16 02:43:47 02:43:47.142 [ERROR] Please refer to 
> /__w/1/s/flink-tests/target/surefire-reports for the individual test results.
> Feb 16 02:43:47 02:43:47.142 [ERROR] Please refer to dump files (if any 
> exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
> Feb 16 02:43:47 02:43:47.142 [ERROR] ExecutionException The forked VM 
> terminated without properly saying goodbye. VM crash or System.exit called?
> Feb 16 02:43:47 02:43:47.142 [ERROR] Command was /bin/sh -c cd 
> '/__w/1/s/flink-tests' && '/usr/lib/jvm/jdk-11.0.19+7/bin/java' 
> '-XX:+UseG1GC' '-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' 
> '--add-opens=java.base/java.util=ALL-UNNAMED' 
> '--add-opens=java.base/java.io=ALL-UNNAMED' '-Xmx1536m' '-jar' 
> '/__w/1/s/flink-tests/target/surefire/surefirebooter-20240216015747138_560.jar'
>  '/__w/1/s/flink-tests/target/surefire' '2024-02-16T01-57-43_286-jvmRun4' 
> 'surefire-20240216015747138_558tmp' 'surefire_185-20240216015747138_559tmp'
> Feb 16 02:43:47 02:43:47.142 [ERROR] Error occurred in starting fork, check 
> output in log
> Feb 16 02:43:47 02:43:47.142 [ERROR] Process Exit Code: 127
> Feb 16 02:43:47 02:43:47.142 [ERROR] Crashed tests:
> Feb 16 02:43:47 02:43:47.142 [ERROR] 
> org.apache.flink.test.checkpointing.ChangelogLocalRecoveryITCase
> Feb 16 02:43:47 02:43:47.142 [ERROR] 
> org.apache.maven.surefire.booter.SurefireBooterForkException: 
> ExecutionException The forked VM terminated without properly saying goodbye. 
> VM crash or System.exit called?
> Feb 16 02:43:47 02:43:47.142 [ERROR] Command was /bin/sh -c cd 
> '/__w/1/s/flink-tests' && '/usr/lib/jvm/jdk-11.0.19+7/bin/java' 
> '-XX:+UseG1GC' '-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' 
> '--add-opens=java.base/java.util=ALL-UNNAMED' 
> '--add-opens=java.base/java.io=ALL-UNNAMED' '-Xmx1536m' '-jar' 
> '/__w/1/s/flink-tests/target/surefire/surefirebooter-20240216015747138_560.jar'
>  '/__w/1/s/flink-tests/target/surefire' '2024-02-16T01-57-43_286-jvmRun4' 
> 'surefire-20240216015747138_558tmp' 'surefire_185-20240216015747138_559tmp'
> Feb 16 02:43:47 02:43:47.142 [ERROR] Error occurred in starting fork, check 
> output in log
> Feb 16 02:43:47 02:43:47.142 [ERROR] Process Exit Code: 127
> Feb 16 02:43:47 02:43:47.142 [ERROR] Crashed tests:
> Feb 16 02:43:47 02:43:47.142 [ERROR] 
> org.apache.flink.test.checkpointing.ChangelogLocalRecoveryITCase
> Feb 16 02:43:47 02:43:47.142 [ERROR]  at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34448) ChangelogLocalRecoveryITCase failed fatally with 127 exit code

2024-02-15 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-34448:
-

 Summary: ChangelogLocalRecoveryITCase failed fatally with 127 exit 
code
 Key: FLINK-34448
 URL: https://issues.apache.org/jira/browse/FLINK-34448
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.20.0
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57550=logs=2c3cbe13-dee0-5837-cf47-3053da9a8a78=b78d9d30-509a-5cea-1fef-db7abaa325ae=8897
\
{code}
Feb 16 02:43:47 02:43:47.142 [ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:3.2.2:test (integration-tests) 
on project flink-tests: 
Feb 16 02:43:47 02:43:47.142 [ERROR] 
Feb 16 02:43:47 02:43:47.142 [ERROR] Please refer to 
/__w/1/s/flink-tests/target/surefire-reports for the individual test results.
Feb 16 02:43:47 02:43:47.142 [ERROR] Please refer to dump files (if any exist) 
[date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
Feb 16 02:43:47 02:43:47.142 [ERROR] ExecutionException The forked VM 
terminated without properly saying goodbye. VM crash or System.exit called?
Feb 16 02:43:47 02:43:47.142 [ERROR] Command was /bin/sh -c cd 
'/__w/1/s/flink-tests' && '/usr/lib/jvm/jdk-11.0.19+7/bin/java' '-XX:+UseG1GC' 
'-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' 
'--add-opens=java.base/java.util=ALL-UNNAMED' 
'--add-opens=java.base/java.io=ALL-UNNAMED' '-Xmx1536m' '-jar' 
'/__w/1/s/flink-tests/target/surefire/surefirebooter-20240216015747138_560.jar' 
'/__w/1/s/flink-tests/target/surefire' '2024-02-16T01-57-43_286-jvmRun4' 
'surefire-20240216015747138_558tmp' 'surefire_185-20240216015747138_559tmp'
Feb 16 02:43:47 02:43:47.142 [ERROR] Error occurred in starting fork, check 
output in log
Feb 16 02:43:47 02:43:47.142 [ERROR] Process Exit Code: 127
Feb 16 02:43:47 02:43:47.142 [ERROR] Crashed tests:
Feb 16 02:43:47 02:43:47.142 [ERROR] 
org.apache.flink.test.checkpointing.ChangelogLocalRecoveryITCase
Feb 16 02:43:47 02:43:47.142 [ERROR] 
org.apache.maven.surefire.booter.SurefireBooterForkException: 
ExecutionException The forked VM terminated without properly saying goodbye. VM 
crash or System.exit called?
Feb 16 02:43:47 02:43:47.142 [ERROR] Command was /bin/sh -c cd 
'/__w/1/s/flink-tests' && '/usr/lib/jvm/jdk-11.0.19+7/bin/java' '-XX:+UseG1GC' 
'-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' 
'--add-opens=java.base/java.util=ALL-UNNAMED' 
'--add-opens=java.base/java.io=ALL-UNNAMED' '-Xmx1536m' '-jar' 
'/__w/1/s/flink-tests/target/surefire/surefirebooter-20240216015747138_560.jar' 
'/__w/1/s/flink-tests/target/surefire' '2024-02-16T01-57-43_286-jvmRun4' 
'surefire-20240216015747138_558tmp' 'surefire_185-20240216015747138_559tmp'
Feb 16 02:43:47 02:43:47.142 [ERROR] Error occurred in starting fork, check 
output in log
Feb 16 02:43:47 02:43:47.142 [ERROR] Process Exit Code: 127
Feb 16 02:43:47 02:43:47.142 [ERROR] Crashed tests:
Feb 16 02:43:47 02:43:47.142 [ERROR] 
org.apache.flink.test.checkpointing.ChangelogLocalRecoveryITCase
Feb 16 02:43:47 02:43:47.142 [ERROR]at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456)

{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34202) python tests take suspiciously long in some of the cases

2024-02-15 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817856#comment-17817856
 ] 

Matthias Pohl commented on FLINK-34202:
---

1.20 (master): 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57550=logs=bf5e383b-9fd3-5f02-ca1c-8f788e2e76d3=85189c57-d8a0-5c9c-b61d-fc05cfac62cf

> python tests take suspiciously long in some of the cases
> 
>
> Key: FLINK-34202
> URL: https://issues.apache.org/jira/browse/FLINK-34202
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.17.2, 1.19.0, 1.18.1
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> [This release-1.18 
> build|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56603=logs=3e4dd1a2-fe2f-5e5d-a581-48087e718d53=b4612f28-e3b5-5853-8a8b-610ae894217a]
>  has the python stage running into a timeout without any obvious reason. The 
> [python stage run for 
> JDK17|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56603=logs=b53e1644-5cb4-5a3b-5d48-f523f39bcf06]
>  was also getting close to the 4h timeout.
> I'm creating this issue for documentation purposes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable

2024-02-15 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817683#comment-17817683
 ] 

Matthias Pohl edited comment on FLINK-22765 at 2/15/24 2:06 PM:


The following error is reported:
{code}
The system is out of resources.
Consult the following stack trace for details.
java.lang.OutOfMemoryError: Metaspace
at jdk.compiler/com.sun.tools.javac.comp.Flow.analyzeTree(Flow.java:233)
at 
jdk.compiler/com.sun.tools.javac.main.JavaCompiler.flow(JavaCompiler.java:1419)
at 
jdk.compiler/com.sun.tools.javac.main.JavaCompiler.flow(JavaCompiler.java:1393)
at 
jdk.compiler/com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:976)
at jdk.compiler/com.sun.tools.javac.main.Main.compile(Main.java:319)
at jdk.compiler/com.sun.tools.javac.main.Main.compile(Main.java:178)
at jdk.compiler/com.sun.tools.javac.Main.compile(Main.java:82)
at 
jdk.compiler/com.sun.tools.javac.api.JavacTool.run(JavacTool.java:214)
at 
org.apache.flink.testutils.ClassLoaderUtils.compileClass(ClassLoaderUtils.java:83)
at 
org.apache.flink.testutils.ClassLoaderUtils.writeAndCompile(ClassLoaderUtils.java:62)
at 
org.apache.flink.testutils.ClassLoaderUtils.access$100(ClassLoaderUtils.java:46)
at 
org.apache.flink.testutils.ClassLoaderUtils$ClassLoaderBuilder.build(ClassLoaderUtils.java:163)
at 
org.apache.flink.runtime.util.ExceptionUtilsITCase$DummyClassLoadingProgram.loadDummyClass(ExceptionUtilsITCase.java:180)
at 
org.apache.flink.runtime.util.ExceptionUtilsITCase$DummyClassLoadingProgram.main(ExceptionUtilsITCase.java:159)
{code}

That explains why other sources (e.g. [this SO 
post|https://stackoverflow.com/a/39509720]) state that the error message "The 
system is out of resources." indicates an error while compiling the classes. 
That happens in the ClassloaderUtils. The heap size is just not sufficient for 
compiling the classes.


was (Author: mapohl):
The following error is reported:
{code}
The system is out of resources.
Consult the following stack trace for details.
java.lang.OutOfMemoryError: Metaspace
at jdk.compiler/com.sun.tools.javac.comp.Flow.analyzeTree(Flow.java:233)
at 
jdk.compiler/com.sun.tools.javac.main.JavaCompiler.flow(JavaCompiler.java:1419)
at 
jdk.compiler/com.sun.tools.javac.main.JavaCompiler.flow(JavaCompiler.java:1393)
at 
jdk.compiler/com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:976)
at jdk.compiler/com.sun.tools.javac.main.Main.compile(Main.java:319)
at jdk.compiler/com.sun.tools.javac.main.Main.compile(Main.java:178)
at jdk.compiler/com.sun.tools.javac.Main.compile(Main.java:82)
at 
jdk.compiler/com.sun.tools.javac.api.JavacTool.run(JavacTool.java:214)
at 
org.apache.flink.testutils.ClassLoaderUtils.compileClass(ClassLoaderUtils.java:83)
at 
org.apache.flink.testutils.ClassLoaderUtils.writeAndCompile(ClassLoaderUtils.java:62)
at 
org.apache.flink.testutils.ClassLoaderUtils.access$100(ClassLoaderUtils.java:46)
at 
org.apache.flink.testutils.ClassLoaderUtils$ClassLoaderBuilder.build(ClassLoaderUtils.java:163)
at 
org.apache.flink.runtime.util.ExceptionUtilsITCase$DummyClassLoadingProgram.loadDummyClass(ExceptionUtilsITCase.java:180)
at 
org.apache.flink.runtime.util.ExceptionUtilsITCase$DummyClassLoadingProgram.main(ExceptionUtilsITCase.java:159)
{code}

> ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
> 
>
> Key: FLINK-22765
> URL: https://issues.apache.org/jira/browse/FLINK-22765
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0, 1.20.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: pull-request-available, stale-assigned, test-stability
> Fix For: 1.14.0, 1.16.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56
> {code}
> May 25 00:56:38 java.lang.AssertionError: 
> May 25 00:56:38 
> May 25 00:56:38 Expected: is ""
> May 25 00:56:38  but: was "The system is out of resources.\nConsult the 
> following stack trace for details."
> May 25 00:56:38   at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:956)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:923)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94)
> May 25 00:56:38   at 
> 

[jira] [Commented] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable

2024-02-15 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817683#comment-17817683
 ] 

Matthias Pohl commented on FLINK-22765:
---

The following error is reported:
{code}
The system is out of resources.
Consult the following stack trace for details.
java.lang.OutOfMemoryError: Metaspace
at jdk.compiler/com.sun.tools.javac.comp.Flow.analyzeTree(Flow.java:233)
at 
jdk.compiler/com.sun.tools.javac.main.JavaCompiler.flow(JavaCompiler.java:1419)
at 
jdk.compiler/com.sun.tools.javac.main.JavaCompiler.flow(JavaCompiler.java:1393)
at 
jdk.compiler/com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:976)
at jdk.compiler/com.sun.tools.javac.main.Main.compile(Main.java:319)
at jdk.compiler/com.sun.tools.javac.main.Main.compile(Main.java:178)
at jdk.compiler/com.sun.tools.javac.Main.compile(Main.java:82)
at 
jdk.compiler/com.sun.tools.javac.api.JavacTool.run(JavacTool.java:214)
at 
org.apache.flink.testutils.ClassLoaderUtils.compileClass(ClassLoaderUtils.java:83)
at 
org.apache.flink.testutils.ClassLoaderUtils.writeAndCompile(ClassLoaderUtils.java:62)
at 
org.apache.flink.testutils.ClassLoaderUtils.access$100(ClassLoaderUtils.java:46)
at 
org.apache.flink.testutils.ClassLoaderUtils$ClassLoaderBuilder.build(ClassLoaderUtils.java:163)
at 
org.apache.flink.runtime.util.ExceptionUtilsITCase$DummyClassLoadingProgram.loadDummyClass(ExceptionUtilsITCase.java:180)
at 
org.apache.flink.runtime.util.ExceptionUtilsITCase$DummyClassLoadingProgram.main(ExceptionUtilsITCase.java:159)
{code}

> ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
> 
>
> Key: FLINK-22765
> URL: https://issues.apache.org/jira/browse/FLINK-22765
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0, 1.20.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: pull-request-available, stale-assigned, test-stability
> Fix For: 1.14.0, 1.16.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56
> {code}
> May 25 00:56:38 java.lang.AssertionError: 
> May 25 00:56:38 
> May 25 00:56:38 Expected: is ""
> May 25 00:56:38  but: was "The system is out of resources.\nConsult the 
> following stack trace for details."
> May 25 00:56:38   at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:956)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:923)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError(ExceptionUtilsITCase.java:70)
> May 25 00:56:38   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> May 25 00:56:38   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> May 25 00:56:38   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> May 25 00:56:38   at java.lang.reflect.Method.invoke(Method.java:498)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> May 25 00:56:38   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> May 25 00:56:38   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> May 25 00:56:38   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> May 25 00:56:38   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> May 25 00:56:38   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> May 25 00:56:38   at 
> 

[jira] [Comment Edited] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable

2024-02-15 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817676#comment-17817676
 ] 

Matthias Pohl edited comment on FLINK-22765 at 2/15/24 1:50 PM:


The issue seems to be related with the JDK version. We've seen one failure with 
JDK11 
([20231014.2|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53728=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=11038])
 and two with JDK17 ([GHA build 
#46|https://github.com/XComp/flink/actions/runs/7057414894/job/19211346164#step:12:7308],
 [GHA build 
#61|https://github.com/XComp/flink/actions/runs/7095339465/job/19312311325#step:12:8783]).
 All the other failure happened with JDK21.

Indeed, it's reproducible locally with JDK21. That would also explain why we're 
seeing this error more often recently with the introduction of JDK21.


was (Author: mapohl):
The issue seems to be related with the JDK version. We've seen one failure with 
JDK11 
([20231014.2|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53728=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=11038])
 and two with JDK17 ([GHA build 
#46|https://github.com/XComp/flink/actions/runs/7057414894/job/19211346164#step:12:7308],
 [GHA build 
#61|https://github.com/XComp/flink/actions/runs/7095339465/job/19312311325#step:12:8783]).
 All the other failure happened with JDK21.

Indeed, it's reproducible locally with JDK21.

> ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
> 
>
> Key: FLINK-22765
> URL: https://issues.apache.org/jira/browse/FLINK-22765
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0, 1.20.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: pull-request-available, stale-assigned, test-stability
> Fix For: 1.14.0, 1.16.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56
> {code}
> May 25 00:56:38 java.lang.AssertionError: 
> May 25 00:56:38 
> May 25 00:56:38 Expected: is ""
> May 25 00:56:38  but: was "The system is out of resources.\nConsult the 
> following stack trace for details."
> May 25 00:56:38   at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:956)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:923)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError(ExceptionUtilsITCase.java:70)
> May 25 00:56:38   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> May 25 00:56:38   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> May 25 00:56:38   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> May 25 00:56:38   at java.lang.reflect.Method.invoke(Method.java:498)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> May 25 00:56:38   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> May 25 00:56:38   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> May 25 00:56:38   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> May 25 00:56:38   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> May 25 00:56:38   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> May 25 00:56:38   at 
> 

[jira] [Comment Edited] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable

2024-02-15 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817676#comment-17817676
 ] 

Matthias Pohl edited comment on FLINK-22765 at 2/15/24 1:49 PM:


The issue seems to be related with the JDK version. We've seen one failure with 
JDK11 
([20231014.2|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53728=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=11038])
 and two with JDK17 ([GHA build 
#46|https://github.com/XComp/flink/actions/runs/7057414894/job/19211346164#step:12:7308],
 [GHA build 
#61|https://github.com/XComp/flink/actions/runs/7095339465/job/19312311325#step:12:8783]).
 All the other failure happened with JDK21.

Indeed, it's reproducible locally with JDK21.


was (Author: mapohl):
The issue seems to be related with the JDK version. We've seen one failure with 
JDK11 
([20231014.2|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53728=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=11038])
 and two with JDK17 ([GHA build 
#46|https://github.com/XComp/flink/actions/runs/7057414894/job/19211346164#step:12:7308],
 [GHA build 
#61|https://github.com/XComp/flink/actions/runs/7095339465/job/19312311325#step:12:8783]).
 All the other failure happened with JDK21.

Indeed, it's reproducible locally.

> ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
> 
>
> Key: FLINK-22765
> URL: https://issues.apache.org/jira/browse/FLINK-22765
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0, 1.20.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: pull-request-available, stale-assigned, test-stability
> Fix For: 1.14.0, 1.16.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56
> {code}
> May 25 00:56:38 java.lang.AssertionError: 
> May 25 00:56:38 
> May 25 00:56:38 Expected: is ""
> May 25 00:56:38  but: was "The system is out of resources.\nConsult the 
> following stack trace for details."
> May 25 00:56:38   at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:956)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:923)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError(ExceptionUtilsITCase.java:70)
> May 25 00:56:38   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> May 25 00:56:38   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> May 25 00:56:38   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> May 25 00:56:38   at java.lang.reflect.Method.invoke(Method.java:498)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> May 25 00:56:38   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> May 25 00:56:38   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> May 25 00:56:38   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> May 25 00:56:38   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> May 25 00:56:38   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> May 25 00:56:38   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> May 

[jira] [Comment Edited] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable

2024-02-15 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817676#comment-17817676
 ] 

Matthias Pohl edited comment on FLINK-22765 at 2/15/24 1:49 PM:


The issue seems to be related with the JDK version. We've seen one failure with 
JDK11 
([20231014.2|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53728=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=11038])
 and two with JDK17 ([GHA build 
#46|https://github.com/XComp/flink/actions/runs/7057414894/job/19211346164#step:12:7308],
 [GHA build 
#61|https://github.com/XComp/flink/actions/runs/7095339465/job/19312311325#step:12:8783]).
 All the other failure happened with JDK21.

Indeed, it's reproducible locally.


was (Author: mapohl):
The issue seems to be related with the JDK version. We've seen one failure with 
JDK11 
([20231014.2|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53728=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=11038])
 and two with JDK17 ([GHA build 
#46|https://github.com/XComp/flink/actions/runs/7057414894/job/19211346164#step:12:7308],
 [GHA build 
#61|https://github.com/XComp/flink/actions/runs/7095339465/job/19312311325#step:12:8783]).
 All the other failure happened with JDK21

> ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
> 
>
> Key: FLINK-22765
> URL: https://issues.apache.org/jira/browse/FLINK-22765
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0, 1.20.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: pull-request-available, stale-assigned, test-stability
> Fix For: 1.14.0, 1.16.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56
> {code}
> May 25 00:56:38 java.lang.AssertionError: 
> May 25 00:56:38 
> May 25 00:56:38 Expected: is ""
> May 25 00:56:38  but: was "The system is out of resources.\nConsult the 
> following stack trace for details."
> May 25 00:56:38   at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:956)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:923)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError(ExceptionUtilsITCase.java:70)
> May 25 00:56:38   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> May 25 00:56:38   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> May 25 00:56:38   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> May 25 00:56:38   at java.lang.reflect.Method.invoke(Method.java:498)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> May 25 00:56:38   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> May 25 00:56:38   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> May 25 00:56:38   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> May 25 00:56:38   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> May 25 00:56:38   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> May 25 00:56:38   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> May 25 00:56:38   at 

[jira] [Commented] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable

2024-02-15 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817676#comment-17817676
 ] 

Matthias Pohl commented on FLINK-22765:
---

The issue seems to be related with the JDK version. We've seen one failure with 
JDK11 
([20231014.2|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53728=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=11038])
 and two with JDK17 ([GHA build 
#46|https://github.com/XComp/flink/actions/runs/7057414894/job/19211346164#step:12:7308],
 [GHA build 
#61|https://github.com/XComp/flink/actions/runs/7095339465/job/19312311325#step:12:8783]).

> ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
> 
>
> Key: FLINK-22765
> URL: https://issues.apache.org/jira/browse/FLINK-22765
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0, 1.20.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: pull-request-available, stale-assigned, test-stability
> Fix For: 1.14.0, 1.16.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56
> {code}
> May 25 00:56:38 java.lang.AssertionError: 
> May 25 00:56:38 
> May 25 00:56:38 Expected: is ""
> May 25 00:56:38  but: was "The system is out of resources.\nConsult the 
> following stack trace for details."
> May 25 00:56:38   at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:956)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:923)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError(ExceptionUtilsITCase.java:70)
> May 25 00:56:38   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> May 25 00:56:38   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> May 25 00:56:38   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> May 25 00:56:38   at java.lang.reflect.Method.invoke(Method.java:498)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> May 25 00:56:38   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> May 25 00:56:38   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> May 25 00:56:38   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> May 25 00:56:38   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> May 25 00:56:38   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> May 25 00:56:38   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> May 25 00:56:38   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> May 25 00:56:38   at 
> 

[jira] [Comment Edited] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable

2024-02-15 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817676#comment-17817676
 ] 

Matthias Pohl edited comment on FLINK-22765 at 2/15/24 1:34 PM:


The issue seems to be related with the JDK version. We've seen one failure with 
JDK11 
([20231014.2|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53728=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=11038])
 and two with JDK17 ([GHA build 
#46|https://github.com/XComp/flink/actions/runs/7057414894/job/19211346164#step:12:7308],
 [GHA build 
#61|https://github.com/XComp/flink/actions/runs/7095339465/job/19312311325#step:12:8783]).
 All the other failure happened with JDK21


was (Author: mapohl):
The issue seems to be related with the JDK version. We've seen one failure with 
JDK11 
([20231014.2|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53728=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=11038])
 and two with JDK17 ([GHA build 
#46|https://github.com/XComp/flink/actions/runs/7057414894/job/19211346164#step:12:7308],
 [GHA build 
#61|https://github.com/XComp/flink/actions/runs/7095339465/job/19312311325#step:12:8783]).

> ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
> 
>
> Key: FLINK-22765
> URL: https://issues.apache.org/jira/browse/FLINK-22765
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0, 1.20.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: pull-request-available, stale-assigned, test-stability
> Fix For: 1.14.0, 1.16.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56
> {code}
> May 25 00:56:38 java.lang.AssertionError: 
> May 25 00:56:38 
> May 25 00:56:38 Expected: is ""
> May 25 00:56:38  but: was "The system is out of resources.\nConsult the 
> following stack trace for details."
> May 25 00:56:38   at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:956)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:923)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError(ExceptionUtilsITCase.java:70)
> May 25 00:56:38   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> May 25 00:56:38   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> May 25 00:56:38   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> May 25 00:56:38   at java.lang.reflect.Method.invoke(Method.java:498)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> May 25 00:56:38   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> May 25 00:56:38   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> May 25 00:56:38   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> May 25 00:56:38   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> May 25 00:56:38   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> May 25 00:56:38   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> May 25 00:56:38   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> May 25 00:56:38   at 
> 

[jira] [Commented] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable

2024-02-15 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817644#comment-17817644
 ] 

Matthias Pohl commented on FLINK-22765:
---

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57535=logs=a657ddbf-d986-5381-9649-342d9c92e7fb=dc085d4a-05c8-580e-06ab-21f5624dab16=8999

> ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
> 
>
> Key: FLINK-22765
> URL: https://issues.apache.org/jira/browse/FLINK-22765
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0, 1.20.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: pull-request-available, stale-assigned, test-stability
> Fix For: 1.14.0, 1.16.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56
> {code}
> May 25 00:56:38 java.lang.AssertionError: 
> May 25 00:56:38 
> May 25 00:56:38 Expected: is ""
> May 25 00:56:38  but: was "The system is out of resources.\nConsult the 
> following stack trace for details."
> May 25 00:56:38   at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:956)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:923)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError(ExceptionUtilsITCase.java:70)
> May 25 00:56:38   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> May 25 00:56:38   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> May 25 00:56:38   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> May 25 00:56:38   at java.lang.reflect.Method.invoke(Method.java:498)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> May 25 00:56:38   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> May 25 00:56:38   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> May 25 00:56:38   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> May 25 00:56:38   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> May 25 00:56:38   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> May 25 00:56:38   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> May 25 00:56:38   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> May 25 00:56:38   at 
> 

[jira] [Assigned] (FLINK-34447) ActiveResourceManagerTest#testWorkerRegistrationTimeoutNotCountingAllocationTime still fails on slow machines

2024-02-15 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl reassigned FLINK-34447:
-

Assignee: Matthias Pohl

> ActiveResourceManagerTest#testWorkerRegistrationTimeoutNotCountingAllocationTime
>  still fails on slow machines
> -
>
> Key: FLINK-34447
> URL: https://issues.apache.org/jira/browse/FLINK-34447
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Major
>  Labels: pull-request-available
>
> This appeared in this [PR CI 
> run|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57529=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=7997]
>  of FLINK-34427.
> {code}
> Feb 14 18:50:01 18:50:01.283 [ERROR] Tests run: 18, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 0.665 s <<< FAILURE! -- in 
> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest
> Feb 14 18:50:01 18:50:01.283 [ERROR] 
> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest.testWorkerRegistrationTimeoutNotCountingAllocationTime
>  -- Time elapsed: 0.197 s <<< FAILURE!
> Feb 14 18:50:01 java.lang.AssertionError: 
> Feb 14 18:50:01 
> Feb 14 18:50:01 Expecting
> Feb 14 18:50:0170e6587e5e4ba9f310031a96bdda2971]>
> Feb 14 18:50:01 not to be done.
> Feb 14 18:50:01 Be aware that the state of the future in this message might 
> not reflect the one at the time when the assertion was performed as it is 
> evaluated later on
> Feb 14 18:50:01   at 
> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest$15.lambda$new$3(ActiveResourceManagerTest.java:982)
> Feb 14 18:50:01   at 
> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest$Context.runTest(ActiveResourceManagerTest.java:1133)
> Feb 14 18:50:01   at 
> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest$15.(ActiveResourceManagerTest.java:963)
> Feb 14 18:50:01   at 
> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest.testWorkerRegistrationTimeoutNotCountingAllocationTime(ActiveResourceManagerTest.java:946)
> Feb 14 18:50:01   at java.lang.reflect.Method.invoke(Method.java:498)
> Feb 14 18:50:01   at 
> java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
> Feb 14 18:50:01   at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> Feb 14 18:50:01   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> Feb 14 18:50:01   at 
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> Feb 14 18:50:01   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> {code}
> But I was able to reproduce it locally as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34447) ActiveResourceManagerTest#testWorkerRegistrationTimeoutNotCountingAllocationTime still fails on slow machines

2024-02-15 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-34447:
-

 Summary: 
ActiveResourceManagerTest#testWorkerRegistrationTimeoutNotCountingAllocationTime
 still fails on slow machines
 Key: FLINK-34447
 URL: https://issues.apache.org/jira/browse/FLINK-34447
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Coordination
Affects Versions: 1.18.1, 1.19.0, 1.20.0
Reporter: Matthias Pohl


This appeared in this [PR CI 
run|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57529=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=7997]
 of FLINK-34427.
{code}
Feb 14 18:50:01 18:50:01.283 [ERROR] Tests run: 18, Failures: 1, Errors: 0, 
Skipped: 0, Time elapsed: 0.665 s <<< FAILURE! -- in 
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest
Feb 14 18:50:01 18:50:01.283 [ERROR] 
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest.testWorkerRegistrationTimeoutNotCountingAllocationTime
 -- Time elapsed: 0.197 s <<< FAILURE!
Feb 14 18:50:01 java.lang.AssertionError: 
Feb 14 18:50:01 
Feb 14 18:50:01 Expecting
Feb 14 18:50:01   
Feb 14 18:50:01 not to be done.
Feb 14 18:50:01 Be aware that the state of the future in this message might not 
reflect the one at the time when the assertion was performed as it is evaluated 
later on
Feb 14 18:50:01 at 
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest$15.lambda$new$3(ActiveResourceManagerTest.java:982)
Feb 14 18:50:01 at 
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest$Context.runTest(ActiveResourceManagerTest.java:1133)
Feb 14 18:50:01 at 
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest$15.(ActiveResourceManagerTest.java:963)
Feb 14 18:50:01 at 
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest.testWorkerRegistrationTimeoutNotCountingAllocationTime(ActiveResourceManagerTest.java:946)
Feb 14 18:50:01 at java.lang.reflect.Method.invoke(Method.java:498)
Feb 14 18:50:01 at 
java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
Feb 14 18:50:01 at 
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
Feb 14 18:50:01 at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
Feb 14 18:50:01 at 
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
Feb 14 18:50:01 at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
{code}

But I was able to reproduce it locally as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32523) NotifyCheckpointAbortedITCase.testNotifyCheckpointAborted fails with timeout on AZP

2024-02-15 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817604#comment-17817604
 ] 

Matthias Pohl commented on FLINK-32523:
---

1.17: 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57534=logs=2c3cbe13-dee0-5837-cf47-3053da9a8a78=b78d9d30-509a-5cea-1fef-db7abaa325ae=7946

> NotifyCheckpointAbortedITCase.testNotifyCheckpointAborted fails with timeout 
> on AZP
> ---
>
> Key: FLINK-32523
> URL: https://issues.apache.org/jira/browse/FLINK-32523
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.16.2, 1.18.0, 1.17.1, 1.19.0
>Reporter: Sergey Nuyanzin
>Assignee: Hangxiang Yu
>Priority: Critical
>  Labels: pull-request-available, stale-assigned, test-stability
> Attachments: failure.log
>
>
> This build
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=50795=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=8638
>  fails with timeout
> {noformat}
> Jul 03 01:26:35 org.junit.runners.model.TestTimedOutException: test timed out 
> after 10 milliseconds
> Jul 03 01:26:35   at java.lang.Object.wait(Native Method)
> Jul 03 01:26:35   at java.lang.Object.wait(Object.java:502)
> Jul 03 01:26:35   at 
> org.apache.flink.core.testutils.OneShotLatch.await(OneShotLatch.java:61)
> Jul 03 01:26:35   at 
> org.apache.flink.test.checkpointing.NotifyCheckpointAbortedITCase.verifyAllOperatorsNotifyAborted(NotifyCheckpointAbortedITCase.java:198)
> Jul 03 01:26:35   at 
> org.apache.flink.test.checkpointing.NotifyCheckpointAbortedITCase.testNotifyCheckpointAborted(NotifyCheckpointAbortedITCase.java:189)
> Jul 03 01:26:35   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Jul 03 01:26:35   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Jul 03 01:26:35   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Jul 03 01:26:35   at java.lang.reflect.Method.invoke(Method.java:498)
> Jul 03 01:26:35   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> Jul 03 01:26:35   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Jul 03 01:26:35   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> Jul 03 01:26:35   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Jul 03 01:26:35   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
> Jul 03 01:26:35   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
> Jul 03 01:26:35   at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> Jul 03 01:26:35   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-29114) TableSourceITCase#testTableHintWithLogicalTableScanReuse sometimes fails with result mismatch

2024-02-15 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-29114:
--
Affects Version/s: 1.20.0

> TableSourceITCase#testTableHintWithLogicalTableScanReuse sometimes fails with 
> result mismatch 
> --
>
> Key: FLINK-29114
> URL: https://issues.apache.org/jira/browse/FLINK-29114
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner, Tests
>Affects Versions: 1.15.0, 1.19.0, 1.20.0
>Reporter: Sergey Nuyanzin
>Priority: Minor
>  Labels: auto-deprioritized-major, test-stability
>
> It could be reproduced locally by repeating tests. Usually about 100 
> iterations are enough to have several failed tests
> {noformat}
> [ERROR] Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 1.664 s <<< FAILURE! - in 
> org.apache.flink.table.planner.runtime.batch.sql.TableSourceITCase
> [ERROR] 
> org.apache.flink.table.planner.runtime.batch.sql.TableSourceITCase.testTableHintWithLogicalTableScanReuse
>   Time elapsed: 0.108 s  <<< FAILURE!
> java.lang.AssertionError: expected: 3,2,Hello world, 3,2,Hello world, 3,2,Hello world)> but was: 2,2,Hello, 2,2,Hello, 3,2,Hello world, 3,2,Hello world)>
>     at org.junit.Assert.fail(Assert.java:89)
>     at org.junit.Assert.failNotEquals(Assert.java:835)
>     at org.junit.Assert.assertEquals(Assert.java:120)
>     at org.junit.Assert.assertEquals(Assert.java:146)
>     at 
> org.apache.flink.table.planner.runtime.batch.sql.TableSourceITCase.testTableHintWithLogicalTableScanReuse(TableSourceITCase.scala:428)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>     at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>     at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>     at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>     at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
>     at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>     at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>     at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>     at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
>     at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
>     at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>     at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>     at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
>     at 
> org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42)
>     at 
> org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80)
>     at 
> org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:72)
>     at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:107)
>     at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88)
>     at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54)
>     at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:67)
>     at 
> 

[jira] [Commented] (FLINK-29114) TableSourceITCase#testTableHintWithLogicalTableScanReuse sometimes fails with result mismatch

2024-02-15 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817603#comment-17817603
 ] 

Matthias Pohl commented on FLINK-29114:
---

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57533=logs=f2c100be-250b-5e85-7bbe-176f68fcddc5=05efd11e-5400-54a4-0d27-a4663be008a9=11541

> TableSourceITCase#testTableHintWithLogicalTableScanReuse sometimes fails with 
> result mismatch 
> --
>
> Key: FLINK-29114
> URL: https://issues.apache.org/jira/browse/FLINK-29114
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner, Tests
>Affects Versions: 1.15.0, 1.19.0
>Reporter: Sergey Nuyanzin
>Priority: Minor
>  Labels: auto-deprioritized-major, test-stability
>
> It could be reproduced locally by repeating tests. Usually about 100 
> iterations are enough to have several failed tests
> {noformat}
> [ERROR] Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 1.664 s <<< FAILURE! - in 
> org.apache.flink.table.planner.runtime.batch.sql.TableSourceITCase
> [ERROR] 
> org.apache.flink.table.planner.runtime.batch.sql.TableSourceITCase.testTableHintWithLogicalTableScanReuse
>   Time elapsed: 0.108 s  <<< FAILURE!
> java.lang.AssertionError: expected: 3,2,Hello world, 3,2,Hello world, 3,2,Hello world)> but was: 2,2,Hello, 2,2,Hello, 3,2,Hello world, 3,2,Hello world)>
>     at org.junit.Assert.fail(Assert.java:89)
>     at org.junit.Assert.failNotEquals(Assert.java:835)
>     at org.junit.Assert.assertEquals(Assert.java:120)
>     at org.junit.Assert.assertEquals(Assert.java:146)
>     at 
> org.apache.flink.table.planner.runtime.batch.sql.TableSourceITCase.testTableHintWithLogicalTableScanReuse(TableSourceITCase.scala:428)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>     at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>     at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>     at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>     at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
>     at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>     at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>     at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>     at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
>     at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
>     at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>     at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>     at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
>     at 
> org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42)
>     at 
> org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80)
>     at 
> org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:72)
>     at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:107)
>     at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88)
>     at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54)
>     at 
> 

[jira] [Commented] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable

2024-02-14 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817602#comment-17817602
 ] 

Matthias Pohl commented on FLINK-22765:
---

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57533=logs=a657ddbf-d986-5381-9649-342d9c92e7fb=dc085d4a-05c8-580e-06ab-21f5624dab16=8998

> ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
> 
>
> Key: FLINK-22765
> URL: https://issues.apache.org/jira/browse/FLINK-22765
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0, 1.20.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: pull-request-available, stale-assigned, test-stability
> Fix For: 1.14.0, 1.16.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56
> {code}
> May 25 00:56:38 java.lang.AssertionError: 
> May 25 00:56:38 
> May 25 00:56:38 Expected: is ""
> May 25 00:56:38  but: was "The system is out of resources.\nConsult the 
> following stack trace for details."
> May 25 00:56:38   at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:956)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:923)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError(ExceptionUtilsITCase.java:70)
> May 25 00:56:38   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> May 25 00:56:38   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> May 25 00:56:38   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> May 25 00:56:38   at java.lang.reflect.Method.invoke(Method.java:498)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> May 25 00:56:38   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> May 25 00:56:38   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> May 25 00:56:38   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> May 25 00:56:38   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> May 25 00:56:38   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> May 25 00:56:38   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> May 25 00:56:38   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> May 25 00:56:38   at 
> 

[jira] [Resolved] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM

2024-02-14 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl resolved FLINK-34403.
---
Resolution: Fixed

* master
** 
[2298e53f35121602c56845ac8040439fbd1a9ff4|https://github.com/apache/flink/commit/2298e53f35121602c56845ac8040439fbd1a9ff4]
** 
[9a316a5bcc47da7f69e76e0c25ed257adc4298ce|https://github.com/apache/flink/commit/9a316a5bcc47da7f69e76e0c25ed257adc4298ce]

> VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
> -
>
> Key: FLINK-34403
> URL: https://issues.apache.org/jira/browse/FLINK-34403
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.20.0
>Reporter: Benchao Li
>Assignee: Matthias Pohl
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.20.0
>
>
> After FLINK-33611 merged, the misc test on GHA cannot pass due to out of 
> memory error, throwing following exceptions:
> {code:java}
> Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest
> Error: 05:43:21 05:43:21.773 [ERROR] 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time 
> elapsed: 40.97 s <<< ERROR!
> Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in 
> serialization.
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007)
> Feb 07 05:43:21   at 
> org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56)
> Feb 07 05:43:21   at 
> org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:89)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:76)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:71)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple(VeryBigPbRowToProtoTest.java:37)
> Feb 07 05:43:21   at java.lang.reflect.Method.invoke(Method.java:498)
> Feb 07 05:43:21 Caused by: java.util.concurrent.ExecutionException: 
> java.lang.IllegalArgumentException: Self-suppression not permitted
> Feb 07 05:43:21   at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> Feb 07 05:43:21   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:323)
> Feb 07 05:43:21   ... 18 more
> Feb 07 05:43:21 Caused by: java.lang.IllegalArgumentException: 
> Self-suppression not permitted
> Feb 07 05:43:21   at 
> java.lang.Throwable.addSuppressed(Throwable.java:1072)
> Feb 07 05:43:21   at 
> org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:556)
> Feb 07 05:43:21   at 
> 

[jira] [Comment Edited] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM

2024-02-14 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817273#comment-17817273
 ] 

Matthias Pohl edited comment on FLINK-34403 at 2/14/24 11:12 AM:
-

Args, all the time I didn't notice that they are two separate tests (with very 
similar names):
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862
* 
https://github.com/apache/flink/actions/runs/7895502334/job/21548207280#step:10:23089
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=d871f0ce-7328-5d00-023b-e7391f5801c8=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6=23068
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=d871f0ce-7328-5d00-023b-e7391f5801c8=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6=23068
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=298e20ef-7951-5965-0e79-ea664ddc435e=d4c90338-c843-57b0-3232-10ae74f00347=23375

Reopening the issue.


was (Author: mapohl):
Args, all the time I didn't notice that they are two separate tests (with very 
similar names):
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862
* 
https://github.com/apache/flink/actions/runs/7895502334/job/21548207280#step:10:23089
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=d871f0ce-7328-5d00-023b-e7391f5801c8=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6=23068
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=d871f0ce-7328-5d00-023b-e7391f5801c8=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6=23068

Reopening the issue.

> VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
> -
>
> Key: FLINK-34403
> URL: https://issues.apache.org/jira/browse/FLINK-34403
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.20.0
>Reporter: Benchao Li
>Assignee: Matthias Pohl
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.20.0
>
>
> After FLINK-33611 merged, the misc test on GHA cannot pass due to out of 
> memory error, throwing following exceptions:
> {code:java}
> Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest
> Error: 05:43:21 05:43:21.773 [ERROR] 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time 
> elapsed: 40.97 s <<< ERROR!
> Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in 
> serialization.
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007)
> Feb 07 05:43:21   at 
> org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56)
> Feb 07 05:43:21   at 
> org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382)
> Feb 07 

[jira] [Comment Edited] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM

2024-02-14 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817273#comment-17817273
 ] 

Matthias Pohl edited comment on FLINK-34403 at 2/14/24 11:12 AM:
-

Args, all the time I didn't notice that they are two separate tests (with very 
similar names):
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862
* 
https://github.com/apache/flink/actions/runs/7895502334/job/21548207280#step:10:23089
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=d871f0ce-7328-5d00-023b-e7391f5801c8=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6=23068
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=d871f0ce-7328-5d00-023b-e7391f5801c8=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6=23068

Reopening the issue.


was (Author: mapohl):
Args, all the time I didn't notice that they are two separate tests (with very 
similar names):
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862
* 
https://github.com/apache/flink/actions/runs/7895502334/job/21548207280#step:10:23089
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=d871f0ce-7328-5d00-023b-e7391f5801c8=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6=23068
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=298e20ef-7951-5965-0e79-ea664ddc435e=d4c90338-c843-57b0-3232-10ae74f00347=23375
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=298e20ef-7951-5965-0e79-ea664ddc435e=d4c90338-c843-57b0-3232-10ae74f00347=23375

Reopening the issue.

> VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
> -
>
> Key: FLINK-34403
> URL: https://issues.apache.org/jira/browse/FLINK-34403
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.20.0
>Reporter: Benchao Li
>Assignee: Matthias Pohl
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.20.0
>
>
> After FLINK-33611 merged, the misc test on GHA cannot pass due to out of 
> memory error, throwing following exceptions:
> {code:java}
> Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest
> Error: 05:43:21 05:43:21.773 [ERROR] 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time 
> elapsed: 40.97 s <<< ERROR!
> Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in 
> serialization.
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007)
> Feb 07 05:43:21   at 
> org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56)
> Feb 07 05:43:21   at 
> org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382)
> Feb 07 

[jira] [Comment Edited] (FLINK-34336) AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState may hang sometimes

2024-02-14 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817298#comment-17817298
 ] 

Matthias Pohl edited comment on FLINK-34336 at 2/14/24 11:11 AM:
-

* 
https://github.com/apache/flink/actions/runs/7895502334/job/21548185872#step:10:10193
* 
https://github.com/apache/flink/actions/runs/7895502334/job/21548208160#step:10:11190
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=15356


was (Author: mapohl):
* 
https://github.com/apache/flink/actions/runs/7895502334/job/21548185872#step:10:10193
* 
https://github.com/apache/flink/actions/runs/7895502334/job/21548208160#step:10:11190

> AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState 
> may hang sometimes
> -
>
> Key: FLINK-34336
> URL: https://issues.apache.org/jira/browse/FLINK-34336
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.19.0
>
>
> AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState 
> may hang in 
> waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, 
> {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};{color}
> h2. Reason:
> The job has 2 tasks(vertices), after calling updateJobResourceRequirements. 
> The source parallelism isn't changed (It's parallelism) , and the 
> FlatMapper+Sink is changed from  parallelism to parallelism2.
> So we expect the task number should be parallelism + parallelism2 instead of 
> parallelism2.
>  
> h2. Why it can be passed for now?
> Flink 1.19 supports the scaling cooldown, and the cooldown time is 30s by 
> default. It means, flink job will rescale job 30 seconds after 
> updateJobResourceRequirements is called.
>  
> So the running tasks are old parallelism when we call 
> waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, 
> {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};. {color}
> IIUC, it cannot be guaranteed, and it's unexpected.
>  
> h2. How to reproduce this bug?
> [https://github.com/1996fanrui/flink/commit/ffd713e24d37db2c103e4cd4361d0cd916d0d2f6]
>  * Disable the cooldown
>  * Sleep for a while before waitForRunningTasks
> If so, the job running in new parallelism, so `waitForRunningTasks` will hang 
> forever.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM

2024-02-14 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817273#comment-17817273
 ] 

Matthias Pohl edited comment on FLINK-34403 at 2/14/24 11:11 AM:
-

Args, all the time I didn't notice that they are two separate tests (with very 
similar names):
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862
* 
https://github.com/apache/flink/actions/runs/7895502334/job/21548207280#step:10:23089
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=d871f0ce-7328-5d00-023b-e7391f5801c8=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6=23068
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=298e20ef-7951-5965-0e79-ea664ddc435e=d4c90338-c843-57b0-3232-10ae74f00347=23375
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=298e20ef-7951-5965-0e79-ea664ddc435e=d4c90338-c843-57b0-3232-10ae74f00347=23375

Reopening the issue.


was (Author: mapohl):
Args, all the time I didn't notice that they are two separate tests (with very 
similar names):
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862
* 
https://github.com/apache/flink/actions/runs/7895502334/job/21548207280#step:10:23089
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=d871f0ce-7328-5d00-023b-e7391f5801c8=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6=23068
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=298e20ef-7951-5965-0e79-ea664ddc435e=d4c90338-c843-57b0-3232-10ae74f00347=23375

Reopening the issue.

> VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
> -
>
> Key: FLINK-34403
> URL: https://issues.apache.org/jira/browse/FLINK-34403
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.20.0
>Reporter: Benchao Li
>Assignee: Matthias Pohl
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.20.0
>
>
> After FLINK-33611 merged, the misc test on GHA cannot pass due to out of 
> memory error, throwing following exceptions:
> {code:java}
> Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest
> Error: 05:43:21 05:43:21.773 [ERROR] 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time 
> elapsed: 40.97 s <<< ERROR!
> Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in 
> serialization.
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007)
> Feb 07 05:43:21   at 
> org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56)
> Feb 07 05:43:21   at 
> org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382)
> Feb 07 

[jira] [Comment Edited] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM

2024-02-14 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817273#comment-17817273
 ] 

Matthias Pohl edited comment on FLINK-34403 at 2/14/24 11:10 AM:
-

Args, all the time I didn't notice that they are two separate tests (with very 
similar names):
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862
* 
https://github.com/apache/flink/actions/runs/7895502334/job/21548207280#step:10:23089
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=d871f0ce-7328-5d00-023b-e7391f5801c8=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6=23068
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=298e20ef-7951-5965-0e79-ea664ddc435e=d4c90338-c843-57b0-3232-10ae74f00347=23375

Reopening the issue.


was (Author: mapohl):
Args, all the time I didn't notice that they are two separate tests (with very 
similar names):
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862
* 
https://github.com/apache/flink/actions/runs/7895502334/job/21548207280#step:10:23089
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=d871f0ce-7328-5d00-023b-e7391f5801c8=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6=23068

Reopening the issue.

> VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
> -
>
> Key: FLINK-34403
> URL: https://issues.apache.org/jira/browse/FLINK-34403
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.20.0
>Reporter: Benchao Li
>Assignee: Matthias Pohl
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.20.0
>
>
> After FLINK-33611 merged, the misc test on GHA cannot pass due to out of 
> memory error, throwing following exceptions:
> {code:java}
> Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest
> Error: 05:43:21 05:43:21.773 [ERROR] 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time 
> elapsed: 40.97 s <<< ERROR!
> Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in 
> serialization.
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007)
> Feb 07 05:43:21   at 
> org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56)
> Feb 07 05:43:21   at 
> org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66)
> Feb 07 05:43:21   at 
> 

[jira] [Comment Edited] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM

2024-02-14 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817273#comment-17817273
 ] 

Matthias Pohl edited comment on FLINK-34403 at 2/14/24 11:10 AM:
-

Args, all the time I didn't notice that they are two separate tests (with very 
similar names):
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862
* 
https://github.com/apache/flink/actions/runs/7895502334/job/21548207280#step:10:23089
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=d871f0ce-7328-5d00-023b-e7391f5801c8=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6=23068

Reopening the issue.


was (Author: mapohl):
Args, all the time I didn't notice that they are two separate tests (with very 
similar names):
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862
* 
https://github.com/apache/flink/actions/runs/7895502334/job/21548207280#step:10:23089

Reopening the issue.

> VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
> -
>
> Key: FLINK-34403
> URL: https://issues.apache.org/jira/browse/FLINK-34403
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.20.0
>Reporter: Benchao Li
>Assignee: Matthias Pohl
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.20.0
>
>
> After FLINK-33611 merged, the misc test on GHA cannot pass due to out of 
> memory error, throwing following exceptions:
> {code:java}
> Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest
> Error: 05:43:21 05:43:21.773 [ERROR] 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time 
> elapsed: 40.97 s <<< ERROR!
> Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in 
> serialization.
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007)
> Feb 07 05:43:21   at 
> org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56)
> Feb 07 05:43:21   at 
> org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:89)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:76)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:71)
> Feb 07 05:43:21   at 
> 

[jira] [Commented] (FLINK-34273) git fetch fails

2024-02-14 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817334#comment-17817334
 ] 

Matthias Pohl commented on FLINK-34273:
---

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57521=logs=60960eae-6f09-579e-371e-29814bdd1adc=1fe608a4-e773-5ca0-5336-1c37a61b9f8d

> git fetch fails
> ---
>
> Key: FLINK-34273
> URL: https://issues.apache.org/jira/browse/FLINK-34273
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI, Test Infrastructure
>Affects Versions: 1.19.0, 1.18.1
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: test-stability
>
> We've seen multiple {{git fetch}} failures. I assume this to be an 
> infrastructure issue. This Jira issue is for documentation purposes.
> {code:java}
> error: RPC failed; curl 18 transfer closed with outstanding read data 
> remaining
> error: 5211 bytes of body are still expected
> fetch-pack: unexpected disconnect while reading sideband packet
> fatal: early EOF
> fatal: fetch-pack: invalid index-pack output {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57080=logs=0e7be18f-84f2-53f0-a32d-4a5e4a174679=5d6dc3d3-393d-5111-3a40-c6a5a36202e6=667



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable

2024-02-14 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-22765:
--
Priority: Critical  (was: Major)

> ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
> 
>
> Key: FLINK-22765
> URL: https://issues.apache.org/jira/browse/FLINK-22765
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0, 1.20.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: pull-request-available, stale-assigned, test-stability
> Fix For: 1.14.0, 1.16.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56
> {code}
> May 25 00:56:38 java.lang.AssertionError: 
> May 25 00:56:38 
> May 25 00:56:38 Expected: is ""
> May 25 00:56:38  but: was "The system is out of resources.\nConsult the 
> following stack trace for details."
> May 25 00:56:38   at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:956)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:923)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError(ExceptionUtilsITCase.java:70)
> May 25 00:56:38   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> May 25 00:56:38   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> May 25 00:56:38   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> May 25 00:56:38   at java.lang.reflect.Method.invoke(Method.java:498)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> May 25 00:56:38   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> May 25 00:56:38   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> May 25 00:56:38   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> May 25 00:56:38   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> May 25 00:56:38   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> May 25 00:56:38   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> May 25 00:56:38   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> May 25 00:56:38 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable

2024-02-14 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817333#comment-17817333
 ] 

Matthias Pohl commented on FLINK-22765:
---

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57521=logs=a657ddbf-d986-5381-9649-342d9c92e7fb=dc085d4a-05c8-580e-06ab-21f5624dab16=8997

> ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
> 
>
> Key: FLINK-22765
> URL: https://issues.apache.org/jira/browse/FLINK-22765
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0, 1.20.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: pull-request-available, stale-assigned, test-stability
> Fix For: 1.14.0, 1.16.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56
> {code}
> May 25 00:56:38 java.lang.AssertionError: 
> May 25 00:56:38 
> May 25 00:56:38 Expected: is ""
> May 25 00:56:38  but: was "The system is out of resources.\nConsult the 
> following stack trace for details."
> May 25 00:56:38   at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:956)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:923)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError(ExceptionUtilsITCase.java:70)
> May 25 00:56:38   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> May 25 00:56:38   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> May 25 00:56:38   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> May 25 00:56:38   at java.lang.reflect.Method.invoke(Method.java:498)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> May 25 00:56:38   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> May 25 00:56:38   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> May 25 00:56:38   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> May 25 00:56:38   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> May 25 00:56:38   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> May 25 00:56:38   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> May 25 00:56:38   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> May 25 00:56:38   at 
> 

[jira] [Commented] (FLINK-28440) EventTimeWindowCheckpointingITCase failed with restore

2024-02-14 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817300#comment-17817300
 ] 

Matthias Pohl commented on FLINK-28440:
---

https://github.com/apache/flink/actions/runs/7895502334/job/21548198516#step:10:7557

> EventTimeWindowCheckpointingITCase failed with restore
> --
>
> Key: FLINK-28440
> URL: https://issues.apache.org/jira/browse/FLINK-28440
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / State Backends
>Affects Versions: 1.16.0, 1.17.0, 1.18.0, 1.19.0
>Reporter: Huang Xingbo
>Assignee: Yanfei Lei
>Priority: Critical
>  Labels: auto-deprioritized-critical, pull-request-available, 
> stale-assigned, test-stability
> Fix For: 1.19.0
>
> Attachments: image-2023-02-01-00-51-54-506.png, 
> image-2023-02-01-01-10-01-521.png, image-2023-02-01-01-19-12-182.png, 
> image-2023-02-01-16-47-23-756.png, image-2023-02-01-16-57-43-889.png, 
> image-2023-02-02-10-52-56-599.png, image-2023-02-03-10-09-07-586.png, 
> image-2023-02-03-12-03-16-155.png, image-2023-02-03-12-03-56-614.png
>
>
> {code:java}
> Caused by: java.lang.Exception: Exception while creating 
> StreamOperatorStateContext.
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:256)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268)
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:722)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:698)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:665)
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:935)
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:904)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:728)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_0a448493b4782967b150582570326227_(2/4) from 
> any of the 1 provided restore options.
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:353)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:165)
>   ... 11 more
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
> /tmp/junit1835099326935900400/junit1113650082510421526/52ee65b7-033f-4429-8ddd-adbe85e27ced
>  (No such file or directory)
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.advance(StateChangelogHandleStreamHandleReader.java:87)
>   at 
> org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.hasNext(StateChangelogHandleStreamHandleReader.java:69)
>   at 
> org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.readBackendHandle(ChangelogBackendRestoreOperation.java:96)
>   at 
> org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.restore(ChangelogBackendRestoreOperation.java:75)
>   at 
> org.apache.flink.state.changelog.ChangelogStateBackend.restore(ChangelogStateBackend.java:92)
>   at 
> org.apache.flink.state.changelog.AbstractChangelogStateBackend.createKeyedStateBackend(AbstractChangelogStateBackend.java:136)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:336)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
>   ... 13 more
> Caused by: java.io.FileNotFoundException: 
> 

[jira] [Commented] (FLINK-34418) Disk space issues for Docker-ized GitHub Action jobs

2024-02-14 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817301#comment-17817301
 ] 

Matthias Pohl commented on FLINK-34418:
---

https://github.com/apache/flink/actions/runs/7895502334

> Disk space issues for Docker-ized GitHub Action jobs
> 
>
> Key: FLINK-34418
> URL: https://issues.apache.org/jira/browse/FLINK-34418
> Project: Flink
>  Issue Type: Bug
>  Components: Test Infrastructure
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: github-actions, pull-request-available, test-stability
>
> [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746]
> {code:java}
> [...]
> Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device
> 27608Feb 09 03:00:13  at java.io.FileOutputStream.writeBytes(Native Method)
> 27609Feb 09 03:00:13  at 
> java.io.FileOutputStream.write(FileOutputStream.java:326)
> 27610Feb 09 03:00:13  at 
> org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250)
> 27611Feb 09 03:00:13  ... 39 more
> [...] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM

2024-02-14 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817273#comment-17817273
 ] 

Matthias Pohl edited comment on FLINK-34403 at 2/14/24 9:59 AM:


Args, all the time I didn't notice that they are two separate tests (with very 
similar names):
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862
* 
https://github.com/apache/flink/actions/runs/7895502334/job/21548207280#step:10:23089

Reopening the issue.


was (Author: mapohl):
Args, all the time I didn't notice that they are two separate tests (with very 
similar names):
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862

Reopening the issue.

> VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
> -
>
> Key: FLINK-34403
> URL: https://issues.apache.org/jira/browse/FLINK-34403
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.20.0
>Reporter: Benchao Li
>Assignee: Matthias Pohl
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.20.0
>
>
> After FLINK-33611 merged, the misc test on GHA cannot pass due to out of 
> memory error, throwing following exceptions:
> {code:java}
> Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest
> Error: 05:43:21 05:43:21.773 [ERROR] 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time 
> elapsed: 40.97 s <<< ERROR!
> Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in 
> serialization.
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007)
> Feb 07 05:43:21   at 
> org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56)
> Feb 07 05:43:21   at 
> org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:89)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:76)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:71)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple(VeryBigPbRowToProtoTest.java:37)
> Feb 07 05:43:21   at java.lang.reflect.Method.invoke(Method.java:498)
> Feb 07 05:43:21 Caused by: java.util.concurrent.ExecutionException: 
> 

[jira] [Comment Edited] (FLINK-34336) AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState may hang sometimes

2024-02-14 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817298#comment-17817298
 ] 

Matthias Pohl edited comment on FLINK-34336 at 2/14/24 9:59 AM:


* 
https://github.com/apache/flink/actions/runs/7895502334/job/21548185872#step:10:10193
* 
https://github.com/apache/flink/actions/runs/7895502334/job/21548208160#step:10:11190


was (Author: mapohl):
https://github.com/apache/flink/actions/runs/7895502334/job/21548185872#step:10:10193

> AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState 
> may hang sometimes
> -
>
> Key: FLINK-34336
> URL: https://issues.apache.org/jira/browse/FLINK-34336
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.19.0
>
>
> AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState 
> may hang in 
> waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, 
> {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};{color}
> h2. Reason:
> The job has 2 tasks(vertices), after calling updateJobResourceRequirements. 
> The source parallelism isn't changed (It's parallelism) , and the 
> FlatMapper+Sink is changed from  parallelism to parallelism2.
> So we expect the task number should be parallelism + parallelism2 instead of 
> parallelism2.
>  
> h2. Why it can be passed for now?
> Flink 1.19 supports the scaling cooldown, and the cooldown time is 30s by 
> default. It means, flink job will rescale job 30 seconds after 
> updateJobResourceRequirements is called.
>  
> So the running tasks are old parallelism when we call 
> waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, 
> {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};. {color}
> IIUC, it cannot be guaranteed, and it's unexpected.
>  
> h2. How to reproduce this bug?
> [https://github.com/1996fanrui/flink/commit/ffd713e24d37db2c103e4cd4361d0cd916d0d2f6]
>  * Disable the cooldown
>  * Sleep for a while before waitForRunningTasks
> If so, the job running in new parallelism, so `waitForRunningTasks` will hang 
> forever.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34336) AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState may hang sometimes

2024-02-14 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-34336:
--
Labels: pull-request-available test-stability  (was: pull-request-available)

> AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState 
> may hang sometimes
> -
>
> Key: FLINK-34336
> URL: https://issues.apache.org/jira/browse/FLINK-34336
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.19.0
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.19.0
>
>
> AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState 
> may hang in 
> waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, 
> {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};{color}
> h2. Reason:
> The job has 2 tasks(vertices), after calling updateJobResourceRequirements. 
> The source parallelism isn't changed (It's parallelism) , and the 
> FlatMapper+Sink is changed from  parallelism to parallelism2.
> So we expect the task number should be parallelism + parallelism2 instead of 
> parallelism2.
>  
> h2. Why it can be passed for now?
> Flink 1.19 supports the scaling cooldown, and the cooldown time is 30s by 
> default. It means, flink job will rescale job 30 seconds after 
> updateJobResourceRequirements is called.
>  
> So the running tasks are old parallelism when we call 
> waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, 
> {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};. {color}
> IIUC, it cannot be guaranteed, and it's unexpected.
>  
> h2. How to reproduce this bug?
> [https://github.com/1996fanrui/flink/commit/ffd713e24d37db2c103e4cd4361d0cd916d0d2f6]
>  * Disable the cooldown
>  * Sleep for a while before waitForRunningTasks
> If so, the job running in new parallelism, so `waitForRunningTasks` will hang 
> forever.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34336) AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState may hang sometimes

2024-02-14 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-34336:
--
Affects Version/s: 1.20.0

> AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState 
> may hang sometimes
> -
>
> Key: FLINK-34336
> URL: https://issues.apache.org/jira/browse/FLINK-34336
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.19.0
>
>
> AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState 
> may hang in 
> waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, 
> {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};{color}
> h2. Reason:
> The job has 2 tasks(vertices), after calling updateJobResourceRequirements. 
> The source parallelism isn't changed (It's parallelism) , and the 
> FlatMapper+Sink is changed from  parallelism to parallelism2.
> So we expect the task number should be parallelism + parallelism2 instead of 
> parallelism2.
>  
> h2. Why it can be passed for now?
> Flink 1.19 supports the scaling cooldown, and the cooldown time is 30s by 
> default. It means, flink job will rescale job 30 seconds after 
> updateJobResourceRequirements is called.
>  
> So the running tasks are old parallelism when we call 
> waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, 
> {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};. {color}
> IIUC, it cannot be guaranteed, and it's unexpected.
>  
> h2. How to reproduce this bug?
> [https://github.com/1996fanrui/flink/commit/ffd713e24d37db2c103e4cd4361d0cd916d0d2f6]
>  * Disable the cooldown
>  * Sleep for a while before waitForRunningTasks
> If so, the job running in new parallelism, so `waitForRunningTasks` will hang 
> forever.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34336) AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState may hang sometimes

2024-02-14 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817298#comment-17817298
 ] 

Matthias Pohl commented on FLINK-34336:
---

https://github.com/apache/flink/actions/runs/7895502334/job/21548185872#step:10:10193

> AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState 
> may hang sometimes
> -
>
> Key: FLINK-34336
> URL: https://issues.apache.org/jira/browse/FLINK-34336
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.19.0
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState 
> may hang in 
> waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, 
> {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};{color}
> h2. Reason:
> The job has 2 tasks(vertices), after calling updateJobResourceRequirements. 
> The source parallelism isn't changed (It's parallelism) , and the 
> FlatMapper+Sink is changed from  parallelism to parallelism2.
> So we expect the task number should be parallelism + parallelism2 instead of 
> parallelism2.
>  
> h2. Why it can be passed for now?
> Flink 1.19 supports the scaling cooldown, and the cooldown time is 30s by 
> default. It means, flink job will rescale job 30 seconds after 
> updateJobResourceRequirements is called.
>  
> So the running tasks are old parallelism when we call 
> waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, 
> {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};. {color}
> IIUC, it cannot be guaranteed, and it's unexpected.
>  
> h2. How to reproduce this bug?
> [https://github.com/1996fanrui/flink/commit/ffd713e24d37db2c103e4cd4361d0cd916d0d2f6]
>  * Disable the cooldown
>  * Sleep for a while before waitForRunningTasks
> If so, the job running in new parallelism, so `waitForRunningTasks` will hang 
> forever.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34418) Disk space issues for Docker-ized GitHub Action jobs

2024-02-14 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817295#comment-17817295
 ] 

Matthias Pohl commented on FLINK-34418:
---

https://github.com/apache/flink/actions/runs/7895502322/job/21548178211

> Disk space issues for Docker-ized GitHub Action jobs
> 
>
> Key: FLINK-34418
> URL: https://issues.apache.org/jira/browse/FLINK-34418
> Project: Flink
>  Issue Type: Bug
>  Components: Test Infrastructure
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: github-actions, pull-request-available, test-stability
>
> [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746]
> {code:java}
> [...]
> Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device
> 27608Feb 09 03:00:13  at java.io.FileOutputStream.writeBytes(Native Method)
> 27609Feb 09 03:00:13  at 
> java.io.FileOutputStream.write(FileOutputStream.java:326)
> 27610Feb 09 03:00:13  at 
> org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250)
> 27611Feb 09 03:00:13  ... 39 more
> [...] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34418) Disk space issues for Docker-ized GitHub Action jobs

2024-02-14 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817292#comment-17817292
 ] 

Matthias Pohl commented on FLINK-34418:
---

https://github.com/apache/flink/actions/runs/7895502206/job/21548178104

> Disk space issues for Docker-ized GitHub Action jobs
> 
>
> Key: FLINK-34418
> URL: https://issues.apache.org/jira/browse/FLINK-34418
> Project: Flink
>  Issue Type: Bug
>  Components: Test Infrastructure
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: github-actions, pull-request-available, test-stability
>
> [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746]
> {code:java}
> [...]
> Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device
> 27608Feb 09 03:00:13  at java.io.FileOutputStream.writeBytes(Native Method)
> 27609Feb 09 03:00:13  at 
> java.io.FileOutputStream.write(FileOutputStream.java:326)
> 27610Feb 09 03:00:13  at 
> org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250)
> 27611Feb 09 03:00:13  ... 39 more
> [...] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34443) YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication failed when deploying job cluster

2024-02-14 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817293#comment-17817293
 ] 

Matthias Pohl commented on FLINK-34443:
---

Maybe related to FLINK-34418

> YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication failed 
> when deploying job cluster
> ---
>
> Key: FLINK-34443
> URL: https://issues.apache.org/jira/browse/FLINK-34443
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI, Runtime / Coordination, Test 
> Infrastructure
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: github-actions, test-stability
>
> https://github.com/apache/flink/actions/runs/7895502206/job/21548246199#step:10:28804
> {code}
> Error: 03:04:05 03:04:05.066 [ERROR] Tests run: 2, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 68.10 s <<< FAILURE! -- in 
> org.apache.flink.yarn.YARNFileReplicationITCase
> Error: 03:04:05 03:04:05.067 [ERROR] 
> org.apache.flink.yarn.YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication
>  -- Time elapsed: 1.982 s <<< ERROR!
> Feb 14 03:04:05 
> org.apache.flink.client.deployment.ClusterDeploymentException: Could not 
> deploy Yarn job cluster.
> Feb 14 03:04:05   at 
> org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:566)
> Feb 14 03:04:05   at 
> org.apache.flink.yarn.YARNFileReplicationITCase.deployPerJob(YARNFileReplicationITCase.java:109)
> Feb 14 03:04:05   at 
> org.apache.flink.yarn.YARNFileReplicationITCase.lambda$testPerJobModeWithCustomizedFileReplication$0(YARNFileReplicationITCase.java:73)
> Feb 14 03:04:05   at 
> org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:303)
> Feb 14 03:04:05   at 
> org.apache.flink.yarn.YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication(YARNFileReplicationITCase.java:73)
> Feb 14 03:04:05   at java.lang.reflect.Method.invoke(Method.java:498)
> Feb 14 03:04:05   at 
> java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
> Feb 14 03:04:05   at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> Feb 14 03:04:05   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> Feb 14 03:04:05   at 
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> Feb 14 03:04:05   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> Feb 14 03:04:05 Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/root/.flink/application_1707879779446_0002/log4j-api-2.17.1.jar could 
> only be written to 0 of the 1 minReplication nodes. There are 2 datanode(s) 
> running and 2 node(s) are excluded in this operation.
> Feb 14 03:04:05   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2260)
> Feb 14 03:04:05   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
> Feb 14 03:04:05   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2813)
> Feb 14 03:04:05   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:908)
> Feb 14 03:04:05   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:577)
> Feb 14 03:04:05   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> Feb 14 03:04:05   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:549)
> Feb 14 03:04:05   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:518)
> Feb 14 03:04:05   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086)
> Feb 14 03:04:05   at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029)
> Feb 14 03:04:05   at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:957)
> Feb 14 03:04:05   at java.security.AccessController.doPrivileged(Native 
> Method)
> Feb 14 03:04:05   at javax.security.auth.Subject.doAs(Subject.java:422)
> Feb 14 03:04:05   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
> Feb 14 03:04:05   at 
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2957)
> Feb 14 03:04:05 
> Feb 14 03:04:05   at 
> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1579)
> Feb 14 03:04:05   at org.apache.hadoop.ipc.Client.call(Client.java:1525)
> Feb 14 

[jira] [Created] (FLINK-34443) YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication failed when deploying job cluster

2024-02-14 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-34443:
-

 Summary: 
YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication failed 
when deploying job cluster
 Key: FLINK-34443
 URL: https://issues.apache.org/jira/browse/FLINK-34443
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI, Runtime / Coordination, Test 
Infrastructure
Affects Versions: 1.19.0, 1.20.0
Reporter: Matthias Pohl


https://github.com/apache/flink/actions/runs/7895502206/job/21548246199#step:10:28804

{code}
Error: 03:04:05 03:04:05.066 [ERROR] Tests run: 2, Failures: 0, Errors: 1, 
Skipped: 0, Time elapsed: 68.10 s <<< FAILURE! -- in 
org.apache.flink.yarn.YARNFileReplicationITCase
Error: 03:04:05 03:04:05.067 [ERROR] 
org.apache.flink.yarn.YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication
 -- Time elapsed: 1.982 s <<< ERROR!
Feb 14 03:04:05 org.apache.flink.client.deployment.ClusterDeploymentException: 
Could not deploy Yarn job cluster.
Feb 14 03:04:05 at 
org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:566)
Feb 14 03:04:05 at 
org.apache.flink.yarn.YARNFileReplicationITCase.deployPerJob(YARNFileReplicationITCase.java:109)
Feb 14 03:04:05 at 
org.apache.flink.yarn.YARNFileReplicationITCase.lambda$testPerJobModeWithCustomizedFileReplication$0(YARNFileReplicationITCase.java:73)
Feb 14 03:04:05 at 
org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:303)
Feb 14 03:04:05 at 
org.apache.flink.yarn.YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication(YARNFileReplicationITCase.java:73)
Feb 14 03:04:05 at java.lang.reflect.Method.invoke(Method.java:498)
Feb 14 03:04:05 at 
java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
Feb 14 03:04:05 at 
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
Feb 14 03:04:05 at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
Feb 14 03:04:05 at 
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
Feb 14 03:04:05 at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
Feb 14 03:04:05 Caused by: 
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
/user/root/.flink/application_1707879779446_0002/log4j-api-2.17.1.jar could 
only be written to 0 of the 1 minReplication nodes. There are 2 datanode(s) 
running and 2 node(s) are excluded in this operation.
Feb 14 03:04:05 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2260)
Feb 14 03:04:05 at 
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
Feb 14 03:04:05 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2813)
Feb 14 03:04:05 at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:908)
Feb 14 03:04:05 at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:577)
Feb 14 03:04:05 at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
Feb 14 03:04:05 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:549)
Feb 14 03:04:05 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:518)
Feb 14 03:04:05 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086)
Feb 14 03:04:05 at 
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029)
Feb 14 03:04:05 at 
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:957)
Feb 14 03:04:05 at java.security.AccessController.doPrivileged(Native 
Method)
Feb 14 03:04:05 at javax.security.auth.Subject.doAs(Subject.java:422)
Feb 14 03:04:05 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
Feb 14 03:04:05 at 
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2957)
Feb 14 03:04:05 
Feb 14 03:04:05 at 
org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1579)
Feb 14 03:04:05 at org.apache.hadoop.ipc.Client.call(Client.java:1525)
Feb 14 03:04:05 at org.apache.hadoop.ipc.Client.call(Client.java:1422)
Feb 14 03:04:05 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:231)
Feb 14 03:04:05 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
Feb 14 03:04:05 at com.sun.proxy.$Proxy113.addBlock(Unknown Source)
Feb 14 03:04:05 at 

[jira] [Commented] (FLINK-30629) ClientHeartbeatTest.testJobRunningIfClientReportHeartbeat is unstable

2024-02-14 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817283#comment-17817283
 ] 

Matthias Pohl commented on FLINK-30629:
---

1.17: 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57519=logs=77a9d8e1-d610-59b3-fc2a-4766541e0e33=125e07e7-8de0-5c6c-a541-a567415af3ef=9747

> ClientHeartbeatTest.testJobRunningIfClientReportHeartbeat is unstable
> -
>
> Key: FLINK-30629
> URL: https://issues.apache.org/jira/browse/FLINK-30629
> Project: Flink
>  Issue Type: Bug
>  Components: Client / Job Submission
>Affects Versions: 1.17.0, 1.18.0
>Reporter: Xintong Song
>Assignee: Liu
>Priority: Critical
>  Labels: pull-request-available, stale-assigned, test-stability
> Fix For: 1.19.0
>
> Attachments: ClientHeartbeatTestLog.txt, 
> logs-cron_azure-test_cron_azure_core-1685497478.zip
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44690=logs=77a9d8e1-d610-59b3-fc2a-4766541e0e33=125e07e7-8de0-5c6c-a541-a567415af3ef=10819
> {code:java}
> Jan 11 04:32:39 [ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, 
> Time elapsed: 21.02 s <<< FAILURE! - in 
> org.apache.flink.client.ClientHeartbeatTest
> Jan 11 04:32:39 [ERROR] 
> org.apache.flink.client.ClientHeartbeatTest.testJobRunningIfClientReportHeartbeat
>   Time elapsed: 9.157 s  <<< ERROR!
> Jan 11 04:32:39 java.lang.IllegalStateException: MiniCluster is not yet 
> running or has already been shut down.
> Jan 11 04:32:39   at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:193)
> Jan 11 04:32:39   at 
> org.apache.flink.runtime.minicluster.MiniCluster.getDispatcherGatewayFuture(MiniCluster.java:1044)
> Jan 11 04:32:39   at 
> org.apache.flink.runtime.minicluster.MiniCluster.runDispatcherCommand(MiniCluster.java:917)
> Jan 11 04:32:39   at 
> org.apache.flink.runtime.minicluster.MiniCluster.getJobStatus(MiniCluster.java:841)
> Jan 11 04:32:39   at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.getJobStatus(MiniClusterJobClient.java:91)
> Jan 11 04:32:39   at 
> org.apache.flink.client.ClientHeartbeatTest.testJobRunningIfClientReportHeartbeat(ClientHeartbeatTest.java:79)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-21834) org.apache.flink.core.fs.AbstractRecoverableWriterTest.testResumeWithWrongOffset fail

2024-02-14 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817281#comment-17817281
 ] 

Matthias Pohl commented on FLINK-21834:
---

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57504=logs=2e8cb2f7-b2d3-5c62-9c05-cd756d33a819=2dd510a3-5041-5201-6dc3-54d310f68906=10519

{code}
Feb 13 12:33:39 12:33:39.888 [ERROR] Tests run: 11, Failures: 1, Errors: 0, 
Skipped: 0, Time elapsed: 116.9 s <<< FAILURE! -- in 
org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriterTest
Feb 13 12:33:39 12:33:39.888 [ERROR] 
org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriterTest.testResumeWithWrongOffset
 -- Time elapsed: 100.7 s <<< FAILURE!
Feb 13 12:33:39 java.lang.AssertionError
Feb 13 12:33:39 at 
org.apache.flink.core.fs.AbstractRecoverableWriterTest.testResumeWithWrongOffset(AbstractRecoverableWriterTest.java:381)
Feb 13 12:33:39 at java.lang.reflect.Method.invoke(Method.java:498)
Feb 13 12:33:39 at 
org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
Feb 13 12:33:39 
{code}

> org.apache.flink.core.fs.AbstractRecoverableWriterTest.testResumeWithWrongOffset
>  fail
> -
>
> Key: FLINK-21834
> URL: https://issues.apache.org/jira/browse/FLINK-21834
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems
>Affects Versions: 1.12.2, 1.13.2, 1.15.0
>Reporter: Guowei Ma
>Priority: Not a Priority
>  Labels: auto-deprioritized-critical, auto-deprioritized-major, 
> auto-deprioritized-minor, test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=14847=logs=3d12d40f-c62d-5ec4-6acc-0efe94cc3e89=5d6e4255-0ea8-5e2a-f52c-c881b7872361=10893
> Maybe we need print what the exception is when `recover` is called.
> {code:java}
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.fail(Assert.java:95)
>   at 
> org.apache.flink.core.fs.AbstractRecoverableWriterTest.testResumeWithWrongOffset(AbstractRecoverableWriterTest.java:381)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> 

[jira] [Comment Edited] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM

2024-02-14 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817273#comment-17817273
 ] 

Matthias Pohl edited comment on FLINK-34403 at 2/14/24 9:37 AM:


Args, all the time I didn't notice that they are two separate tests (with very 
similar names):
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862

Reopening the issue.


was (Author: mapohl):
Args, all the time I didn't notice that they are two separate tests (with very 
similar names):
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862

Reopening the issue.

> VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
> -
>
> Key: FLINK-34403
> URL: https://issues.apache.org/jira/browse/FLINK-34403
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.20.0
>Reporter: Benchao Li
>Assignee: Matthias Pohl
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.20.0
>
>
> After FLINK-33611 merged, the misc test on GHA cannot pass due to out of 
> memory error, throwing following exceptions:
> {code:java}
> Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest
> Error: 05:43:21 05:43:21.773 [ERROR] 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time 
> elapsed: 40.97 s <<< ERROR!
> Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in 
> serialization.
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007)
> Feb 07 05:43:21   at 
> org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56)
> Feb 07 05:43:21   at 
> org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:89)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:76)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:71)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple(VeryBigPbRowToProtoTest.java:37)
> Feb 07 05:43:21   at java.lang.reflect.Method.invoke(Method.java:498)
> Feb 07 05:43:21 Caused by: java.util.concurrent.ExecutionException: 
> java.lang.IllegalArgumentException: Self-suppression not permitted
> Feb 07 05:43:21   at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> Feb 07 05:43:21   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> Feb 07 

[jira] [Reopened] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM

2024-02-14 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl reopened FLINK-34403:
---

Args, all the time I didn't notice that they are two separate tests (with very 
similar names):
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862

Reopening the issue.

> VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
> -
>
> Key: FLINK-34403
> URL: https://issues.apache.org/jira/browse/FLINK-34403
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.20.0
>Reporter: Benchao Li
>Assignee: Matthias Pohl
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.20.0
>
>
> After FLINK-33611 merged, the misc test on GHA cannot pass due to out of 
> memory error, throwing following exceptions:
> {code:java}
> Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest
> Error: 05:43:21 05:43:21.773 [ERROR] 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time 
> elapsed: 40.97 s <<< ERROR!
> Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in 
> serialization.
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007)
> Feb 07 05:43:21   at 
> org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56)
> Feb 07 05:43:21   at 
> org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:89)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:76)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:71)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple(VeryBigPbRowToProtoTest.java:37)
> Feb 07 05:43:21   at java.lang.reflect.Method.invoke(Method.java:498)
> Feb 07 05:43:21 Caused by: java.util.concurrent.ExecutionException: 
> java.lang.IllegalArgumentException: Self-suppression not permitted
> Feb 07 05:43:21   at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> Feb 07 05:43:21   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:323)
> Feb 07 05:43:21   ... 18 more
> Feb 07 05:43:21 Caused by: java.lang.IllegalArgumentException: 
> Self-suppression not permitted
> Feb 07 05:43:21   at 
> java.lang.Throwable.addSuppressed(Throwable.java:1072)
> Feb 07 05:43:21   at 
> org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:556)
> Feb 07 05:43:21   at 
> 

[jira] [Commented] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM

2024-02-14 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817271#comment-17817271
 ] 

Matthias Pohl commented on FLINK-34403:
---

No worries

> VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
> -
>
> Key: FLINK-34403
> URL: https://issues.apache.org/jira/browse/FLINK-34403
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.20.0
>Reporter: Benchao Li
>Assignee: Matthias Pohl
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.20.0
>
>
> After FLINK-33611 merged, the misc test on GHA cannot pass due to out of 
> memory error, throwing following exceptions:
> {code:java}
> Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest
> Error: 05:43:21 05:43:21.773 [ERROR] 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time 
> elapsed: 40.97 s <<< ERROR!
> Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in 
> serialization.
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007)
> Feb 07 05:43:21   at 
> org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56)
> Feb 07 05:43:21   at 
> org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:89)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:76)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:71)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple(VeryBigPbRowToProtoTest.java:37)
> Feb 07 05:43:21   at java.lang.reflect.Method.invoke(Method.java:498)
> Feb 07 05:43:21 Caused by: java.util.concurrent.ExecutionException: 
> java.lang.IllegalArgumentException: Self-suppression not permitted
> Feb 07 05:43:21   at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> Feb 07 05:43:21   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:323)
> Feb 07 05:43:21   ... 18 more
> Feb 07 05:43:21 Caused by: java.lang.IllegalArgumentException: 
> Self-suppression not permitted
> Feb 07 05:43:21   at 
> java.lang.Throwable.addSuppressed(Throwable.java:1072)
> Feb 07 05:43:21   at 
> org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:556)
> Feb 07 05:43:21   at 
> org.apache.flink.util.InstantiationUtil.writeObjectToConfig(InstantiationUtil.java:486)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamConfig.lambda$triggerSerializationAndReturnFuture$0(StreamConfig.java:182)
> Feb 07 05:43:21   at 
> 

[jira] [Commented] (FLINK-34424) BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times out

2024-02-13 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817082#comment-17817082
 ] 

Matthias Pohl commented on FLINK-34424:
---

args. Didn't we have this in the past? Sorry again - the auto completion and 
the guy behind the screen are to blame here. Yes, you're right.

> BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times 
> out
> 
>
> Key: FLINK-34424
> URL: https://issues.apache.org/jira/browse/FLINK-34424
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57446=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9151
> {code}
> Feb 11 13:55:29 "ForkJoinPool-50-worker-25" #414 daemon prio=5 os_prio=0 
> tid=0x7f19503af800 nid=0x284c in Object.wait() [0x7f191b6db000]
> Feb 11 13:55:29java.lang.Thread.State: WAITING (on object monitor)
> Feb 11 13:55:29   at java.lang.Object.wait(Native Method)
> Feb 11 13:55:29   at java.lang.Thread.join(Thread.java:1252)
> Feb 11 13:55:29   - locked <0xe2e019a8> (a 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest$LongReader)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.trySync(CheckedThread.java:104)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:92)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:81)
> Feb 11 13:55:29   at 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest.testRead10ConsumersConcurrent(BoundedBlockingSubpartitionWriteReadTest.java:177)
> Feb 11 13:55:29   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34424) BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times out

2024-02-13 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817069#comment-17817069
 ] 

Matthias Pohl commented on FLINK-34424:
---

[~piotr.nowicki] (because it's networking; feel free to delegate) 
[~yunfengzhou] (because you touched the code in FLINK-33743 recently): Can 
someone help with investigating the cause of the issue?

> BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times 
> out
> 
>
> Key: FLINK-34424
> URL: https://issues.apache.org/jira/browse/FLINK-34424
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57446=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9151
> {code}
> Feb 11 13:55:29 "ForkJoinPool-50-worker-25" #414 daemon prio=5 os_prio=0 
> tid=0x7f19503af800 nid=0x284c in Object.wait() [0x7f191b6db000]
> Feb 11 13:55:29java.lang.Thread.State: WAITING (on object monitor)
> Feb 11 13:55:29   at java.lang.Object.wait(Native Method)
> Feb 11 13:55:29   at java.lang.Thread.join(Thread.java:1252)
> Feb 11 13:55:29   - locked <0xe2e019a8> (a 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest$LongReader)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.trySync(CheckedThread.java:104)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:92)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:81)
> Feb 11 13:55:29   at 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest.testRead10ConsumersConcurrent(BoundedBlockingSubpartitionWriteReadTest.java:177)
> Feb 11 13:55:29   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34424) BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times out

2024-02-13 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817063#comment-17817063
 ] 

Matthias Pohl commented on FLINK-34424:
---

I'm wondering whether that has anything to do with the blocked reader thread:
{code}
Feb 11 13:55:29 "Thread-76" #476 daemon prio=5 os_prio=0 tid=0x7f190bbf1800 
nid=0x5a40 waiting for monitor entry [0x7f191bce4000]
Feb 11 13:55:29java.lang.Thread.State: BLOCKED (on object monitor)
Feb 11 13:55:29 at net.jpountz.lz4.LZ4JNI.LZ4_decompress_fast(Native 
Method)
Feb 11 13:55:29 at 
net.jpountz.lz4.LZ4JNIFastDecompressor.decompress(LZ4JNIFastDecompressor.java:70)
Feb 11 13:55:29 at 
org.apache.flink.runtime.io.compression.Lz4BlockDecompressor.decompress(Lz4BlockDecompressor.java:68)
Feb 11 13:55:29 at 
org.apache.flink.runtime.io.network.buffer.BufferDecompressor.decompress(BufferDecompressor.java:126)
Feb 11 13:55:29 at 
org.apache.flink.runtime.io.network.buffer.BufferDecompressor.decompressToIntermediateBuffer(BufferDecompressor.java:68)
Feb 11 13:55:29 at 
org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest.readLongs(BoundedBlockingSubpartitionWriteReadTest.java:206)
Feb 11 13:55:29 at 
org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest.access$000(BoundedBlockingSubpartitionWriteReadTest.java:55)
Feb 11 13:55:29 at 
org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest$LongReader.go(BoundedBlockingSubpartitionWriteReadTest.java:323)
Feb 11 13:55:29 at 
org.apache.flink.core.testutils.CheckedThread.run(CheckedThread.java:67)
{code}

The test was started at 13:32:18.152 and timed out at 13:55:39

> BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times 
> out
> 
>
> Key: FLINK-34424
> URL: https://issues.apache.org/jira/browse/FLINK-34424
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57446=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9151
> {code}
> Feb 11 13:55:29 "ForkJoinPool-50-worker-25" #414 daemon prio=5 os_prio=0 
> tid=0x7f19503af800 nid=0x284c in Object.wait() [0x7f191b6db000]
> Feb 11 13:55:29java.lang.Thread.State: WAITING (on object monitor)
> Feb 11 13:55:29   at java.lang.Object.wait(Native Method)
> Feb 11 13:55:29   at java.lang.Thread.join(Thread.java:1252)
> Feb 11 13:55:29   - locked <0xe2e019a8> (a 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest$LongReader)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.trySync(CheckedThread.java:104)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:92)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:81)
> Feb 11 13:55:29   at 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest.testRead10ConsumersConcurrent(BoundedBlockingSubpartitionWriteReadTest.java:177)
> Feb 11 13:55:29   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34333) Fix FLINK-34007 LeaderElector bug in 1.18

2024-02-13 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-34333:
--
Release Note: Fixes a bug where the leader election wasn't able to pick up 
leadership again after renewing the lease token caused a leadership loss. This 
required fabric8io:kubernetes-client to be upgraded from v6.6.2 to v6.9.0.

> Fix FLINK-34007 LeaderElector bug in 1.18
> -
>
> Key: FLINK-34333
> URL: https://issues.apache.org/jira/browse/FLINK-34333
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.18.1
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.18.2
>
>
> FLINK-34007 revealed a bug in the k8s client v6.6.2 which we're using since 
> Flink 1.18. This issue was fixed with FLINK-34007 for Flink 1.19 which 
> required an update of the k8s client to v6.9.0.
> This Jira issue is about finding a solution in Flink 1.18 for the very same 
> problem FLINK-34007 covered. It's a dedicated Jira issue because we want to 
> unblock the release of 1.19 by resolving FLINK-34007.
> Just to summarize why the upgrade to v6.9.0 is desired: There's a bug in 
> v6.6.2 which might prevent the leadership lost event being forwarded to the 
> client ([#5463|https://github.com/fabric8io/kubernetes-client/issues/5463]). 
> An initial proposal where the release call was handled in Flink's 
> {{KubernetesLeaderElector}} didn't work due to the leadership lost event 
> being triggered twice (see [FLINK-34007 PR 
> comment|https://github.com/apache/flink/pull/24132#discussion_r1467175902])



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-34333) Fix FLINK-34007 LeaderElector bug in 1.18

2024-02-13 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl resolved FLINK-34333.
---
Fix Version/s: 1.18.2
   Resolution: Fixed

* 1.18
** 
[35c560312efc91dafd1b4674ce1e10acc9320ab1|https://github.com/apache/flink/commit/35c560312efc91dafd1b4674ce1e10acc9320ab1]
** 
[87560b7cedd6c857612a24b83485f5000b9edbd6|https://github.com/apache/flink/commit/87560b7cedd6c857612a24b83485f5000b9edbd6]

> Fix FLINK-34007 LeaderElector bug in 1.18
> -
>
> Key: FLINK-34333
> URL: https://issues.apache.org/jira/browse/FLINK-34333
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.18.1
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.18.2
>
>
> FLINK-34007 revealed a bug in the k8s client v6.6.2 which we're using since 
> Flink 1.18. This issue was fixed with FLINK-34007 for Flink 1.19 which 
> required an update of the k8s client to v6.9.0.
> This Jira issue is about finding a solution in Flink 1.18 for the very same 
> problem FLINK-34007 covered. It's a dedicated Jira issue because we want to 
> unblock the release of 1.19 by resolving FLINK-34007.
> Just to summarize why the upgrade to v6.9.0 is desired: There's a bug in 
> v6.6.2 which might prevent the leadership lost event being forwarded to the 
> client ([#5463|https://github.com/fabric8io/kubernetes-client/issues/5463]). 
> An initial proposal where the release call was handled in Flink's 
> {{KubernetesLeaderElector}} didn't work due to the leadership lost event 
> being triggered twice (see [FLINK-34007 PR 
> comment|https://github.com/apache/flink/pull/24132#discussion_r1467175902])



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34425) TaskManagerRunnerITCase#testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure times out

2024-02-13 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl reassigned FLINK-34425:
-

Assignee: (was: Matthias Pohl)

> TaskManagerRunnerITCase#testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure
>  times out
> ---
>
> Key: FLINK-34425
> URL: https://issues.apache.org/jira/browse/FLINK-34425
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: test-stability
>
> https://github.com/apache/flink/actions/runs/7851900616/job/21429757962#step:10:8844
> {code}
> Feb 10 03:21:45 "main" #1 [498632] prio=5 os_prio=0 cpu=619.91ms 
> elapsed=1653.40s tid=0x7fbd29695000 nid=498632 waiting on condition  
> [0x7fbd2b9f3000]
> Feb 10 03:21:45java.lang.Thread.State: WAITING (parking)
> Feb 10 03:21:45   at 
> jdk.internal.misc.Unsafe.park(java.base@21.0.1/Native Method)
> Feb 10 03:21:45   - parking to wait for  <0xae6199f0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> Feb 10 03:21:45   at 
> java.util.concurrent.locks.LockSupport.park(java.base@21.0.1/LockSupport.java:371)
> Feb 10 03:21:45   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(java.base@21.0.1/AbstractQueuedSynchronizer.java:519)
> Feb 10 03:21:45   at 
> java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@21.0.1/ForkJoinPool.java:3780)
> Feb 10 03:21:45   at 
> java.util.concurrent.ForkJoinPool.managedBlock(java.base@21.0.1/ForkJoinPool.java:3725)
> Feb 10 03:21:45   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@21.0.1/AbstractQueuedSynchronizer.java:1707)
> Feb 10 03:21:45   at 
> java.lang.ProcessImpl.waitFor(java.base@21.0.1/ProcessImpl.java:425)
> Feb 10 03:21:45   at 
> org.apache.flink.test.recovery.TaskManagerRunnerITCase.testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure(TaskManagerRunnerITCase.java:126)
> Feb 10 03:21:45   at 
> java.lang.invoke.LambdaForm$DMH/0x7fbccb1b8000.invokeVirtual(java.base@21.0.1/LambdaForm$DMH)
> Feb 10 03:21:45   at 
> java.lang.invoke.LambdaForm$MH/0x7fbccb1b8800.invoke(java.base@21.0.1/LambdaForm$MH)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34427) FineGrainedSlotManagerTest fails fatally (exit code 239)

2024-02-13 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816973#comment-17816973
 ] 

Matthias Pohl commented on FLINK-34427:
---

Thanks for the clarification. This is an issue that also exists in 1.18. I 
won't increase the priority to blocker for 1.19 because of that. But we should 
fix this.

> FineGrainedSlotManagerTest fails fatally (exit code 239)
> 
>
> Key: FLINK-34427
> URL: https://issues.apache.org/jira/browse/FLINK-34427
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> https://github.com/apache/flink/actions/runs/7866453350/job/21460921911#step:10:8959
> {code}
> Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239
> Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests:
> Error: 02:28:53 02:28:53.220 [ERROR] 
> org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest
> Error: 02:28:53 02:28:53.220 [ERROR] 
> org.apache.maven.surefire.booter.SurefireBooterForkException: 
> ExecutionException The forked VM terminated without properly saying goodbye. 
> VM crash or System.exit called?
> Error: 02:28:53 02:28:53.220 [ERROR] Command was /bin/sh -c cd 
> '/root/flink/flink-runtime' && 
> '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' '-XX:+UseG1GC' '-Xms256m' 
> '-XX:+IgnoreUnrecognizedVMOptions' 
> '--add-opens=java.base/java.util=ALL-UNNAMED' 
> '--add-opens=java.base/java.lang=ALL-UNNAMED' 
> '--add-opens=java.base/java.net=ALL-UNNAMED' 
> '--add-opens=java.base/java.io=ALL-UNNAMED' 
> '--add-opens=java.base/java.util.concurrent=ALL-UNNAMED' '-Xmx768m' '-jar' 
> '/root/flink/flink-runtime/target/surefire/surefirebooter-20240212022332296_94.jar'
>  '/root/flink/flink-runtime/target/surefire' 
> '2024-02-12T02-21-39_495-jvmRun3' 'surefire-20240212022332296_88tmp' 
> 'surefire_26-20240212022332296_91tmp'
> Error: 02:28:53 02:28:53.220 [ERROR] Error occurred in starting fork, check 
> output in log
> Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239
> Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests:
> Error: 02:28:53 02:28:53.221 [ERROR] 
> org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest
> Error: 02:28:53 02:28:53.221 [ERROR]  at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456)
> [...]
> {code}
> The fatal error is triggered most likely within the 
> {{FineGrainedSlotManagerTest}}:
> {code}
> 02:26:39,362 [   pool-643-thread-1] ERROR 
> org.apache.flink.util.FatalExitExceptionHandler  [] - FATAL: 
> Thread 'pool-643-thread-1' produced an uncaught exception. Stopping the 
> process...
> java.util.concurrent.CompletionException: 
> java.util.concurrent.RejectedExecutionException: Task 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@4bbc0b10 
> rejected from 
> java.util.concurrent.ScheduledThreadPoolExecutor@7a45cd9a[Shutting down, pool 
> size = 1, active threads = 1, queued tasks = 1, completed tasks = 194]
> at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
>  ~[?:1.8.0_392]
> at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
>  ~[?:1.8.0_392]
> at 
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:838) 
> ~[?:1.8.0_392]
> at 
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
>  ~[?:1.8.0_392]
> at 
> java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:851)
>  ~[?:1.8.0_392]
> at 
> java.util.concurrent.CompletableFuture.handleAsync(CompletableFuture.java:2178)
>  ~[?:1.8.0_392]
> at 
> org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer.allocateSlot(DefaultSlotStatusSyncer.java:138)
>  ~[classes/:?]
> at 
> org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.allocateSlotsAccordingTo(FineGrainedSlotManager.java:722)
>  ~[classes/:?]
> at 
> org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.checkResourceRequirements(FineGrainedSlotManager.java:645)
>  ~[classes/:?]
> at 
> org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.lambda$null$12(FineGrainedSlotManager.java:603)
>  ~[classes/:?]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [?:1.8.0_392]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> [?:1.8.0_392]
> at 
> 

[jira] [Updated] (FLINK-34427) FineGrainedSlotManagerTest fails fatally (exit code 239)

2024-02-13 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-34427:
--
Affects Version/s: 1.18.1

> FineGrainedSlotManagerTest fails fatally (exit code 239)
> 
>
> Key: FLINK-34427
> URL: https://issues.apache.org/jira/browse/FLINK-34427
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> https://github.com/apache/flink/actions/runs/7866453350/job/21460921911#step:10:8959
> {code}
> Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239
> Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests:
> Error: 02:28:53 02:28:53.220 [ERROR] 
> org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest
> Error: 02:28:53 02:28:53.220 [ERROR] 
> org.apache.maven.surefire.booter.SurefireBooterForkException: 
> ExecutionException The forked VM terminated without properly saying goodbye. 
> VM crash or System.exit called?
> Error: 02:28:53 02:28:53.220 [ERROR] Command was /bin/sh -c cd 
> '/root/flink/flink-runtime' && 
> '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' '-XX:+UseG1GC' '-Xms256m' 
> '-XX:+IgnoreUnrecognizedVMOptions' 
> '--add-opens=java.base/java.util=ALL-UNNAMED' 
> '--add-opens=java.base/java.lang=ALL-UNNAMED' 
> '--add-opens=java.base/java.net=ALL-UNNAMED' 
> '--add-opens=java.base/java.io=ALL-UNNAMED' 
> '--add-opens=java.base/java.util.concurrent=ALL-UNNAMED' '-Xmx768m' '-jar' 
> '/root/flink/flink-runtime/target/surefire/surefirebooter-20240212022332296_94.jar'
>  '/root/flink/flink-runtime/target/surefire' 
> '2024-02-12T02-21-39_495-jvmRun3' 'surefire-20240212022332296_88tmp' 
> 'surefire_26-20240212022332296_91tmp'
> Error: 02:28:53 02:28:53.220 [ERROR] Error occurred in starting fork, check 
> output in log
> Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239
> Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests:
> Error: 02:28:53 02:28:53.221 [ERROR] 
> org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest
> Error: 02:28:53 02:28:53.221 [ERROR]  at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456)
> [...]
> {code}
> The fatal error is triggered most likely within the 
> {{FineGrainedSlotManagerTest}}:
> {code}
> 02:26:39,362 [   pool-643-thread-1] ERROR 
> org.apache.flink.util.FatalExitExceptionHandler  [] - FATAL: 
> Thread 'pool-643-thread-1' produced an uncaught exception. Stopping the 
> process...
> java.util.concurrent.CompletionException: 
> java.util.concurrent.RejectedExecutionException: Task 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@4bbc0b10 
> rejected from 
> java.util.concurrent.ScheduledThreadPoolExecutor@7a45cd9a[Shutting down, pool 
> size = 1, active threads = 1, queued tasks = 1, completed tasks = 194]
> at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
>  ~[?:1.8.0_392]
> at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
>  ~[?:1.8.0_392]
> at 
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:838) 
> ~[?:1.8.0_392]
> at 
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
>  ~[?:1.8.0_392]
> at 
> java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:851)
>  ~[?:1.8.0_392]
> at 
> java.util.concurrent.CompletableFuture.handleAsync(CompletableFuture.java:2178)
>  ~[?:1.8.0_392]
> at 
> org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer.allocateSlot(DefaultSlotStatusSyncer.java:138)
>  ~[classes/:?]
> at 
> org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.allocateSlotsAccordingTo(FineGrainedSlotManager.java:722)
>  ~[classes/:?]
> at 
> org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.checkResourceRequirements(FineGrainedSlotManager.java:645)
>  ~[classes/:?]
> at 
> org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.lambda$null$12(FineGrainedSlotManager.java:603)
>  ~[classes/:?]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [?:1.8.0_392]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> [?:1.8.0_392]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  [?:1.8.0_392]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  

[jira] [Updated] (FLINK-34425) TaskManagerRunnerITCase#testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure times out

2024-02-13 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-34425:
--
Priority: Major  (was: Critical)

> TaskManagerRunnerITCase#testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure
>  times out
> ---
>
> Key: FLINK-34425
> URL: https://issues.apache.org/jira/browse/FLINK-34425
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Major
>  Labels: test-stability
>
> https://github.com/apache/flink/actions/runs/7851900616/job/21429757962#step:10:8844
> {code}
> Feb 10 03:21:45 "main" #1 [498632] prio=5 os_prio=0 cpu=619.91ms 
> elapsed=1653.40s tid=0x7fbd29695000 nid=498632 waiting on condition  
> [0x7fbd2b9f3000]
> Feb 10 03:21:45java.lang.Thread.State: WAITING (parking)
> Feb 10 03:21:45   at 
> jdk.internal.misc.Unsafe.park(java.base@21.0.1/Native Method)
> Feb 10 03:21:45   - parking to wait for  <0xae6199f0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> Feb 10 03:21:45   at 
> java.util.concurrent.locks.LockSupport.park(java.base@21.0.1/LockSupport.java:371)
> Feb 10 03:21:45   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(java.base@21.0.1/AbstractQueuedSynchronizer.java:519)
> Feb 10 03:21:45   at 
> java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@21.0.1/ForkJoinPool.java:3780)
> Feb 10 03:21:45   at 
> java.util.concurrent.ForkJoinPool.managedBlock(java.base@21.0.1/ForkJoinPool.java:3725)
> Feb 10 03:21:45   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@21.0.1/AbstractQueuedSynchronizer.java:1707)
> Feb 10 03:21:45   at 
> java.lang.ProcessImpl.waitFor(java.base@21.0.1/ProcessImpl.java:425)
> Feb 10 03:21:45   at 
> org.apache.flink.test.recovery.TaskManagerRunnerITCase.testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure(TaskManagerRunnerITCase.java:126)
> Feb 10 03:21:45   at 
> java.lang.invoke.LambdaForm$DMH/0x7fbccb1b8000.invokeVirtual(java.base@21.0.1/LambdaForm$DMH)
> Feb 10 03:21:45   at 
> java.lang.invoke.LambdaForm$MH/0x7fbccb1b8800.invoke(java.base@21.0.1/LambdaForm$MH)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-34425) TaskManagerRunnerITCase#testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure times out

2024-02-13 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816971#comment-17816971
 ] 

Matthias Pohl edited comment on FLINK-34425 at 2/13/24 11:48 AM:
-

This looks like a test issue. The TaskManager process is destroyed in 
[TaskManagerRunnerITCase:124|https://github.com/apache/flink/blob/d6c7eee8243b4fe3e593698f250643534dc79cb5/flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerRunnerITCase.java#L124]
 but doesn't get back properly causing the timeout in {{#waitFor()}} in 
[TaskManagerRunnerITCase:126|https://github.com/apache/flink/blob/d6c7eee8243b4fe3e593698f250643534dc79cb5/flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerRunnerITCase.java#L126].

I'm gonna lower this issue's priority to {{Major}}. I don't consider it in any 
way problematic for the upcoming 1.19 release.


was (Author: mapohl):
This looks like a test issue. The TaskManager process is destroyed in 
[TaskManagerRunnerITCase:124|https://github.com/apache/flink/blob/d6c7eee8243b4fe3e593698f250643534dc79cb5/flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerRunnerITCase.java#L124]
 but doesn't get back properly causing the timeout in {{#waitFor()}} in 
[TaskManagerRunnerITCase:126|https://github.com/apache/flink/blob/d6c7eee8243b4fe3e593698f250643534dc79cb5/flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerRunnerITCase.java#L126].

> TaskManagerRunnerITCase#testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure
>  times out
> ---
>
> Key: FLINK-34425
> URL: https://issues.apache.org/jira/browse/FLINK-34425
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> https://github.com/apache/flink/actions/runs/7851900616/job/21429757962#step:10:8844
> {code}
> Feb 10 03:21:45 "main" #1 [498632] prio=5 os_prio=0 cpu=619.91ms 
> elapsed=1653.40s tid=0x7fbd29695000 nid=498632 waiting on condition  
> [0x7fbd2b9f3000]
> Feb 10 03:21:45java.lang.Thread.State: WAITING (parking)
> Feb 10 03:21:45   at 
> jdk.internal.misc.Unsafe.park(java.base@21.0.1/Native Method)
> Feb 10 03:21:45   - parking to wait for  <0xae6199f0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> Feb 10 03:21:45   at 
> java.util.concurrent.locks.LockSupport.park(java.base@21.0.1/LockSupport.java:371)
> Feb 10 03:21:45   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(java.base@21.0.1/AbstractQueuedSynchronizer.java:519)
> Feb 10 03:21:45   at 
> java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@21.0.1/ForkJoinPool.java:3780)
> Feb 10 03:21:45   at 
> java.util.concurrent.ForkJoinPool.managedBlock(java.base@21.0.1/ForkJoinPool.java:3725)
> Feb 10 03:21:45   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@21.0.1/AbstractQueuedSynchronizer.java:1707)
> Feb 10 03:21:45   at 
> java.lang.ProcessImpl.waitFor(java.base@21.0.1/ProcessImpl.java:425)
> Feb 10 03:21:45   at 
> org.apache.flink.test.recovery.TaskManagerRunnerITCase.testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure(TaskManagerRunnerITCase.java:126)
> Feb 10 03:21:45   at 
> java.lang.invoke.LambdaForm$DMH/0x7fbccb1b8000.invokeVirtual(java.base@21.0.1/LambdaForm$DMH)
> Feb 10 03:21:45   at 
> java.lang.invoke.LambdaForm$MH/0x7fbccb1b8800.invoke(java.base@21.0.1/LambdaForm$MH)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34425) TaskManagerRunnerITCase#testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure times out

2024-02-13 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816971#comment-17816971
 ] 

Matthias Pohl commented on FLINK-34425:
---

This looks like a test issue. The TaskManager process is destroyed in 
[TaskManagerRunnerITCase:124|https://github.com/apache/flink/blob/d6c7eee8243b4fe3e593698f250643534dc79cb5/flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerRunnerITCase.java#L124]
 but doesn't get back properly causing the timeout in {{#waitFor()}} in 
[TaskManagerRunnerITCase:126|https://github.com/apache/flink/blob/d6c7eee8243b4fe3e593698f250643534dc79cb5/flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerRunnerITCase.java#L126].

> TaskManagerRunnerITCase#testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure
>  times out
> ---
>
> Key: FLINK-34425
> URL: https://issues.apache.org/jira/browse/FLINK-34425
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> https://github.com/apache/flink/actions/runs/7851900616/job/21429757962#step:10:8844
> {code}
> Feb 10 03:21:45 "main" #1 [498632] prio=5 os_prio=0 cpu=619.91ms 
> elapsed=1653.40s tid=0x7fbd29695000 nid=498632 waiting on condition  
> [0x7fbd2b9f3000]
> Feb 10 03:21:45java.lang.Thread.State: WAITING (parking)
> Feb 10 03:21:45   at 
> jdk.internal.misc.Unsafe.park(java.base@21.0.1/Native Method)
> Feb 10 03:21:45   - parking to wait for  <0xae6199f0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> Feb 10 03:21:45   at 
> java.util.concurrent.locks.LockSupport.park(java.base@21.0.1/LockSupport.java:371)
> Feb 10 03:21:45   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(java.base@21.0.1/AbstractQueuedSynchronizer.java:519)
> Feb 10 03:21:45   at 
> java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@21.0.1/ForkJoinPool.java:3780)
> Feb 10 03:21:45   at 
> java.util.concurrent.ForkJoinPool.managedBlock(java.base@21.0.1/ForkJoinPool.java:3725)
> Feb 10 03:21:45   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@21.0.1/AbstractQueuedSynchronizer.java:1707)
> Feb 10 03:21:45   at 
> java.lang.ProcessImpl.waitFor(java.base@21.0.1/ProcessImpl.java:425)
> Feb 10 03:21:45   at 
> org.apache.flink.test.recovery.TaskManagerRunnerITCase.testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure(TaskManagerRunnerITCase.java:126)
> Feb 10 03:21:45   at 
> java.lang.invoke.LambdaForm$DMH/0x7fbccb1b8000.invokeVirtual(java.base@21.0.1/LambdaForm$DMH)
> Feb 10 03:21:45   at 
> java.lang.invoke.LambdaForm$MH/0x7fbccb1b8800.invoke(java.base@21.0.1/LambdaForm$MH)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34427) FineGrainedSlotManagerTest fails fatally (exit code 239)

2024-02-13 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816963#comment-17816963
 ] 

Matthias Pohl commented on FLINK-34427:
---

[~chesnay] the upstream future {{requestFuture}} is coming from the 
{{TaskManagerGateway#requestSlot}} RPC call. I would conclude that the 
RPCEndpoint (considering that the [handleAsync 
callback|https://github.com/apache/flink/blob/15fe1653acec45d7c7bac17071e9773a4aa690a4/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/DefaultSlotStatusSyncer.java#L138]
 should be chained up and run in wherever the RPC call is executed) is should 
down while there's still a scheduled task queued up causing the 
{{RejectedExecutionException}}. WDYT?

> FineGrainedSlotManagerTest fails fatally (exit code 239)
> 
>
> Key: FLINK-34427
> URL: https://issues.apache.org/jira/browse/FLINK-34427
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> https://github.com/apache/flink/actions/runs/7866453350/job/21460921911#step:10:8959
> {code}
> Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239
> Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests:
> Error: 02:28:53 02:28:53.220 [ERROR] 
> org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest
> Error: 02:28:53 02:28:53.220 [ERROR] 
> org.apache.maven.surefire.booter.SurefireBooterForkException: 
> ExecutionException The forked VM terminated without properly saying goodbye. 
> VM crash or System.exit called?
> Error: 02:28:53 02:28:53.220 [ERROR] Command was /bin/sh -c cd 
> '/root/flink/flink-runtime' && 
> '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' '-XX:+UseG1GC' '-Xms256m' 
> '-XX:+IgnoreUnrecognizedVMOptions' 
> '--add-opens=java.base/java.util=ALL-UNNAMED' 
> '--add-opens=java.base/java.lang=ALL-UNNAMED' 
> '--add-opens=java.base/java.net=ALL-UNNAMED' 
> '--add-opens=java.base/java.io=ALL-UNNAMED' 
> '--add-opens=java.base/java.util.concurrent=ALL-UNNAMED' '-Xmx768m' '-jar' 
> '/root/flink/flink-runtime/target/surefire/surefirebooter-20240212022332296_94.jar'
>  '/root/flink/flink-runtime/target/surefire' 
> '2024-02-12T02-21-39_495-jvmRun3' 'surefire-20240212022332296_88tmp' 
> 'surefire_26-20240212022332296_91tmp'
> Error: 02:28:53 02:28:53.220 [ERROR] Error occurred in starting fork, check 
> output in log
> Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239
> Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests:
> Error: 02:28:53 02:28:53.221 [ERROR] 
> org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest
> Error: 02:28:53 02:28:53.221 [ERROR]  at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456)
> [...]
> {code}
> The fatal error is triggered most likely within the 
> {{FineGrainedSlotManagerTest}}:
> {code}
> 02:26:39,362 [   pool-643-thread-1] ERROR 
> org.apache.flink.util.FatalExitExceptionHandler  [] - FATAL: 
> Thread 'pool-643-thread-1' produced an uncaught exception. Stopping the 
> process...
> java.util.concurrent.CompletionException: 
> java.util.concurrent.RejectedExecutionException: Task 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@4bbc0b10 
> rejected from 
> java.util.concurrent.ScheduledThreadPoolExecutor@7a45cd9a[Shutting down, pool 
> size = 1, active threads = 1, queued tasks = 1, completed tasks = 194]
> at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
>  ~[?:1.8.0_392]
> at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
>  ~[?:1.8.0_392]
> at 
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:838) 
> ~[?:1.8.0_392]
> at 
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
>  ~[?:1.8.0_392]
> at 
> java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:851)
>  ~[?:1.8.0_392]
> at 
> java.util.concurrent.CompletableFuture.handleAsync(CompletableFuture.java:2178)
>  ~[?:1.8.0_392]
> at 
> org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer.allocateSlot(DefaultSlotStatusSyncer.java:138)
>  ~[classes/:?]
> at 
> org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.allocateSlotsAccordingTo(FineGrainedSlotManager.java:722)
>  ~[classes/:?]
> at 
> org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.checkResourceRequirements(FineGrainedSlotManager.java:645)
>  ~[classes/:?]
> at 
> 

[jira] [Commented] (FLINK-34434) DefaultSlotStatusSyncer doesn't complete the returned future

2024-02-13 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816961#comment-17816961
 ] 

Matthias Pohl commented on FLINK-34434:
---

[~guoyangze] can you have a look at this?

> DefaultSlotStatusSyncer doesn't complete the returned future
> 
>
> Key: FLINK-34434
> URL: https://issues.apache.org/jira/browse/FLINK-34434
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.2, 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Major
>
> When looking into FLINK-34427 (unrelated), I noticed an odd line in 
> [DefaultSlotStatusSyncer:155|https://github.com/apache/flink/blob/15fe1653acec45d7c7bac17071e9773a4aa690a4/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/DefaultSlotStatusSyncer.java#L155]
>  where we complete a future that should be already completed (because the 
> callback is triggered after the {{requestFuture}} is already completed in 
> some way. Shouldn't we complete the {{returnedFuture}} instead?
> I'm keeping the priority at {{Major}} because it doesn't seem to have been an 
> issue in the past.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-34434) DefaultSlotStatusSyncer doesn't complete the returned future

2024-02-13 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816961#comment-17816961
 ] 

Matthias Pohl edited comment on FLINK-34434 at 2/13/24 11:11 AM:
-

[~guoyangze] can you have a look at this? Maybe, I'm missing something here.


was (Author: mapohl):
[~guoyangze] can you have a look at this?

> DefaultSlotStatusSyncer doesn't complete the returned future
> 
>
> Key: FLINK-34434
> URL: https://issues.apache.org/jira/browse/FLINK-34434
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.2, 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Major
>
> When looking into FLINK-34427 (unrelated), I noticed an odd line in 
> [DefaultSlotStatusSyncer:155|https://github.com/apache/flink/blob/15fe1653acec45d7c7bac17071e9773a4aa690a4/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/DefaultSlotStatusSyncer.java#L155]
>  where we complete a future that should be already completed (because the 
> callback is triggered after the {{requestFuture}} is already completed in 
> some way. Shouldn't we complete the {{returnedFuture}} instead?
> I'm keeping the priority at {{Major}} because it doesn't seem to have been an 
> issue in the past.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34434) DefaultSlotStatusSyncer doesn't complete the returned future

2024-02-13 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-34434:
-

 Summary: DefaultSlotStatusSyncer doesn't complete the returned 
future
 Key: FLINK-34434
 URL: https://issues.apache.org/jira/browse/FLINK-34434
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.18.1, 1.17.2, 1.19.0, 1.20.0
Reporter: Matthias Pohl


When looking into FLINK-34427 (unrelated), I noticed an odd line in 
[DefaultSlotStatusSyncer:155|https://github.com/apache/flink/blob/15fe1653acec45d7c7bac17071e9773a4aa690a4/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/DefaultSlotStatusSyncer.java#L155]
 where we complete a future that should be already completed (because the 
callback is triggered after the {{requestFuture}} is already completed in some 
way. Shouldn't we complete the {{returnedFuture}} instead?

I'm keeping the priority at {{Major}} because it doesn't seem to have been an 
issue in the past.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34427) ResourceManagerTaskExecutorTest fails fatally (exit code 239)

2024-02-13 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-34427:
--
Description: 
https://github.com/apache/flink/actions/runs/7866453350/job/21460921911#step:10:8959

{code}
Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239
Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests:
Error: 02:28:53 02:28:53.220 [ERROR] 
org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest
Error: 02:28:53 02:28:53.220 [ERROR] 
org.apache.maven.surefire.booter.SurefireBooterForkException: 
ExecutionException The forked VM terminated without properly saying goodbye. VM 
crash or System.exit called?
Error: 02:28:53 02:28:53.220 [ERROR] Command was /bin/sh -c cd 
'/root/flink/flink-runtime' && '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' 
'-XX:+UseG1GC' '-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' 
'--add-opens=java.base/java.util=ALL-UNNAMED' 
'--add-opens=java.base/java.lang=ALL-UNNAMED' 
'--add-opens=java.base/java.net=ALL-UNNAMED' 
'--add-opens=java.base/java.io=ALL-UNNAMED' 
'--add-opens=java.base/java.util.concurrent=ALL-UNNAMED' '-Xmx768m' '-jar' 
'/root/flink/flink-runtime/target/surefire/surefirebooter-20240212022332296_94.jar'
 '/root/flink/flink-runtime/target/surefire' '2024-02-12T02-21-39_495-jvmRun3' 
'surefire-20240212022332296_88tmp' 'surefire_26-20240212022332296_91tmp'
Error: 02:28:53 02:28:53.220 [ERROR] Error occurred in starting fork, check 
output in log
Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239
Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests:
Error: 02:28:53 02:28:53.221 [ERROR] 
org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest
Error: 02:28:53 02:28:53.221 [ERROR]at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456)
[...]
{code}

The fatal error is triggered most likely within the 
{{FineGrainedSlotManagerTest}}:
{code}
02:26:39,362 [   pool-643-thread-1] ERROR 
org.apache.flink.util.FatalExitExceptionHandler  [] - FATAL: Thread 
'pool-643-thread-1' produced an uncaught exception. Stopping the process...
java.util.concurrent.CompletionException: 
java.util.concurrent.RejectedExecutionException: Task 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@4bbc0b10 
rejected from 
java.util.concurrent.ScheduledThreadPoolExecutor@7a45cd9a[Shutting down, pool 
size = 1, active threads = 1, queued tasks = 1, completed tasks = 194]
at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
 ~[?:1.8.0_392]
at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
 ~[?:1.8.0_392]
at 
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:838) 
~[?:1.8.0_392]
at 
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
 ~[?:1.8.0_392]
at 
java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:851)
 ~[?:1.8.0_392]
at 
java.util.concurrent.CompletableFuture.handleAsync(CompletableFuture.java:2178) 
~[?:1.8.0_392]
at 
org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer.allocateSlot(DefaultSlotStatusSyncer.java:138)
 ~[classes/:?]
at 
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.allocateSlotsAccordingTo(FineGrainedSlotManager.java:722)
 ~[classes/:?]
at 
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.checkResourceRequirements(FineGrainedSlotManager.java:645)
 ~[classes/:?]
at 
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.lambda$null$12(FineGrainedSlotManager.java:603)
 ~[classes/:?]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_392]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_392]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 [?:1.8.0_392]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 [?:1.8.0_392]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_392]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_392]
at java.lang.Thread.run(Thread.java:750) [?:1.8.0_392]
Caused by: java.util.concurrent.RejectedExecutionException: Task 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@4bbc0b10 
rejected from 
java.util.concurrent.ScheduledThreadPoolExecutor@7a45cd9a[Shutting down, pool 
size = 1, active threads = 1, queued tasks = 1, completed tasks = 194]
at 

[jira] [Updated] (FLINK-34427) FineGrainedSlotManagerTest fails fatally (exit code 239)

2024-02-13 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-34427:
--
Summary: FineGrainedSlotManagerTest fails fatally (exit code 239)  (was: 
ResourceManagerTaskExecutorTest fails fatally (exit code 239))

> FineGrainedSlotManagerTest fails fatally (exit code 239)
> 
>
> Key: FLINK-34427
> URL: https://issues.apache.org/jira/browse/FLINK-34427
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> https://github.com/apache/flink/actions/runs/7866453350/job/21460921911#step:10:8959
> {code}
> Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239
> Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests:
> Error: 02:28:53 02:28:53.220 [ERROR] 
> org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest
> Error: 02:28:53 02:28:53.220 [ERROR] 
> org.apache.maven.surefire.booter.SurefireBooterForkException: 
> ExecutionException The forked VM terminated without properly saying goodbye. 
> VM crash or System.exit called?
> Error: 02:28:53 02:28:53.220 [ERROR] Command was /bin/sh -c cd 
> '/root/flink/flink-runtime' && 
> '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' '-XX:+UseG1GC' '-Xms256m' 
> '-XX:+IgnoreUnrecognizedVMOptions' 
> '--add-opens=java.base/java.util=ALL-UNNAMED' 
> '--add-opens=java.base/java.lang=ALL-UNNAMED' 
> '--add-opens=java.base/java.net=ALL-UNNAMED' 
> '--add-opens=java.base/java.io=ALL-UNNAMED' 
> '--add-opens=java.base/java.util.concurrent=ALL-UNNAMED' '-Xmx768m' '-jar' 
> '/root/flink/flink-runtime/target/surefire/surefirebooter-20240212022332296_94.jar'
>  '/root/flink/flink-runtime/target/surefire' 
> '2024-02-12T02-21-39_495-jvmRun3' 'surefire-20240212022332296_88tmp' 
> 'surefire_26-20240212022332296_91tmp'
> Error: 02:28:53 02:28:53.220 [ERROR] Error occurred in starting fork, check 
> output in log
> Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239
> Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests:
> Error: 02:28:53 02:28:53.221 [ERROR] 
> org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest
> Error: 02:28:53 02:28:53.221 [ERROR]  at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456)
> [...]
> {code}
> The fatal error is triggered most likely within the 
> {{FineGrainedSlotManagerTest}}:
> {code}
> 02:26:39,362 [   pool-643-thread-1] ERROR 
> org.apache.flink.util.FatalExitExceptionHandler  [] - FATAL: 
> Thread 'pool-643-thread-1' produced an uncaught exception. Stopping the 
> process...
> java.util.concurrent.CompletionException: 
> java.util.concurrent.RejectedExecutionException: Task 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@4bbc0b10 
> rejected from 
> java.util.concurrent.ScheduledThreadPoolExecutor@7a45cd9a[Shutting down, pool 
> size = 1, active threads = 1, queued tasks = 1, completed tasks = 194]
> at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
>  ~[?:1.8.0_392]
> at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
>  ~[?:1.8.0_392]
> at 
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:838) 
> ~[?:1.8.0_392]
> at 
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
>  ~[?:1.8.0_392]
> at 
> java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:851)
>  ~[?:1.8.0_392]
> at 
> java.util.concurrent.CompletableFuture.handleAsync(CompletableFuture.java:2178)
>  ~[?:1.8.0_392]
> at 
> org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer.allocateSlot(DefaultSlotStatusSyncer.java:138)
>  ~[classes/:?]
> at 
> org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.allocateSlotsAccordingTo(FineGrainedSlotManager.java:722)
>  ~[classes/:?]
> at 
> org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.checkResourceRequirements(FineGrainedSlotManager.java:645)
>  ~[classes/:?]
> at 
> org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.lambda$null$12(FineGrainedSlotManager.java:603)
>  ~[classes/:?]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [?:1.8.0_392]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> [?:1.8.0_392]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  [?:1.8.0_392]
> at 
> 

[jira] [Resolved] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM

2024-02-13 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl resolved FLINK-34403.
---
Fix Version/s: 1.20.0
   Resolution: Fixed

master: 
[9a316a5bcc47da7f69e76e0c25ed257adc4298ce|https://github.com/apache/flink/commit/9a316a5bcc47da7f69e76e0c25ed257adc4298ce]

> VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
> -
>
> Key: FLINK-34403
> URL: https://issues.apache.org/jira/browse/FLINK-34403
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.20.0
>Reporter: Benchao Li
>Assignee: Matthias Pohl
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.20.0
>
>
> After FLINK-33611 merged, the misc test on GHA cannot pass due to out of 
> memory error, throwing following exceptions:
> {code:java}
> Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest
> Error: 05:43:21 05:43:21.773 [ERROR] 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time 
> elapsed: 40.97 s <<< ERROR!
> Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in 
> serialization.
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007)
> Feb 07 05:43:21   at 
> org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56)
> Feb 07 05:43:21   at 
> org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:89)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:76)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:71)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple(VeryBigPbRowToProtoTest.java:37)
> Feb 07 05:43:21   at java.lang.reflect.Method.invoke(Method.java:498)
> Feb 07 05:43:21 Caused by: java.util.concurrent.ExecutionException: 
> java.lang.IllegalArgumentException: Self-suppression not permitted
> Feb 07 05:43:21   at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> Feb 07 05:43:21   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:323)
> Feb 07 05:43:21   ... 18 more
> Feb 07 05:43:21 Caused by: java.lang.IllegalArgumentException: 
> Self-suppression not permitted
> Feb 07 05:43:21   at 
> java.lang.Throwable.addSuppressed(Throwable.java:1072)
> Feb 07 05:43:21   at 
> org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:556)
> Feb 07 05:43:21   at 
> org.apache.flink.util.InstantiationUtil.writeObjectToConfig(InstantiationUtil.java:486)
> Feb 07 05:43:21   at 
> 

[jira] [Commented] (FLINK-34418) Disk space issues for Docker-ized GitHub Action jobs

2024-02-13 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816937#comment-17816937
 ] 

Matthias Pohl commented on FLINK-34418:
---

https://github.com/apache/flink/actions/runs/7880739758

> Disk space issues for Docker-ized GitHub Action jobs
> 
>
> Key: FLINK-34418
> URL: https://issues.apache.org/jira/browse/FLINK-34418
> Project: Flink
>  Issue Type: Bug
>  Components: Test Infrastructure
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: github-actions, pull-request-available, test-stability
>
> [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746]
> {code:java}
> [...]
> Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device
> 27608Feb 09 03:00:13  at java.io.FileOutputStream.writeBytes(Native Method)
> 27609Feb 09 03:00:13  at 
> java.io.FileOutputStream.write(FileOutputStream.java:326)
> 27610Feb 09 03:00:13  at 
> org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250)
> 27611Feb 09 03:00:13  ... 39 more
> [...] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32006) AsyncWaitOperatorTest.testProcessingTimeWithTimeoutFunctionOrderedWithRetry times out on Azure

2024-02-13 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816935#comment-17816935
 ] 

Matthias Pohl commented on FLINK-32006:
---

1.18: 
https://github.com/apache/flink/actions/runs/7880739758/job/21503455883#step:10:9621

> AsyncWaitOperatorTest.testProcessingTimeWithTimeoutFunctionOrderedWithRetry 
> times out on Azure
> --
>
> Key: FLINK-32006
> URL: https://issues.apache.org/jira/browse/FLINK-32006
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.18.0, 1.17.2, 1.19.0
>Reporter: David Morávek
>Assignee: David Morávek
>Priority: Critical
>  Labels: pull-request-available, stale-assigned, test-stability
>
> {code:java}
> May 04 13:52:18 [ERROR] 
> org.apache.flink.streaming.api.operators.async.AsyncWaitOperatorTest.testProcessingTimeWithTimeoutFunctionOrderedWithRetry
>   Time elapsed: 100.009 s  <<< ERROR!
> May 04 13:52:18 org.junit.runners.model.TestTimedOutException: test timed out 
> after 100 seconds
> May 04 13:52:18   at java.lang.Thread.sleep(Native Method)
> May 04 13:52:18   at 
> org.apache.flink.streaming.api.operators.async.AsyncWaitOperatorTest.testProcessingTimeAlwaysTimeoutFunctionWithRetry(AsyncWaitOperatorTest.java:1313)
> May 04 13:52:18   at 
> org.apache.flink.streaming.api.operators.async.AsyncWaitOperatorTest.testProcessingTimeWithTimeoutFunctionOrderedWithRetry(AsyncWaitOperatorTest.java:1277)
> May 04 13:52:18   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> May 04 13:52:18   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> May 04 13:52:18   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> May 04 13:52:18   at java.lang.reflect.Method.invoke(Method.java:498)
> May 04 13:52:18   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> May 04 13:52:18   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> May 04 13:52:18   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> May 04 13:52:18   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> May 04 13:52:18   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> May 04 13:52:18   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
> May 04 13:52:18   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
> May 04 13:52:18   at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> May 04 13:52:18   at java.lang.Thread.run(Thread.java:748)
>  {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=48671=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9288



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34433) CollectionFunctionsITCase.test failed due to job restart

2024-02-13 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-34433:
-

 Summary: CollectionFunctionsITCase.test failed due to job restart
 Key: FLINK-34433
 URL: https://issues.apache.org/jira/browse/FLINK-34433
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.19.0, 1.20.0
Reporter: Matthias Pohl


https://github.com/apache/flink/actions/runs/7880739697/job/21503460772#step:10:11312

{code}
Error: 02:33:24 02:33:24.955 [ERROR] Tests run: 439, Failures: 0, Errors: 1, 
Skipped: 0, Time elapsed: 56.57 s <<< FAILURE! -- in 
org.apache.flink.table.planner.functions.CollectionFunctionsITCase
Error: 02:33:24 02:33:24.956 [ERROR] 
org.apache.flink.table.planner.functions.CollectionFunctionsITCase.test(TestCase)[81]
 -- Time elapsed: 1.141 s <<< ERROR!
Feb 13 02:33:24 java.lang.RuntimeException: Job restarted
Feb 13 02:33:24 at 
org.apache.flink.streaming.api.operators.collect.UncheckpointedCollectResultBuffer.sinkRestarted(UncheckpointedCollectResultBuffer.java:42)
Feb 13 02:33:24 at 
org.apache.flink.streaming.api.operators.collect.AbstractCollectResultBuffer.dealWithResponse(AbstractCollectResultBuffer.java:87)
Feb 13 02:33:24 at 
org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.next(CollectResultFetcher.java:124)
Feb 13 02:33:24 at 
org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:126)
Feb 13 02:33:24 at 
org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:100)
Feb 13 02:33:24 at 
org.apache.flink.table.planner.connectors.CollectDynamicSink$CloseableRowIteratorWrapper.hasNext(CollectDynamicSink.java:247)
Feb 13 02:33:24 at 
org.assertj.core.internal.Iterators.assertHasNext(Iterators.java:49)
Feb 13 02:33:24 at 
org.assertj.core.api.AbstractIteratorAssert.hasNext(AbstractIteratorAssert.java:60)
Feb 13 02:33:24 at 
org.apache.flink.table.planner.functions.BuiltInFunctionTestBase$ResultTestItem.test(BuiltInFunctionTestBase.java:383)
Feb 13 02:33:24 at 
org.apache.flink.table.planner.functions.BuiltInFunctionTestBase$TestSetSpec.lambda$getTestCase$4(BuiltInFunctionTestBase.java:341)
Feb 13 02:33:24 at 
org.apache.flink.table.planner.functions.BuiltInFunctionTestBase$TestCase.execute(BuiltInFunctionTestBase.java:119)
Feb 13 02:33:24 at 
org.apache.flink.table.planner.functions.BuiltInFunctionTestBase.test(BuiltInFunctionTestBase.java:99)
Feb 13 02:33:24 at java.lang.reflect.Method.invoke(Method.java:498)
Feb 13 02:33:24 at 
java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
Feb 13 02:33:24 at 
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
Feb 13 02:33:24 at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
Feb 13 02:33:24 at 
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
Feb 13 02:33:24 at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM

2024-02-13 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl reassigned FLINK-34403:
-

Assignee: Matthias Pohl

> VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
> -
>
> Key: FLINK-34403
> URL: https://issues.apache.org/jira/browse/FLINK-34403
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.20.0
>Reporter: Benchao Li
>Assignee: Matthias Pohl
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> After FLINK-33611 merged, the misc test on GHA cannot pass due to out of 
> memory error, throwing following exceptions:
> {code:java}
> Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest
> Error: 05:43:21 05:43:21.773 [ERROR] 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time 
> elapsed: 40.97 s <<< ERROR!
> Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in 
> serialization.
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007)
> Feb 07 05:43:21   at 
> org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56)
> Feb 07 05:43:21   at 
> org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:89)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:76)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:71)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple(VeryBigPbRowToProtoTest.java:37)
> Feb 07 05:43:21   at java.lang.reflect.Method.invoke(Method.java:498)
> Feb 07 05:43:21 Caused by: java.util.concurrent.ExecutionException: 
> java.lang.IllegalArgumentException: Self-suppression not permitted
> Feb 07 05:43:21   at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> Feb 07 05:43:21   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:323)
> Feb 07 05:43:21   ... 18 more
> Feb 07 05:43:21 Caused by: java.lang.IllegalArgumentException: 
> Self-suppression not permitted
> Feb 07 05:43:21   at 
> java.lang.Throwable.addSuppressed(Throwable.java:1072)
> Feb 07 05:43:21   at 
> org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:556)
> Feb 07 05:43:21   at 
> org.apache.flink.util.InstantiationUtil.writeObjectToConfig(InstantiationUtil.java:486)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamConfig.lambda$triggerSerializationAndReturnFuture$0(StreamConfig.java:182)
> Feb 07 05:43:21   at 
> java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670)

[jira] [Commented] (FLINK-28440) EventTimeWindowCheckpointingITCase failed with restore

2024-02-13 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816928#comment-17816928
 ] 

Matthias Pohl commented on FLINK-28440:
---

https://github.com/apache/flink/actions/runs/7880739609/job/21503465125#step:10:7557

> EventTimeWindowCheckpointingITCase failed with restore
> --
>
> Key: FLINK-28440
> URL: https://issues.apache.org/jira/browse/FLINK-28440
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / State Backends
>Affects Versions: 1.16.0, 1.17.0, 1.18.0, 1.19.0
>Reporter: Huang Xingbo
>Assignee: Yanfei Lei
>Priority: Critical
>  Labels: auto-deprioritized-critical, pull-request-available, 
> stale-assigned, test-stability
> Fix For: 1.19.0
>
> Attachments: image-2023-02-01-00-51-54-506.png, 
> image-2023-02-01-01-10-01-521.png, image-2023-02-01-01-19-12-182.png, 
> image-2023-02-01-16-47-23-756.png, image-2023-02-01-16-57-43-889.png, 
> image-2023-02-02-10-52-56-599.png, image-2023-02-03-10-09-07-586.png, 
> image-2023-02-03-12-03-16-155.png, image-2023-02-03-12-03-56-614.png
>
>
> {code:java}
> Caused by: java.lang.Exception: Exception while creating 
> StreamOperatorStateContext.
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:256)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268)
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:722)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:698)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:665)
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:935)
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:904)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:728)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_0a448493b4782967b150582570326227_(2/4) from 
> any of the 1 provided restore options.
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:353)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:165)
>   ... 11 more
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
> /tmp/junit1835099326935900400/junit1113650082510421526/52ee65b7-033f-4429-8ddd-adbe85e27ced
>  (No such file or directory)
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.advance(StateChangelogHandleStreamHandleReader.java:87)
>   at 
> org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.hasNext(StateChangelogHandleStreamHandleReader.java:69)
>   at 
> org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.readBackendHandle(ChangelogBackendRestoreOperation.java:96)
>   at 
> org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.restore(ChangelogBackendRestoreOperation.java:75)
>   at 
> org.apache.flink.state.changelog.ChangelogStateBackend.restore(ChangelogStateBackend.java:92)
>   at 
> org.apache.flink.state.changelog.AbstractChangelogStateBackend.createKeyedStateBackend(AbstractChangelogStateBackend.java:136)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:336)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
>   ... 13 more
> Caused by: java.io.FileNotFoundException: 
> 

[jira] [Updated] (FLINK-34427) ResourceManagerTaskExecutorTest fails fatally (exit code 239)

2024-02-12 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-34427:
--
Component/s: Runtime / Coordination

> ResourceManagerTaskExecutorTest fails fatally (exit code 239)
> -
>
> Key: FLINK-34427
> URL: https://issues.apache.org/jira/browse/FLINK-34427
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> https://github.com/apache/flink/actions/runs/7866453350/job/21460921911#step:10:8959
> {code}
> Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239
> Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests:
> Error: 02:28:53 02:28:53.220 [ERROR] 
> org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest
> Error: 02:28:53 02:28:53.220 [ERROR] 
> org.apache.maven.surefire.booter.SurefireBooterForkException: 
> ExecutionException The forked VM terminated without properly saying goodbye. 
> VM crash or System.exit called?
> Error: 02:28:53 02:28:53.220 [ERROR] Command was /bin/sh -c cd 
> '/root/flink/flink-runtime' && 
> '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' '-XX:+UseG1GC' '-Xms256m' 
> '-XX:+IgnoreUnrecognizedVMOptions' 
> '--add-opens=java.base/java.util=ALL-UNNAMED' 
> '--add-opens=java.base/java.lang=ALL-UNNAMED' 
> '--add-opens=java.base/java.net=ALL-UNNAMED' 
> '--add-opens=java.base/java.io=ALL-UNNAMED' 
> '--add-opens=java.base/java.util.concurrent=ALL-UNNAMED' '-Xmx768m' '-jar' 
> '/root/flink/flink-runtime/target/surefire/surefirebooter-20240212022332296_94.jar'
>  '/root/flink/flink-runtime/target/surefire' 
> '2024-02-12T02-21-39_495-jvmRun3' 'surefire-20240212022332296_88tmp' 
> 'surefire_26-20240212022332296_91tmp'
> Error: 02:28:53 02:28:53.220 [ERROR] Error occurred in starting fork, check 
> output in log
> Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239
> Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests:
> Error: 02:28:53 02:28:53.221 [ERROR] 
> org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest
> Error: 02:28:53 02:28:53.221 [ERROR]  at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM

2024-02-12 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816876#comment-17816876
 ] 

Matthias Pohl commented on FLINK-34403:
---

* 
https://dev.azure.com/apache-flink/web/build.aspx?pcguid=2d3c0ac8-fecf-45be-8407-6d87302181a9=vstfs%3a%2f%2f%2fBuild%2fBuild%2f57469_data=ew0KICAic291cmNlIjogIlNsYWNrUGlwZWxpbmVzQXBwIiwNCiAgInNvdXJjZV9ldmVudF9uYW1lIjogImJ1aWxkLmNvbXBsZXRlIg0KfQ%3d%3d
* 
https://dev.azure.com/apache-flink/web/build.aspx?pcguid=2d3c0ac8-fecf-45be-8407-6d87302181a9=vstfs%3a%2f%2f%2fBuild%2fBuild%2f57489_data=ew0KICAic291cmNlIjogIlNsYWNrUGlwZWxpbmVzQXBwIiwNCiAgInNvdXJjZV9ldmVudF9uYW1lIjogImJ1aWxkLmNvbXBsZXRlIg0KfQ%3d%3d
* 
https://dev.azure.com/apache-flink/web/build.aspx?pcguid=2d3c0ac8-fecf-45be-8407-6d87302181a9=vstfs%3a%2f%2f%2fBuild%2fBuild%2f57491_data=ew0KICAic291cmNlIjogIlNsYWNrUGlwZWxpbmVzQXBwIiwNCiAgInNvdXJjZV9ldmVudF9uYW1lIjogImJ1aWxkLmNvbXBsZXRlIg0KfQ%3d%3d

> VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
> -
>
> Key: FLINK-34403
> URL: https://issues.apache.org/jira/browse/FLINK-34403
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.20.0
>Reporter: Benchao Li
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> After FLINK-33611 merged, the misc test on GHA cannot pass due to out of 
> memory error, throwing following exceptions:
> {code:java}
> Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest
> Error: 05:43:21 05:43:21.773 [ERROR] 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time 
> elapsed: 40.97 s <<< ERROR!
> Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in 
> serialization.
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007)
> Feb 07 05:43:21   at 
> org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56)
> Feb 07 05:43:21   at 
> org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:89)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:76)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:71)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple(VeryBigPbRowToProtoTest.java:37)
> Feb 07 05:43:21   at java.lang.reflect.Method.invoke(Method.java:498)
> Feb 07 05:43:21 Caused by: java.util.concurrent.ExecutionException: 
> java.lang.IllegalArgumentException: Self-suppression not permitted
> Feb 07 05:43:21   at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> Feb 07 05:43:21   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> Feb 07 05:43:21   at 
> 

[jira] [Commented] (FLINK-34273) git fetch fails

2024-02-12 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816875#comment-17816875
 ] 

Matthias Pohl commented on FLINK-34273:
---

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57492=logs=245e1f2e-ba5b-5570-d689-25ae21e5302f=a47dd1b5-aa0a-596a-799b-05a053059d14

> git fetch fails
> ---
>
> Key: FLINK-34273
> URL: https://issues.apache.org/jira/browse/FLINK-34273
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI, Test Infrastructure
>Affects Versions: 1.19.0, 1.18.1
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> We've seen multiple {{git fetch}} failures. I assume this to be an 
> infrastructure issue. This Jira issue is for documentation purposes.
> {code:java}
> error: RPC failed; curl 18 transfer closed with outstanding read data 
> remaining
> error: 5211 bytes of body are still expected
> fetch-pack: unexpected disconnect while reading sideband packet
> fatal: early EOF
> fatal: fetch-pack: invalid index-pack output {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57080=logs=0e7be18f-84f2-53f0-a32d-4a5e4a174679=5d6dc3d3-393d-5111-3a40-c6a5a36202e6=667



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34418) Disk space issues for Docker-ized GitHub Action jobs

2024-02-12 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816602#comment-17816602
 ] 

Matthias Pohl commented on FLINK-34418:
---

https://github.com/apache/flink/actions/runs/7869012663/job/21467582110

> Disk space issues for Docker-ized GitHub Action jobs
> 
>
> Key: FLINK-34418
> URL: https://issues.apache.org/jira/browse/FLINK-34418
> Project: Flink
>  Issue Type: Bug
>  Components: Test Infrastructure
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: github-actions, pull-request-available, test-stability
>
> [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746]
> {code:java}
> [...]
> Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device
> 27608Feb 09 03:00:13  at java.io.FileOutputStream.writeBytes(Native Method)
> 27609Feb 09 03:00:13  at 
> java.io.FileOutputStream.write(FileOutputStream.java:326)
> 27610Feb 09 03:00:13  at 
> org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250)
> 27611Feb 09 03:00:13  ... 39 more
> [...] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34418) Disk space issues for Docker-ized GitHub Action jobs

2024-02-12 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816603#comment-17816603
 ] 

Matthias Pohl commented on FLINK-34418:
---

https://github.com/apache/flink/actions/runs/7870763675

> Disk space issues for Docker-ized GitHub Action jobs
> 
>
> Key: FLINK-34418
> URL: https://issues.apache.org/jira/browse/FLINK-34418
> Project: Flink
>  Issue Type: Bug
>  Components: Test Infrastructure
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: github-actions, pull-request-available, test-stability
>
> [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746]
> {code:java}
> [...]
> Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device
> 27608Feb 09 03:00:13  at java.io.FileOutputStream.writeBytes(Native Method)
> 27609Feb 09 03:00:13  at 
> java.io.FileOutputStream.write(FileOutputStream.java:326)
> 27610Feb 09 03:00:13  at 
> org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250)
> 27611Feb 09 03:00:13  ... 39 more
> [...] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM

2024-02-12 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816591#comment-17816591
 ] 

Matthias Pohl commented on FLINK-34403:
---

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57464=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23485

> VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
> -
>
> Key: FLINK-34403
> URL: https://issues.apache.org/jira/browse/FLINK-34403
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.20.0
>Reporter: Benchao Li
>Priority: Critical
>  Labels: test-stability
>
> After FLINK-33611 merged, the misc test on GHA cannot pass due to out of 
> memory error, throwing following exceptions:
> {code:java}
> Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest
> Error: 05:43:21 05:43:21.773 [ERROR] 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time 
> elapsed: 40.97 s <<< ERROR!
> Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in 
> serialization.
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007)
> Feb 07 05:43:21   at 
> org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56)
> Feb 07 05:43:21   at 
> org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:89)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:76)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:71)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple(VeryBigPbRowToProtoTest.java:37)
> Feb 07 05:43:21   at java.lang.reflect.Method.invoke(Method.java:498)
> Feb 07 05:43:21 Caused by: java.util.concurrent.ExecutionException: 
> java.lang.IllegalArgumentException: Self-suppression not permitted
> Feb 07 05:43:21   at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> Feb 07 05:43:21   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:323)
> Feb 07 05:43:21   ... 18 more
> Feb 07 05:43:21 Caused by: java.lang.IllegalArgumentException: 
> Self-suppression not permitted
> Feb 07 05:43:21   at 
> java.lang.Throwable.addSuppressed(Throwable.java:1072)
> Feb 07 05:43:21   at 
> org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:556)
> Feb 07 05:43:21   at 
> org.apache.flink.util.InstantiationUtil.writeObjectToConfig(InstantiationUtil.java:486)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamConfig.lambda$triggerSerializationAndReturnFuture$0(StreamConfig.java:182)

[jira] [Resolved] (FLINK-34411) "Wordcount on Docker test (custom fs plugin)" timed out with some strange issue while setting the test up

2024-02-12 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl resolved FLINK-34411.
---
  Assignee: Matthias Pohl
Resolution: Fixed

I rebased {{dev-1.19}} to {{dev-master}} and provided a fix for the snapshot CI 
in {{master}}

apache/flink-docker@master: 2c169b6a83bf83bbe997ed35aaf548de10050b58

> "Wordcount on Docker test (custom fs plugin)" timed out with some strange 
> issue while setting the test up
> -
>
> Key: FLINK-34411
> URL: https://issues.apache.org/jira/browse/FLINK-34411
> Project: Flink
>  Issue Type: Bug
>  Components: Test Infrastructure
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57380=logs=bea52777-eaf8-5663-8482-18fbc3630e81=43ba8ce7-ebbf-57cd-9163-444305d74117=5802
> {code}
> Feb 07 15:22:39 
> ==
> Feb 07 15:22:39 Running 'Wordcount on Docker test (custom fs plugin)'
> Feb 07 15:22:39 
> ==
> Feb 07 15:22:39 TEST_DATA_DIR: 
> /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-39516987853
> Feb 07 15:22:40 Flink dist directory: 
> /home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT
> Feb 07 15:22:40 Flink dist directory: 
> /home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT
> Feb 07 15:22:41 Docker version 24.0.7, build afdd53b
> Feb 07 15:22:44 docker-compose version 1.29.2, build 5becea4c
> Feb 07 15:22:44 Starting fileserver for Flink distribution
> Feb 07 15:22:44 ~/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin 
> ~/work/1/s
> Feb 07 15:23:07 ~/work/1/s
> Feb 07 15:23:07 
> ~/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-39516987853
>  ~/work/1/s
> Feb 07 15:23:07 Preparing Dockeriles
> Feb 07 15:23:07 Executing command: git clone 
> https://github.com/apache/flink-docker.git --branch dev-1.19 --single-branch
> Cloning into 'flink-docker'...
> /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/common_docker.sh: 
> line 65: ./add-custom.sh: No such file or directory
> Feb 07 15:23:07 Building images
> ERROR: unable to prepare context: path "dev/test_docker_embedded_job-ubuntu" 
> not found
> Feb 07 15:23:09 ~/work/1/s
> Feb 07 15:23:09 Command: build_image test_docker_embedded_job failed. 
> Retrying...
> Feb 07 15:23:14 Starting fileserver for Flink distribution
> Feb 07 15:23:14 ~/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin 
> ~/work/1/s
> Feb 07 15:23:36 ~/work/1/s
> Feb 07 15:23:36 
> ~/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-39516987853
>  ~/work/1/s
> Feb 07 15:23:36 Preparing Dockeriles
> Feb 07 15:23:36 Executing command: git clone 
> https://github.com/apache/flink-docker.git --branch dev-1.19 --single-branch
> fatal: destination path 'flink-docker' already exists and is not an empty 
> directory.
> Feb 07 15:23:36 Retry 1/5 exited 128, retrying in 1 seconds...
> Traceback (most recent call last):
>   File 
> "/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/python3_fileserver.py",
>  line 26, in 
> httpd = socketserver.TCPServer(("", ), handler)
>   File "/usr/lib/python3.8/socketserver.py", line 452, in __init__
> self.server_bind()
>   File "/usr/lib/python3.8/socketserver.py", line 466, in server_bind
> self.socket.bind(self.server_address)
> OSError: [Errno 98] Address already in use
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34411) "Wordcount on Docker test (custom fs plugin)" timed out with some strange issue while setting the test up

2024-02-12 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816538#comment-17816538
 ] 

Matthias Pohl commented on FLINK-34411:
---

* https://github.com/apache/flink/actions/runs/7838691836/job/21390782645

> "Wordcount on Docker test (custom fs plugin)" timed out with some strange 
> issue while setting the test up
> -
>
> Key: FLINK-34411
> URL: https://issues.apache.org/jira/browse/FLINK-34411
> Project: Flink
>  Issue Type: Bug
>  Components: Test Infrastructure
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57380=logs=bea52777-eaf8-5663-8482-18fbc3630e81=43ba8ce7-ebbf-57cd-9163-444305d74117=5802
> {code}
> Feb 07 15:22:39 
> ==
> Feb 07 15:22:39 Running 'Wordcount on Docker test (custom fs plugin)'
> Feb 07 15:22:39 
> ==
> Feb 07 15:22:39 TEST_DATA_DIR: 
> /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-39516987853
> Feb 07 15:22:40 Flink dist directory: 
> /home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT
> Feb 07 15:22:40 Flink dist directory: 
> /home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT
> Feb 07 15:22:41 Docker version 24.0.7, build afdd53b
> Feb 07 15:22:44 docker-compose version 1.29.2, build 5becea4c
> Feb 07 15:22:44 Starting fileserver for Flink distribution
> Feb 07 15:22:44 ~/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin 
> ~/work/1/s
> Feb 07 15:23:07 ~/work/1/s
> Feb 07 15:23:07 
> ~/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-39516987853
>  ~/work/1/s
> Feb 07 15:23:07 Preparing Dockeriles
> Feb 07 15:23:07 Executing command: git clone 
> https://github.com/apache/flink-docker.git --branch dev-1.19 --single-branch
> Cloning into 'flink-docker'...
> /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/common_docker.sh: 
> line 65: ./add-custom.sh: No such file or directory
> Feb 07 15:23:07 Building images
> ERROR: unable to prepare context: path "dev/test_docker_embedded_job-ubuntu" 
> not found
> Feb 07 15:23:09 ~/work/1/s
> Feb 07 15:23:09 Command: build_image test_docker_embedded_job failed. 
> Retrying...
> Feb 07 15:23:14 Starting fileserver for Flink distribution
> Feb 07 15:23:14 ~/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin 
> ~/work/1/s
> Feb 07 15:23:36 ~/work/1/s
> Feb 07 15:23:36 
> ~/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-39516987853
>  ~/work/1/s
> Feb 07 15:23:36 Preparing Dockeriles
> Feb 07 15:23:36 Executing command: git clone 
> https://github.com/apache/flink-docker.git --branch dev-1.19 --single-branch
> fatal: destination path 'flink-docker' already exists and is not an empty 
> directory.
> Feb 07 15:23:36 Retry 1/5 exited 128, retrying in 1 seconds...
> Traceback (most recent call last):
>   File 
> "/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/python3_fileserver.py",
>  line 26, in 
> httpd = socketserver.TCPServer(("", ), handler)
>   File "/usr/lib/python3.8/socketserver.py", line 452, in __init__
> self.server_bind()
>   File "/usr/lib/python3.8/socketserver.py", line 466, in server_bind
> self.socket.bind(self.server_address)
> OSError: [Errno 98] Address already in use
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34428) WindowAggregateITCase#testEventTimeHopWindow_GroupingSets times out

2024-02-12 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-34428:
-

 Summary: WindowAggregateITCase#testEventTimeHopWindow_GroupingSets 
times out
 Key: FLINK-34428
 URL: https://issues.apache.org/jira/browse/FLINK-34428
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Affects Versions: 1.18.1
Reporter: Matthias Pohl


https://github.com/apache/flink/actions/runs/7866453368/job/21460921339#step:10:15127

{code}
"main" #1 prio=5 os_prio=0 tid=0x7f1770cb7000 nid=0x4ad4d waiting on 
condition [0x7f17711f6000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xab48e3a0> (a 
java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
at 
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
at 
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2131)
at 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2099)
at 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2077)
at 
org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:876)
at 
org.apache.flink.table.planner.runtime.stream.sql.WindowAggregateITCase.testTumbleWindowWithoutOutputWindowColumns(WindowAggregateITCase.scala:477)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[...]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34418) Disk space issues for Docker-ized GitHub Action jobs

2024-02-12 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-34418:
--
Summary: Disk space issues for Docker-ized GitHub Action jobs  (was: 
YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots
 failed due to disk space)

> Disk space issues for Docker-ized GitHub Action jobs
> 
>
> Key: FLINK-34418
> URL: https://issues.apache.org/jira/browse/FLINK-34418
> Project: Flink
>  Issue Type: Bug
>  Components: Test Infrastructure
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: github-actions, test-stability
>
> [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746]
> {code:java}
> [...]
> Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device
> 27608Feb 09 03:00:13  at java.io.FileOutputStream.writeBytes(Native Method)
> 27609Feb 09 03:00:13  at 
> java.io.FileOutputStream.write(FileOutputStream.java:326)
> 27610Feb 09 03:00:13  at 
> org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250)
> 27611Feb 09 03:00:13  ... 39 more
> [...] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34418) YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots fa

2024-02-12 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816537#comment-17816537
 ] 

Matthias Pohl commented on FLINK-34418:
---

https://github.com/apache/flink/actions/runs/7866453368

> YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots
>  failed due to disk space
> -
>
> Key: FLINK-34418
> URL: https://issues.apache.org/jira/browse/FLINK-34418
> Project: Flink
>  Issue Type: Bug
>  Components: Test Infrastructure
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: github-actions, test-stability
>
> [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746]
> {code:java}
> [...]
> Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device
> 27608Feb 09 03:00:13  at java.io.FileOutputStream.writeBytes(Native Method)
> 27609Feb 09 03:00:13  at 
> java.io.FileOutputStream.write(FileOutputStream.java:326)
> 27610Feb 09 03:00:13  at 
> org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250)
> 27611Feb 09 03:00:13  ... 39 more
> [...] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34427) ResourceManagerTaskExecutorTest fails fatally (exit code 239)

2024-02-12 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-34427:
--
Affects Version/s: 1.19.0
   1.20.0

> ResourceManagerTaskExecutorTest fails fatally (exit code 239)
> -
>
> Key: FLINK-34427
> URL: https://issues.apache.org/jira/browse/FLINK-34427
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> https://github.com/apache/flink/actions/runs/7866453350/job/21460921911#step:10:8959
> {code}
> Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239
> Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests:
> Error: 02:28:53 02:28:53.220 [ERROR] 
> org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest
> Error: 02:28:53 02:28:53.220 [ERROR] 
> org.apache.maven.surefire.booter.SurefireBooterForkException: 
> ExecutionException The forked VM terminated without properly saying goodbye. 
> VM crash or System.exit called?
> Error: 02:28:53 02:28:53.220 [ERROR] Command was /bin/sh -c cd 
> '/root/flink/flink-runtime' && 
> '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' '-XX:+UseG1GC' '-Xms256m' 
> '-XX:+IgnoreUnrecognizedVMOptions' 
> '--add-opens=java.base/java.util=ALL-UNNAMED' 
> '--add-opens=java.base/java.lang=ALL-UNNAMED' 
> '--add-opens=java.base/java.net=ALL-UNNAMED' 
> '--add-opens=java.base/java.io=ALL-UNNAMED' 
> '--add-opens=java.base/java.util.concurrent=ALL-UNNAMED' '-Xmx768m' '-jar' 
> '/root/flink/flink-runtime/target/surefire/surefirebooter-20240212022332296_94.jar'
>  '/root/flink/flink-runtime/target/surefire' 
> '2024-02-12T02-21-39_495-jvmRun3' 'surefire-20240212022332296_88tmp' 
> 'surefire_26-20240212022332296_91tmp'
> Error: 02:28:53 02:28:53.220 [ERROR] Error occurred in starting fork, check 
> output in log
> Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239
> Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests:
> Error: 02:28:53 02:28:53.221 [ERROR] 
> org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest
> Error: 02:28:53 02:28:53.221 [ERROR]  at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34427) ResourceManagerTaskExecutorTest fails fatally (exit code 239)

2024-02-12 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-34427:
-

 Summary: ResourceManagerTaskExecutorTest fails fatally (exit code 
239)
 Key: FLINK-34427
 URL: https://issues.apache.org/jira/browse/FLINK-34427
 Project: Flink
  Issue Type: Bug
Reporter: Matthias Pohl


https://github.com/apache/flink/actions/runs/7866453350/job/21460921911#step:10:8959

{code}
Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239
Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests:
Error: 02:28:53 02:28:53.220 [ERROR] 
org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest
Error: 02:28:53 02:28:53.220 [ERROR] 
org.apache.maven.surefire.booter.SurefireBooterForkException: 
ExecutionException The forked VM terminated without properly saying goodbye. VM 
crash or System.exit called?
Error: 02:28:53 02:28:53.220 [ERROR] Command was /bin/sh -c cd 
'/root/flink/flink-runtime' && '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' 
'-XX:+UseG1GC' '-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' 
'--add-opens=java.base/java.util=ALL-UNNAMED' 
'--add-opens=java.base/java.lang=ALL-UNNAMED' 
'--add-opens=java.base/java.net=ALL-UNNAMED' 
'--add-opens=java.base/java.io=ALL-UNNAMED' 
'--add-opens=java.base/java.util.concurrent=ALL-UNNAMED' '-Xmx768m' '-jar' 
'/root/flink/flink-runtime/target/surefire/surefirebooter-20240212022332296_94.jar'
 '/root/flink/flink-runtime/target/surefire' '2024-02-12T02-21-39_495-jvmRun3' 
'surefire-20240212022332296_88tmp' 'surefire_26-20240212022332296_91tmp'
Error: 02:28:53 02:28:53.220 [ERROR] Error occurred in starting fork, check 
output in log
Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239
Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests:
Error: 02:28:53 02:28:53.221 [ERROR] 
org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest
Error: 02:28:53 02:28:53.221 [ERROR]at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456)
[...]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP

2024-02-12 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816534#comment-17816534
 ] 

Matthias Pohl commented on FLINK-33186:
---

https://github.com/apache/flink/actions/runs/7866453155/job/21460933108#step:10:7710

>  CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished 
> fails on AZP
> -
>
> Key: FLINK-33186
> URL: https://issues.apache.org/jira/browse/FLINK-33186
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.19.0, 1.18.1
>Reporter: Sergey Nuyanzin
>Assignee: Jiang Xin
>Priority: Critical
>  Labels: test-stability
>
> This build 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509=logs=baf26b34-3c6a-54e8-f93f-cf269b32f802=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9=8762
> fails as
> {noformat}
> Sep 28 01:23:43 Caused by: 
> org.apache.flink.runtime.checkpoint.CheckpointException: Task local 
> checkpoint failure.
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550)
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248)
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235)
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817)
> Sep 28 01:23:43   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> Sep 28 01:23:43   at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> Sep 28 01:23:43   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> Sep 28 01:23:43   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> Sep 28 01:23:43   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> Sep 28 01:23:43   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> Sep 28 01:23:43   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34418) YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots fa

2024-02-12 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816532#comment-17816532
 ] 

Matthias Pohl commented on FLINK-34418:
---

https://github.com/apache/flink/actions/runs/7861970334

> YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots
>  failed due to disk space
> -
>
> Key: FLINK-34418
> URL: https://issues.apache.org/jira/browse/FLINK-34418
> Project: Flink
>  Issue Type: Bug
>  Components: Test Infrastructure
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: github-actions, test-stability
>
> [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746]
> {code:java}
> [...]
> Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device
> 27608Feb 09 03:00:13  at java.io.FileOutputStream.writeBytes(Native Method)
> 27609Feb 09 03:00:13  at 
> java.io.FileOutputStream.write(FileOutputStream.java:326)
> 27610Feb 09 03:00:13  at 
> org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250)
> 27611Feb 09 03:00:13  ... 39 more
> [...] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM

2024-02-12 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816519#comment-17816519
 ] 

Matthias Pohl edited comment on FLINK-34403 at 2/12/24 8:26 AM:


* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57422=results
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57428=results
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57440=results
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57454=results
* 
https://github.com/apache/flink/actions/runs/7831121355/job/21367169878#step:10:23505
* 
https://github.com/apache/flink/actions/runs/7823924194/job/21345848746#step:10:23507
* https://github.com/apache/flink/actions/runs/7823895861
* https://github.com/apache/flink/actions/runs/7838691422
* https://github.com/apache/flink/actions/runs/7851900601
* 
https://github.com/apache/flink/actions/runs/7859002096/job/21444979868#step:10:23510


was (Author: mapohl):
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57422=results
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57428=results
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57440=results
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57454=results
* 
https://github.com/apache/flink/actions/runs/7831121355/job/21367169878#step:10:23505
* 
https://github.com/apache/flink/actions/runs/7823924194/job/21345848746#step:10:23507
* https://github.com/apache/flink/actions/runs/7823895861
* https://github.com/apache/flink/actions/runs/7838691422
* https://github.com/apache/flink/actions/runs/7851900601

> VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
> -
>
> Key: FLINK-34403
> URL: https://issues.apache.org/jira/browse/FLINK-34403
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.20.0
>Reporter: Benchao Li
>Priority: Critical
>  Labels: test-stability
>
> After FLINK-33611 merged, the misc test on GHA cannot pass due to out of 
> memory error, throwing following exceptions:
> {code:java}
> Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest
> Error: 05:43:21 05:43:21.773 [ERROR] 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time 
> elapsed: 40.97 s <<< ERROR!
> Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in 
> serialization.
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007)
> Feb 07 05:43:21   at 
> org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56)
> Feb 07 05:43:21   at 
> org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:89)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:76)
> Feb 07 05:43:21  

[jira] [Commented] (FLINK-33958) Implement restore tests for IntervalJoin node

2024-02-12 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816530#comment-17816530
 ] 

Matthias Pohl commented on FLINK-33958:
---

* 
https://github.com/apache/flink/actions/runs/7831121355/job/21367168844#step:10:11257

> Implement restore tests for IntervalJoin node
> -
>
> Key: FLINK-33958
> URL: https://issues.apache.org/jira/browse/FLINK-33958
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Reporter: Bonnie Varghese
>Assignee: Bonnie Varghese
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM

2024-02-12 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816519#comment-17816519
 ] 

Matthias Pohl edited comment on FLINK-34403 at 2/12/24 8:25 AM:


* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57422=results
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57428=results
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57440=results
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57454=results
* 
https://github.com/apache/flink/actions/runs/7831121355/job/21367169878#step:10:23505
* 
https://github.com/apache/flink/actions/runs/7823924194/job/21345848746#step:10:23507
* https://github.com/apache/flink/actions/runs/7823895861
* https://github.com/apache/flink/actions/runs/7838691422
* https://github.com/apache/flink/actions/runs/7851900601


was (Author: mapohl):
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57422=results
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57428=results
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57440=results
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57454=results

> VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
> -
>
> Key: FLINK-34403
> URL: https://issues.apache.org/jira/browse/FLINK-34403
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.20.0
>Reporter: Benchao Li
>Priority: Critical
>  Labels: test-stability
>
> After FLINK-33611 merged, the misc test on GHA cannot pass due to out of 
> memory error, throwing following exceptions:
> {code:java}
> Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest
> Error: 05:43:21 05:43:21.773 [ERROR] 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time 
> elapsed: 40.97 s <<< ERROR!
> Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in 
> serialization.
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007)
> Feb 07 05:43:21   at 
> org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56)
> Feb 07 05:43:21   at 
> org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104)
> Feb 07 05:43:21   at 
> org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382)
> Feb 07 05:43:21   at 
> org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:89)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:76)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:71)
> Feb 07 05:43:21   at 
> org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple(VeryBigPbRowToProtoTest.java:37)
> Feb 07 05:43:21   at java.lang.reflect.Method.invoke(Method.java:498)
> Feb 07 05:43:21 Caused by: java.util.concurrent.ExecutionException: 
> java.lang.IllegalArgumentException: Self-suppression 

[jira] [Commented] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable

2024-02-12 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816531#comment-17816531
 ] 

Matthias Pohl commented on FLINK-22765:
---

https://github.com/apache/flink/actions/runs/7859001687/job/21444942424#step:10:8685

> ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
> 
>
> Key: FLINK-22765
> URL: https://issues.apache.org/jira/browse/FLINK-22765
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0, 1.20.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Major
>  Labels: pull-request-available, stale-assigned, test-stability
> Fix For: 1.14.0, 1.16.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56
> {code}
> May 25 00:56:38 java.lang.AssertionError: 
> May 25 00:56:38 
> May 25 00:56:38 Expected: is ""
> May 25 00:56:38  but: was "The system is out of resources.\nConsult the 
> following stack trace for details."
> May 25 00:56:38   at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:956)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:923)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError(ExceptionUtilsITCase.java:70)
> May 25 00:56:38   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> May 25 00:56:38   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> May 25 00:56:38   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> May 25 00:56:38   at java.lang.reflect.Method.invoke(Method.java:498)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> May 25 00:56:38   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> May 25 00:56:38   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> May 25 00:56:38   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> May 25 00:56:38   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> May 25 00:56:38   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> May 25 00:56:38   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> May 25 00:56:38   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> May 25 00:56:38 
> {code}



--
This message was sent by 

[jira] [Commented] (FLINK-34418) YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots fa

2024-02-12 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816529#comment-17816529
 ] 

Matthias Pohl commented on FLINK-34418:
---

This one still succeeded but got a disk space reaching limits warning: 
https://github.com/apache/flink/actions/runs/7859001687/job/21445027923#step:1:46

> YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots
>  failed due to disk space
> -
>
> Key: FLINK-34418
> URL: https://issues.apache.org/jira/browse/FLINK-34418
> Project: Flink
>  Issue Type: Bug
>  Components: Test Infrastructure
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: github-actions, test-stability
>
> [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746]
> {code:java}
> [...]
> Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device
> 27608Feb 09 03:00:13  at java.io.FileOutputStream.writeBytes(Native Method)
> 27609Feb 09 03:00:13  at 
> java.io.FileOutputStream.write(FileOutputStream.java:326)
> 27610Feb 09 03:00:13  at 
> org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250)
> 27611Feb 09 03:00:13  ... 39 more
> [...] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34418) YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots fa

2024-02-12 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816528#comment-17816528
 ] 

Matthias Pohl commented on FLINK-34418:
---

https://github.com/apache/flink/actions/runs/7859001632/job/21444955041

> YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots
>  failed due to disk space
> -
>
> Key: FLINK-34418
> URL: https://issues.apache.org/jira/browse/FLINK-34418
> Project: Flink
>  Issue Type: Bug
>  Components: Test Infrastructure
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: github-actions, test-stability
>
> [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746]
> {code:java}
> [...]
> Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device
> 27608Feb 09 03:00:13  at java.io.FileOutputStream.writeBytes(Native Method)
> 27609Feb 09 03:00:13  at 
> java.io.FileOutputStream.write(FileOutputStream.java:326)
> 27610Feb 09 03:00:13  at 
> org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250)
> 27611Feb 09 03:00:13  ... 39 more
> [...] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34418) YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots fa

2024-02-12 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816527#comment-17816527
 ] 

Matthias Pohl commented on FLINK-34418:
---

https://github.com/apache/flink/actions/runs/7851900779

> YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots
>  failed due to disk space
> -
>
> Key: FLINK-34418
> URL: https://issues.apache.org/jira/browse/FLINK-34418
> Project: Flink
>  Issue Type: Bug
>  Components: Test Infrastructure
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: github-actions, test-stability
>
> [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746]
> {code:java}
> [...]
> Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device
> 27608Feb 09 03:00:13  at java.io.FileOutputStream.writeBytes(Native Method)
> 27609Feb 09 03:00:13  at 
> java.io.FileOutputStream.write(FileOutputStream.java:326)
> 27610Feb 09 03:00:13  at 
> org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250)
> 27611Feb 09 03:00:13  ... 39 more
> [...] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34426) HybridShuffleITCase.testHybridSelectiveExchangesRestart times out

2024-02-12 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-34426:
-

 Summary: HybridShuffleITCase.testHybridSelectiveExchangesRestart 
times out
 Key: FLINK-34426
 URL: https://issues.apache.org/jira/browse/FLINK-34426
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network
Affects Versions: 1.18.1
Reporter: Matthias Pohl


https://github.com/apache/flink/actions/runs/7851900779/job/21429781783#step:10:9052

{code}
"ForkJoinPool-1-worker-3" #16 daemon prio=5 os_prio=0 cpu=3397.79ms 
elapsed=11462.88s tid=0x7f48966b3800 nid=0x7a303 waiting on condition  
[0x7f486e97a000]
   java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@11.0.19/Native Method)
- parking to wait for  <0xa2faa230> (a 
java.util.concurrent.CompletableFuture$Signaller)
at 
java.util.concurrent.locks.LockSupport.park(java.base@11.0.19/LockSupport.java:194)
at 
java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.19/CompletableFuture.java:1796)
at 
java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.19/ForkJoinPool.java:3118)
at 
java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.19/CompletableFuture.java:1823)
at 
java.util.concurrent.CompletableFuture.get(java.base@11.0.19/CompletableFuture.java:1998)
at 
org.apache.flink.util.AutoCloseableAsync.close(AutoCloseableAsync.java:36)
at 
org.apache.flink.test.runtime.JobGraphRunningUtil.execute(JobGraphRunningUtil.java:61)
at 
org.apache.flink.test.runtime.BatchShuffleITCaseBase.executeJob(BatchShuffleITCaseBase.java:117)
at 
org.apache.flink.test.runtime.HybridShuffleITCase.testHybridSelectiveExchangesRestart(HybridShuffleITCase.java:79)
at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.19/Native 
Method)
[...]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34418) YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots fa

2024-02-12 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816526#comment-17816526
 ] 

Matthias Pohl commented on FLINK-34418:
---

https://github.com/apache/flink/actions/runs/7851900616

> YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots
>  failed due to disk space
> -
>
> Key: FLINK-34418
> URL: https://issues.apache.org/jira/browse/FLINK-34418
> Project: Flink
>  Issue Type: Bug
>  Components: Test Infrastructure
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: github-actions, test-stability
>
> [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746]
> {code:java}
> [...]
> Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device
> 27608Feb 09 03:00:13  at java.io.FileOutputStream.writeBytes(Native Method)
> 27609Feb 09 03:00:13  at 
> java.io.FileOutputStream.write(FileOutputStream.java:326)
> 27610Feb 09 03:00:13  at 
> org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250)
> 27611Feb 09 03:00:13  ... 39 more
> [...] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34425) TaskManagerRunnerITCase#testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure times out

2024-02-12 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-34425:
-

 Summary: 
TaskManagerRunnerITCase#testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure
 times out
 Key: FLINK-34425
 URL: https://issues.apache.org/jira/browse/FLINK-34425
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.19.0, 1.20.0
Reporter: Matthias Pohl


https://github.com/apache/flink/actions/runs/7851900616/job/21429757962#step:10:8844

{code}
Feb 10 03:21:45 "main" #1 [498632] prio=5 os_prio=0 cpu=619.91ms 
elapsed=1653.40s tid=0x7fbd29695000 nid=498632 waiting on condition  
[0x7fbd2b9f3000]
Feb 10 03:21:45java.lang.Thread.State: WAITING (parking)
Feb 10 03:21:45 at 
jdk.internal.misc.Unsafe.park(java.base@21.0.1/Native Method)
Feb 10 03:21:45 - parking to wait for  <0xae6199f0> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
Feb 10 03:21:45 at 
java.util.concurrent.locks.LockSupport.park(java.base@21.0.1/LockSupport.java:371)
Feb 10 03:21:45 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(java.base@21.0.1/AbstractQueuedSynchronizer.java:519)
Feb 10 03:21:45 at 
java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@21.0.1/ForkJoinPool.java:3780)
Feb 10 03:21:45 at 
java.util.concurrent.ForkJoinPool.managedBlock(java.base@21.0.1/ForkJoinPool.java:3725)
Feb 10 03:21:45 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@21.0.1/AbstractQueuedSynchronizer.java:1707)
Feb 10 03:21:45 at 
java.lang.ProcessImpl.waitFor(java.base@21.0.1/ProcessImpl.java:425)
Feb 10 03:21:45 at 
org.apache.flink.test.recovery.TaskManagerRunnerITCase.testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure(TaskManagerRunnerITCase.java:126)
Feb 10 03:21:45 at 
java.lang.invoke.LambdaForm$DMH/0x7fbccb1b8000.invokeVirtual(java.base@21.0.1/LambdaForm$DMH)
Feb 10 03:21:45 at 
java.lang.invoke.LambdaForm$MH/0x7fbccb1b8800.invoke(java.base@21.0.1/LambdaForm$MH)
[...]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34418) YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots fa

2024-02-12 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816523#comment-17816523
 ] 

Matthias Pohl commented on FLINK-34418:
---

https://github.com/apache/flink/actions/runs/7851900601/job/21429775024

> YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots
>  failed due to disk space
> -
>
> Key: FLINK-34418
> URL: https://issues.apache.org/jira/browse/FLINK-34418
> Project: Flink
>  Issue Type: Bug
>  Components: Test Infrastructure
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: github-actions, test-stability
>
> [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746]
> {code:java}
> [...]
> Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device
> 27608Feb 09 03:00:13  at java.io.FileOutputStream.writeBytes(Native Method)
> 27609Feb 09 03:00:13  at 
> java.io.FileOutputStream.write(FileOutputStream.java:326)
> 27610Feb 09 03:00:13  at 
> org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250)
> 27611Feb 09 03:00:13  ... 39 more
> [...] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (FLINK-26515) RetryingExecutorTest. testDiscardOnTimeout failed on azure

2024-02-12 Thread Matthias Pohl (Jira)


[ https://issues.apache.org/jira/browse/FLINK-26515 ]


Matthias Pohl deleted comment on FLINK-26515:
---

was (Author: mapohl):
1.18: 
https://github.com/apache/flink/actions/runs/7838691874/job/21390763726#step:10:10503

> RetryingExecutorTest. testDiscardOnTimeout failed on azure
> --
>
> Key: FLINK-26515
> URL: https://issues.apache.org/jira/browse/FLINK-26515
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.14.3, 1.17.0, 1.16.1, 1.18.0, 1.19.0
>Reporter: Yun Gao
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available, 
> test-stability
>
> {code:java}
> Mar 06 01:20:29 [ERROR] Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, 
> Time elapsed: 1.941 s <<< FAILURE! - in 
> org.apache.flink.changelog.fs.RetryingExecutorTest
> Mar 06 01:20:29 [ERROR] testTimeout  Time elapsed: 1.934 s  <<< FAILURE!
> Mar 06 01:20:29 java.lang.AssertionError: expected:<500.0> but 
> was:<1922.869766>
> Mar 06 01:20:29   at org.junit.Assert.fail(Assert.java:89)
> Mar 06 01:20:29   at org.junit.Assert.failNotEquals(Assert.java:835)
> Mar 06 01:20:29   at org.junit.Assert.assertEquals(Assert.java:555)
> Mar 06 01:20:29   at org.junit.Assert.assertEquals(Assert.java:685)
> Mar 06 01:20:29   at 
> org.apache.flink.changelog.fs.RetryingExecutorTest.testTimeout(RetryingExecutorTest.java:145)
> Mar 06 01:20:29   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Mar 06 01:20:29   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Mar 06 01:20:29   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Mar 06 01:20:29   at java.lang.reflect.Method.invoke(Method.java:498)
> Mar 06 01:20:29   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> Mar 06 01:20:29   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Mar 06 01:20:29   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> Mar 06 01:20:29   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Mar 06 01:20:29   at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> Mar 06 01:20:29   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> Mar 06 01:20:29   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> Mar 06 01:20:29   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> Mar 06 01:20:29   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> Mar 06 01:20:29   at 
> org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> Mar 06 01:20:29   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> Mar 06 01:20:29   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> Mar 06 01:20:29   at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> Mar 06 01:20:29   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> Mar 06 01:20:29   at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> Mar 06 01:20:29   at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> Mar 06 01:20:29   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> Mar 06 01:20:29   at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
> Mar 06 01:20:29   at 
> org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43)
> Mar 06 01:20:29   at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> Mar 06 01:20:29   at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
> Mar 06 01:20:29   at 
> java.util.Iterator.forEachRemaining(Iterator.java:116)
> Mar 06 01:20:29   at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> Mar 06 01:20:29   at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
> Mar 06 01:20:29   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
>  {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=32569=logs=f450c1a5-64b1-5955-e215-49cb1ad5ec88=cc452273-9efa-565d-9db8-ef62a38a0c10=22554



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-26515) RetryingExecutorTest. testDiscardOnTimeout failed on azure

2024-02-12 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816522#comment-17816522
 ] 

Matthias Pohl commented on FLINK-26515:
---

1.18: 
https://github.com/apache/flink/actions/runs/7838691874/job/21390763726#step:10:10503

> RetryingExecutorTest. testDiscardOnTimeout failed on azure
> --
>
> Key: FLINK-26515
> URL: https://issues.apache.org/jira/browse/FLINK-26515
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.14.3, 1.17.0, 1.16.1, 1.18.0, 1.19.0
>Reporter: Yun Gao
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available, 
> test-stability
>
> {code:java}
> Mar 06 01:20:29 [ERROR] Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, 
> Time elapsed: 1.941 s <<< FAILURE! - in 
> org.apache.flink.changelog.fs.RetryingExecutorTest
> Mar 06 01:20:29 [ERROR] testTimeout  Time elapsed: 1.934 s  <<< FAILURE!
> Mar 06 01:20:29 java.lang.AssertionError: expected:<500.0> but 
> was:<1922.869766>
> Mar 06 01:20:29   at org.junit.Assert.fail(Assert.java:89)
> Mar 06 01:20:29   at org.junit.Assert.failNotEquals(Assert.java:835)
> Mar 06 01:20:29   at org.junit.Assert.assertEquals(Assert.java:555)
> Mar 06 01:20:29   at org.junit.Assert.assertEquals(Assert.java:685)
> Mar 06 01:20:29   at 
> org.apache.flink.changelog.fs.RetryingExecutorTest.testTimeout(RetryingExecutorTest.java:145)
> Mar 06 01:20:29   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Mar 06 01:20:29   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Mar 06 01:20:29   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Mar 06 01:20:29   at java.lang.reflect.Method.invoke(Method.java:498)
> Mar 06 01:20:29   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> Mar 06 01:20:29   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Mar 06 01:20:29   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> Mar 06 01:20:29   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Mar 06 01:20:29   at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> Mar 06 01:20:29   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> Mar 06 01:20:29   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> Mar 06 01:20:29   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> Mar 06 01:20:29   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> Mar 06 01:20:29   at 
> org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> Mar 06 01:20:29   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> Mar 06 01:20:29   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> Mar 06 01:20:29   at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> Mar 06 01:20:29   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> Mar 06 01:20:29   at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> Mar 06 01:20:29   at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> Mar 06 01:20:29   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> Mar 06 01:20:29   at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
> Mar 06 01:20:29   at 
> org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43)
> Mar 06 01:20:29   at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> Mar 06 01:20:29   at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
> Mar 06 01:20:29   at 
> java.util.Iterator.forEachRemaining(Iterator.java:116)
> Mar 06 01:20:29   at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> Mar 06 01:20:29   at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
> Mar 06 01:20:29   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
>  {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=32569=logs=f450c1a5-64b1-5955-e215-49cb1ad5ec88=cc452273-9efa-565d-9db8-ef62a38a0c10=22554



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


<    2   3   4   5   6   7   8   9   10   11   >