[jira] [Updated] (FLINK-33555) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory:
[ https://issues.apache.org/jira/browse/FLINK-33555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-33555: -- Priority: Critical (was: Major) > LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory: > --- > > Key: FLINK-33555 > URL: https://issues.apache.org/jira/browse/FLINK-33555 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.19.0, 1.20.0 >Reporter: Matthias Pohl >Priority: Critical > Labels: test-stability > > https://github.com/XComp/flink/actions/runs/6868936761/job/18680977238#step:12:13492 > {code} > Error: 21:44:15 21:44:15.144 [ERROR] > LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory:119 > [The task was deployed to AllocationID(fcf411eadbae8beed895a78ea1653046) but > it should have been deployed to > AllocationID(dec337d82b9d960004ffd73be8a2c5d5) for local recovery., The task > was deployed to AllocationID(a61fd8a6bc5ef9d467f32f918bdfb385) but it should > have been deployed to AllocationID(fcf411eadbae8beed895a78ea1653046) for > local recovery., The task was deployed to > AllocationID(dec337d82b9d960004ffd73be8a2c5d5) but it should have been > deployed to AllocationID(a61fd8a6bc5ef9d467f32f918bdfb385) for local > recovery.] ==> expected: but was: > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33555) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory:
[ https://issues.apache.org/jira/browse/FLINK-33555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-33555: -- Affects Version/s: 1.19.0 1.20.0 > LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory: > --- > > Key: FLINK-33555 > URL: https://issues.apache.org/jira/browse/FLINK-33555 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.19.0, 1.20.0 >Reporter: Matthias Pohl >Priority: Major > Labels: github-actions, test-stability > > https://github.com/XComp/flink/actions/runs/6868936761/job/18680977238#step:12:13492 > {code} > Error: 21:44:15 21:44:15.144 [ERROR] > LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory:119 > [The task was deployed to AllocationID(fcf411eadbae8beed895a78ea1653046) but > it should have been deployed to > AllocationID(dec337d82b9d960004ffd73be8a2c5d5) for local recovery., The task > was deployed to AllocationID(a61fd8a6bc5ef9d467f32f918bdfb385) but it should > have been deployed to AllocationID(fcf411eadbae8beed895a78ea1653046) for > local recovery., The task was deployed to > AllocationID(dec337d82b9d960004ffd73be8a2c5d5) but it should have been > deployed to AllocationID(a61fd8a6bc5ef9d467f32f918bdfb385) for local > recovery.] ==> expected: but was: > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33555) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory:
[ https://issues.apache.org/jira/browse/FLINK-33555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817860#comment-17817860 ] Matthias Pohl commented on FLINK-33555: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57550=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=15485 {code} Feb 16 01:14:56 01:14:56.299 [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 39.33 s <<< FAILURE! -- in org.apache.flink.test.recovery.LocalRecoveryITCase Feb 16 01:14:56 01:14:56.299 [ERROR] org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory -- Time elapsed: 39.27 s <<< FAILURE! Feb 16 01:14:56 org.opentest4j.AssertionFailedError: [The task was deployed to AllocationID(34c031bb72931f33a70b6a55fe30501c) but it should have been deployed to AllocationID(ee1115e87892e59107adfa6b7bfbfd13) for local recovery., The task was deployed to AllocationID(ee1115e87892e59107adfa6b7bfbfd13) but it should have been deployed to AllocationID(34c031bb72931f33a70b6a55fe30501c) for local recovery.] ==> expected: but was: Feb 16 01:14:56 at org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) Feb 16 01:14:56 at org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) Feb 16 01:14:56 at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63) Feb 16 01:14:56 at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36) Feb 16 01:14:56 at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:214) Feb 16 01:14:56 at org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory(LocalRecoveryITCase.java:119) Feb 16 01:14:56 at java.lang.reflect.Method.invoke(Method.java:498) Feb 16 01:14:56 at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) Feb 16 01:14:56 at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) Feb 16 01:14:56 at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) Feb 16 01:14:56 at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) Feb 16 01:14:56 at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) {code} > LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory: > --- > > Key: FLINK-33555 > URL: https://issues.apache.org/jira/browse/FLINK-33555 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: Matthias Pohl >Priority: Major > Labels: github-actions, test-stability > > https://github.com/XComp/flink/actions/runs/6868936761/job/18680977238#step:12:13492 > {code} > Error: 21:44:15 21:44:15.144 [ERROR] > LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory:119 > [The task was deployed to AllocationID(fcf411eadbae8beed895a78ea1653046) but > it should have been deployed to > AllocationID(dec337d82b9d960004ffd73be8a2c5d5) for local recovery., The task > was deployed to AllocationID(a61fd8a6bc5ef9d467f32f918bdfb385) but it should > have been deployed to AllocationID(fcf411eadbae8beed895a78ea1653046) for > local recovery., The task was deployed to > AllocationID(dec337d82b9d960004ffd73be8a2c5d5) but it should have been > deployed to AllocationID(a61fd8a6bc5ef9d467f32f918bdfb385) for local > recovery.] ==> expected: but was: > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-33555) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory:
[ https://issues.apache.org/jira/browse/FLINK-33555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817860#comment-17817860 ] Matthias Pohl edited comment on FLINK-33555 at 2/16/24 7:57 AM: https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57550=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=15485 {code} Feb 16 01:14:56 01:14:56.299 [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 39.33 s <<< FAILURE! -- in org.apache.flink.test.recovery.LocalRecoveryITCase Feb 16 01:14:56 01:14:56.299 [ERROR] org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory -- Time elapsed: 39.27 s <<< FAILURE! Feb 16 01:14:56 org.opentest4j.AssertionFailedError: [The task was deployed to AllocationID(34c031bb72931f33a70b6a55fe30501c) but it should have been deployed to AllocationID(ee1115e87892e59107adfa6b7bfbfd13) for local recovery., The task was deployed to AllocationID(ee1115e87892e59107adfa6b7bfbfd13) but it should have been deployed to AllocationID(34c031bb72931f33a70b6a55fe30501c) for local recovery.] ==> expected: but was: Feb 16 01:14:56 at org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) Feb 16 01:14:56 at org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) Feb 16 01:14:56 at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63) Feb 16 01:14:56 at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36) Feb 16 01:14:56 at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:214) Feb 16 01:14:56 at org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory(LocalRecoveryITCase.java:119) Feb 16 01:14:56 at java.lang.reflect.Method.invoke(Method.java:498) Feb 16 01:14:56 at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) Feb 16 01:14:56 at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) Feb 16 01:14:56 at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) Feb 16 01:14:56 at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) Feb 16 01:14:56 at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) {code} I'm moving this one out of FLINK-27075 as it appeared in Azure Pipelines as well. was (Author: mapohl): https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57550=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=15485 {code} Feb 16 01:14:56 01:14:56.299 [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 39.33 s <<< FAILURE! -- in org.apache.flink.test.recovery.LocalRecoveryITCase Feb 16 01:14:56 01:14:56.299 [ERROR] org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory -- Time elapsed: 39.27 s <<< FAILURE! Feb 16 01:14:56 org.opentest4j.AssertionFailedError: [The task was deployed to AllocationID(34c031bb72931f33a70b6a55fe30501c) but it should have been deployed to AllocationID(ee1115e87892e59107adfa6b7bfbfd13) for local recovery., The task was deployed to AllocationID(ee1115e87892e59107adfa6b7bfbfd13) but it should have been deployed to AllocationID(34c031bb72931f33a70b6a55fe30501c) for local recovery.] ==> expected: but was: Feb 16 01:14:56 at org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) Feb 16 01:14:56 at org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) Feb 16 01:14:56 at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63) Feb 16 01:14:56 at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36) Feb 16 01:14:56 at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:214) Feb 16 01:14:56 at org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory(LocalRecoveryITCase.java:119) Feb 16 01:14:56 at java.lang.reflect.Method.invoke(Method.java:498) Feb 16 01:14:56 at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) Feb 16 01:14:56 at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) Feb 16 01:14:56 at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) Feb 16 01:14:56 at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) Feb 16 01:14:56 at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) {code} > LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory: >
[jira] [Commented] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
[ https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817857#comment-17817857 ] Matthias Pohl commented on FLINK-22765: --- JDK21: https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57550=logs=a657ddbf-d986-5381-9649-342d9c92e7fb=dc085d4a-05c8-580e-06ab-21f5624dab16=7806 > ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable > > > Key: FLINK-22765 > URL: https://issues.apache.org/jira/browse/FLINK-22765 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0, 1.20.0 >Reporter: Robert Metzger >Assignee: Robert Metzger >Priority: Critical > Labels: pull-request-available, stale-assigned, test-stability > Fix For: 1.14.0, 1.16.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56 > {code} > May 25 00:56:38 java.lang.AssertionError: > May 25 00:56:38 > May 25 00:56:38 Expected: is "" > May 25 00:56:38 but: was "The system is out of resources.\nConsult the > following stack trace for details." > May 25 00:56:38 at > org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > May 25 00:56:38 at org.junit.Assert.assertThat(Assert.java:956) > May 25 00:56:38 at org.junit.Assert.assertThat(Assert.java:923) > May 25 00:56:38 at > org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94) > May 25 00:56:38 at > org.apache.flink.runtime.util.ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError(ExceptionUtilsITCase.java:70) > May 25 00:56:38 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > May 25 00:56:38 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > May 25 00:56:38 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > May 25 00:56:38 at java.lang.reflect.Method.invoke(Method.java:498) > May 25 00:56:38 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > May 25 00:56:38 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > May 25 00:56:38 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > May 25 00:56:38 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > May 25 00:56:38 at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > May 25 00:56:38 at > org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > May 25 00:56:38 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > May 25 00:56:38 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > May 25 00:56:38 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > May 25 00:56:38 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > May 25 00:56:38 at > org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > May 25 00:56:38 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > May 25 00:56:38 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > May 25 00:56:38 at > org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > May 25 00:56:38 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > May 25 00:56:38 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > May 25 00:56:38 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > May 25 00:56:38 at > org.junit.runners.ParentRunner.run(ParentRunner.java:363) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > May 25 00:56:38 at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > May 25 00:56:38 at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > May 25 00:56:38 at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > May 25 00:56:38 at >
[jira] [Commented] (FLINK-29114) TableSourceITCase#testTableHintWithLogicalTableScanReuse sometimes fails with result mismatch
[ https://issues.apache.org/jira/browse/FLINK-29114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817858#comment-17817858 ] Matthias Pohl commented on FLINK-29114: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57550=logs=f2c100be-250b-5e85-7bbe-176f68fcddc5=05efd11e-5400-54a4-0d27-a4663be008a9=11539 > TableSourceITCase#testTableHintWithLogicalTableScanReuse sometimes fails with > result mismatch > -- > > Key: FLINK-29114 > URL: https://issues.apache.org/jira/browse/FLINK-29114 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner, Tests >Affects Versions: 1.15.0, 1.19.0, 1.20.0 >Reporter: Sergey Nuyanzin >Priority: Minor > Labels: auto-deprioritized-major, test-stability > > It could be reproduced locally by repeating tests. Usually about 100 > iterations are enough to have several failed tests > {noformat} > [ERROR] Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: > 1.664 s <<< FAILURE! - in > org.apache.flink.table.planner.runtime.batch.sql.TableSourceITCase > [ERROR] > org.apache.flink.table.planner.runtime.batch.sql.TableSourceITCase.testTableHintWithLogicalTableScanReuse > Time elapsed: 0.108 s <<< FAILURE! > java.lang.AssertionError: expected: 3,2,Hello world, 3,2,Hello world, 3,2,Hello world)> but was: 2,2,Hello, 2,2,Hello, 3,2,Hello world, 3,2,Hello world)> > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:120) > at org.junit.Assert.assertEquals(Assert.java:146) > at > org.apache.flink.table.planner.runtime.batch.sql.TableSourceITCase.testTableHintWithLogicalTableScanReuse(TableSourceITCase.scala:428) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at org.junit.runner.JUnitCore.run(JUnitCore.java:115) > at > org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42) > at > org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80) > at > org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:72) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:107) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54) > at >
[jira] [Updated] (FLINK-34448) ChangelogLocalRecoveryITCase failed fatally with 127 exit code
[ https://issues.apache.org/jira/browse/FLINK-34448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34448: -- Priority: Critical (was: Major) > ChangelogLocalRecoveryITCase failed fatally with 127 exit code > -- > > Key: FLINK-34448 > URL: https://issues.apache.org/jira/browse/FLINK-34448 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.20.0 >Reporter: Matthias Pohl >Priority: Critical > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57550=logs=2c3cbe13-dee0-5837-cf47-3053da9a8a78=b78d9d30-509a-5cea-1fef-db7abaa325ae=8897 > \ > {code} > Feb 16 02:43:47 02:43:47.142 [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-surefire-plugin:3.2.2:test (integration-tests) > on project flink-tests: > Feb 16 02:43:47 02:43:47.142 [ERROR] > Feb 16 02:43:47 02:43:47.142 [ERROR] Please refer to > /__w/1/s/flink-tests/target/surefire-reports for the individual test results. > Feb 16 02:43:47 02:43:47.142 [ERROR] Please refer to dump files (if any > exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream. > Feb 16 02:43:47 02:43:47.142 [ERROR] ExecutionException The forked VM > terminated without properly saying goodbye. VM crash or System.exit called? > Feb 16 02:43:47 02:43:47.142 [ERROR] Command was /bin/sh -c cd > '/__w/1/s/flink-tests' && '/usr/lib/jvm/jdk-11.0.19+7/bin/java' > '-XX:+UseG1GC' '-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' > '--add-opens=java.base/java.util=ALL-UNNAMED' > '--add-opens=java.base/java.io=ALL-UNNAMED' '-Xmx1536m' '-jar' > '/__w/1/s/flink-tests/target/surefire/surefirebooter-20240216015747138_560.jar' > '/__w/1/s/flink-tests/target/surefire' '2024-02-16T01-57-43_286-jvmRun4' > 'surefire-20240216015747138_558tmp' 'surefire_185-20240216015747138_559tmp' > Feb 16 02:43:47 02:43:47.142 [ERROR] Error occurred in starting fork, check > output in log > Feb 16 02:43:47 02:43:47.142 [ERROR] Process Exit Code: 127 > Feb 16 02:43:47 02:43:47.142 [ERROR] Crashed tests: > Feb 16 02:43:47 02:43:47.142 [ERROR] > org.apache.flink.test.checkpointing.ChangelogLocalRecoveryITCase > Feb 16 02:43:47 02:43:47.142 [ERROR] > org.apache.maven.surefire.booter.SurefireBooterForkException: > ExecutionException The forked VM terminated without properly saying goodbye. > VM crash or System.exit called? > Feb 16 02:43:47 02:43:47.142 [ERROR] Command was /bin/sh -c cd > '/__w/1/s/flink-tests' && '/usr/lib/jvm/jdk-11.0.19+7/bin/java' > '-XX:+UseG1GC' '-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' > '--add-opens=java.base/java.util=ALL-UNNAMED' > '--add-opens=java.base/java.io=ALL-UNNAMED' '-Xmx1536m' '-jar' > '/__w/1/s/flink-tests/target/surefire/surefirebooter-20240216015747138_560.jar' > '/__w/1/s/flink-tests/target/surefire' '2024-02-16T01-57-43_286-jvmRun4' > 'surefire-20240216015747138_558tmp' 'surefire_185-20240216015747138_559tmp' > Feb 16 02:43:47 02:43:47.142 [ERROR] Error occurred in starting fork, check > output in log > Feb 16 02:43:47 02:43:47.142 [ERROR] Process Exit Code: 127 > Feb 16 02:43:47 02:43:47.142 [ERROR] Crashed tests: > Feb 16 02:43:47 02:43:47.142 [ERROR] > org.apache.flink.test.checkpointing.ChangelogLocalRecoveryITCase > Feb 16 02:43:47 02:43:47.142 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34448) ChangelogLocalRecoveryITCase failed fatally with 127 exit code
Matthias Pohl created FLINK-34448: - Summary: ChangelogLocalRecoveryITCase failed fatally with 127 exit code Key: FLINK-34448 URL: https://issues.apache.org/jira/browse/FLINK-34448 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.20.0 Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57550=logs=2c3cbe13-dee0-5837-cf47-3053da9a8a78=b78d9d30-509a-5cea-1fef-db7abaa325ae=8897 \ {code} Feb 16 02:43:47 02:43:47.142 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.2.2:test (integration-tests) on project flink-tests: Feb 16 02:43:47 02:43:47.142 [ERROR] Feb 16 02:43:47 02:43:47.142 [ERROR] Please refer to /__w/1/s/flink-tests/target/surefire-reports for the individual test results. Feb 16 02:43:47 02:43:47.142 [ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream. Feb 16 02:43:47 02:43:47.142 [ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called? Feb 16 02:43:47 02:43:47.142 [ERROR] Command was /bin/sh -c cd '/__w/1/s/flink-tests' && '/usr/lib/jvm/jdk-11.0.19+7/bin/java' '-XX:+UseG1GC' '-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' '--add-opens=java.base/java.util=ALL-UNNAMED' '--add-opens=java.base/java.io=ALL-UNNAMED' '-Xmx1536m' '-jar' '/__w/1/s/flink-tests/target/surefire/surefirebooter-20240216015747138_560.jar' '/__w/1/s/flink-tests/target/surefire' '2024-02-16T01-57-43_286-jvmRun4' 'surefire-20240216015747138_558tmp' 'surefire_185-20240216015747138_559tmp' Feb 16 02:43:47 02:43:47.142 [ERROR] Error occurred in starting fork, check output in log Feb 16 02:43:47 02:43:47.142 [ERROR] Process Exit Code: 127 Feb 16 02:43:47 02:43:47.142 [ERROR] Crashed tests: Feb 16 02:43:47 02:43:47.142 [ERROR] org.apache.flink.test.checkpointing.ChangelogLocalRecoveryITCase Feb 16 02:43:47 02:43:47.142 [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called? Feb 16 02:43:47 02:43:47.142 [ERROR] Command was /bin/sh -c cd '/__w/1/s/flink-tests' && '/usr/lib/jvm/jdk-11.0.19+7/bin/java' '-XX:+UseG1GC' '-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' '--add-opens=java.base/java.util=ALL-UNNAMED' '--add-opens=java.base/java.io=ALL-UNNAMED' '-Xmx1536m' '-jar' '/__w/1/s/flink-tests/target/surefire/surefirebooter-20240216015747138_560.jar' '/__w/1/s/flink-tests/target/surefire' '2024-02-16T01-57-43_286-jvmRun4' 'surefire-20240216015747138_558tmp' 'surefire_185-20240216015747138_559tmp' Feb 16 02:43:47 02:43:47.142 [ERROR] Error occurred in starting fork, check output in log Feb 16 02:43:47 02:43:47.142 [ERROR] Process Exit Code: 127 Feb 16 02:43:47 02:43:47.142 [ERROR] Crashed tests: Feb 16 02:43:47 02:43:47.142 [ERROR] org.apache.flink.test.checkpointing.ChangelogLocalRecoveryITCase Feb 16 02:43:47 02:43:47.142 [ERROR]at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34202) python tests take suspiciously long in some of the cases
[ https://issues.apache.org/jira/browse/FLINK-34202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817856#comment-17817856 ] Matthias Pohl commented on FLINK-34202: --- 1.20 (master): https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57550=logs=bf5e383b-9fd3-5f02-ca1c-8f788e2e76d3=85189c57-d8a0-5c9c-b61d-fc05cfac62cf > python tests take suspiciously long in some of the cases > > > Key: FLINK-34202 > URL: https://issues.apache.org/jira/browse/FLINK-34202 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.17.2, 1.19.0, 1.18.1 >Reporter: Matthias Pohl >Priority: Critical > Labels: test-stability > > [This release-1.18 > build|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56603=logs=3e4dd1a2-fe2f-5e5d-a581-48087e718d53=b4612f28-e3b5-5853-8a8b-610ae894217a] > has the python stage running into a timeout without any obvious reason. The > [python stage run for > JDK17|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56603=logs=b53e1644-5cb4-5a3b-5d48-f523f39bcf06] > was also getting close to the 4h timeout. > I'm creating this issue for documentation purposes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
[ https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817683#comment-17817683 ] Matthias Pohl edited comment on FLINK-22765 at 2/15/24 2:06 PM: The following error is reported: {code} The system is out of resources. Consult the following stack trace for details. java.lang.OutOfMemoryError: Metaspace at jdk.compiler/com.sun.tools.javac.comp.Flow.analyzeTree(Flow.java:233) at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.flow(JavaCompiler.java:1419) at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.flow(JavaCompiler.java:1393) at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:976) at jdk.compiler/com.sun.tools.javac.main.Main.compile(Main.java:319) at jdk.compiler/com.sun.tools.javac.main.Main.compile(Main.java:178) at jdk.compiler/com.sun.tools.javac.Main.compile(Main.java:82) at jdk.compiler/com.sun.tools.javac.api.JavacTool.run(JavacTool.java:214) at org.apache.flink.testutils.ClassLoaderUtils.compileClass(ClassLoaderUtils.java:83) at org.apache.flink.testutils.ClassLoaderUtils.writeAndCompile(ClassLoaderUtils.java:62) at org.apache.flink.testutils.ClassLoaderUtils.access$100(ClassLoaderUtils.java:46) at org.apache.flink.testutils.ClassLoaderUtils$ClassLoaderBuilder.build(ClassLoaderUtils.java:163) at org.apache.flink.runtime.util.ExceptionUtilsITCase$DummyClassLoadingProgram.loadDummyClass(ExceptionUtilsITCase.java:180) at org.apache.flink.runtime.util.ExceptionUtilsITCase$DummyClassLoadingProgram.main(ExceptionUtilsITCase.java:159) {code} That explains why other sources (e.g. [this SO post|https://stackoverflow.com/a/39509720]) state that the error message "The system is out of resources." indicates an error while compiling the classes. That happens in the ClassloaderUtils. The heap size is just not sufficient for compiling the classes. was (Author: mapohl): The following error is reported: {code} The system is out of resources. Consult the following stack trace for details. java.lang.OutOfMemoryError: Metaspace at jdk.compiler/com.sun.tools.javac.comp.Flow.analyzeTree(Flow.java:233) at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.flow(JavaCompiler.java:1419) at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.flow(JavaCompiler.java:1393) at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:976) at jdk.compiler/com.sun.tools.javac.main.Main.compile(Main.java:319) at jdk.compiler/com.sun.tools.javac.main.Main.compile(Main.java:178) at jdk.compiler/com.sun.tools.javac.Main.compile(Main.java:82) at jdk.compiler/com.sun.tools.javac.api.JavacTool.run(JavacTool.java:214) at org.apache.flink.testutils.ClassLoaderUtils.compileClass(ClassLoaderUtils.java:83) at org.apache.flink.testutils.ClassLoaderUtils.writeAndCompile(ClassLoaderUtils.java:62) at org.apache.flink.testutils.ClassLoaderUtils.access$100(ClassLoaderUtils.java:46) at org.apache.flink.testutils.ClassLoaderUtils$ClassLoaderBuilder.build(ClassLoaderUtils.java:163) at org.apache.flink.runtime.util.ExceptionUtilsITCase$DummyClassLoadingProgram.loadDummyClass(ExceptionUtilsITCase.java:180) at org.apache.flink.runtime.util.ExceptionUtilsITCase$DummyClassLoadingProgram.main(ExceptionUtilsITCase.java:159) {code} > ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable > > > Key: FLINK-22765 > URL: https://issues.apache.org/jira/browse/FLINK-22765 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0, 1.20.0 >Reporter: Robert Metzger >Assignee: Robert Metzger >Priority: Critical > Labels: pull-request-available, stale-assigned, test-stability > Fix For: 1.14.0, 1.16.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56 > {code} > May 25 00:56:38 java.lang.AssertionError: > May 25 00:56:38 > May 25 00:56:38 Expected: is "" > May 25 00:56:38 but: was "The system is out of resources.\nConsult the > following stack trace for details." > May 25 00:56:38 at > org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > May 25 00:56:38 at org.junit.Assert.assertThat(Assert.java:956) > May 25 00:56:38 at org.junit.Assert.assertThat(Assert.java:923) > May 25 00:56:38 at > org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94) > May 25 00:56:38 at >
[jira] [Commented] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
[ https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817683#comment-17817683 ] Matthias Pohl commented on FLINK-22765: --- The following error is reported: {code} The system is out of resources. Consult the following stack trace for details. java.lang.OutOfMemoryError: Metaspace at jdk.compiler/com.sun.tools.javac.comp.Flow.analyzeTree(Flow.java:233) at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.flow(JavaCompiler.java:1419) at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.flow(JavaCompiler.java:1393) at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:976) at jdk.compiler/com.sun.tools.javac.main.Main.compile(Main.java:319) at jdk.compiler/com.sun.tools.javac.main.Main.compile(Main.java:178) at jdk.compiler/com.sun.tools.javac.Main.compile(Main.java:82) at jdk.compiler/com.sun.tools.javac.api.JavacTool.run(JavacTool.java:214) at org.apache.flink.testutils.ClassLoaderUtils.compileClass(ClassLoaderUtils.java:83) at org.apache.flink.testutils.ClassLoaderUtils.writeAndCompile(ClassLoaderUtils.java:62) at org.apache.flink.testutils.ClassLoaderUtils.access$100(ClassLoaderUtils.java:46) at org.apache.flink.testutils.ClassLoaderUtils$ClassLoaderBuilder.build(ClassLoaderUtils.java:163) at org.apache.flink.runtime.util.ExceptionUtilsITCase$DummyClassLoadingProgram.loadDummyClass(ExceptionUtilsITCase.java:180) at org.apache.flink.runtime.util.ExceptionUtilsITCase$DummyClassLoadingProgram.main(ExceptionUtilsITCase.java:159) {code} > ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable > > > Key: FLINK-22765 > URL: https://issues.apache.org/jira/browse/FLINK-22765 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0, 1.20.0 >Reporter: Robert Metzger >Assignee: Robert Metzger >Priority: Critical > Labels: pull-request-available, stale-assigned, test-stability > Fix For: 1.14.0, 1.16.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56 > {code} > May 25 00:56:38 java.lang.AssertionError: > May 25 00:56:38 > May 25 00:56:38 Expected: is "" > May 25 00:56:38 but: was "The system is out of resources.\nConsult the > following stack trace for details." > May 25 00:56:38 at > org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > May 25 00:56:38 at org.junit.Assert.assertThat(Assert.java:956) > May 25 00:56:38 at org.junit.Assert.assertThat(Assert.java:923) > May 25 00:56:38 at > org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94) > May 25 00:56:38 at > org.apache.flink.runtime.util.ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError(ExceptionUtilsITCase.java:70) > May 25 00:56:38 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > May 25 00:56:38 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > May 25 00:56:38 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > May 25 00:56:38 at java.lang.reflect.Method.invoke(Method.java:498) > May 25 00:56:38 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > May 25 00:56:38 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > May 25 00:56:38 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > May 25 00:56:38 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > May 25 00:56:38 at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > May 25 00:56:38 at > org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > May 25 00:56:38 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > May 25 00:56:38 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > May 25 00:56:38 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > May 25 00:56:38 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > May 25 00:56:38 at > org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > May 25 00:56:38 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > May 25 00:56:38 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > May 25 00:56:38 at >
[jira] [Comment Edited] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
[ https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817676#comment-17817676 ] Matthias Pohl edited comment on FLINK-22765 at 2/15/24 1:50 PM: The issue seems to be related with the JDK version. We've seen one failure with JDK11 ([20231014.2|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53728=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=11038]) and two with JDK17 ([GHA build #46|https://github.com/XComp/flink/actions/runs/7057414894/job/19211346164#step:12:7308], [GHA build #61|https://github.com/XComp/flink/actions/runs/7095339465/job/19312311325#step:12:8783]). All the other failure happened with JDK21. Indeed, it's reproducible locally with JDK21. That would also explain why we're seeing this error more often recently with the introduction of JDK21. was (Author: mapohl): The issue seems to be related with the JDK version. We've seen one failure with JDK11 ([20231014.2|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53728=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=11038]) and two with JDK17 ([GHA build #46|https://github.com/XComp/flink/actions/runs/7057414894/job/19211346164#step:12:7308], [GHA build #61|https://github.com/XComp/flink/actions/runs/7095339465/job/19312311325#step:12:8783]). All the other failure happened with JDK21. Indeed, it's reproducible locally with JDK21. > ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable > > > Key: FLINK-22765 > URL: https://issues.apache.org/jira/browse/FLINK-22765 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0, 1.20.0 >Reporter: Robert Metzger >Assignee: Robert Metzger >Priority: Critical > Labels: pull-request-available, stale-assigned, test-stability > Fix For: 1.14.0, 1.16.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56 > {code} > May 25 00:56:38 java.lang.AssertionError: > May 25 00:56:38 > May 25 00:56:38 Expected: is "" > May 25 00:56:38 but: was "The system is out of resources.\nConsult the > following stack trace for details." > May 25 00:56:38 at > org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > May 25 00:56:38 at org.junit.Assert.assertThat(Assert.java:956) > May 25 00:56:38 at org.junit.Assert.assertThat(Assert.java:923) > May 25 00:56:38 at > org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94) > May 25 00:56:38 at > org.apache.flink.runtime.util.ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError(ExceptionUtilsITCase.java:70) > May 25 00:56:38 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > May 25 00:56:38 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > May 25 00:56:38 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > May 25 00:56:38 at java.lang.reflect.Method.invoke(Method.java:498) > May 25 00:56:38 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > May 25 00:56:38 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > May 25 00:56:38 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > May 25 00:56:38 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > May 25 00:56:38 at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > May 25 00:56:38 at > org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > May 25 00:56:38 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > May 25 00:56:38 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > May 25 00:56:38 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > May 25 00:56:38 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > May 25 00:56:38 at > org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > May 25 00:56:38 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > May 25 00:56:38 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > May 25 00:56:38 at > org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > May 25 00:56:38 at >
[jira] [Comment Edited] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
[ https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817676#comment-17817676 ] Matthias Pohl edited comment on FLINK-22765 at 2/15/24 1:49 PM: The issue seems to be related with the JDK version. We've seen one failure with JDK11 ([20231014.2|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53728=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=11038]) and two with JDK17 ([GHA build #46|https://github.com/XComp/flink/actions/runs/7057414894/job/19211346164#step:12:7308], [GHA build #61|https://github.com/XComp/flink/actions/runs/7095339465/job/19312311325#step:12:8783]). All the other failure happened with JDK21. Indeed, it's reproducible locally with JDK21. was (Author: mapohl): The issue seems to be related with the JDK version. We've seen one failure with JDK11 ([20231014.2|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53728=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=11038]) and two with JDK17 ([GHA build #46|https://github.com/XComp/flink/actions/runs/7057414894/job/19211346164#step:12:7308], [GHA build #61|https://github.com/XComp/flink/actions/runs/7095339465/job/19312311325#step:12:8783]). All the other failure happened with JDK21. Indeed, it's reproducible locally. > ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable > > > Key: FLINK-22765 > URL: https://issues.apache.org/jira/browse/FLINK-22765 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0, 1.20.0 >Reporter: Robert Metzger >Assignee: Robert Metzger >Priority: Critical > Labels: pull-request-available, stale-assigned, test-stability > Fix For: 1.14.0, 1.16.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56 > {code} > May 25 00:56:38 java.lang.AssertionError: > May 25 00:56:38 > May 25 00:56:38 Expected: is "" > May 25 00:56:38 but: was "The system is out of resources.\nConsult the > following stack trace for details." > May 25 00:56:38 at > org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > May 25 00:56:38 at org.junit.Assert.assertThat(Assert.java:956) > May 25 00:56:38 at org.junit.Assert.assertThat(Assert.java:923) > May 25 00:56:38 at > org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94) > May 25 00:56:38 at > org.apache.flink.runtime.util.ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError(ExceptionUtilsITCase.java:70) > May 25 00:56:38 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > May 25 00:56:38 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > May 25 00:56:38 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > May 25 00:56:38 at java.lang.reflect.Method.invoke(Method.java:498) > May 25 00:56:38 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > May 25 00:56:38 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > May 25 00:56:38 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > May 25 00:56:38 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > May 25 00:56:38 at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > May 25 00:56:38 at > org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > May 25 00:56:38 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > May 25 00:56:38 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > May 25 00:56:38 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > May 25 00:56:38 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > May 25 00:56:38 at > org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > May 25 00:56:38 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > May 25 00:56:38 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > May 25 00:56:38 at > org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > May 25 00:56:38 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > May 25 00:56:38 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > May
[jira] [Comment Edited] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
[ https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817676#comment-17817676 ] Matthias Pohl edited comment on FLINK-22765 at 2/15/24 1:49 PM: The issue seems to be related with the JDK version. We've seen one failure with JDK11 ([20231014.2|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53728=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=11038]) and two with JDK17 ([GHA build #46|https://github.com/XComp/flink/actions/runs/7057414894/job/19211346164#step:12:7308], [GHA build #61|https://github.com/XComp/flink/actions/runs/7095339465/job/19312311325#step:12:8783]). All the other failure happened with JDK21. Indeed, it's reproducible locally. was (Author: mapohl): The issue seems to be related with the JDK version. We've seen one failure with JDK11 ([20231014.2|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53728=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=11038]) and two with JDK17 ([GHA build #46|https://github.com/XComp/flink/actions/runs/7057414894/job/19211346164#step:12:7308], [GHA build #61|https://github.com/XComp/flink/actions/runs/7095339465/job/19312311325#step:12:8783]). All the other failure happened with JDK21 > ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable > > > Key: FLINK-22765 > URL: https://issues.apache.org/jira/browse/FLINK-22765 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0, 1.20.0 >Reporter: Robert Metzger >Assignee: Robert Metzger >Priority: Critical > Labels: pull-request-available, stale-assigned, test-stability > Fix For: 1.14.0, 1.16.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56 > {code} > May 25 00:56:38 java.lang.AssertionError: > May 25 00:56:38 > May 25 00:56:38 Expected: is "" > May 25 00:56:38 but: was "The system is out of resources.\nConsult the > following stack trace for details." > May 25 00:56:38 at > org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > May 25 00:56:38 at org.junit.Assert.assertThat(Assert.java:956) > May 25 00:56:38 at org.junit.Assert.assertThat(Assert.java:923) > May 25 00:56:38 at > org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94) > May 25 00:56:38 at > org.apache.flink.runtime.util.ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError(ExceptionUtilsITCase.java:70) > May 25 00:56:38 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > May 25 00:56:38 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > May 25 00:56:38 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > May 25 00:56:38 at java.lang.reflect.Method.invoke(Method.java:498) > May 25 00:56:38 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > May 25 00:56:38 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > May 25 00:56:38 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > May 25 00:56:38 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > May 25 00:56:38 at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > May 25 00:56:38 at > org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > May 25 00:56:38 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > May 25 00:56:38 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > May 25 00:56:38 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > May 25 00:56:38 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > May 25 00:56:38 at > org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > May 25 00:56:38 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > May 25 00:56:38 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > May 25 00:56:38 at > org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > May 25 00:56:38 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > May 25 00:56:38 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > May 25 00:56:38 at
[jira] [Commented] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
[ https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817676#comment-17817676 ] Matthias Pohl commented on FLINK-22765: --- The issue seems to be related with the JDK version. We've seen one failure with JDK11 ([20231014.2|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53728=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=11038]) and two with JDK17 ([GHA build #46|https://github.com/XComp/flink/actions/runs/7057414894/job/19211346164#step:12:7308], [GHA build #61|https://github.com/XComp/flink/actions/runs/7095339465/job/19312311325#step:12:8783]). > ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable > > > Key: FLINK-22765 > URL: https://issues.apache.org/jira/browse/FLINK-22765 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0, 1.20.0 >Reporter: Robert Metzger >Assignee: Robert Metzger >Priority: Critical > Labels: pull-request-available, stale-assigned, test-stability > Fix For: 1.14.0, 1.16.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56 > {code} > May 25 00:56:38 java.lang.AssertionError: > May 25 00:56:38 > May 25 00:56:38 Expected: is "" > May 25 00:56:38 but: was "The system is out of resources.\nConsult the > following stack trace for details." > May 25 00:56:38 at > org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > May 25 00:56:38 at org.junit.Assert.assertThat(Assert.java:956) > May 25 00:56:38 at org.junit.Assert.assertThat(Assert.java:923) > May 25 00:56:38 at > org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94) > May 25 00:56:38 at > org.apache.flink.runtime.util.ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError(ExceptionUtilsITCase.java:70) > May 25 00:56:38 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > May 25 00:56:38 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > May 25 00:56:38 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > May 25 00:56:38 at java.lang.reflect.Method.invoke(Method.java:498) > May 25 00:56:38 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > May 25 00:56:38 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > May 25 00:56:38 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > May 25 00:56:38 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > May 25 00:56:38 at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > May 25 00:56:38 at > org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > May 25 00:56:38 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > May 25 00:56:38 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > May 25 00:56:38 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > May 25 00:56:38 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > May 25 00:56:38 at > org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > May 25 00:56:38 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > May 25 00:56:38 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > May 25 00:56:38 at > org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > May 25 00:56:38 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > May 25 00:56:38 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > May 25 00:56:38 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > May 25 00:56:38 at > org.junit.runners.ParentRunner.run(ParentRunner.java:363) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > May 25 00:56:38 at >
[jira] [Comment Edited] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
[ https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817676#comment-17817676 ] Matthias Pohl edited comment on FLINK-22765 at 2/15/24 1:34 PM: The issue seems to be related with the JDK version. We've seen one failure with JDK11 ([20231014.2|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53728=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=11038]) and two with JDK17 ([GHA build #46|https://github.com/XComp/flink/actions/runs/7057414894/job/19211346164#step:12:7308], [GHA build #61|https://github.com/XComp/flink/actions/runs/7095339465/job/19312311325#step:12:8783]). All the other failure happened with JDK21 was (Author: mapohl): The issue seems to be related with the JDK version. We've seen one failure with JDK11 ([20231014.2|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53728=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=11038]) and two with JDK17 ([GHA build #46|https://github.com/XComp/flink/actions/runs/7057414894/job/19211346164#step:12:7308], [GHA build #61|https://github.com/XComp/flink/actions/runs/7095339465/job/19312311325#step:12:8783]). > ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable > > > Key: FLINK-22765 > URL: https://issues.apache.org/jira/browse/FLINK-22765 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0, 1.20.0 >Reporter: Robert Metzger >Assignee: Robert Metzger >Priority: Critical > Labels: pull-request-available, stale-assigned, test-stability > Fix For: 1.14.0, 1.16.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56 > {code} > May 25 00:56:38 java.lang.AssertionError: > May 25 00:56:38 > May 25 00:56:38 Expected: is "" > May 25 00:56:38 but: was "The system is out of resources.\nConsult the > following stack trace for details." > May 25 00:56:38 at > org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > May 25 00:56:38 at org.junit.Assert.assertThat(Assert.java:956) > May 25 00:56:38 at org.junit.Assert.assertThat(Assert.java:923) > May 25 00:56:38 at > org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94) > May 25 00:56:38 at > org.apache.flink.runtime.util.ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError(ExceptionUtilsITCase.java:70) > May 25 00:56:38 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > May 25 00:56:38 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > May 25 00:56:38 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > May 25 00:56:38 at java.lang.reflect.Method.invoke(Method.java:498) > May 25 00:56:38 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > May 25 00:56:38 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > May 25 00:56:38 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > May 25 00:56:38 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > May 25 00:56:38 at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > May 25 00:56:38 at > org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > May 25 00:56:38 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > May 25 00:56:38 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > May 25 00:56:38 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > May 25 00:56:38 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > May 25 00:56:38 at > org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > May 25 00:56:38 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > May 25 00:56:38 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > May 25 00:56:38 at > org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > May 25 00:56:38 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > May 25 00:56:38 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > May 25 00:56:38 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > May 25 00:56:38 at >
[jira] [Commented] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
[ https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817644#comment-17817644 ] Matthias Pohl commented on FLINK-22765: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57535=logs=a657ddbf-d986-5381-9649-342d9c92e7fb=dc085d4a-05c8-580e-06ab-21f5624dab16=8999 > ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable > > > Key: FLINK-22765 > URL: https://issues.apache.org/jira/browse/FLINK-22765 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0, 1.20.0 >Reporter: Robert Metzger >Assignee: Robert Metzger >Priority: Critical > Labels: pull-request-available, stale-assigned, test-stability > Fix For: 1.14.0, 1.16.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56 > {code} > May 25 00:56:38 java.lang.AssertionError: > May 25 00:56:38 > May 25 00:56:38 Expected: is "" > May 25 00:56:38 but: was "The system is out of resources.\nConsult the > following stack trace for details." > May 25 00:56:38 at > org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > May 25 00:56:38 at org.junit.Assert.assertThat(Assert.java:956) > May 25 00:56:38 at org.junit.Assert.assertThat(Assert.java:923) > May 25 00:56:38 at > org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94) > May 25 00:56:38 at > org.apache.flink.runtime.util.ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError(ExceptionUtilsITCase.java:70) > May 25 00:56:38 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > May 25 00:56:38 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > May 25 00:56:38 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > May 25 00:56:38 at java.lang.reflect.Method.invoke(Method.java:498) > May 25 00:56:38 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > May 25 00:56:38 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > May 25 00:56:38 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > May 25 00:56:38 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > May 25 00:56:38 at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > May 25 00:56:38 at > org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > May 25 00:56:38 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > May 25 00:56:38 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > May 25 00:56:38 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > May 25 00:56:38 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > May 25 00:56:38 at > org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > May 25 00:56:38 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > May 25 00:56:38 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > May 25 00:56:38 at > org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > May 25 00:56:38 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > May 25 00:56:38 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > May 25 00:56:38 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > May 25 00:56:38 at > org.junit.runners.ParentRunner.run(ParentRunner.java:363) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > May 25 00:56:38 at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > May 25 00:56:38 at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > May 25 00:56:38 at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > May 25 00:56:38 at >
[jira] [Assigned] (FLINK-34447) ActiveResourceManagerTest#testWorkerRegistrationTimeoutNotCountingAllocationTime still fails on slow machines
[ https://issues.apache.org/jira/browse/FLINK-34447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl reassigned FLINK-34447: - Assignee: Matthias Pohl > ActiveResourceManagerTest#testWorkerRegistrationTimeoutNotCountingAllocationTime > still fails on slow machines > - > > Key: FLINK-34447 > URL: https://issues.apache.org/jira/browse/FLINK-34447 > Project: Flink > Issue Type: Technical Debt > Components: Runtime / Coordination >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Major > Labels: pull-request-available > > This appeared in this [PR CI > run|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57529=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=7997] > of FLINK-34427. > {code} > Feb 14 18:50:01 18:50:01.283 [ERROR] Tests run: 18, Failures: 1, Errors: 0, > Skipped: 0, Time elapsed: 0.665 s <<< FAILURE! -- in > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest > Feb 14 18:50:01 18:50:01.283 [ERROR] > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest.testWorkerRegistrationTimeoutNotCountingAllocationTime > -- Time elapsed: 0.197 s <<< FAILURE! > Feb 14 18:50:01 java.lang.AssertionError: > Feb 14 18:50:01 > Feb 14 18:50:01 Expecting > Feb 14 18:50:0170e6587e5e4ba9f310031a96bdda2971]> > Feb 14 18:50:01 not to be done. > Feb 14 18:50:01 Be aware that the state of the future in this message might > not reflect the one at the time when the assertion was performed as it is > evaluated later on > Feb 14 18:50:01 at > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest$15.lambda$new$3(ActiveResourceManagerTest.java:982) > Feb 14 18:50:01 at > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest$Context.runTest(ActiveResourceManagerTest.java:1133) > Feb 14 18:50:01 at > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest$15.(ActiveResourceManagerTest.java:963) > Feb 14 18:50:01 at > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest.testWorkerRegistrationTimeoutNotCountingAllocationTime(ActiveResourceManagerTest.java:946) > Feb 14 18:50:01 at java.lang.reflect.Method.invoke(Method.java:498) > Feb 14 18:50:01 at > java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) > Feb 14 18:50:01 at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) > Feb 14 18:50:01 at > java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) > Feb 14 18:50:01 at > java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) > Feb 14 18:50:01 at > java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) > {code} > But I was able to reproduce it locally as well. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34447) ActiveResourceManagerTest#testWorkerRegistrationTimeoutNotCountingAllocationTime still fails on slow machines
Matthias Pohl created FLINK-34447: - Summary: ActiveResourceManagerTest#testWorkerRegistrationTimeoutNotCountingAllocationTime still fails on slow machines Key: FLINK-34447 URL: https://issues.apache.org/jira/browse/FLINK-34447 Project: Flink Issue Type: Technical Debt Components: Runtime / Coordination Affects Versions: 1.18.1, 1.19.0, 1.20.0 Reporter: Matthias Pohl This appeared in this [PR CI run|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57529=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=7997] of FLINK-34427. {code} Feb 14 18:50:01 18:50:01.283 [ERROR] Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.665 s <<< FAILURE! -- in org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest Feb 14 18:50:01 18:50:01.283 [ERROR] org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest.testWorkerRegistrationTimeoutNotCountingAllocationTime -- Time elapsed: 0.197 s <<< FAILURE! Feb 14 18:50:01 java.lang.AssertionError: Feb 14 18:50:01 Feb 14 18:50:01 Expecting Feb 14 18:50:01 Feb 14 18:50:01 not to be done. Feb 14 18:50:01 Be aware that the state of the future in this message might not reflect the one at the time when the assertion was performed as it is evaluated later on Feb 14 18:50:01 at org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest$15.lambda$new$3(ActiveResourceManagerTest.java:982) Feb 14 18:50:01 at org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest$Context.runTest(ActiveResourceManagerTest.java:1133) Feb 14 18:50:01 at org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest$15.(ActiveResourceManagerTest.java:963) Feb 14 18:50:01 at org.apache.flink.runtime.resourcemanager.active.ActiveResourceManagerTest.testWorkerRegistrationTimeoutNotCountingAllocationTime(ActiveResourceManagerTest.java:946) Feb 14 18:50:01 at java.lang.reflect.Method.invoke(Method.java:498) Feb 14 18:50:01 at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) Feb 14 18:50:01 at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) Feb 14 18:50:01 at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) Feb 14 18:50:01 at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) Feb 14 18:50:01 at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) {code} But I was able to reproduce it locally as well. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32523) NotifyCheckpointAbortedITCase.testNotifyCheckpointAborted fails with timeout on AZP
[ https://issues.apache.org/jira/browse/FLINK-32523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817604#comment-17817604 ] Matthias Pohl commented on FLINK-32523: --- 1.17: https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57534=logs=2c3cbe13-dee0-5837-cf47-3053da9a8a78=b78d9d30-509a-5cea-1fef-db7abaa325ae=7946 > NotifyCheckpointAbortedITCase.testNotifyCheckpointAborted fails with timeout > on AZP > --- > > Key: FLINK-32523 > URL: https://issues.apache.org/jira/browse/FLINK-32523 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.16.2, 1.18.0, 1.17.1, 1.19.0 >Reporter: Sergey Nuyanzin >Assignee: Hangxiang Yu >Priority: Critical > Labels: pull-request-available, stale-assigned, test-stability > Attachments: failure.log > > > This build > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=50795=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=8638 > fails with timeout > {noformat} > Jul 03 01:26:35 org.junit.runners.model.TestTimedOutException: test timed out > after 10 milliseconds > Jul 03 01:26:35 at java.lang.Object.wait(Native Method) > Jul 03 01:26:35 at java.lang.Object.wait(Object.java:502) > Jul 03 01:26:35 at > org.apache.flink.core.testutils.OneShotLatch.await(OneShotLatch.java:61) > Jul 03 01:26:35 at > org.apache.flink.test.checkpointing.NotifyCheckpointAbortedITCase.verifyAllOperatorsNotifyAborted(NotifyCheckpointAbortedITCase.java:198) > Jul 03 01:26:35 at > org.apache.flink.test.checkpointing.NotifyCheckpointAbortedITCase.testNotifyCheckpointAborted(NotifyCheckpointAbortedITCase.java:189) > Jul 03 01:26:35 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > Jul 03 01:26:35 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > Jul 03 01:26:35 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Jul 03 01:26:35 at java.lang.reflect.Method.invoke(Method.java:498) > Jul 03 01:26:35 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > Jul 03 01:26:35 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > Jul 03 01:26:35 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > Jul 03 01:26:35 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > Jul 03 01:26:35 at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > Jul 03 01:26:35 at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > Jul 03 01:26:35 at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > Jul 03 01:26:35 at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-29114) TableSourceITCase#testTableHintWithLogicalTableScanReuse sometimes fails with result mismatch
[ https://issues.apache.org/jira/browse/FLINK-29114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-29114: -- Affects Version/s: 1.20.0 > TableSourceITCase#testTableHintWithLogicalTableScanReuse sometimes fails with > result mismatch > -- > > Key: FLINK-29114 > URL: https://issues.apache.org/jira/browse/FLINK-29114 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner, Tests >Affects Versions: 1.15.0, 1.19.0, 1.20.0 >Reporter: Sergey Nuyanzin >Priority: Minor > Labels: auto-deprioritized-major, test-stability > > It could be reproduced locally by repeating tests. Usually about 100 > iterations are enough to have several failed tests > {noformat} > [ERROR] Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: > 1.664 s <<< FAILURE! - in > org.apache.flink.table.planner.runtime.batch.sql.TableSourceITCase > [ERROR] > org.apache.flink.table.planner.runtime.batch.sql.TableSourceITCase.testTableHintWithLogicalTableScanReuse > Time elapsed: 0.108 s <<< FAILURE! > java.lang.AssertionError: expected: 3,2,Hello world, 3,2,Hello world, 3,2,Hello world)> but was: 2,2,Hello, 2,2,Hello, 3,2,Hello world, 3,2,Hello world)> > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:120) > at org.junit.Assert.assertEquals(Assert.java:146) > at > org.apache.flink.table.planner.runtime.batch.sql.TableSourceITCase.testTableHintWithLogicalTableScanReuse(TableSourceITCase.scala:428) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at org.junit.runner.JUnitCore.run(JUnitCore.java:115) > at > org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42) > at > org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80) > at > org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:72) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:107) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:67) > at >
[jira] [Commented] (FLINK-29114) TableSourceITCase#testTableHintWithLogicalTableScanReuse sometimes fails with result mismatch
[ https://issues.apache.org/jira/browse/FLINK-29114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817603#comment-17817603 ] Matthias Pohl commented on FLINK-29114: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57533=logs=f2c100be-250b-5e85-7bbe-176f68fcddc5=05efd11e-5400-54a4-0d27-a4663be008a9=11541 > TableSourceITCase#testTableHintWithLogicalTableScanReuse sometimes fails with > result mismatch > -- > > Key: FLINK-29114 > URL: https://issues.apache.org/jira/browse/FLINK-29114 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner, Tests >Affects Versions: 1.15.0, 1.19.0 >Reporter: Sergey Nuyanzin >Priority: Minor > Labels: auto-deprioritized-major, test-stability > > It could be reproduced locally by repeating tests. Usually about 100 > iterations are enough to have several failed tests > {noformat} > [ERROR] Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: > 1.664 s <<< FAILURE! - in > org.apache.flink.table.planner.runtime.batch.sql.TableSourceITCase > [ERROR] > org.apache.flink.table.planner.runtime.batch.sql.TableSourceITCase.testTableHintWithLogicalTableScanReuse > Time elapsed: 0.108 s <<< FAILURE! > java.lang.AssertionError: expected: 3,2,Hello world, 3,2,Hello world, 3,2,Hello world)> but was: 2,2,Hello, 2,2,Hello, 3,2,Hello world, 3,2,Hello world)> > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:120) > at org.junit.Assert.assertEquals(Assert.java:146) > at > org.apache.flink.table.planner.runtime.batch.sql.TableSourceITCase.testTableHintWithLogicalTableScanReuse(TableSourceITCase.scala:428) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at org.junit.runner.JUnitCore.run(JUnitCore.java:115) > at > org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42) > at > org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80) > at > org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:72) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:107) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54) > at >
[jira] [Commented] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
[ https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817602#comment-17817602 ] Matthias Pohl commented on FLINK-22765: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57533=logs=a657ddbf-d986-5381-9649-342d9c92e7fb=dc085d4a-05c8-580e-06ab-21f5624dab16=8998 > ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable > > > Key: FLINK-22765 > URL: https://issues.apache.org/jira/browse/FLINK-22765 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0, 1.20.0 >Reporter: Robert Metzger >Assignee: Robert Metzger >Priority: Critical > Labels: pull-request-available, stale-assigned, test-stability > Fix For: 1.14.0, 1.16.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56 > {code} > May 25 00:56:38 java.lang.AssertionError: > May 25 00:56:38 > May 25 00:56:38 Expected: is "" > May 25 00:56:38 but: was "The system is out of resources.\nConsult the > following stack trace for details." > May 25 00:56:38 at > org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > May 25 00:56:38 at org.junit.Assert.assertThat(Assert.java:956) > May 25 00:56:38 at org.junit.Assert.assertThat(Assert.java:923) > May 25 00:56:38 at > org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94) > May 25 00:56:38 at > org.apache.flink.runtime.util.ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError(ExceptionUtilsITCase.java:70) > May 25 00:56:38 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > May 25 00:56:38 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > May 25 00:56:38 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > May 25 00:56:38 at java.lang.reflect.Method.invoke(Method.java:498) > May 25 00:56:38 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > May 25 00:56:38 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > May 25 00:56:38 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > May 25 00:56:38 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > May 25 00:56:38 at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > May 25 00:56:38 at > org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > May 25 00:56:38 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > May 25 00:56:38 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > May 25 00:56:38 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > May 25 00:56:38 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > May 25 00:56:38 at > org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > May 25 00:56:38 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > May 25 00:56:38 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > May 25 00:56:38 at > org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > May 25 00:56:38 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > May 25 00:56:38 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > May 25 00:56:38 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > May 25 00:56:38 at > org.junit.runners.ParentRunner.run(ParentRunner.java:363) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > May 25 00:56:38 at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > May 25 00:56:38 at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > May 25 00:56:38 at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > May 25 00:56:38 at >
[jira] [Resolved] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
[ https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl resolved FLINK-34403. --- Resolution: Fixed * master ** [2298e53f35121602c56845ac8040439fbd1a9ff4|https://github.com/apache/flink/commit/2298e53f35121602c56845ac8040439fbd1a9ff4] ** [9a316a5bcc47da7f69e76e0c25ed257adc4298ce|https://github.com/apache/flink/commit/9a316a5bcc47da7f69e76e0c25ed257adc4298ce] > VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM > - > > Key: FLINK-34403 > URL: https://issues.apache.org/jira/browse/FLINK-34403 > Project: Flink > Issue Type: Bug > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.20.0 >Reporter: Benchao Li >Assignee: Matthias Pohl >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.20.0 > > > After FLINK-33611 merged, the misc test on GHA cannot pass due to out of > memory error, throwing following exceptions: > {code:java} > Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest > Error: 05:43:21 05:43:21.773 [ERROR] > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time > elapsed: 40.97 s <<< ERROR! > Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in > serialization. > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007) > Feb 07 05:43:21 at > org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56) > Feb 07 05:43:21 at > org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:89) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:76) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:71) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple(VeryBigPbRowToProtoTest.java:37) > Feb 07 05:43:21 at java.lang.reflect.Method.invoke(Method.java:498) > Feb 07 05:43:21 Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalArgumentException: Self-suppression not permitted > Feb 07 05:43:21 at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > Feb 07 05:43:21 at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:323) > Feb 07 05:43:21 ... 18 more > Feb 07 05:43:21 Caused by: java.lang.IllegalArgumentException: > Self-suppression not permitted > Feb 07 05:43:21 at > java.lang.Throwable.addSuppressed(Throwable.java:1072) > Feb 07 05:43:21 at > org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:556) > Feb 07 05:43:21 at >
[jira] [Comment Edited] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
[ https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817273#comment-17817273 ] Matthias Pohl edited comment on FLINK-34403 at 2/14/24 11:12 AM: - Args, all the time I didn't notice that they are two separate tests (with very similar names): * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 * https://github.com/apache/flink/actions/runs/7895502334/job/21548207280#step:10:23089 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=d871f0ce-7328-5d00-023b-e7391f5801c8=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6=23068 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=d871f0ce-7328-5d00-023b-e7391f5801c8=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6=23068 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=298e20ef-7951-5965-0e79-ea664ddc435e=d4c90338-c843-57b0-3232-10ae74f00347=23375 Reopening the issue. was (Author: mapohl): Args, all the time I didn't notice that they are two separate tests (with very similar names): * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 * https://github.com/apache/flink/actions/runs/7895502334/job/21548207280#step:10:23089 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=d871f0ce-7328-5d00-023b-e7391f5801c8=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6=23068 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=d871f0ce-7328-5d00-023b-e7391f5801c8=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6=23068 Reopening the issue. > VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM > - > > Key: FLINK-34403 > URL: https://issues.apache.org/jira/browse/FLINK-34403 > Project: Flink > Issue Type: Bug > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.20.0 >Reporter: Benchao Li >Assignee: Matthias Pohl >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.20.0 > > > After FLINK-33611 merged, the misc test on GHA cannot pass due to out of > memory error, throwing following exceptions: > {code:java} > Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest > Error: 05:43:21 05:43:21.773 [ERROR] > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time > elapsed: 40.97 s <<< ERROR! > Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in > serialization. > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007) > Feb 07 05:43:21 at > org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56) > Feb 07 05:43:21 at > org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382) > Feb 07
[jira] [Comment Edited] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
[ https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817273#comment-17817273 ] Matthias Pohl edited comment on FLINK-34403 at 2/14/24 11:12 AM: - Args, all the time I didn't notice that they are two separate tests (with very similar names): * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 * https://github.com/apache/flink/actions/runs/7895502334/job/21548207280#step:10:23089 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=d871f0ce-7328-5d00-023b-e7391f5801c8=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6=23068 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=d871f0ce-7328-5d00-023b-e7391f5801c8=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6=23068 Reopening the issue. was (Author: mapohl): Args, all the time I didn't notice that they are two separate tests (with very similar names): * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 * https://github.com/apache/flink/actions/runs/7895502334/job/21548207280#step:10:23089 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=d871f0ce-7328-5d00-023b-e7391f5801c8=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6=23068 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=298e20ef-7951-5965-0e79-ea664ddc435e=d4c90338-c843-57b0-3232-10ae74f00347=23375 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=298e20ef-7951-5965-0e79-ea664ddc435e=d4c90338-c843-57b0-3232-10ae74f00347=23375 Reopening the issue. > VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM > - > > Key: FLINK-34403 > URL: https://issues.apache.org/jira/browse/FLINK-34403 > Project: Flink > Issue Type: Bug > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.20.0 >Reporter: Benchao Li >Assignee: Matthias Pohl >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.20.0 > > > After FLINK-33611 merged, the misc test on GHA cannot pass due to out of > memory error, throwing following exceptions: > {code:java} > Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest > Error: 05:43:21 05:43:21.773 [ERROR] > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time > elapsed: 40.97 s <<< ERROR! > Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in > serialization. > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007) > Feb 07 05:43:21 at > org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56) > Feb 07 05:43:21 at > org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382) > Feb 07
[jira] [Comment Edited] (FLINK-34336) AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState may hang sometimes
[ https://issues.apache.org/jira/browse/FLINK-34336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817298#comment-17817298 ] Matthias Pohl edited comment on FLINK-34336 at 2/14/24 11:11 AM: - * https://github.com/apache/flink/actions/runs/7895502334/job/21548185872#step:10:10193 * https://github.com/apache/flink/actions/runs/7895502334/job/21548208160#step:10:11190 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=15356 was (Author: mapohl): * https://github.com/apache/flink/actions/runs/7895502334/job/21548185872#step:10:10193 * https://github.com/apache/flink/actions/runs/7895502334/job/21548208160#step:10:11190 > AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState > may hang sometimes > - > > Key: FLINK-34336 > URL: https://issues.apache.org/jira/browse/FLINK-34336 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.19.0, 1.20.0 >Reporter: Rui Fan >Assignee: Rui Fan >Priority: Major > Labels: pull-request-available, test-stability > Fix For: 1.19.0 > > > AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState > may hang in > waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, > {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};{color} > h2. Reason: > The job has 2 tasks(vertices), after calling updateJobResourceRequirements. > The source parallelism isn't changed (It's parallelism) , and the > FlatMapper+Sink is changed from parallelism to parallelism2. > So we expect the task number should be parallelism + parallelism2 instead of > parallelism2. > > h2. Why it can be passed for now? > Flink 1.19 supports the scaling cooldown, and the cooldown time is 30s by > default. It means, flink job will rescale job 30 seconds after > updateJobResourceRequirements is called. > > So the running tasks are old parallelism when we call > waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, > {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};. {color} > IIUC, it cannot be guaranteed, and it's unexpected. > > h2. How to reproduce this bug? > [https://github.com/1996fanrui/flink/commit/ffd713e24d37db2c103e4cd4361d0cd916d0d2f6] > * Disable the cooldown > * Sleep for a while before waitForRunningTasks > If so, the job running in new parallelism, so `waitForRunningTasks` will hang > forever. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
[ https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817273#comment-17817273 ] Matthias Pohl edited comment on FLINK-34403 at 2/14/24 11:11 AM: - Args, all the time I didn't notice that they are two separate tests (with very similar names): * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 * https://github.com/apache/flink/actions/runs/7895502334/job/21548207280#step:10:23089 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=d871f0ce-7328-5d00-023b-e7391f5801c8=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6=23068 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=298e20ef-7951-5965-0e79-ea664ddc435e=d4c90338-c843-57b0-3232-10ae74f00347=23375 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=298e20ef-7951-5965-0e79-ea664ddc435e=d4c90338-c843-57b0-3232-10ae74f00347=23375 Reopening the issue. was (Author: mapohl): Args, all the time I didn't notice that they are two separate tests (with very similar names): * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 * https://github.com/apache/flink/actions/runs/7895502334/job/21548207280#step:10:23089 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=d871f0ce-7328-5d00-023b-e7391f5801c8=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6=23068 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=298e20ef-7951-5965-0e79-ea664ddc435e=d4c90338-c843-57b0-3232-10ae74f00347=23375 Reopening the issue. > VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM > - > > Key: FLINK-34403 > URL: https://issues.apache.org/jira/browse/FLINK-34403 > Project: Flink > Issue Type: Bug > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.20.0 >Reporter: Benchao Li >Assignee: Matthias Pohl >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.20.0 > > > After FLINK-33611 merged, the misc test on GHA cannot pass due to out of > memory error, throwing following exceptions: > {code:java} > Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest > Error: 05:43:21 05:43:21.773 [ERROR] > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time > elapsed: 40.97 s <<< ERROR! > Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in > serialization. > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007) > Feb 07 05:43:21 at > org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56) > Feb 07 05:43:21 at > org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382) > Feb 07
[jira] [Comment Edited] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
[ https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817273#comment-17817273 ] Matthias Pohl edited comment on FLINK-34403 at 2/14/24 11:10 AM: - Args, all the time I didn't notice that they are two separate tests (with very similar names): * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 * https://github.com/apache/flink/actions/runs/7895502334/job/21548207280#step:10:23089 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=d871f0ce-7328-5d00-023b-e7391f5801c8=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6=23068 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=298e20ef-7951-5965-0e79-ea664ddc435e=d4c90338-c843-57b0-3232-10ae74f00347=23375 Reopening the issue. was (Author: mapohl): Args, all the time I didn't notice that they are two separate tests (with very similar names): * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 * https://github.com/apache/flink/actions/runs/7895502334/job/21548207280#step:10:23089 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=d871f0ce-7328-5d00-023b-e7391f5801c8=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6=23068 Reopening the issue. > VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM > - > > Key: FLINK-34403 > URL: https://issues.apache.org/jira/browse/FLINK-34403 > Project: Flink > Issue Type: Bug > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.20.0 >Reporter: Benchao Li >Assignee: Matthias Pohl >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.20.0 > > > After FLINK-33611 merged, the misc test on GHA cannot pass due to out of > memory error, throwing following exceptions: > {code:java} > Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest > Error: 05:43:21 05:43:21.773 [ERROR] > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time > elapsed: 40.97 s <<< ERROR! > Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in > serialization. > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007) > Feb 07 05:43:21 at > org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56) > Feb 07 05:43:21 at > org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66) > Feb 07 05:43:21 at >
[jira] [Comment Edited] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
[ https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817273#comment-17817273 ] Matthias Pohl edited comment on FLINK-34403 at 2/14/24 11:10 AM: - Args, all the time I didn't notice that they are two separate tests (with very similar names): * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 * https://github.com/apache/flink/actions/runs/7895502334/job/21548207280#step:10:23089 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57518=logs=d871f0ce-7328-5d00-023b-e7391f5801c8=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6=23068 Reopening the issue. was (Author: mapohl): Args, all the time I didn't notice that they are two separate tests (with very similar names): * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 * https://github.com/apache/flink/actions/runs/7895502334/job/21548207280#step:10:23089 Reopening the issue. > VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM > - > > Key: FLINK-34403 > URL: https://issues.apache.org/jira/browse/FLINK-34403 > Project: Flink > Issue Type: Bug > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.20.0 >Reporter: Benchao Li >Assignee: Matthias Pohl >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.20.0 > > > After FLINK-33611 merged, the misc test on GHA cannot pass due to out of > memory error, throwing following exceptions: > {code:java} > Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest > Error: 05:43:21 05:43:21.773 [ERROR] > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time > elapsed: 40.97 s <<< ERROR! > Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in > serialization. > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007) > Feb 07 05:43:21 at > org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56) > Feb 07 05:43:21 at > org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:89) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:76) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:71) > Feb 07 05:43:21 at >
[jira] [Commented] (FLINK-34273) git fetch fails
[ https://issues.apache.org/jira/browse/FLINK-34273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817334#comment-17817334 ] Matthias Pohl commented on FLINK-34273: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57521=logs=60960eae-6f09-579e-371e-29814bdd1adc=1fe608a4-e773-5ca0-5336-1c37a61b9f8d > git fetch fails > --- > > Key: FLINK-34273 > URL: https://issues.apache.org/jira/browse/FLINK-34273 > Project: Flink > Issue Type: Bug > Components: Build System / CI, Test Infrastructure >Affects Versions: 1.19.0, 1.18.1 >Reporter: Matthias Pohl >Priority: Major > Labels: test-stability > > We've seen multiple {{git fetch}} failures. I assume this to be an > infrastructure issue. This Jira issue is for documentation purposes. > {code:java} > error: RPC failed; curl 18 transfer closed with outstanding read data > remaining > error: 5211 bytes of body are still expected > fetch-pack: unexpected disconnect while reading sideband packet > fatal: early EOF > fatal: fetch-pack: invalid index-pack output {code} > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57080=logs=0e7be18f-84f2-53f0-a32d-4a5e4a174679=5d6dc3d3-393d-5111-3a40-c6a5a36202e6=667 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
[ https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-22765: -- Priority: Critical (was: Major) > ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable > > > Key: FLINK-22765 > URL: https://issues.apache.org/jira/browse/FLINK-22765 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0, 1.20.0 >Reporter: Robert Metzger >Assignee: Robert Metzger >Priority: Critical > Labels: pull-request-available, stale-assigned, test-stability > Fix For: 1.14.0, 1.16.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56 > {code} > May 25 00:56:38 java.lang.AssertionError: > May 25 00:56:38 > May 25 00:56:38 Expected: is "" > May 25 00:56:38 but: was "The system is out of resources.\nConsult the > following stack trace for details." > May 25 00:56:38 at > org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > May 25 00:56:38 at org.junit.Assert.assertThat(Assert.java:956) > May 25 00:56:38 at org.junit.Assert.assertThat(Assert.java:923) > May 25 00:56:38 at > org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94) > May 25 00:56:38 at > org.apache.flink.runtime.util.ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError(ExceptionUtilsITCase.java:70) > May 25 00:56:38 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > May 25 00:56:38 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > May 25 00:56:38 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > May 25 00:56:38 at java.lang.reflect.Method.invoke(Method.java:498) > May 25 00:56:38 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > May 25 00:56:38 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > May 25 00:56:38 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > May 25 00:56:38 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > May 25 00:56:38 at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > May 25 00:56:38 at > org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > May 25 00:56:38 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > May 25 00:56:38 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > May 25 00:56:38 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > May 25 00:56:38 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > May 25 00:56:38 at > org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > May 25 00:56:38 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > May 25 00:56:38 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > May 25 00:56:38 at > org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > May 25 00:56:38 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > May 25 00:56:38 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > May 25 00:56:38 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > May 25 00:56:38 at > org.junit.runners.ParentRunner.run(ParentRunner.java:363) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > May 25 00:56:38 at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > May 25 00:56:38 at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > May 25 00:56:38 at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > May 25 00:56:38 at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > May 25 00:56:38 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
[ https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817333#comment-17817333 ] Matthias Pohl commented on FLINK-22765: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57521=logs=a657ddbf-d986-5381-9649-342d9c92e7fb=dc085d4a-05c8-580e-06ab-21f5624dab16=8997 > ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable > > > Key: FLINK-22765 > URL: https://issues.apache.org/jira/browse/FLINK-22765 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0, 1.20.0 >Reporter: Robert Metzger >Assignee: Robert Metzger >Priority: Critical > Labels: pull-request-available, stale-assigned, test-stability > Fix For: 1.14.0, 1.16.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56 > {code} > May 25 00:56:38 java.lang.AssertionError: > May 25 00:56:38 > May 25 00:56:38 Expected: is "" > May 25 00:56:38 but: was "The system is out of resources.\nConsult the > following stack trace for details." > May 25 00:56:38 at > org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > May 25 00:56:38 at org.junit.Assert.assertThat(Assert.java:956) > May 25 00:56:38 at org.junit.Assert.assertThat(Assert.java:923) > May 25 00:56:38 at > org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94) > May 25 00:56:38 at > org.apache.flink.runtime.util.ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError(ExceptionUtilsITCase.java:70) > May 25 00:56:38 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > May 25 00:56:38 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > May 25 00:56:38 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > May 25 00:56:38 at java.lang.reflect.Method.invoke(Method.java:498) > May 25 00:56:38 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > May 25 00:56:38 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > May 25 00:56:38 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > May 25 00:56:38 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > May 25 00:56:38 at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > May 25 00:56:38 at > org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > May 25 00:56:38 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > May 25 00:56:38 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > May 25 00:56:38 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > May 25 00:56:38 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > May 25 00:56:38 at > org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > May 25 00:56:38 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > May 25 00:56:38 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > May 25 00:56:38 at > org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > May 25 00:56:38 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > May 25 00:56:38 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > May 25 00:56:38 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > May 25 00:56:38 at > org.junit.runners.ParentRunner.run(ParentRunner.java:363) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > May 25 00:56:38 at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > May 25 00:56:38 at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > May 25 00:56:38 at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > May 25 00:56:38 at >
[jira] [Commented] (FLINK-28440) EventTimeWindowCheckpointingITCase failed with restore
[ https://issues.apache.org/jira/browse/FLINK-28440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817300#comment-17817300 ] Matthias Pohl commented on FLINK-28440: --- https://github.com/apache/flink/actions/runs/7895502334/job/21548198516#step:10:7557 > EventTimeWindowCheckpointingITCase failed with restore > -- > > Key: FLINK-28440 > URL: https://issues.apache.org/jira/browse/FLINK-28440 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing, Runtime / State Backends >Affects Versions: 1.16.0, 1.17.0, 1.18.0, 1.19.0 >Reporter: Huang Xingbo >Assignee: Yanfei Lei >Priority: Critical > Labels: auto-deprioritized-critical, pull-request-available, > stale-assigned, test-stability > Fix For: 1.19.0 > > Attachments: image-2023-02-01-00-51-54-506.png, > image-2023-02-01-01-10-01-521.png, image-2023-02-01-01-19-12-182.png, > image-2023-02-01-16-47-23-756.png, image-2023-02-01-16-57-43-889.png, > image-2023-02-02-10-52-56-599.png, image-2023-02-03-10-09-07-586.png, > image-2023-02-03-12-03-16-155.png, image-2023-02-03-12-03-56-614.png > > > {code:java} > Caused by: java.lang.Exception: Exception while creating > StreamOperatorStateContext. > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:256) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268) > at > org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:722) > at > org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:698) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:665) > at > org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:935) > at > org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:904) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:728) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.flink.util.FlinkException: Could not restore keyed > state backend for WindowOperator_0a448493b4782967b150582570326227_(2/4) from > any of the 1 provided restore options. > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:353) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:165) > ... 11 more > Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: > /tmp/junit1835099326935900400/junit1113650082510421526/52ee65b7-033f-4429-8ddd-adbe85e27ced > (No such file or directory) > at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321) > at > org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.advance(StateChangelogHandleStreamHandleReader.java:87) > at > org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.hasNext(StateChangelogHandleStreamHandleReader.java:69) > at > org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.readBackendHandle(ChangelogBackendRestoreOperation.java:96) > at > org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.restore(ChangelogBackendRestoreOperation.java:75) > at > org.apache.flink.state.changelog.ChangelogStateBackend.restore(ChangelogStateBackend.java:92) > at > org.apache.flink.state.changelog.AbstractChangelogStateBackend.createKeyedStateBackend(AbstractChangelogStateBackend.java:136) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:336) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) > ... 13 more > Caused by: java.io.FileNotFoundException: >
[jira] [Commented] (FLINK-34418) Disk space issues for Docker-ized GitHub Action jobs
[ https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817301#comment-17817301 ] Matthias Pohl commented on FLINK-34418: --- https://github.com/apache/flink/actions/runs/7895502334 > Disk space issues for Docker-ized GitHub Action jobs > > > Key: FLINK-34418 > URL: https://issues.apache.org/jira/browse/FLINK-34418 > Project: Flink > Issue Type: Bug > Components: Test Infrastructure >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Critical > Labels: github-actions, pull-request-available, test-stability > > [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746] > {code:java} > [...] > Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device > 27608Feb 09 03:00:13 at java.io.FileOutputStream.writeBytes(Native Method) > 27609Feb 09 03:00:13 at > java.io.FileOutputStream.write(FileOutputStream.java:326) > 27610Feb 09 03:00:13 at > org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250) > 27611Feb 09 03:00:13 ... 39 more > [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
[ https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817273#comment-17817273 ] Matthias Pohl edited comment on FLINK-34403 at 2/14/24 9:59 AM: Args, all the time I didn't notice that they are two separate tests (with very similar names): * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 * https://github.com/apache/flink/actions/runs/7895502334/job/21548207280#step:10:23089 Reopening the issue. was (Author: mapohl): Args, all the time I didn't notice that they are two separate tests (with very similar names): * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 Reopening the issue. > VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM > - > > Key: FLINK-34403 > URL: https://issues.apache.org/jira/browse/FLINK-34403 > Project: Flink > Issue Type: Bug > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.20.0 >Reporter: Benchao Li >Assignee: Matthias Pohl >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.20.0 > > > After FLINK-33611 merged, the misc test on GHA cannot pass due to out of > memory error, throwing following exceptions: > {code:java} > Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest > Error: 05:43:21 05:43:21.773 [ERROR] > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time > elapsed: 40.97 s <<< ERROR! > Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in > serialization. > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007) > Feb 07 05:43:21 at > org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56) > Feb 07 05:43:21 at > org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:89) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:76) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:71) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple(VeryBigPbRowToProtoTest.java:37) > Feb 07 05:43:21 at java.lang.reflect.Method.invoke(Method.java:498) > Feb 07 05:43:21 Caused by: java.util.concurrent.ExecutionException: >
[jira] [Comment Edited] (FLINK-34336) AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState may hang sometimes
[ https://issues.apache.org/jira/browse/FLINK-34336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817298#comment-17817298 ] Matthias Pohl edited comment on FLINK-34336 at 2/14/24 9:59 AM: * https://github.com/apache/flink/actions/runs/7895502334/job/21548185872#step:10:10193 * https://github.com/apache/flink/actions/runs/7895502334/job/21548208160#step:10:11190 was (Author: mapohl): https://github.com/apache/flink/actions/runs/7895502334/job/21548185872#step:10:10193 > AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState > may hang sometimes > - > > Key: FLINK-34336 > URL: https://issues.apache.org/jira/browse/FLINK-34336 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.19.0, 1.20.0 >Reporter: Rui Fan >Assignee: Rui Fan >Priority: Major > Labels: pull-request-available, test-stability > Fix For: 1.19.0 > > > AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState > may hang in > waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, > {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};{color} > h2. Reason: > The job has 2 tasks(vertices), after calling updateJobResourceRequirements. > The source parallelism isn't changed (It's parallelism) , and the > FlatMapper+Sink is changed from parallelism to parallelism2. > So we expect the task number should be parallelism + parallelism2 instead of > parallelism2. > > h2. Why it can be passed for now? > Flink 1.19 supports the scaling cooldown, and the cooldown time is 30s by > default. It means, flink job will rescale job 30 seconds after > updateJobResourceRequirements is called. > > So the running tasks are old parallelism when we call > waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, > {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};. {color} > IIUC, it cannot be guaranteed, and it's unexpected. > > h2. How to reproduce this bug? > [https://github.com/1996fanrui/flink/commit/ffd713e24d37db2c103e4cd4361d0cd916d0d2f6] > * Disable the cooldown > * Sleep for a while before waitForRunningTasks > If so, the job running in new parallelism, so `waitForRunningTasks` will hang > forever. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34336) AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState may hang sometimes
[ https://issues.apache.org/jira/browse/FLINK-34336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34336: -- Labels: pull-request-available test-stability (was: pull-request-available) > AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState > may hang sometimes > - > > Key: FLINK-34336 > URL: https://issues.apache.org/jira/browse/FLINK-34336 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.19.0 >Reporter: Rui Fan >Assignee: Rui Fan >Priority: Major > Labels: pull-request-available, test-stability > Fix For: 1.19.0 > > > AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState > may hang in > waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, > {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};{color} > h2. Reason: > The job has 2 tasks(vertices), after calling updateJobResourceRequirements. > The source parallelism isn't changed (It's parallelism) , and the > FlatMapper+Sink is changed from parallelism to parallelism2. > So we expect the task number should be parallelism + parallelism2 instead of > parallelism2. > > h2. Why it can be passed for now? > Flink 1.19 supports the scaling cooldown, and the cooldown time is 30s by > default. It means, flink job will rescale job 30 seconds after > updateJobResourceRequirements is called. > > So the running tasks are old parallelism when we call > waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, > {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};. {color} > IIUC, it cannot be guaranteed, and it's unexpected. > > h2. How to reproduce this bug? > [https://github.com/1996fanrui/flink/commit/ffd713e24d37db2c103e4cd4361d0cd916d0d2f6] > * Disable the cooldown > * Sleep for a while before waitForRunningTasks > If so, the job running in new parallelism, so `waitForRunningTasks` will hang > forever. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34336) AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState may hang sometimes
[ https://issues.apache.org/jira/browse/FLINK-34336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34336: -- Affects Version/s: 1.20.0 > AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState > may hang sometimes > - > > Key: FLINK-34336 > URL: https://issues.apache.org/jira/browse/FLINK-34336 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.19.0, 1.20.0 >Reporter: Rui Fan >Assignee: Rui Fan >Priority: Major > Labels: pull-request-available, test-stability > Fix For: 1.19.0 > > > AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState > may hang in > waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, > {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};{color} > h2. Reason: > The job has 2 tasks(vertices), after calling updateJobResourceRequirements. > The source parallelism isn't changed (It's parallelism) , and the > FlatMapper+Sink is changed from parallelism to parallelism2. > So we expect the task number should be parallelism + parallelism2 instead of > parallelism2. > > h2. Why it can be passed for now? > Flink 1.19 supports the scaling cooldown, and the cooldown time is 30s by > default. It means, flink job will rescale job 30 seconds after > updateJobResourceRequirements is called. > > So the running tasks are old parallelism when we call > waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, > {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};. {color} > IIUC, it cannot be guaranteed, and it's unexpected. > > h2. How to reproduce this bug? > [https://github.com/1996fanrui/flink/commit/ffd713e24d37db2c103e4cd4361d0cd916d0d2f6] > * Disable the cooldown > * Sleep for a while before waitForRunningTasks > If so, the job running in new parallelism, so `waitForRunningTasks` will hang > forever. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34336) AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState may hang sometimes
[ https://issues.apache.org/jira/browse/FLINK-34336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817298#comment-17817298 ] Matthias Pohl commented on FLINK-34336: --- https://github.com/apache/flink/actions/runs/7895502334/job/21548185872#step:10:10193 > AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState > may hang sometimes > - > > Key: FLINK-34336 > URL: https://issues.apache.org/jira/browse/FLINK-34336 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.19.0 >Reporter: Rui Fan >Assignee: Rui Fan >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0 > > > AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState > may hang in > waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, > {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};{color} > h2. Reason: > The job has 2 tasks(vertices), after calling updateJobResourceRequirements. > The source parallelism isn't changed (It's parallelism) , and the > FlatMapper+Sink is changed from parallelism to parallelism2. > So we expect the task number should be parallelism + parallelism2 instead of > parallelism2. > > h2. Why it can be passed for now? > Flink 1.19 supports the scaling cooldown, and the cooldown time is 30s by > default. It means, flink job will rescale job 30 seconds after > updateJobResourceRequirements is called. > > So the running tasks are old parallelism when we call > waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, > {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};. {color} > IIUC, it cannot be guaranteed, and it's unexpected. > > h2. How to reproduce this bug? > [https://github.com/1996fanrui/flink/commit/ffd713e24d37db2c103e4cd4361d0cd916d0d2f6] > * Disable the cooldown > * Sleep for a while before waitForRunningTasks > If so, the job running in new parallelism, so `waitForRunningTasks` will hang > forever. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34418) Disk space issues for Docker-ized GitHub Action jobs
[ https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817295#comment-17817295 ] Matthias Pohl commented on FLINK-34418: --- https://github.com/apache/flink/actions/runs/7895502322/job/21548178211 > Disk space issues for Docker-ized GitHub Action jobs > > > Key: FLINK-34418 > URL: https://issues.apache.org/jira/browse/FLINK-34418 > Project: Flink > Issue Type: Bug > Components: Test Infrastructure >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Critical > Labels: github-actions, pull-request-available, test-stability > > [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746] > {code:java} > [...] > Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device > 27608Feb 09 03:00:13 at java.io.FileOutputStream.writeBytes(Native Method) > 27609Feb 09 03:00:13 at > java.io.FileOutputStream.write(FileOutputStream.java:326) > 27610Feb 09 03:00:13 at > org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250) > 27611Feb 09 03:00:13 ... 39 more > [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34418) Disk space issues for Docker-ized GitHub Action jobs
[ https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817292#comment-17817292 ] Matthias Pohl commented on FLINK-34418: --- https://github.com/apache/flink/actions/runs/7895502206/job/21548178104 > Disk space issues for Docker-ized GitHub Action jobs > > > Key: FLINK-34418 > URL: https://issues.apache.org/jira/browse/FLINK-34418 > Project: Flink > Issue Type: Bug > Components: Test Infrastructure >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Critical > Labels: github-actions, pull-request-available, test-stability > > [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746] > {code:java} > [...] > Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device > 27608Feb 09 03:00:13 at java.io.FileOutputStream.writeBytes(Native Method) > 27609Feb 09 03:00:13 at > java.io.FileOutputStream.write(FileOutputStream.java:326) > 27610Feb 09 03:00:13 at > org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250) > 27611Feb 09 03:00:13 ... 39 more > [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34443) YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication failed when deploying job cluster
[ https://issues.apache.org/jira/browse/FLINK-34443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817293#comment-17817293 ] Matthias Pohl commented on FLINK-34443: --- Maybe related to FLINK-34418 > YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication failed > when deploying job cluster > --- > > Key: FLINK-34443 > URL: https://issues.apache.org/jira/browse/FLINK-34443 > Project: Flink > Issue Type: Bug > Components: Build System / CI, Runtime / Coordination, Test > Infrastructure >Affects Versions: 1.19.0, 1.20.0 >Reporter: Matthias Pohl >Priority: Major > Labels: github-actions, test-stability > > https://github.com/apache/flink/actions/runs/7895502206/job/21548246199#step:10:28804 > {code} > Error: 03:04:05 03:04:05.066 [ERROR] Tests run: 2, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 68.10 s <<< FAILURE! -- in > org.apache.flink.yarn.YARNFileReplicationITCase > Error: 03:04:05 03:04:05.067 [ERROR] > org.apache.flink.yarn.YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication > -- Time elapsed: 1.982 s <<< ERROR! > Feb 14 03:04:05 > org.apache.flink.client.deployment.ClusterDeploymentException: Could not > deploy Yarn job cluster. > Feb 14 03:04:05 at > org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:566) > Feb 14 03:04:05 at > org.apache.flink.yarn.YARNFileReplicationITCase.deployPerJob(YARNFileReplicationITCase.java:109) > Feb 14 03:04:05 at > org.apache.flink.yarn.YARNFileReplicationITCase.lambda$testPerJobModeWithCustomizedFileReplication$0(YARNFileReplicationITCase.java:73) > Feb 14 03:04:05 at > org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:303) > Feb 14 03:04:05 at > org.apache.flink.yarn.YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication(YARNFileReplicationITCase.java:73) > Feb 14 03:04:05 at java.lang.reflect.Method.invoke(Method.java:498) > Feb 14 03:04:05 at > java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) > Feb 14 03:04:05 at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) > Feb 14 03:04:05 at > java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) > Feb 14 03:04:05 at > java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) > Feb 14 03:04:05 at > java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) > Feb 14 03:04:05 Caused by: > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/root/.flink/application_1707879779446_0002/log4j-api-2.17.1.jar could > only be written to 0 of the 1 minReplication nodes. There are 2 datanode(s) > running and 2 node(s) are excluded in this operation. > Feb 14 03:04:05 at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2260) > Feb 14 03:04:05 at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294) > Feb 14 03:04:05 at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2813) > Feb 14 03:04:05 at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:908) > Feb 14 03:04:05 at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:577) > Feb 14 03:04:05 at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > Feb 14 03:04:05 at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:549) > Feb 14 03:04:05 at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:518) > Feb 14 03:04:05 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086) > Feb 14 03:04:05 at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029) > Feb 14 03:04:05 at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:957) > Feb 14 03:04:05 at java.security.AccessController.doPrivileged(Native > Method) > Feb 14 03:04:05 at javax.security.auth.Subject.doAs(Subject.java:422) > Feb 14 03:04:05 at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) > Feb 14 03:04:05 at > org.apache.hadoop.ipc.Server$Handler.run(Server.java:2957) > Feb 14 03:04:05 > Feb 14 03:04:05 at > org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1579) > Feb 14 03:04:05 at org.apache.hadoop.ipc.Client.call(Client.java:1525) > Feb 14
[jira] [Created] (FLINK-34443) YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication failed when deploying job cluster
Matthias Pohl created FLINK-34443: - Summary: YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication failed when deploying job cluster Key: FLINK-34443 URL: https://issues.apache.org/jira/browse/FLINK-34443 Project: Flink Issue Type: Bug Components: Build System / CI, Runtime / Coordination, Test Infrastructure Affects Versions: 1.19.0, 1.20.0 Reporter: Matthias Pohl https://github.com/apache/flink/actions/runs/7895502206/job/21548246199#step:10:28804 {code} Error: 03:04:05 03:04:05.066 [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 68.10 s <<< FAILURE! -- in org.apache.flink.yarn.YARNFileReplicationITCase Error: 03:04:05 03:04:05.067 [ERROR] org.apache.flink.yarn.YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication -- Time elapsed: 1.982 s <<< ERROR! Feb 14 03:04:05 org.apache.flink.client.deployment.ClusterDeploymentException: Could not deploy Yarn job cluster. Feb 14 03:04:05 at org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:566) Feb 14 03:04:05 at org.apache.flink.yarn.YARNFileReplicationITCase.deployPerJob(YARNFileReplicationITCase.java:109) Feb 14 03:04:05 at org.apache.flink.yarn.YARNFileReplicationITCase.lambda$testPerJobModeWithCustomizedFileReplication$0(YARNFileReplicationITCase.java:73) Feb 14 03:04:05 at org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:303) Feb 14 03:04:05 at org.apache.flink.yarn.YARNFileReplicationITCase.testPerJobModeWithCustomizedFileReplication(YARNFileReplicationITCase.java:73) Feb 14 03:04:05 at java.lang.reflect.Method.invoke(Method.java:498) Feb 14 03:04:05 at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) Feb 14 03:04:05 at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) Feb 14 03:04:05 at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) Feb 14 03:04:05 at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) Feb 14 03:04:05 at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) Feb 14 03:04:05 Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/root/.flink/application_1707879779446_0002/log4j-api-2.17.1.jar could only be written to 0 of the 1 minReplication nodes. There are 2 datanode(s) running and 2 node(s) are excluded in this operation. Feb 14 03:04:05 at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2260) Feb 14 03:04:05 at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294) Feb 14 03:04:05 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2813) Feb 14 03:04:05 at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:908) Feb 14 03:04:05 at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:577) Feb 14 03:04:05 at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) Feb 14 03:04:05 at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:549) Feb 14 03:04:05 at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:518) Feb 14 03:04:05 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086) Feb 14 03:04:05 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029) Feb 14 03:04:05 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:957) Feb 14 03:04:05 at java.security.AccessController.doPrivileged(Native Method) Feb 14 03:04:05 at javax.security.auth.Subject.doAs(Subject.java:422) Feb 14 03:04:05 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) Feb 14 03:04:05 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2957) Feb 14 03:04:05 Feb 14 03:04:05 at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1579) Feb 14 03:04:05 at org.apache.hadoop.ipc.Client.call(Client.java:1525) Feb 14 03:04:05 at org.apache.hadoop.ipc.Client.call(Client.java:1422) Feb 14 03:04:05 at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:231) Feb 14 03:04:05 at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) Feb 14 03:04:05 at com.sun.proxy.$Proxy113.addBlock(Unknown Source) Feb 14 03:04:05 at
[jira] [Commented] (FLINK-30629) ClientHeartbeatTest.testJobRunningIfClientReportHeartbeat is unstable
[ https://issues.apache.org/jira/browse/FLINK-30629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817283#comment-17817283 ] Matthias Pohl commented on FLINK-30629: --- 1.17: https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57519=logs=77a9d8e1-d610-59b3-fc2a-4766541e0e33=125e07e7-8de0-5c6c-a541-a567415af3ef=9747 > ClientHeartbeatTest.testJobRunningIfClientReportHeartbeat is unstable > - > > Key: FLINK-30629 > URL: https://issues.apache.org/jira/browse/FLINK-30629 > Project: Flink > Issue Type: Bug > Components: Client / Job Submission >Affects Versions: 1.17.0, 1.18.0 >Reporter: Xintong Song >Assignee: Liu >Priority: Critical > Labels: pull-request-available, stale-assigned, test-stability > Fix For: 1.19.0 > > Attachments: ClientHeartbeatTestLog.txt, > logs-cron_azure-test_cron_azure_core-1685497478.zip > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44690=logs=77a9d8e1-d610-59b3-fc2a-4766541e0e33=125e07e7-8de0-5c6c-a541-a567415af3ef=10819 > {code:java} > Jan 11 04:32:39 [ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, > Time elapsed: 21.02 s <<< FAILURE! - in > org.apache.flink.client.ClientHeartbeatTest > Jan 11 04:32:39 [ERROR] > org.apache.flink.client.ClientHeartbeatTest.testJobRunningIfClientReportHeartbeat > Time elapsed: 9.157 s <<< ERROR! > Jan 11 04:32:39 java.lang.IllegalStateException: MiniCluster is not yet > running or has already been shut down. > Jan 11 04:32:39 at > org.apache.flink.util.Preconditions.checkState(Preconditions.java:193) > Jan 11 04:32:39 at > org.apache.flink.runtime.minicluster.MiniCluster.getDispatcherGatewayFuture(MiniCluster.java:1044) > Jan 11 04:32:39 at > org.apache.flink.runtime.minicluster.MiniCluster.runDispatcherCommand(MiniCluster.java:917) > Jan 11 04:32:39 at > org.apache.flink.runtime.minicluster.MiniCluster.getJobStatus(MiniCluster.java:841) > Jan 11 04:32:39 at > org.apache.flink.runtime.minicluster.MiniClusterJobClient.getJobStatus(MiniClusterJobClient.java:91) > Jan 11 04:32:39 at > org.apache.flink.client.ClientHeartbeatTest.testJobRunningIfClientReportHeartbeat(ClientHeartbeatTest.java:79) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-21834) org.apache.flink.core.fs.AbstractRecoverableWriterTest.testResumeWithWrongOffset fail
[ https://issues.apache.org/jira/browse/FLINK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817281#comment-17817281 ] Matthias Pohl commented on FLINK-21834: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57504=logs=2e8cb2f7-b2d3-5c62-9c05-cd756d33a819=2dd510a3-5041-5201-6dc3-54d310f68906=10519 {code} Feb 13 12:33:39 12:33:39.888 [ERROR] Tests run: 11, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 116.9 s <<< FAILURE! -- in org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriterTest Feb 13 12:33:39 12:33:39.888 [ERROR] org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriterTest.testResumeWithWrongOffset -- Time elapsed: 100.7 s <<< FAILURE! Feb 13 12:33:39 java.lang.AssertionError Feb 13 12:33:39 at org.apache.flink.core.fs.AbstractRecoverableWriterTest.testResumeWithWrongOffset(AbstractRecoverableWriterTest.java:381) Feb 13 12:33:39 at java.lang.reflect.Method.invoke(Method.java:498) Feb 13 12:33:39 at org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) Feb 13 12:33:39 {code} > org.apache.flink.core.fs.AbstractRecoverableWriterTest.testResumeWithWrongOffset > fail > - > > Key: FLINK-21834 > URL: https://issues.apache.org/jira/browse/FLINK-21834 > Project: Flink > Issue Type: Bug > Components: FileSystems >Affects Versions: 1.12.2, 1.13.2, 1.15.0 >Reporter: Guowei Ma >Priority: Not a Priority > Labels: auto-deprioritized-critical, auto-deprioritized-major, > auto-deprioritized-minor, test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=14847=logs=3d12d40f-c62d-5ec4-6acc-0efe94cc3e89=5d6e4255-0ea8-5e2a-f52c-c881b7872361=10893 > Maybe we need print what the exception is when `recover` is called. > {code:java} > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.fail(Assert.java:95) > at > org.apache.flink.core.fs.AbstractRecoverableWriterTest.testResumeWithWrongOffset(AbstractRecoverableWriterTest.java:381) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at >
[jira] [Comment Edited] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
[ https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817273#comment-17817273 ] Matthias Pohl edited comment on FLINK-34403 at 2/14/24 9:37 AM: Args, all the time I didn't notice that they are two separate tests (with very similar names): * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 Reopening the issue. was (Author: mapohl): Args, all the time I didn't notice that they are two separate tests (with very similar names): https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 Reopening the issue. > VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM > - > > Key: FLINK-34403 > URL: https://issues.apache.org/jira/browse/FLINK-34403 > Project: Flink > Issue Type: Bug > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.20.0 >Reporter: Benchao Li >Assignee: Matthias Pohl >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.20.0 > > > After FLINK-33611 merged, the misc test on GHA cannot pass due to out of > memory error, throwing following exceptions: > {code:java} > Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest > Error: 05:43:21 05:43:21.773 [ERROR] > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time > elapsed: 40.97 s <<< ERROR! > Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in > serialization. > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007) > Feb 07 05:43:21 at > org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56) > Feb 07 05:43:21 at > org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:89) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:76) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:71) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple(VeryBigPbRowToProtoTest.java:37) > Feb 07 05:43:21 at java.lang.reflect.Method.invoke(Method.java:498) > Feb 07 05:43:21 Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalArgumentException: Self-suppression not permitted > Feb 07 05:43:21 at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > Feb 07 05:43:21 at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) > Feb 07
[jira] [Reopened] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
[ https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl reopened FLINK-34403: --- Args, all the time I didn't notice that they are two separate tests (with very similar names): https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57499=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23862 Reopening the issue. > VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM > - > > Key: FLINK-34403 > URL: https://issues.apache.org/jira/browse/FLINK-34403 > Project: Flink > Issue Type: Bug > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.20.0 >Reporter: Benchao Li >Assignee: Matthias Pohl >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.20.0 > > > After FLINK-33611 merged, the misc test on GHA cannot pass due to out of > memory error, throwing following exceptions: > {code:java} > Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest > Error: 05:43:21 05:43:21.773 [ERROR] > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time > elapsed: 40.97 s <<< ERROR! > Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in > serialization. > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007) > Feb 07 05:43:21 at > org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56) > Feb 07 05:43:21 at > org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:89) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:76) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:71) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple(VeryBigPbRowToProtoTest.java:37) > Feb 07 05:43:21 at java.lang.reflect.Method.invoke(Method.java:498) > Feb 07 05:43:21 Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalArgumentException: Self-suppression not permitted > Feb 07 05:43:21 at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > Feb 07 05:43:21 at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:323) > Feb 07 05:43:21 ... 18 more > Feb 07 05:43:21 Caused by: java.lang.IllegalArgumentException: > Self-suppression not permitted > Feb 07 05:43:21 at > java.lang.Throwable.addSuppressed(Throwable.java:1072) > Feb 07 05:43:21 at > org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:556) > Feb 07 05:43:21 at >
[jira] [Commented] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
[ https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817271#comment-17817271 ] Matthias Pohl commented on FLINK-34403: --- No worries > VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM > - > > Key: FLINK-34403 > URL: https://issues.apache.org/jira/browse/FLINK-34403 > Project: Flink > Issue Type: Bug > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.20.0 >Reporter: Benchao Li >Assignee: Matthias Pohl >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.20.0 > > > After FLINK-33611 merged, the misc test on GHA cannot pass due to out of > memory error, throwing following exceptions: > {code:java} > Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest > Error: 05:43:21 05:43:21.773 [ERROR] > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time > elapsed: 40.97 s <<< ERROR! > Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in > serialization. > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007) > Feb 07 05:43:21 at > org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56) > Feb 07 05:43:21 at > org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:89) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:76) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:71) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple(VeryBigPbRowToProtoTest.java:37) > Feb 07 05:43:21 at java.lang.reflect.Method.invoke(Method.java:498) > Feb 07 05:43:21 Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalArgumentException: Self-suppression not permitted > Feb 07 05:43:21 at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > Feb 07 05:43:21 at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:323) > Feb 07 05:43:21 ... 18 more > Feb 07 05:43:21 Caused by: java.lang.IllegalArgumentException: > Self-suppression not permitted > Feb 07 05:43:21 at > java.lang.Throwable.addSuppressed(Throwable.java:1072) > Feb 07 05:43:21 at > org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:556) > Feb 07 05:43:21 at > org.apache.flink.util.InstantiationUtil.writeObjectToConfig(InstantiationUtil.java:486) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamConfig.lambda$triggerSerializationAndReturnFuture$0(StreamConfig.java:182) > Feb 07 05:43:21 at >
[jira] [Commented] (FLINK-34424) BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times out
[ https://issues.apache.org/jira/browse/FLINK-34424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817082#comment-17817082 ] Matthias Pohl commented on FLINK-34424: --- args. Didn't we have this in the past? Sorry again - the auto completion and the guy behind the screen are to blame here. Yes, you're right. > BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times > out > > > Key: FLINK-34424 > URL: https://issues.apache.org/jira/browse/FLINK-34424 > Project: Flink > Issue Type: Bug > Components: Runtime / Network >Affects Versions: 1.19.0, 1.20.0 >Reporter: Matthias Pohl >Priority: Critical > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57446=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9151 > {code} > Feb 11 13:55:29 "ForkJoinPool-50-worker-25" #414 daemon prio=5 os_prio=0 > tid=0x7f19503af800 nid=0x284c in Object.wait() [0x7f191b6db000] > Feb 11 13:55:29java.lang.Thread.State: WAITING (on object monitor) > Feb 11 13:55:29 at java.lang.Object.wait(Native Method) > Feb 11 13:55:29 at java.lang.Thread.join(Thread.java:1252) > Feb 11 13:55:29 - locked <0xe2e019a8> (a > org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest$LongReader) > Feb 11 13:55:29 at > org.apache.flink.core.testutils.CheckedThread.trySync(CheckedThread.java:104) > Feb 11 13:55:29 at > org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:92) > Feb 11 13:55:29 at > org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:81) > Feb 11 13:55:29 at > org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest.testRead10ConsumersConcurrent(BoundedBlockingSubpartitionWriteReadTest.java:177) > Feb 11 13:55:29 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > [...] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34424) BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times out
[ https://issues.apache.org/jira/browse/FLINK-34424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817069#comment-17817069 ] Matthias Pohl commented on FLINK-34424: --- [~piotr.nowicki] (because it's networking; feel free to delegate) [~yunfengzhou] (because you touched the code in FLINK-33743 recently): Can someone help with investigating the cause of the issue? > BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times > out > > > Key: FLINK-34424 > URL: https://issues.apache.org/jira/browse/FLINK-34424 > Project: Flink > Issue Type: Bug > Components: Runtime / Network >Affects Versions: 1.19.0, 1.20.0 >Reporter: Matthias Pohl >Priority: Critical > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57446=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9151 > {code} > Feb 11 13:55:29 "ForkJoinPool-50-worker-25" #414 daemon prio=5 os_prio=0 > tid=0x7f19503af800 nid=0x284c in Object.wait() [0x7f191b6db000] > Feb 11 13:55:29java.lang.Thread.State: WAITING (on object monitor) > Feb 11 13:55:29 at java.lang.Object.wait(Native Method) > Feb 11 13:55:29 at java.lang.Thread.join(Thread.java:1252) > Feb 11 13:55:29 - locked <0xe2e019a8> (a > org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest$LongReader) > Feb 11 13:55:29 at > org.apache.flink.core.testutils.CheckedThread.trySync(CheckedThread.java:104) > Feb 11 13:55:29 at > org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:92) > Feb 11 13:55:29 at > org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:81) > Feb 11 13:55:29 at > org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest.testRead10ConsumersConcurrent(BoundedBlockingSubpartitionWriteReadTest.java:177) > Feb 11 13:55:29 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > [...] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34424) BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times out
[ https://issues.apache.org/jira/browse/FLINK-34424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817063#comment-17817063 ] Matthias Pohl commented on FLINK-34424: --- I'm wondering whether that has anything to do with the blocked reader thread: {code} Feb 11 13:55:29 "Thread-76" #476 daemon prio=5 os_prio=0 tid=0x7f190bbf1800 nid=0x5a40 waiting for monitor entry [0x7f191bce4000] Feb 11 13:55:29java.lang.Thread.State: BLOCKED (on object monitor) Feb 11 13:55:29 at net.jpountz.lz4.LZ4JNI.LZ4_decompress_fast(Native Method) Feb 11 13:55:29 at net.jpountz.lz4.LZ4JNIFastDecompressor.decompress(LZ4JNIFastDecompressor.java:70) Feb 11 13:55:29 at org.apache.flink.runtime.io.compression.Lz4BlockDecompressor.decompress(Lz4BlockDecompressor.java:68) Feb 11 13:55:29 at org.apache.flink.runtime.io.network.buffer.BufferDecompressor.decompress(BufferDecompressor.java:126) Feb 11 13:55:29 at org.apache.flink.runtime.io.network.buffer.BufferDecompressor.decompressToIntermediateBuffer(BufferDecompressor.java:68) Feb 11 13:55:29 at org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest.readLongs(BoundedBlockingSubpartitionWriteReadTest.java:206) Feb 11 13:55:29 at org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest.access$000(BoundedBlockingSubpartitionWriteReadTest.java:55) Feb 11 13:55:29 at org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest$LongReader.go(BoundedBlockingSubpartitionWriteReadTest.java:323) Feb 11 13:55:29 at org.apache.flink.core.testutils.CheckedThread.run(CheckedThread.java:67) {code} The test was started at 13:32:18.152 and timed out at 13:55:39 > BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times > out > > > Key: FLINK-34424 > URL: https://issues.apache.org/jira/browse/FLINK-34424 > Project: Flink > Issue Type: Bug > Components: Runtime / Network >Affects Versions: 1.19.0, 1.20.0 >Reporter: Matthias Pohl >Priority: Critical > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57446=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9151 > {code} > Feb 11 13:55:29 "ForkJoinPool-50-worker-25" #414 daemon prio=5 os_prio=0 > tid=0x7f19503af800 nid=0x284c in Object.wait() [0x7f191b6db000] > Feb 11 13:55:29java.lang.Thread.State: WAITING (on object monitor) > Feb 11 13:55:29 at java.lang.Object.wait(Native Method) > Feb 11 13:55:29 at java.lang.Thread.join(Thread.java:1252) > Feb 11 13:55:29 - locked <0xe2e019a8> (a > org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest$LongReader) > Feb 11 13:55:29 at > org.apache.flink.core.testutils.CheckedThread.trySync(CheckedThread.java:104) > Feb 11 13:55:29 at > org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:92) > Feb 11 13:55:29 at > org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:81) > Feb 11 13:55:29 at > org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest.testRead10ConsumersConcurrent(BoundedBlockingSubpartitionWriteReadTest.java:177) > Feb 11 13:55:29 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > [...] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34333) Fix FLINK-34007 LeaderElector bug in 1.18
[ https://issues.apache.org/jira/browse/FLINK-34333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34333: -- Release Note: Fixes a bug where the leader election wasn't able to pick up leadership again after renewing the lease token caused a leadership loss. This required fabric8io:kubernetes-client to be upgraded from v6.6.2 to v6.9.0. > Fix FLINK-34007 LeaderElector bug in 1.18 > - > > Key: FLINK-34333 > URL: https://issues.apache.org/jira/browse/FLINK-34333 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.18.1 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Blocker > Labels: pull-request-available > Fix For: 1.18.2 > > > FLINK-34007 revealed a bug in the k8s client v6.6.2 which we're using since > Flink 1.18. This issue was fixed with FLINK-34007 for Flink 1.19 which > required an update of the k8s client to v6.9.0. > This Jira issue is about finding a solution in Flink 1.18 for the very same > problem FLINK-34007 covered. It's a dedicated Jira issue because we want to > unblock the release of 1.19 by resolving FLINK-34007. > Just to summarize why the upgrade to v6.9.0 is desired: There's a bug in > v6.6.2 which might prevent the leadership lost event being forwarded to the > client ([#5463|https://github.com/fabric8io/kubernetes-client/issues/5463]). > An initial proposal where the release call was handled in Flink's > {{KubernetesLeaderElector}} didn't work due to the leadership lost event > being triggered twice (see [FLINK-34007 PR > comment|https://github.com/apache/flink/pull/24132#discussion_r1467175902]) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-34333) Fix FLINK-34007 LeaderElector bug in 1.18
[ https://issues.apache.org/jira/browse/FLINK-34333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl resolved FLINK-34333. --- Fix Version/s: 1.18.2 Resolution: Fixed * 1.18 ** [35c560312efc91dafd1b4674ce1e10acc9320ab1|https://github.com/apache/flink/commit/35c560312efc91dafd1b4674ce1e10acc9320ab1] ** [87560b7cedd6c857612a24b83485f5000b9edbd6|https://github.com/apache/flink/commit/87560b7cedd6c857612a24b83485f5000b9edbd6] > Fix FLINK-34007 LeaderElector bug in 1.18 > - > > Key: FLINK-34333 > URL: https://issues.apache.org/jira/browse/FLINK-34333 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.18.1 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Blocker > Labels: pull-request-available > Fix For: 1.18.2 > > > FLINK-34007 revealed a bug in the k8s client v6.6.2 which we're using since > Flink 1.18. This issue was fixed with FLINK-34007 for Flink 1.19 which > required an update of the k8s client to v6.9.0. > This Jira issue is about finding a solution in Flink 1.18 for the very same > problem FLINK-34007 covered. It's a dedicated Jira issue because we want to > unblock the release of 1.19 by resolving FLINK-34007. > Just to summarize why the upgrade to v6.9.0 is desired: There's a bug in > v6.6.2 which might prevent the leadership lost event being forwarded to the > client ([#5463|https://github.com/fabric8io/kubernetes-client/issues/5463]). > An initial proposal where the release call was handled in Flink's > {{KubernetesLeaderElector}} didn't work due to the leadership lost event > being triggered twice (see [FLINK-34007 PR > comment|https://github.com/apache/flink/pull/24132#discussion_r1467175902]) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-34425) TaskManagerRunnerITCase#testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure times out
[ https://issues.apache.org/jira/browse/FLINK-34425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl reassigned FLINK-34425: - Assignee: (was: Matthias Pohl) > TaskManagerRunnerITCase#testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure > times out > --- > > Key: FLINK-34425 > URL: https://issues.apache.org/jira/browse/FLINK-34425 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.19.0, 1.20.0 >Reporter: Matthias Pohl >Priority: Major > Labels: test-stability > > https://github.com/apache/flink/actions/runs/7851900616/job/21429757962#step:10:8844 > {code} > Feb 10 03:21:45 "main" #1 [498632] prio=5 os_prio=0 cpu=619.91ms > elapsed=1653.40s tid=0x7fbd29695000 nid=498632 waiting on condition > [0x7fbd2b9f3000] > Feb 10 03:21:45java.lang.Thread.State: WAITING (parking) > Feb 10 03:21:45 at > jdk.internal.misc.Unsafe.park(java.base@21.0.1/Native Method) > Feb 10 03:21:45 - parking to wait for <0xae6199f0> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > Feb 10 03:21:45 at > java.util.concurrent.locks.LockSupport.park(java.base@21.0.1/LockSupport.java:371) > Feb 10 03:21:45 at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(java.base@21.0.1/AbstractQueuedSynchronizer.java:519) > Feb 10 03:21:45 at > java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@21.0.1/ForkJoinPool.java:3780) > Feb 10 03:21:45 at > java.util.concurrent.ForkJoinPool.managedBlock(java.base@21.0.1/ForkJoinPool.java:3725) > Feb 10 03:21:45 at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@21.0.1/AbstractQueuedSynchronizer.java:1707) > Feb 10 03:21:45 at > java.lang.ProcessImpl.waitFor(java.base@21.0.1/ProcessImpl.java:425) > Feb 10 03:21:45 at > org.apache.flink.test.recovery.TaskManagerRunnerITCase.testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure(TaskManagerRunnerITCase.java:126) > Feb 10 03:21:45 at > java.lang.invoke.LambdaForm$DMH/0x7fbccb1b8000.invokeVirtual(java.base@21.0.1/LambdaForm$DMH) > Feb 10 03:21:45 at > java.lang.invoke.LambdaForm$MH/0x7fbccb1b8800.invoke(java.base@21.0.1/LambdaForm$MH) > [...] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34427) FineGrainedSlotManagerTest fails fatally (exit code 239)
[ https://issues.apache.org/jira/browse/FLINK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816973#comment-17816973 ] Matthias Pohl commented on FLINK-34427: --- Thanks for the clarification. This is an issue that also exists in 1.18. I won't increase the priority to blocker for 1.19 because of that. But we should fix this. > FineGrainedSlotManagerTest fails fatally (exit code 239) > > > Key: FLINK-34427 > URL: https://issues.apache.org/jira/browse/FLINK-34427 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Critical > Labels: test-stability > > https://github.com/apache/flink/actions/runs/7866453350/job/21460921911#step:10:8959 > {code} > Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239 > Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests: > Error: 02:28:53 02:28:53.220 [ERROR] > org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest > Error: 02:28:53 02:28:53.220 [ERROR] > org.apache.maven.surefire.booter.SurefireBooterForkException: > ExecutionException The forked VM terminated without properly saying goodbye. > VM crash or System.exit called? > Error: 02:28:53 02:28:53.220 [ERROR] Command was /bin/sh -c cd > '/root/flink/flink-runtime' && > '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' '-XX:+UseG1GC' '-Xms256m' > '-XX:+IgnoreUnrecognizedVMOptions' > '--add-opens=java.base/java.util=ALL-UNNAMED' > '--add-opens=java.base/java.lang=ALL-UNNAMED' > '--add-opens=java.base/java.net=ALL-UNNAMED' > '--add-opens=java.base/java.io=ALL-UNNAMED' > '--add-opens=java.base/java.util.concurrent=ALL-UNNAMED' '-Xmx768m' '-jar' > '/root/flink/flink-runtime/target/surefire/surefirebooter-20240212022332296_94.jar' > '/root/flink/flink-runtime/target/surefire' > '2024-02-12T02-21-39_495-jvmRun3' 'surefire-20240212022332296_88tmp' > 'surefire_26-20240212022332296_91tmp' > Error: 02:28:53 02:28:53.220 [ERROR] Error occurred in starting fork, check > output in log > Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239 > Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests: > Error: 02:28:53 02:28:53.221 [ERROR] > org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest > Error: 02:28:53 02:28:53.221 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456) > [...] > {code} > The fatal error is triggered most likely within the > {{FineGrainedSlotManagerTest}}: > {code} > 02:26:39,362 [ pool-643-thread-1] ERROR > org.apache.flink.util.FatalExitExceptionHandler [] - FATAL: > Thread 'pool-643-thread-1' produced an uncaught exception. Stopping the > process... > java.util.concurrent.CompletionException: > java.util.concurrent.RejectedExecutionException: Task > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@4bbc0b10 > rejected from > java.util.concurrent.ScheduledThreadPoolExecutor@7a45cd9a[Shutting down, pool > size = 1, active threads = 1, queued tasks = 1, completed tasks = 194] > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) > ~[?:1.8.0_392] > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) > ~[?:1.8.0_392] > at > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:838) > ~[?:1.8.0_392] > at > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > ~[?:1.8.0_392] > at > java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:851) > ~[?:1.8.0_392] > at > java.util.concurrent.CompletableFuture.handleAsync(CompletableFuture.java:2178) > ~[?:1.8.0_392] > at > org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer.allocateSlot(DefaultSlotStatusSyncer.java:138) > ~[classes/:?] > at > org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.allocateSlotsAccordingTo(FineGrainedSlotManager.java:722) > ~[classes/:?] > at > org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.checkResourceRequirements(FineGrainedSlotManager.java:645) > ~[classes/:?] > at > org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.lambda$null$12(FineGrainedSlotManager.java:603) > ~[classes/:?] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [?:1.8.0_392] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [?:1.8.0_392] > at >
[jira] [Updated] (FLINK-34427) FineGrainedSlotManagerTest fails fatally (exit code 239)
[ https://issues.apache.org/jira/browse/FLINK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34427: -- Affects Version/s: 1.18.1 > FineGrainedSlotManagerTest fails fatally (exit code 239) > > > Key: FLINK-34427 > URL: https://issues.apache.org/jira/browse/FLINK-34427 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Critical > Labels: test-stability > > https://github.com/apache/flink/actions/runs/7866453350/job/21460921911#step:10:8959 > {code} > Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239 > Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests: > Error: 02:28:53 02:28:53.220 [ERROR] > org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest > Error: 02:28:53 02:28:53.220 [ERROR] > org.apache.maven.surefire.booter.SurefireBooterForkException: > ExecutionException The forked VM terminated without properly saying goodbye. > VM crash or System.exit called? > Error: 02:28:53 02:28:53.220 [ERROR] Command was /bin/sh -c cd > '/root/flink/flink-runtime' && > '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' '-XX:+UseG1GC' '-Xms256m' > '-XX:+IgnoreUnrecognizedVMOptions' > '--add-opens=java.base/java.util=ALL-UNNAMED' > '--add-opens=java.base/java.lang=ALL-UNNAMED' > '--add-opens=java.base/java.net=ALL-UNNAMED' > '--add-opens=java.base/java.io=ALL-UNNAMED' > '--add-opens=java.base/java.util.concurrent=ALL-UNNAMED' '-Xmx768m' '-jar' > '/root/flink/flink-runtime/target/surefire/surefirebooter-20240212022332296_94.jar' > '/root/flink/flink-runtime/target/surefire' > '2024-02-12T02-21-39_495-jvmRun3' 'surefire-20240212022332296_88tmp' > 'surefire_26-20240212022332296_91tmp' > Error: 02:28:53 02:28:53.220 [ERROR] Error occurred in starting fork, check > output in log > Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239 > Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests: > Error: 02:28:53 02:28:53.221 [ERROR] > org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest > Error: 02:28:53 02:28:53.221 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456) > [...] > {code} > The fatal error is triggered most likely within the > {{FineGrainedSlotManagerTest}}: > {code} > 02:26:39,362 [ pool-643-thread-1] ERROR > org.apache.flink.util.FatalExitExceptionHandler [] - FATAL: > Thread 'pool-643-thread-1' produced an uncaught exception. Stopping the > process... > java.util.concurrent.CompletionException: > java.util.concurrent.RejectedExecutionException: Task > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@4bbc0b10 > rejected from > java.util.concurrent.ScheduledThreadPoolExecutor@7a45cd9a[Shutting down, pool > size = 1, active threads = 1, queued tasks = 1, completed tasks = 194] > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) > ~[?:1.8.0_392] > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) > ~[?:1.8.0_392] > at > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:838) > ~[?:1.8.0_392] > at > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > ~[?:1.8.0_392] > at > java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:851) > ~[?:1.8.0_392] > at > java.util.concurrent.CompletableFuture.handleAsync(CompletableFuture.java:2178) > ~[?:1.8.0_392] > at > org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer.allocateSlot(DefaultSlotStatusSyncer.java:138) > ~[classes/:?] > at > org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.allocateSlotsAccordingTo(FineGrainedSlotManager.java:722) > ~[classes/:?] > at > org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.checkResourceRequirements(FineGrainedSlotManager.java:645) > ~[classes/:?] > at > org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.lambda$null$12(FineGrainedSlotManager.java:603) > ~[classes/:?] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [?:1.8.0_392] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [?:1.8.0_392] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > [?:1.8.0_392] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) >
[jira] [Updated] (FLINK-34425) TaskManagerRunnerITCase#testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure times out
[ https://issues.apache.org/jira/browse/FLINK-34425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34425: -- Priority: Major (was: Critical) > TaskManagerRunnerITCase#testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure > times out > --- > > Key: FLINK-34425 > URL: https://issues.apache.org/jira/browse/FLINK-34425 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.19.0, 1.20.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Major > Labels: test-stability > > https://github.com/apache/flink/actions/runs/7851900616/job/21429757962#step:10:8844 > {code} > Feb 10 03:21:45 "main" #1 [498632] prio=5 os_prio=0 cpu=619.91ms > elapsed=1653.40s tid=0x7fbd29695000 nid=498632 waiting on condition > [0x7fbd2b9f3000] > Feb 10 03:21:45java.lang.Thread.State: WAITING (parking) > Feb 10 03:21:45 at > jdk.internal.misc.Unsafe.park(java.base@21.0.1/Native Method) > Feb 10 03:21:45 - parking to wait for <0xae6199f0> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > Feb 10 03:21:45 at > java.util.concurrent.locks.LockSupport.park(java.base@21.0.1/LockSupport.java:371) > Feb 10 03:21:45 at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(java.base@21.0.1/AbstractQueuedSynchronizer.java:519) > Feb 10 03:21:45 at > java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@21.0.1/ForkJoinPool.java:3780) > Feb 10 03:21:45 at > java.util.concurrent.ForkJoinPool.managedBlock(java.base@21.0.1/ForkJoinPool.java:3725) > Feb 10 03:21:45 at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@21.0.1/AbstractQueuedSynchronizer.java:1707) > Feb 10 03:21:45 at > java.lang.ProcessImpl.waitFor(java.base@21.0.1/ProcessImpl.java:425) > Feb 10 03:21:45 at > org.apache.flink.test.recovery.TaskManagerRunnerITCase.testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure(TaskManagerRunnerITCase.java:126) > Feb 10 03:21:45 at > java.lang.invoke.LambdaForm$DMH/0x7fbccb1b8000.invokeVirtual(java.base@21.0.1/LambdaForm$DMH) > Feb 10 03:21:45 at > java.lang.invoke.LambdaForm$MH/0x7fbccb1b8800.invoke(java.base@21.0.1/LambdaForm$MH) > [...] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-34425) TaskManagerRunnerITCase#testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure times out
[ https://issues.apache.org/jira/browse/FLINK-34425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816971#comment-17816971 ] Matthias Pohl edited comment on FLINK-34425 at 2/13/24 11:48 AM: - This looks like a test issue. The TaskManager process is destroyed in [TaskManagerRunnerITCase:124|https://github.com/apache/flink/blob/d6c7eee8243b4fe3e593698f250643534dc79cb5/flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerRunnerITCase.java#L124] but doesn't get back properly causing the timeout in {{#waitFor()}} in [TaskManagerRunnerITCase:126|https://github.com/apache/flink/blob/d6c7eee8243b4fe3e593698f250643534dc79cb5/flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerRunnerITCase.java#L126]. I'm gonna lower this issue's priority to {{Major}}. I don't consider it in any way problematic for the upcoming 1.19 release. was (Author: mapohl): This looks like a test issue. The TaskManager process is destroyed in [TaskManagerRunnerITCase:124|https://github.com/apache/flink/blob/d6c7eee8243b4fe3e593698f250643534dc79cb5/flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerRunnerITCase.java#L124] but doesn't get back properly causing the timeout in {{#waitFor()}} in [TaskManagerRunnerITCase:126|https://github.com/apache/flink/blob/d6c7eee8243b4fe3e593698f250643534dc79cb5/flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerRunnerITCase.java#L126]. > TaskManagerRunnerITCase#testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure > times out > --- > > Key: FLINK-34425 > URL: https://issues.apache.org/jira/browse/FLINK-34425 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.19.0, 1.20.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Critical > Labels: test-stability > > https://github.com/apache/flink/actions/runs/7851900616/job/21429757962#step:10:8844 > {code} > Feb 10 03:21:45 "main" #1 [498632] prio=5 os_prio=0 cpu=619.91ms > elapsed=1653.40s tid=0x7fbd29695000 nid=498632 waiting on condition > [0x7fbd2b9f3000] > Feb 10 03:21:45java.lang.Thread.State: WAITING (parking) > Feb 10 03:21:45 at > jdk.internal.misc.Unsafe.park(java.base@21.0.1/Native Method) > Feb 10 03:21:45 - parking to wait for <0xae6199f0> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > Feb 10 03:21:45 at > java.util.concurrent.locks.LockSupport.park(java.base@21.0.1/LockSupport.java:371) > Feb 10 03:21:45 at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(java.base@21.0.1/AbstractQueuedSynchronizer.java:519) > Feb 10 03:21:45 at > java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@21.0.1/ForkJoinPool.java:3780) > Feb 10 03:21:45 at > java.util.concurrent.ForkJoinPool.managedBlock(java.base@21.0.1/ForkJoinPool.java:3725) > Feb 10 03:21:45 at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@21.0.1/AbstractQueuedSynchronizer.java:1707) > Feb 10 03:21:45 at > java.lang.ProcessImpl.waitFor(java.base@21.0.1/ProcessImpl.java:425) > Feb 10 03:21:45 at > org.apache.flink.test.recovery.TaskManagerRunnerITCase.testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure(TaskManagerRunnerITCase.java:126) > Feb 10 03:21:45 at > java.lang.invoke.LambdaForm$DMH/0x7fbccb1b8000.invokeVirtual(java.base@21.0.1/LambdaForm$DMH) > Feb 10 03:21:45 at > java.lang.invoke.LambdaForm$MH/0x7fbccb1b8800.invoke(java.base@21.0.1/LambdaForm$MH) > [...] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34425) TaskManagerRunnerITCase#testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure times out
[ https://issues.apache.org/jira/browse/FLINK-34425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816971#comment-17816971 ] Matthias Pohl commented on FLINK-34425: --- This looks like a test issue. The TaskManager process is destroyed in [TaskManagerRunnerITCase:124|https://github.com/apache/flink/blob/d6c7eee8243b4fe3e593698f250643534dc79cb5/flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerRunnerITCase.java#L124] but doesn't get back properly causing the timeout in {{#waitFor()}} in [TaskManagerRunnerITCase:126|https://github.com/apache/flink/blob/d6c7eee8243b4fe3e593698f250643534dc79cb5/flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerRunnerITCase.java#L126]. > TaskManagerRunnerITCase#testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure > times out > --- > > Key: FLINK-34425 > URL: https://issues.apache.org/jira/browse/FLINK-34425 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.19.0, 1.20.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Critical > Labels: test-stability > > https://github.com/apache/flink/actions/runs/7851900616/job/21429757962#step:10:8844 > {code} > Feb 10 03:21:45 "main" #1 [498632] prio=5 os_prio=0 cpu=619.91ms > elapsed=1653.40s tid=0x7fbd29695000 nid=498632 waiting on condition > [0x7fbd2b9f3000] > Feb 10 03:21:45java.lang.Thread.State: WAITING (parking) > Feb 10 03:21:45 at > jdk.internal.misc.Unsafe.park(java.base@21.0.1/Native Method) > Feb 10 03:21:45 - parking to wait for <0xae6199f0> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > Feb 10 03:21:45 at > java.util.concurrent.locks.LockSupport.park(java.base@21.0.1/LockSupport.java:371) > Feb 10 03:21:45 at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(java.base@21.0.1/AbstractQueuedSynchronizer.java:519) > Feb 10 03:21:45 at > java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@21.0.1/ForkJoinPool.java:3780) > Feb 10 03:21:45 at > java.util.concurrent.ForkJoinPool.managedBlock(java.base@21.0.1/ForkJoinPool.java:3725) > Feb 10 03:21:45 at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@21.0.1/AbstractQueuedSynchronizer.java:1707) > Feb 10 03:21:45 at > java.lang.ProcessImpl.waitFor(java.base@21.0.1/ProcessImpl.java:425) > Feb 10 03:21:45 at > org.apache.flink.test.recovery.TaskManagerRunnerITCase.testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure(TaskManagerRunnerITCase.java:126) > Feb 10 03:21:45 at > java.lang.invoke.LambdaForm$DMH/0x7fbccb1b8000.invokeVirtual(java.base@21.0.1/LambdaForm$DMH) > Feb 10 03:21:45 at > java.lang.invoke.LambdaForm$MH/0x7fbccb1b8800.invoke(java.base@21.0.1/LambdaForm$MH) > [...] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34427) FineGrainedSlotManagerTest fails fatally (exit code 239)
[ https://issues.apache.org/jira/browse/FLINK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816963#comment-17816963 ] Matthias Pohl commented on FLINK-34427: --- [~chesnay] the upstream future {{requestFuture}} is coming from the {{TaskManagerGateway#requestSlot}} RPC call. I would conclude that the RPCEndpoint (considering that the [handleAsync callback|https://github.com/apache/flink/blob/15fe1653acec45d7c7bac17071e9773a4aa690a4/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/DefaultSlotStatusSyncer.java#L138] should be chained up and run in wherever the RPC call is executed) is should down while there's still a scheduled task queued up causing the {{RejectedExecutionException}}. WDYT? > FineGrainedSlotManagerTest fails fatally (exit code 239) > > > Key: FLINK-34427 > URL: https://issues.apache.org/jira/browse/FLINK-34427 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.19.0, 1.20.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Critical > Labels: test-stability > > https://github.com/apache/flink/actions/runs/7866453350/job/21460921911#step:10:8959 > {code} > Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239 > Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests: > Error: 02:28:53 02:28:53.220 [ERROR] > org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest > Error: 02:28:53 02:28:53.220 [ERROR] > org.apache.maven.surefire.booter.SurefireBooterForkException: > ExecutionException The forked VM terminated without properly saying goodbye. > VM crash or System.exit called? > Error: 02:28:53 02:28:53.220 [ERROR] Command was /bin/sh -c cd > '/root/flink/flink-runtime' && > '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' '-XX:+UseG1GC' '-Xms256m' > '-XX:+IgnoreUnrecognizedVMOptions' > '--add-opens=java.base/java.util=ALL-UNNAMED' > '--add-opens=java.base/java.lang=ALL-UNNAMED' > '--add-opens=java.base/java.net=ALL-UNNAMED' > '--add-opens=java.base/java.io=ALL-UNNAMED' > '--add-opens=java.base/java.util.concurrent=ALL-UNNAMED' '-Xmx768m' '-jar' > '/root/flink/flink-runtime/target/surefire/surefirebooter-20240212022332296_94.jar' > '/root/flink/flink-runtime/target/surefire' > '2024-02-12T02-21-39_495-jvmRun3' 'surefire-20240212022332296_88tmp' > 'surefire_26-20240212022332296_91tmp' > Error: 02:28:53 02:28:53.220 [ERROR] Error occurred in starting fork, check > output in log > Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239 > Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests: > Error: 02:28:53 02:28:53.221 [ERROR] > org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest > Error: 02:28:53 02:28:53.221 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456) > [...] > {code} > The fatal error is triggered most likely within the > {{FineGrainedSlotManagerTest}}: > {code} > 02:26:39,362 [ pool-643-thread-1] ERROR > org.apache.flink.util.FatalExitExceptionHandler [] - FATAL: > Thread 'pool-643-thread-1' produced an uncaught exception. Stopping the > process... > java.util.concurrent.CompletionException: > java.util.concurrent.RejectedExecutionException: Task > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@4bbc0b10 > rejected from > java.util.concurrent.ScheduledThreadPoolExecutor@7a45cd9a[Shutting down, pool > size = 1, active threads = 1, queued tasks = 1, completed tasks = 194] > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) > ~[?:1.8.0_392] > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) > ~[?:1.8.0_392] > at > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:838) > ~[?:1.8.0_392] > at > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > ~[?:1.8.0_392] > at > java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:851) > ~[?:1.8.0_392] > at > java.util.concurrent.CompletableFuture.handleAsync(CompletableFuture.java:2178) > ~[?:1.8.0_392] > at > org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer.allocateSlot(DefaultSlotStatusSyncer.java:138) > ~[classes/:?] > at > org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.allocateSlotsAccordingTo(FineGrainedSlotManager.java:722) > ~[classes/:?] > at > org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.checkResourceRequirements(FineGrainedSlotManager.java:645) > ~[classes/:?] > at >
[jira] [Commented] (FLINK-34434) DefaultSlotStatusSyncer doesn't complete the returned future
[ https://issues.apache.org/jira/browse/FLINK-34434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816961#comment-17816961 ] Matthias Pohl commented on FLINK-34434: --- [~guoyangze] can you have a look at this? > DefaultSlotStatusSyncer doesn't complete the returned future > > > Key: FLINK-34434 > URL: https://issues.apache.org/jira/browse/FLINK-34434 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.17.2, 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Major > > When looking into FLINK-34427 (unrelated), I noticed an odd line in > [DefaultSlotStatusSyncer:155|https://github.com/apache/flink/blob/15fe1653acec45d7c7bac17071e9773a4aa690a4/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/DefaultSlotStatusSyncer.java#L155] > where we complete a future that should be already completed (because the > callback is triggered after the {{requestFuture}} is already completed in > some way. Shouldn't we complete the {{returnedFuture}} instead? > I'm keeping the priority at {{Major}} because it doesn't seem to have been an > issue in the past. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-34434) DefaultSlotStatusSyncer doesn't complete the returned future
[ https://issues.apache.org/jira/browse/FLINK-34434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816961#comment-17816961 ] Matthias Pohl edited comment on FLINK-34434 at 2/13/24 11:11 AM: - [~guoyangze] can you have a look at this? Maybe, I'm missing something here. was (Author: mapohl): [~guoyangze] can you have a look at this? > DefaultSlotStatusSyncer doesn't complete the returned future > > > Key: FLINK-34434 > URL: https://issues.apache.org/jira/browse/FLINK-34434 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.17.2, 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Major > > When looking into FLINK-34427 (unrelated), I noticed an odd line in > [DefaultSlotStatusSyncer:155|https://github.com/apache/flink/blob/15fe1653acec45d7c7bac17071e9773a4aa690a4/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/DefaultSlotStatusSyncer.java#L155] > where we complete a future that should be already completed (because the > callback is triggered after the {{requestFuture}} is already completed in > some way. Shouldn't we complete the {{returnedFuture}} instead? > I'm keeping the priority at {{Major}} because it doesn't seem to have been an > issue in the past. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34434) DefaultSlotStatusSyncer doesn't complete the returned future
Matthias Pohl created FLINK-34434: - Summary: DefaultSlotStatusSyncer doesn't complete the returned future Key: FLINK-34434 URL: https://issues.apache.org/jira/browse/FLINK-34434 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.18.1, 1.17.2, 1.19.0, 1.20.0 Reporter: Matthias Pohl When looking into FLINK-34427 (unrelated), I noticed an odd line in [DefaultSlotStatusSyncer:155|https://github.com/apache/flink/blob/15fe1653acec45d7c7bac17071e9773a4aa690a4/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/DefaultSlotStatusSyncer.java#L155] where we complete a future that should be already completed (because the callback is triggered after the {{requestFuture}} is already completed in some way. Shouldn't we complete the {{returnedFuture}} instead? I'm keeping the priority at {{Major}} because it doesn't seem to have been an issue in the past. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34427) ResourceManagerTaskExecutorTest fails fatally (exit code 239)
[ https://issues.apache.org/jira/browse/FLINK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34427: -- Description: https://github.com/apache/flink/actions/runs/7866453350/job/21460921911#step:10:8959 {code} Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239 Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests: Error: 02:28:53 02:28:53.220 [ERROR] org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest Error: 02:28:53 02:28:53.220 [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called? Error: 02:28:53 02:28:53.220 [ERROR] Command was /bin/sh -c cd '/root/flink/flink-runtime' && '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' '-XX:+UseG1GC' '-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' '--add-opens=java.base/java.util=ALL-UNNAMED' '--add-opens=java.base/java.lang=ALL-UNNAMED' '--add-opens=java.base/java.net=ALL-UNNAMED' '--add-opens=java.base/java.io=ALL-UNNAMED' '--add-opens=java.base/java.util.concurrent=ALL-UNNAMED' '-Xmx768m' '-jar' '/root/flink/flink-runtime/target/surefire/surefirebooter-20240212022332296_94.jar' '/root/flink/flink-runtime/target/surefire' '2024-02-12T02-21-39_495-jvmRun3' 'surefire-20240212022332296_88tmp' 'surefire_26-20240212022332296_91tmp' Error: 02:28:53 02:28:53.220 [ERROR] Error occurred in starting fork, check output in log Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239 Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests: Error: 02:28:53 02:28:53.221 [ERROR] org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest Error: 02:28:53 02:28:53.221 [ERROR]at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456) [...] {code} The fatal error is triggered most likely within the {{FineGrainedSlotManagerTest}}: {code} 02:26:39,362 [ pool-643-thread-1] ERROR org.apache.flink.util.FatalExitExceptionHandler [] - FATAL: Thread 'pool-643-thread-1' produced an uncaught exception. Stopping the process... java.util.concurrent.CompletionException: java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@4bbc0b10 rejected from java.util.concurrent.ScheduledThreadPoolExecutor@7a45cd9a[Shutting down, pool size = 1, active threads = 1, queued tasks = 1, completed tasks = 194] at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) ~[?:1.8.0_392] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) ~[?:1.8.0_392] at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:838) ~[?:1.8.0_392] at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) ~[?:1.8.0_392] at java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:851) ~[?:1.8.0_392] at java.util.concurrent.CompletableFuture.handleAsync(CompletableFuture.java:2178) ~[?:1.8.0_392] at org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer.allocateSlot(DefaultSlotStatusSyncer.java:138) ~[classes/:?] at org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.allocateSlotsAccordingTo(FineGrainedSlotManager.java:722) ~[classes/:?] at org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.checkResourceRequirements(FineGrainedSlotManager.java:645) ~[classes/:?] at org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.lambda$null$12(FineGrainedSlotManager.java:603) ~[classes/:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_392] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_392] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_392] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_392] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_392] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_392] at java.lang.Thread.run(Thread.java:750) [?:1.8.0_392] Caused by: java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@4bbc0b10 rejected from java.util.concurrent.ScheduledThreadPoolExecutor@7a45cd9a[Shutting down, pool size = 1, active threads = 1, queued tasks = 1, completed tasks = 194] at
[jira] [Updated] (FLINK-34427) FineGrainedSlotManagerTest fails fatally (exit code 239)
[ https://issues.apache.org/jira/browse/FLINK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34427: -- Summary: FineGrainedSlotManagerTest fails fatally (exit code 239) (was: ResourceManagerTaskExecutorTest fails fatally (exit code 239)) > FineGrainedSlotManagerTest fails fatally (exit code 239) > > > Key: FLINK-34427 > URL: https://issues.apache.org/jira/browse/FLINK-34427 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.19.0, 1.20.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Critical > Labels: test-stability > > https://github.com/apache/flink/actions/runs/7866453350/job/21460921911#step:10:8959 > {code} > Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239 > Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests: > Error: 02:28:53 02:28:53.220 [ERROR] > org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest > Error: 02:28:53 02:28:53.220 [ERROR] > org.apache.maven.surefire.booter.SurefireBooterForkException: > ExecutionException The forked VM terminated without properly saying goodbye. > VM crash or System.exit called? > Error: 02:28:53 02:28:53.220 [ERROR] Command was /bin/sh -c cd > '/root/flink/flink-runtime' && > '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' '-XX:+UseG1GC' '-Xms256m' > '-XX:+IgnoreUnrecognizedVMOptions' > '--add-opens=java.base/java.util=ALL-UNNAMED' > '--add-opens=java.base/java.lang=ALL-UNNAMED' > '--add-opens=java.base/java.net=ALL-UNNAMED' > '--add-opens=java.base/java.io=ALL-UNNAMED' > '--add-opens=java.base/java.util.concurrent=ALL-UNNAMED' '-Xmx768m' '-jar' > '/root/flink/flink-runtime/target/surefire/surefirebooter-20240212022332296_94.jar' > '/root/flink/flink-runtime/target/surefire' > '2024-02-12T02-21-39_495-jvmRun3' 'surefire-20240212022332296_88tmp' > 'surefire_26-20240212022332296_91tmp' > Error: 02:28:53 02:28:53.220 [ERROR] Error occurred in starting fork, check > output in log > Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239 > Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests: > Error: 02:28:53 02:28:53.221 [ERROR] > org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest > Error: 02:28:53 02:28:53.221 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456) > [...] > {code} > The fatal error is triggered most likely within the > {{FineGrainedSlotManagerTest}}: > {code} > 02:26:39,362 [ pool-643-thread-1] ERROR > org.apache.flink.util.FatalExitExceptionHandler [] - FATAL: > Thread 'pool-643-thread-1' produced an uncaught exception. Stopping the > process... > java.util.concurrent.CompletionException: > java.util.concurrent.RejectedExecutionException: Task > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@4bbc0b10 > rejected from > java.util.concurrent.ScheduledThreadPoolExecutor@7a45cd9a[Shutting down, pool > size = 1, active threads = 1, queued tasks = 1, completed tasks = 194] > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) > ~[?:1.8.0_392] > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) > ~[?:1.8.0_392] > at > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:838) > ~[?:1.8.0_392] > at > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > ~[?:1.8.0_392] > at > java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:851) > ~[?:1.8.0_392] > at > java.util.concurrent.CompletableFuture.handleAsync(CompletableFuture.java:2178) > ~[?:1.8.0_392] > at > org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer.allocateSlot(DefaultSlotStatusSyncer.java:138) > ~[classes/:?] > at > org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.allocateSlotsAccordingTo(FineGrainedSlotManager.java:722) > ~[classes/:?] > at > org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.checkResourceRequirements(FineGrainedSlotManager.java:645) > ~[classes/:?] > at > org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.lambda$null$12(FineGrainedSlotManager.java:603) > ~[classes/:?] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [?:1.8.0_392] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [?:1.8.0_392] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > [?:1.8.0_392] > at >
[jira] [Resolved] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
[ https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl resolved FLINK-34403. --- Fix Version/s: 1.20.0 Resolution: Fixed master: [9a316a5bcc47da7f69e76e0c25ed257adc4298ce|https://github.com/apache/flink/commit/9a316a5bcc47da7f69e76e0c25ed257adc4298ce] > VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM > - > > Key: FLINK-34403 > URL: https://issues.apache.org/jira/browse/FLINK-34403 > Project: Flink > Issue Type: Bug > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.20.0 >Reporter: Benchao Li >Assignee: Matthias Pohl >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.20.0 > > > After FLINK-33611 merged, the misc test on GHA cannot pass due to out of > memory error, throwing following exceptions: > {code:java} > Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest > Error: 05:43:21 05:43:21.773 [ERROR] > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time > elapsed: 40.97 s <<< ERROR! > Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in > serialization. > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007) > Feb 07 05:43:21 at > org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56) > Feb 07 05:43:21 at > org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:89) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:76) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:71) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple(VeryBigPbRowToProtoTest.java:37) > Feb 07 05:43:21 at java.lang.reflect.Method.invoke(Method.java:498) > Feb 07 05:43:21 Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalArgumentException: Self-suppression not permitted > Feb 07 05:43:21 at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > Feb 07 05:43:21 at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:323) > Feb 07 05:43:21 ... 18 more > Feb 07 05:43:21 Caused by: java.lang.IllegalArgumentException: > Self-suppression not permitted > Feb 07 05:43:21 at > java.lang.Throwable.addSuppressed(Throwable.java:1072) > Feb 07 05:43:21 at > org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:556) > Feb 07 05:43:21 at > org.apache.flink.util.InstantiationUtil.writeObjectToConfig(InstantiationUtil.java:486) > Feb 07 05:43:21 at >
[jira] [Commented] (FLINK-34418) Disk space issues for Docker-ized GitHub Action jobs
[ https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816937#comment-17816937 ] Matthias Pohl commented on FLINK-34418: --- https://github.com/apache/flink/actions/runs/7880739758 > Disk space issues for Docker-ized GitHub Action jobs > > > Key: FLINK-34418 > URL: https://issues.apache.org/jira/browse/FLINK-34418 > Project: Flink > Issue Type: Bug > Components: Test Infrastructure >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Critical > Labels: github-actions, pull-request-available, test-stability > > [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746] > {code:java} > [...] > Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device > 27608Feb 09 03:00:13 at java.io.FileOutputStream.writeBytes(Native Method) > 27609Feb 09 03:00:13 at > java.io.FileOutputStream.write(FileOutputStream.java:326) > 27610Feb 09 03:00:13 at > org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250) > 27611Feb 09 03:00:13 ... 39 more > [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32006) AsyncWaitOperatorTest.testProcessingTimeWithTimeoutFunctionOrderedWithRetry times out on Azure
[ https://issues.apache.org/jira/browse/FLINK-32006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816935#comment-17816935 ] Matthias Pohl commented on FLINK-32006: --- 1.18: https://github.com/apache/flink/actions/runs/7880739758/job/21503455883#step:10:9621 > AsyncWaitOperatorTest.testProcessingTimeWithTimeoutFunctionOrderedWithRetry > times out on Azure > -- > > Key: FLINK-32006 > URL: https://issues.apache.org/jira/browse/FLINK-32006 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Affects Versions: 1.18.0, 1.17.2, 1.19.0 >Reporter: David Morávek >Assignee: David Morávek >Priority: Critical > Labels: pull-request-available, stale-assigned, test-stability > > {code:java} > May 04 13:52:18 [ERROR] > org.apache.flink.streaming.api.operators.async.AsyncWaitOperatorTest.testProcessingTimeWithTimeoutFunctionOrderedWithRetry > Time elapsed: 100.009 s <<< ERROR! > May 04 13:52:18 org.junit.runners.model.TestTimedOutException: test timed out > after 100 seconds > May 04 13:52:18 at java.lang.Thread.sleep(Native Method) > May 04 13:52:18 at > org.apache.flink.streaming.api.operators.async.AsyncWaitOperatorTest.testProcessingTimeAlwaysTimeoutFunctionWithRetry(AsyncWaitOperatorTest.java:1313) > May 04 13:52:18 at > org.apache.flink.streaming.api.operators.async.AsyncWaitOperatorTest.testProcessingTimeWithTimeoutFunctionOrderedWithRetry(AsyncWaitOperatorTest.java:1277) > May 04 13:52:18 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > May 04 13:52:18 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > May 04 13:52:18 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > May 04 13:52:18 at java.lang.reflect.Method.invoke(Method.java:498) > May 04 13:52:18 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > May 04 13:52:18 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > May 04 13:52:18 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > May 04 13:52:18 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > May 04 13:52:18 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) > May 04 13:52:18 at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > May 04 13:52:18 at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > May 04 13:52:18 at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > May 04 13:52:18 at java.lang.Thread.run(Thread.java:748) > {code} > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=48671=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9288 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34433) CollectionFunctionsITCase.test failed due to job restart
Matthias Pohl created FLINK-34433: - Summary: CollectionFunctionsITCase.test failed due to job restart Key: FLINK-34433 URL: https://issues.apache.org/jira/browse/FLINK-34433 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 1.19.0, 1.20.0 Reporter: Matthias Pohl https://github.com/apache/flink/actions/runs/7880739697/job/21503460772#step:10:11312 {code} Error: 02:33:24 02:33:24.955 [ERROR] Tests run: 439, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 56.57 s <<< FAILURE! -- in org.apache.flink.table.planner.functions.CollectionFunctionsITCase Error: 02:33:24 02:33:24.956 [ERROR] org.apache.flink.table.planner.functions.CollectionFunctionsITCase.test(TestCase)[81] -- Time elapsed: 1.141 s <<< ERROR! Feb 13 02:33:24 java.lang.RuntimeException: Job restarted Feb 13 02:33:24 at org.apache.flink.streaming.api.operators.collect.UncheckpointedCollectResultBuffer.sinkRestarted(UncheckpointedCollectResultBuffer.java:42) Feb 13 02:33:24 at org.apache.flink.streaming.api.operators.collect.AbstractCollectResultBuffer.dealWithResponse(AbstractCollectResultBuffer.java:87) Feb 13 02:33:24 at org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.next(CollectResultFetcher.java:124) Feb 13 02:33:24 at org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:126) Feb 13 02:33:24 at org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:100) Feb 13 02:33:24 at org.apache.flink.table.planner.connectors.CollectDynamicSink$CloseableRowIteratorWrapper.hasNext(CollectDynamicSink.java:247) Feb 13 02:33:24 at org.assertj.core.internal.Iterators.assertHasNext(Iterators.java:49) Feb 13 02:33:24 at org.assertj.core.api.AbstractIteratorAssert.hasNext(AbstractIteratorAssert.java:60) Feb 13 02:33:24 at org.apache.flink.table.planner.functions.BuiltInFunctionTestBase$ResultTestItem.test(BuiltInFunctionTestBase.java:383) Feb 13 02:33:24 at org.apache.flink.table.planner.functions.BuiltInFunctionTestBase$TestSetSpec.lambda$getTestCase$4(BuiltInFunctionTestBase.java:341) Feb 13 02:33:24 at org.apache.flink.table.planner.functions.BuiltInFunctionTestBase$TestCase.execute(BuiltInFunctionTestBase.java:119) Feb 13 02:33:24 at org.apache.flink.table.planner.functions.BuiltInFunctionTestBase.test(BuiltInFunctionTestBase.java:99) Feb 13 02:33:24 at java.lang.reflect.Method.invoke(Method.java:498) Feb 13 02:33:24 at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) Feb 13 02:33:24 at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) Feb 13 02:33:24 at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) Feb 13 02:33:24 at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) Feb 13 02:33:24 at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
[ https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl reassigned FLINK-34403: - Assignee: Matthias Pohl > VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM > - > > Key: FLINK-34403 > URL: https://issues.apache.org/jira/browse/FLINK-34403 > Project: Flink > Issue Type: Bug > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.20.0 >Reporter: Benchao Li >Assignee: Matthias Pohl >Priority: Critical > Labels: pull-request-available, test-stability > > After FLINK-33611 merged, the misc test on GHA cannot pass due to out of > memory error, throwing following exceptions: > {code:java} > Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest > Error: 05:43:21 05:43:21.773 [ERROR] > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time > elapsed: 40.97 s <<< ERROR! > Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in > serialization. > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007) > Feb 07 05:43:21 at > org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56) > Feb 07 05:43:21 at > org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:89) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:76) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:71) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple(VeryBigPbRowToProtoTest.java:37) > Feb 07 05:43:21 at java.lang.reflect.Method.invoke(Method.java:498) > Feb 07 05:43:21 Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalArgumentException: Self-suppression not permitted > Feb 07 05:43:21 at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > Feb 07 05:43:21 at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:323) > Feb 07 05:43:21 ... 18 more > Feb 07 05:43:21 Caused by: java.lang.IllegalArgumentException: > Self-suppression not permitted > Feb 07 05:43:21 at > java.lang.Throwable.addSuppressed(Throwable.java:1072) > Feb 07 05:43:21 at > org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:556) > Feb 07 05:43:21 at > org.apache.flink.util.InstantiationUtil.writeObjectToConfig(InstantiationUtil.java:486) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamConfig.lambda$triggerSerializationAndReturnFuture$0(StreamConfig.java:182) > Feb 07 05:43:21 at > java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670)
[jira] [Commented] (FLINK-28440) EventTimeWindowCheckpointingITCase failed with restore
[ https://issues.apache.org/jira/browse/FLINK-28440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816928#comment-17816928 ] Matthias Pohl commented on FLINK-28440: --- https://github.com/apache/flink/actions/runs/7880739609/job/21503465125#step:10:7557 > EventTimeWindowCheckpointingITCase failed with restore > -- > > Key: FLINK-28440 > URL: https://issues.apache.org/jira/browse/FLINK-28440 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing, Runtime / State Backends >Affects Versions: 1.16.0, 1.17.0, 1.18.0, 1.19.0 >Reporter: Huang Xingbo >Assignee: Yanfei Lei >Priority: Critical > Labels: auto-deprioritized-critical, pull-request-available, > stale-assigned, test-stability > Fix For: 1.19.0 > > Attachments: image-2023-02-01-00-51-54-506.png, > image-2023-02-01-01-10-01-521.png, image-2023-02-01-01-19-12-182.png, > image-2023-02-01-16-47-23-756.png, image-2023-02-01-16-57-43-889.png, > image-2023-02-02-10-52-56-599.png, image-2023-02-03-10-09-07-586.png, > image-2023-02-03-12-03-16-155.png, image-2023-02-03-12-03-56-614.png > > > {code:java} > Caused by: java.lang.Exception: Exception while creating > StreamOperatorStateContext. > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:256) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268) > at > org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:722) > at > org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:698) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:665) > at > org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:935) > at > org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:904) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:728) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.flink.util.FlinkException: Could not restore keyed > state backend for WindowOperator_0a448493b4782967b150582570326227_(2/4) from > any of the 1 provided restore options. > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:353) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:165) > ... 11 more > Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: > /tmp/junit1835099326935900400/junit1113650082510421526/52ee65b7-033f-4429-8ddd-adbe85e27ced > (No such file or directory) > at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321) > at > org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.advance(StateChangelogHandleStreamHandleReader.java:87) > at > org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.hasNext(StateChangelogHandleStreamHandleReader.java:69) > at > org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.readBackendHandle(ChangelogBackendRestoreOperation.java:96) > at > org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.restore(ChangelogBackendRestoreOperation.java:75) > at > org.apache.flink.state.changelog.ChangelogStateBackend.restore(ChangelogStateBackend.java:92) > at > org.apache.flink.state.changelog.AbstractChangelogStateBackend.createKeyedStateBackend(AbstractChangelogStateBackend.java:136) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:336) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) > ... 13 more > Caused by: java.io.FileNotFoundException: >
[jira] [Updated] (FLINK-34427) ResourceManagerTaskExecutorTest fails fatally (exit code 239)
[ https://issues.apache.org/jira/browse/FLINK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34427: -- Component/s: Runtime / Coordination > ResourceManagerTaskExecutorTest fails fatally (exit code 239) > - > > Key: FLINK-34427 > URL: https://issues.apache.org/jira/browse/FLINK-34427 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.19.0, 1.20.0 >Reporter: Matthias Pohl >Priority: Critical > Labels: test-stability > > https://github.com/apache/flink/actions/runs/7866453350/job/21460921911#step:10:8959 > {code} > Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239 > Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests: > Error: 02:28:53 02:28:53.220 [ERROR] > org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest > Error: 02:28:53 02:28:53.220 [ERROR] > org.apache.maven.surefire.booter.SurefireBooterForkException: > ExecutionException The forked VM terminated without properly saying goodbye. > VM crash or System.exit called? > Error: 02:28:53 02:28:53.220 [ERROR] Command was /bin/sh -c cd > '/root/flink/flink-runtime' && > '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' '-XX:+UseG1GC' '-Xms256m' > '-XX:+IgnoreUnrecognizedVMOptions' > '--add-opens=java.base/java.util=ALL-UNNAMED' > '--add-opens=java.base/java.lang=ALL-UNNAMED' > '--add-opens=java.base/java.net=ALL-UNNAMED' > '--add-opens=java.base/java.io=ALL-UNNAMED' > '--add-opens=java.base/java.util.concurrent=ALL-UNNAMED' '-Xmx768m' '-jar' > '/root/flink/flink-runtime/target/surefire/surefirebooter-20240212022332296_94.jar' > '/root/flink/flink-runtime/target/surefire' > '2024-02-12T02-21-39_495-jvmRun3' 'surefire-20240212022332296_88tmp' > 'surefire_26-20240212022332296_91tmp' > Error: 02:28:53 02:28:53.220 [ERROR] Error occurred in starting fork, check > output in log > Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239 > Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests: > Error: 02:28:53 02:28:53.221 [ERROR] > org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest > Error: 02:28:53 02:28:53.221 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456) > [...] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
[ https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816876#comment-17816876 ] Matthias Pohl commented on FLINK-34403: --- * https://dev.azure.com/apache-flink/web/build.aspx?pcguid=2d3c0ac8-fecf-45be-8407-6d87302181a9=vstfs%3a%2f%2f%2fBuild%2fBuild%2f57469_data=ew0KICAic291cmNlIjogIlNsYWNrUGlwZWxpbmVzQXBwIiwNCiAgInNvdXJjZV9ldmVudF9uYW1lIjogImJ1aWxkLmNvbXBsZXRlIg0KfQ%3d%3d * https://dev.azure.com/apache-flink/web/build.aspx?pcguid=2d3c0ac8-fecf-45be-8407-6d87302181a9=vstfs%3a%2f%2f%2fBuild%2fBuild%2f57489_data=ew0KICAic291cmNlIjogIlNsYWNrUGlwZWxpbmVzQXBwIiwNCiAgInNvdXJjZV9ldmVudF9uYW1lIjogImJ1aWxkLmNvbXBsZXRlIg0KfQ%3d%3d * https://dev.azure.com/apache-flink/web/build.aspx?pcguid=2d3c0ac8-fecf-45be-8407-6d87302181a9=vstfs%3a%2f%2f%2fBuild%2fBuild%2f57491_data=ew0KICAic291cmNlIjogIlNsYWNrUGlwZWxpbmVzQXBwIiwNCiAgInNvdXJjZV9ldmVudF9uYW1lIjogImJ1aWxkLmNvbXBsZXRlIg0KfQ%3d%3d > VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM > - > > Key: FLINK-34403 > URL: https://issues.apache.org/jira/browse/FLINK-34403 > Project: Flink > Issue Type: Bug > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.20.0 >Reporter: Benchao Li >Priority: Critical > Labels: pull-request-available, test-stability > > After FLINK-33611 merged, the misc test on GHA cannot pass due to out of > memory error, throwing following exceptions: > {code:java} > Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest > Error: 05:43:21 05:43:21.773 [ERROR] > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time > elapsed: 40.97 s <<< ERROR! > Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in > serialization. > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007) > Feb 07 05:43:21 at > org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56) > Feb 07 05:43:21 at > org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:89) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:76) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:71) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple(VeryBigPbRowToProtoTest.java:37) > Feb 07 05:43:21 at java.lang.reflect.Method.invoke(Method.java:498) > Feb 07 05:43:21 Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalArgumentException: Self-suppression not permitted > Feb 07 05:43:21 at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > Feb 07 05:43:21 at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) > Feb 07 05:43:21 at >
[jira] [Commented] (FLINK-34273) git fetch fails
[ https://issues.apache.org/jira/browse/FLINK-34273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816875#comment-17816875 ] Matthias Pohl commented on FLINK-34273: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57492=logs=245e1f2e-ba5b-5570-d689-25ae21e5302f=a47dd1b5-aa0a-596a-799b-05a053059d14 > git fetch fails > --- > > Key: FLINK-34273 > URL: https://issues.apache.org/jira/browse/FLINK-34273 > Project: Flink > Issue Type: Bug > Components: Build System / CI, Test Infrastructure >Affects Versions: 1.19.0, 1.18.1 >Reporter: Matthias Pohl >Priority: Critical > Labels: test-stability > > We've seen multiple {{git fetch}} failures. I assume this to be an > infrastructure issue. This Jira issue is for documentation purposes. > {code:java} > error: RPC failed; curl 18 transfer closed with outstanding read data > remaining > error: 5211 bytes of body are still expected > fetch-pack: unexpected disconnect while reading sideband packet > fatal: early EOF > fatal: fetch-pack: invalid index-pack output {code} > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57080=logs=0e7be18f-84f2-53f0-a32d-4a5e4a174679=5d6dc3d3-393d-5111-3a40-c6a5a36202e6=667 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34418) Disk space issues for Docker-ized GitHub Action jobs
[ https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816602#comment-17816602 ] Matthias Pohl commented on FLINK-34418: --- https://github.com/apache/flink/actions/runs/7869012663/job/21467582110 > Disk space issues for Docker-ized GitHub Action jobs > > > Key: FLINK-34418 > URL: https://issues.apache.org/jira/browse/FLINK-34418 > Project: Flink > Issue Type: Bug > Components: Test Infrastructure >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Critical > Labels: github-actions, pull-request-available, test-stability > > [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746] > {code:java} > [...] > Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device > 27608Feb 09 03:00:13 at java.io.FileOutputStream.writeBytes(Native Method) > 27609Feb 09 03:00:13 at > java.io.FileOutputStream.write(FileOutputStream.java:326) > 27610Feb 09 03:00:13 at > org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250) > 27611Feb 09 03:00:13 ... 39 more > [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34418) Disk space issues for Docker-ized GitHub Action jobs
[ https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816603#comment-17816603 ] Matthias Pohl commented on FLINK-34418: --- https://github.com/apache/flink/actions/runs/7870763675 > Disk space issues for Docker-ized GitHub Action jobs > > > Key: FLINK-34418 > URL: https://issues.apache.org/jira/browse/FLINK-34418 > Project: Flink > Issue Type: Bug > Components: Test Infrastructure >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Critical > Labels: github-actions, pull-request-available, test-stability > > [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746] > {code:java} > [...] > Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device > 27608Feb 09 03:00:13 at java.io.FileOutputStream.writeBytes(Native Method) > 27609Feb 09 03:00:13 at > java.io.FileOutputStream.write(FileOutputStream.java:326) > 27610Feb 09 03:00:13 at > org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250) > 27611Feb 09 03:00:13 ... 39 more > [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
[ https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816591#comment-17816591 ] Matthias Pohl commented on FLINK-34403: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57464=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23485 > VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM > - > > Key: FLINK-34403 > URL: https://issues.apache.org/jira/browse/FLINK-34403 > Project: Flink > Issue Type: Bug > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.20.0 >Reporter: Benchao Li >Priority: Critical > Labels: test-stability > > After FLINK-33611 merged, the misc test on GHA cannot pass due to out of > memory error, throwing following exceptions: > {code:java} > Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest > Error: 05:43:21 05:43:21.773 [ERROR] > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time > elapsed: 40.97 s <<< ERROR! > Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in > serialization. > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007) > Feb 07 05:43:21 at > org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56) > Feb 07 05:43:21 at > org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:89) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:76) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:71) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple(VeryBigPbRowToProtoTest.java:37) > Feb 07 05:43:21 at java.lang.reflect.Method.invoke(Method.java:498) > Feb 07 05:43:21 Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalArgumentException: Self-suppression not permitted > Feb 07 05:43:21 at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > Feb 07 05:43:21 at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:323) > Feb 07 05:43:21 ... 18 more > Feb 07 05:43:21 Caused by: java.lang.IllegalArgumentException: > Self-suppression not permitted > Feb 07 05:43:21 at > java.lang.Throwable.addSuppressed(Throwable.java:1072) > Feb 07 05:43:21 at > org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:556) > Feb 07 05:43:21 at > org.apache.flink.util.InstantiationUtil.writeObjectToConfig(InstantiationUtil.java:486) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamConfig.lambda$triggerSerializationAndReturnFuture$0(StreamConfig.java:182)
[jira] [Resolved] (FLINK-34411) "Wordcount on Docker test (custom fs plugin)" timed out with some strange issue while setting the test up
[ https://issues.apache.org/jira/browse/FLINK-34411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl resolved FLINK-34411. --- Assignee: Matthias Pohl Resolution: Fixed I rebased {{dev-1.19}} to {{dev-master}} and provided a fix for the snapshot CI in {{master}} apache/flink-docker@master: 2c169b6a83bf83bbe997ed35aaf548de10050b58 > "Wordcount on Docker test (custom fs plugin)" timed out with some strange > issue while setting the test up > - > > Key: FLINK-34411 > URL: https://issues.apache.org/jira/browse/FLINK-34411 > Project: Flink > Issue Type: Bug > Components: Test Infrastructure >Affects Versions: 1.19.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Critical > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57380=logs=bea52777-eaf8-5663-8482-18fbc3630e81=43ba8ce7-ebbf-57cd-9163-444305d74117=5802 > {code} > Feb 07 15:22:39 > == > Feb 07 15:22:39 Running 'Wordcount on Docker test (custom fs plugin)' > Feb 07 15:22:39 > == > Feb 07 15:22:39 TEST_DATA_DIR: > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-39516987853 > Feb 07 15:22:40 Flink dist directory: > /home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT > Feb 07 15:22:40 Flink dist directory: > /home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT > Feb 07 15:22:41 Docker version 24.0.7, build afdd53b > Feb 07 15:22:44 docker-compose version 1.29.2, build 5becea4c > Feb 07 15:22:44 Starting fileserver for Flink distribution > Feb 07 15:22:44 ~/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin > ~/work/1/s > Feb 07 15:23:07 ~/work/1/s > Feb 07 15:23:07 > ~/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-39516987853 > ~/work/1/s > Feb 07 15:23:07 Preparing Dockeriles > Feb 07 15:23:07 Executing command: git clone > https://github.com/apache/flink-docker.git --branch dev-1.19 --single-branch > Cloning into 'flink-docker'... > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/common_docker.sh: > line 65: ./add-custom.sh: No such file or directory > Feb 07 15:23:07 Building images > ERROR: unable to prepare context: path "dev/test_docker_embedded_job-ubuntu" > not found > Feb 07 15:23:09 ~/work/1/s > Feb 07 15:23:09 Command: build_image test_docker_embedded_job failed. > Retrying... > Feb 07 15:23:14 Starting fileserver for Flink distribution > Feb 07 15:23:14 ~/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin > ~/work/1/s > Feb 07 15:23:36 ~/work/1/s > Feb 07 15:23:36 > ~/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-39516987853 > ~/work/1/s > Feb 07 15:23:36 Preparing Dockeriles > Feb 07 15:23:36 Executing command: git clone > https://github.com/apache/flink-docker.git --branch dev-1.19 --single-branch > fatal: destination path 'flink-docker' already exists and is not an empty > directory. > Feb 07 15:23:36 Retry 1/5 exited 128, retrying in 1 seconds... > Traceback (most recent call last): > File > "/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/python3_fileserver.py", > line 26, in > httpd = socketserver.TCPServer(("", ), handler) > File "/usr/lib/python3.8/socketserver.py", line 452, in __init__ > self.server_bind() > File "/usr/lib/python3.8/socketserver.py", line 466, in server_bind > self.socket.bind(self.server_address) > OSError: [Errno 98] Address already in use > [...] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34411) "Wordcount on Docker test (custom fs plugin)" timed out with some strange issue while setting the test up
[ https://issues.apache.org/jira/browse/FLINK-34411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816538#comment-17816538 ] Matthias Pohl commented on FLINK-34411: --- * https://github.com/apache/flink/actions/runs/7838691836/job/21390782645 > "Wordcount on Docker test (custom fs plugin)" timed out with some strange > issue while setting the test up > - > > Key: FLINK-34411 > URL: https://issues.apache.org/jira/browse/FLINK-34411 > Project: Flink > Issue Type: Bug > Components: Test Infrastructure >Affects Versions: 1.19.0 >Reporter: Matthias Pohl >Priority: Critical > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57380=logs=bea52777-eaf8-5663-8482-18fbc3630e81=43ba8ce7-ebbf-57cd-9163-444305d74117=5802 > {code} > Feb 07 15:22:39 > == > Feb 07 15:22:39 Running 'Wordcount on Docker test (custom fs plugin)' > Feb 07 15:22:39 > == > Feb 07 15:22:39 TEST_DATA_DIR: > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-39516987853 > Feb 07 15:22:40 Flink dist directory: > /home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT > Feb 07 15:22:40 Flink dist directory: > /home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT > Feb 07 15:22:41 Docker version 24.0.7, build afdd53b > Feb 07 15:22:44 docker-compose version 1.29.2, build 5becea4c > Feb 07 15:22:44 Starting fileserver for Flink distribution > Feb 07 15:22:44 ~/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin > ~/work/1/s > Feb 07 15:23:07 ~/work/1/s > Feb 07 15:23:07 > ~/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-39516987853 > ~/work/1/s > Feb 07 15:23:07 Preparing Dockeriles > Feb 07 15:23:07 Executing command: git clone > https://github.com/apache/flink-docker.git --branch dev-1.19 --single-branch > Cloning into 'flink-docker'... > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/common_docker.sh: > line 65: ./add-custom.sh: No such file or directory > Feb 07 15:23:07 Building images > ERROR: unable to prepare context: path "dev/test_docker_embedded_job-ubuntu" > not found > Feb 07 15:23:09 ~/work/1/s > Feb 07 15:23:09 Command: build_image test_docker_embedded_job failed. > Retrying... > Feb 07 15:23:14 Starting fileserver for Flink distribution > Feb 07 15:23:14 ~/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin > ~/work/1/s > Feb 07 15:23:36 ~/work/1/s > Feb 07 15:23:36 > ~/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-39516987853 > ~/work/1/s > Feb 07 15:23:36 Preparing Dockeriles > Feb 07 15:23:36 Executing command: git clone > https://github.com/apache/flink-docker.git --branch dev-1.19 --single-branch > fatal: destination path 'flink-docker' already exists and is not an empty > directory. > Feb 07 15:23:36 Retry 1/5 exited 128, retrying in 1 seconds... > Traceback (most recent call last): > File > "/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/python3_fileserver.py", > line 26, in > httpd = socketserver.TCPServer(("", ), handler) > File "/usr/lib/python3.8/socketserver.py", line 452, in __init__ > self.server_bind() > File "/usr/lib/python3.8/socketserver.py", line 466, in server_bind > self.socket.bind(self.server_address) > OSError: [Errno 98] Address already in use > [...] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34428) WindowAggregateITCase#testEventTimeHopWindow_GroupingSets times out
Matthias Pohl created FLINK-34428: - Summary: WindowAggregateITCase#testEventTimeHopWindow_GroupingSets times out Key: FLINK-34428 URL: https://issues.apache.org/jira/browse/FLINK-34428 Project: Flink Issue Type: Bug Components: Table SQL / API Affects Versions: 1.18.1 Reporter: Matthias Pohl https://github.com/apache/flink/actions/runs/7866453368/job/21460921339#step:10:15127 {code} "main" #1 prio=5 os_prio=0 tid=0x7f1770cb7000 nid=0x4ad4d waiting on condition [0x7f17711f6000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xab48e3a0> (a java.util.concurrent.CompletableFuture$Signaller) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707) at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2131) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2099) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2077) at org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:876) at org.apache.flink.table.planner.runtime.stream.sql.WindowAggregateITCase.testTumbleWindowWithoutOutputWindowColumns(WindowAggregateITCase.scala:477) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34418) Disk space issues for Docker-ized GitHub Action jobs
[ https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34418: -- Summary: Disk space issues for Docker-ized GitHub Action jobs (was: YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots failed due to disk space) > Disk space issues for Docker-ized GitHub Action jobs > > > Key: FLINK-34418 > URL: https://issues.apache.org/jira/browse/FLINK-34418 > Project: Flink > Issue Type: Bug > Components: Test Infrastructure >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Critical > Labels: github-actions, test-stability > > [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746] > {code:java} > [...] > Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device > 27608Feb 09 03:00:13 at java.io.FileOutputStream.writeBytes(Native Method) > 27609Feb 09 03:00:13 at > java.io.FileOutputStream.write(FileOutputStream.java:326) > 27610Feb 09 03:00:13 at > org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250) > 27611Feb 09 03:00:13 ... 39 more > [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34418) YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots fa
[ https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816537#comment-17816537 ] Matthias Pohl commented on FLINK-34418: --- https://github.com/apache/flink/actions/runs/7866453368 > YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots > failed due to disk space > - > > Key: FLINK-34418 > URL: https://issues.apache.org/jira/browse/FLINK-34418 > Project: Flink > Issue Type: Bug > Components: Test Infrastructure >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Critical > Labels: github-actions, test-stability > > [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746] > {code:java} > [...] > Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device > 27608Feb 09 03:00:13 at java.io.FileOutputStream.writeBytes(Native Method) > 27609Feb 09 03:00:13 at > java.io.FileOutputStream.write(FileOutputStream.java:326) > 27610Feb 09 03:00:13 at > org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250) > 27611Feb 09 03:00:13 ... 39 more > [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34427) ResourceManagerTaskExecutorTest fails fatally (exit code 239)
[ https://issues.apache.org/jira/browse/FLINK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34427: -- Affects Version/s: 1.19.0 1.20.0 > ResourceManagerTaskExecutorTest fails fatally (exit code 239) > - > > Key: FLINK-34427 > URL: https://issues.apache.org/jira/browse/FLINK-34427 > Project: Flink > Issue Type: Bug >Affects Versions: 1.19.0, 1.20.0 >Reporter: Matthias Pohl >Priority: Critical > Labels: test-stability > > https://github.com/apache/flink/actions/runs/7866453350/job/21460921911#step:10:8959 > {code} > Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239 > Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests: > Error: 02:28:53 02:28:53.220 [ERROR] > org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest > Error: 02:28:53 02:28:53.220 [ERROR] > org.apache.maven.surefire.booter.SurefireBooterForkException: > ExecutionException The forked VM terminated without properly saying goodbye. > VM crash or System.exit called? > Error: 02:28:53 02:28:53.220 [ERROR] Command was /bin/sh -c cd > '/root/flink/flink-runtime' && > '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' '-XX:+UseG1GC' '-Xms256m' > '-XX:+IgnoreUnrecognizedVMOptions' > '--add-opens=java.base/java.util=ALL-UNNAMED' > '--add-opens=java.base/java.lang=ALL-UNNAMED' > '--add-opens=java.base/java.net=ALL-UNNAMED' > '--add-opens=java.base/java.io=ALL-UNNAMED' > '--add-opens=java.base/java.util.concurrent=ALL-UNNAMED' '-Xmx768m' '-jar' > '/root/flink/flink-runtime/target/surefire/surefirebooter-20240212022332296_94.jar' > '/root/flink/flink-runtime/target/surefire' > '2024-02-12T02-21-39_495-jvmRun3' 'surefire-20240212022332296_88tmp' > 'surefire_26-20240212022332296_91tmp' > Error: 02:28:53 02:28:53.220 [ERROR] Error occurred in starting fork, check > output in log > Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239 > Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests: > Error: 02:28:53 02:28:53.221 [ERROR] > org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest > Error: 02:28:53 02:28:53.221 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456) > [...] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34427) ResourceManagerTaskExecutorTest fails fatally (exit code 239)
Matthias Pohl created FLINK-34427: - Summary: ResourceManagerTaskExecutorTest fails fatally (exit code 239) Key: FLINK-34427 URL: https://issues.apache.org/jira/browse/FLINK-34427 Project: Flink Issue Type: Bug Reporter: Matthias Pohl https://github.com/apache/flink/actions/runs/7866453350/job/21460921911#step:10:8959 {code} Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239 Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests: Error: 02:28:53 02:28:53.220 [ERROR] org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest Error: 02:28:53 02:28:53.220 [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called? Error: 02:28:53 02:28:53.220 [ERROR] Command was /bin/sh -c cd '/root/flink/flink-runtime' && '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' '-XX:+UseG1GC' '-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' '--add-opens=java.base/java.util=ALL-UNNAMED' '--add-opens=java.base/java.lang=ALL-UNNAMED' '--add-opens=java.base/java.net=ALL-UNNAMED' '--add-opens=java.base/java.io=ALL-UNNAMED' '--add-opens=java.base/java.util.concurrent=ALL-UNNAMED' '-Xmx768m' '-jar' '/root/flink/flink-runtime/target/surefire/surefirebooter-20240212022332296_94.jar' '/root/flink/flink-runtime/target/surefire' '2024-02-12T02-21-39_495-jvmRun3' 'surefire-20240212022332296_88tmp' 'surefire_26-20240212022332296_91tmp' Error: 02:28:53 02:28:53.220 [ERROR] Error occurred in starting fork, check output in log Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239 Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests: Error: 02:28:53 02:28:53.221 [ERROR] org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest Error: 02:28:53 02:28:53.221 [ERROR]at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456) [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816534#comment-17816534 ] Matthias Pohl commented on FLINK-33186: --- https://github.com/apache/flink/actions/runs/7866453155/job/21460933108#step:10:7710 > CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished > fails on AZP > - > > Key: FLINK-33186 > URL: https://issues.apache.org/jira/browse/FLINK-33186 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.19.0, 1.18.1 >Reporter: Sergey Nuyanzin >Assignee: Jiang Xin >Priority: Critical > Labels: test-stability > > This build > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509=logs=baf26b34-3c6a-54e8-f93f-cf269b32f802=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9=8762 > fails as > {noformat} > Sep 28 01:23:43 Caused by: > org.apache.flink.runtime.checkpoint.CheckpointException: Task local > checkpoint failure. > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817) > Sep 28 01:23:43 at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > Sep 28 01:23:43 at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > Sep 28 01:23:43 at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34418) YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots fa
[ https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816532#comment-17816532 ] Matthias Pohl commented on FLINK-34418: --- https://github.com/apache/flink/actions/runs/7861970334 > YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots > failed due to disk space > - > > Key: FLINK-34418 > URL: https://issues.apache.org/jira/browse/FLINK-34418 > Project: Flink > Issue Type: Bug > Components: Test Infrastructure >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Critical > Labels: github-actions, test-stability > > [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746] > {code:java} > [...] > Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device > 27608Feb 09 03:00:13 at java.io.FileOutputStream.writeBytes(Native Method) > 27609Feb 09 03:00:13 at > java.io.FileOutputStream.write(FileOutputStream.java:326) > 27610Feb 09 03:00:13 at > org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250) > 27611Feb 09 03:00:13 ... 39 more > [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
[ https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816519#comment-17816519 ] Matthias Pohl edited comment on FLINK-34403 at 2/12/24 8:26 AM: * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57422=results * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57428=results * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57440=results * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57454=results * https://github.com/apache/flink/actions/runs/7831121355/job/21367169878#step:10:23505 * https://github.com/apache/flink/actions/runs/7823924194/job/21345848746#step:10:23507 * https://github.com/apache/flink/actions/runs/7823895861 * https://github.com/apache/flink/actions/runs/7838691422 * https://github.com/apache/flink/actions/runs/7851900601 * https://github.com/apache/flink/actions/runs/7859002096/job/21444979868#step:10:23510 was (Author: mapohl): * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57422=results * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57428=results * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57440=results * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57454=results * https://github.com/apache/flink/actions/runs/7831121355/job/21367169878#step:10:23505 * https://github.com/apache/flink/actions/runs/7823924194/job/21345848746#step:10:23507 * https://github.com/apache/flink/actions/runs/7823895861 * https://github.com/apache/flink/actions/runs/7838691422 * https://github.com/apache/flink/actions/runs/7851900601 > VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM > - > > Key: FLINK-34403 > URL: https://issues.apache.org/jira/browse/FLINK-34403 > Project: Flink > Issue Type: Bug > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.20.0 >Reporter: Benchao Li >Priority: Critical > Labels: test-stability > > After FLINK-33611 merged, the misc test on GHA cannot pass due to out of > memory error, throwing following exceptions: > {code:java} > Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest > Error: 05:43:21 05:43:21.773 [ERROR] > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time > elapsed: 40.97 s <<< ERROR! > Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in > serialization. > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007) > Feb 07 05:43:21 at > org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56) > Feb 07 05:43:21 at > org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:89) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:76) > Feb 07 05:43:21
[jira] [Commented] (FLINK-33958) Implement restore tests for IntervalJoin node
[ https://issues.apache.org/jira/browse/FLINK-33958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816530#comment-17816530 ] Matthias Pohl commented on FLINK-33958: --- * https://github.com/apache/flink/actions/runs/7831121355/job/21367168844#step:10:11257 > Implement restore tests for IntervalJoin node > - > > Key: FLINK-33958 > URL: https://issues.apache.org/jira/browse/FLINK-33958 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Planner >Reporter: Bonnie Varghese >Assignee: Bonnie Varghese >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-34403) VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM
[ https://issues.apache.org/jira/browse/FLINK-34403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816519#comment-17816519 ] Matthias Pohl edited comment on FLINK-34403 at 2/12/24 8:25 AM: * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57422=results * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57428=results * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57440=results * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57454=results * https://github.com/apache/flink/actions/runs/7831121355/job/21367169878#step:10:23505 * https://github.com/apache/flink/actions/runs/7823924194/job/21345848746#step:10:23507 * https://github.com/apache/flink/actions/runs/7823895861 * https://github.com/apache/flink/actions/runs/7838691422 * https://github.com/apache/flink/actions/runs/7851900601 was (Author: mapohl): * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57422=results * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57428=results * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57440=results * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57454=results > VeryBigPbProtoToRowTest#testSimple cannot pass due to OOM > - > > Key: FLINK-34403 > URL: https://issues.apache.org/jira/browse/FLINK-34403 > Project: Flink > Issue Type: Bug > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.20.0 >Reporter: Benchao Li >Priority: Critical > Labels: test-stability > > After FLINK-33611 merged, the misc test on GHA cannot pass due to out of > memory error, throwing following exceptions: > {code:java} > Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest > Error: 05:43:21 05:43:21.773 [ERROR] > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time > elapsed: 40.97 s <<< ERROR! > Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in > serialization. > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007) > Feb 07 05:43:21 at > org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56) > Feb 07 05:43:21 at > org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104) > Feb 07 05:43:21 at > org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382) > Feb 07 05:43:21 at > org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:89) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:76) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:71) > Feb 07 05:43:21 at > org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple(VeryBigPbRowToProtoTest.java:37) > Feb 07 05:43:21 at java.lang.reflect.Method.invoke(Method.java:498) > Feb 07 05:43:21 Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalArgumentException: Self-suppression
[jira] [Commented] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
[ https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816531#comment-17816531 ] Matthias Pohl commented on FLINK-22765: --- https://github.com/apache/flink/actions/runs/7859001687/job/21444942424#step:10:8685 > ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable > > > Key: FLINK-22765 > URL: https://issues.apache.org/jira/browse/FLINK-22765 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0, 1.20.0 >Reporter: Robert Metzger >Assignee: Robert Metzger >Priority: Major > Labels: pull-request-available, stale-assigned, test-stability > Fix For: 1.14.0, 1.16.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56 > {code} > May 25 00:56:38 java.lang.AssertionError: > May 25 00:56:38 > May 25 00:56:38 Expected: is "" > May 25 00:56:38 but: was "The system is out of resources.\nConsult the > following stack trace for details." > May 25 00:56:38 at > org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > May 25 00:56:38 at org.junit.Assert.assertThat(Assert.java:956) > May 25 00:56:38 at org.junit.Assert.assertThat(Assert.java:923) > May 25 00:56:38 at > org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94) > May 25 00:56:38 at > org.apache.flink.runtime.util.ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError(ExceptionUtilsITCase.java:70) > May 25 00:56:38 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > May 25 00:56:38 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > May 25 00:56:38 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > May 25 00:56:38 at java.lang.reflect.Method.invoke(Method.java:498) > May 25 00:56:38 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > May 25 00:56:38 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > May 25 00:56:38 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > May 25 00:56:38 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > May 25 00:56:38 at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > May 25 00:56:38 at > org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > May 25 00:56:38 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > May 25 00:56:38 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > May 25 00:56:38 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > May 25 00:56:38 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > May 25 00:56:38 at > org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > May 25 00:56:38 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > May 25 00:56:38 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > May 25 00:56:38 at > org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > May 25 00:56:38 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > May 25 00:56:38 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > May 25 00:56:38 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > May 25 00:56:38 at > org.junit.runners.ParentRunner.run(ParentRunner.java:363) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > May 25 00:56:38 at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > May 25 00:56:38 at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > May 25 00:56:38 at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > May 25 00:56:38 at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > May 25 00:56:38 at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > May 25 00:56:38 > {code} -- This message was sent by
[jira] [Commented] (FLINK-34418) YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots fa
[ https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816529#comment-17816529 ] Matthias Pohl commented on FLINK-34418: --- This one still succeeded but got a disk space reaching limits warning: https://github.com/apache/flink/actions/runs/7859001687/job/21445027923#step:1:46 > YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots > failed due to disk space > - > > Key: FLINK-34418 > URL: https://issues.apache.org/jira/browse/FLINK-34418 > Project: Flink > Issue Type: Bug > Components: Test Infrastructure >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Critical > Labels: github-actions, test-stability > > [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746] > {code:java} > [...] > Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device > 27608Feb 09 03:00:13 at java.io.FileOutputStream.writeBytes(Native Method) > 27609Feb 09 03:00:13 at > java.io.FileOutputStream.write(FileOutputStream.java:326) > 27610Feb 09 03:00:13 at > org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250) > 27611Feb 09 03:00:13 ... 39 more > [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34418) YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots fa
[ https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816528#comment-17816528 ] Matthias Pohl commented on FLINK-34418: --- https://github.com/apache/flink/actions/runs/7859001632/job/21444955041 > YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots > failed due to disk space > - > > Key: FLINK-34418 > URL: https://issues.apache.org/jira/browse/FLINK-34418 > Project: Flink > Issue Type: Bug > Components: Test Infrastructure >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Critical > Labels: github-actions, test-stability > > [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746] > {code:java} > [...] > Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device > 27608Feb 09 03:00:13 at java.io.FileOutputStream.writeBytes(Native Method) > 27609Feb 09 03:00:13 at > java.io.FileOutputStream.write(FileOutputStream.java:326) > 27610Feb 09 03:00:13 at > org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250) > 27611Feb 09 03:00:13 ... 39 more > [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34418) YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots fa
[ https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816527#comment-17816527 ] Matthias Pohl commented on FLINK-34418: --- https://github.com/apache/flink/actions/runs/7851900779 > YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots > failed due to disk space > - > > Key: FLINK-34418 > URL: https://issues.apache.org/jira/browse/FLINK-34418 > Project: Flink > Issue Type: Bug > Components: Test Infrastructure >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Critical > Labels: github-actions, test-stability > > [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746] > {code:java} > [...] > Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device > 27608Feb 09 03:00:13 at java.io.FileOutputStream.writeBytes(Native Method) > 27609Feb 09 03:00:13 at > java.io.FileOutputStream.write(FileOutputStream.java:326) > 27610Feb 09 03:00:13 at > org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250) > 27611Feb 09 03:00:13 ... 39 more > [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34426) HybridShuffleITCase.testHybridSelectiveExchangesRestart times out
Matthias Pohl created FLINK-34426: - Summary: HybridShuffleITCase.testHybridSelectiveExchangesRestart times out Key: FLINK-34426 URL: https://issues.apache.org/jira/browse/FLINK-34426 Project: Flink Issue Type: Bug Components: Runtime / Network Affects Versions: 1.18.1 Reporter: Matthias Pohl https://github.com/apache/flink/actions/runs/7851900779/job/21429781783#step:10:9052 {code} "ForkJoinPool-1-worker-3" #16 daemon prio=5 os_prio=0 cpu=3397.79ms elapsed=11462.88s tid=0x7f48966b3800 nid=0x7a303 waiting on condition [0x7f486e97a000] java.lang.Thread.State: WAITING (parking) at jdk.internal.misc.Unsafe.park(java.base@11.0.19/Native Method) - parking to wait for <0xa2faa230> (a java.util.concurrent.CompletableFuture$Signaller) at java.util.concurrent.locks.LockSupport.park(java.base@11.0.19/LockSupport.java:194) at java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.19/CompletableFuture.java:1796) at java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.19/ForkJoinPool.java:3118) at java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.19/CompletableFuture.java:1823) at java.util.concurrent.CompletableFuture.get(java.base@11.0.19/CompletableFuture.java:1998) at org.apache.flink.util.AutoCloseableAsync.close(AutoCloseableAsync.java:36) at org.apache.flink.test.runtime.JobGraphRunningUtil.execute(JobGraphRunningUtil.java:61) at org.apache.flink.test.runtime.BatchShuffleITCaseBase.executeJob(BatchShuffleITCaseBase.java:117) at org.apache.flink.test.runtime.HybridShuffleITCase.testHybridSelectiveExchangesRestart(HybridShuffleITCase.java:79) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.19/Native Method) [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34418) YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots fa
[ https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816526#comment-17816526 ] Matthias Pohl commented on FLINK-34418: --- https://github.com/apache/flink/actions/runs/7851900616 > YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots > failed due to disk space > - > > Key: FLINK-34418 > URL: https://issues.apache.org/jira/browse/FLINK-34418 > Project: Flink > Issue Type: Bug > Components: Test Infrastructure >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Critical > Labels: github-actions, test-stability > > [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746] > {code:java} > [...] > Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device > 27608Feb 09 03:00:13 at java.io.FileOutputStream.writeBytes(Native Method) > 27609Feb 09 03:00:13 at > java.io.FileOutputStream.write(FileOutputStream.java:326) > 27610Feb 09 03:00:13 at > org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250) > 27611Feb 09 03:00:13 ... 39 more > [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34425) TaskManagerRunnerITCase#testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure times out
Matthias Pohl created FLINK-34425: - Summary: TaskManagerRunnerITCase#testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure times out Key: FLINK-34425 URL: https://issues.apache.org/jira/browse/FLINK-34425 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.19.0, 1.20.0 Reporter: Matthias Pohl https://github.com/apache/flink/actions/runs/7851900616/job/21429757962#step:10:8844 {code} Feb 10 03:21:45 "main" #1 [498632] prio=5 os_prio=0 cpu=619.91ms elapsed=1653.40s tid=0x7fbd29695000 nid=498632 waiting on condition [0x7fbd2b9f3000] Feb 10 03:21:45java.lang.Thread.State: WAITING (parking) Feb 10 03:21:45 at jdk.internal.misc.Unsafe.park(java.base@21.0.1/Native Method) Feb 10 03:21:45 - parking to wait for <0xae6199f0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) Feb 10 03:21:45 at java.util.concurrent.locks.LockSupport.park(java.base@21.0.1/LockSupport.java:371) Feb 10 03:21:45 at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(java.base@21.0.1/AbstractQueuedSynchronizer.java:519) Feb 10 03:21:45 at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@21.0.1/ForkJoinPool.java:3780) Feb 10 03:21:45 at java.util.concurrent.ForkJoinPool.managedBlock(java.base@21.0.1/ForkJoinPool.java:3725) Feb 10 03:21:45 at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@21.0.1/AbstractQueuedSynchronizer.java:1707) Feb 10 03:21:45 at java.lang.ProcessImpl.waitFor(java.base@21.0.1/ProcessImpl.java:425) Feb 10 03:21:45 at org.apache.flink.test.recovery.TaskManagerRunnerITCase.testNondeterministicWorkingDirIsDeletedInCaseOfProcessFailure(TaskManagerRunnerITCase.java:126) Feb 10 03:21:45 at java.lang.invoke.LambdaForm$DMH/0x7fbccb1b8000.invokeVirtual(java.base@21.0.1/LambdaForm$DMH) Feb 10 03:21:45 at java.lang.invoke.LambdaForm$MH/0x7fbccb1b8800.invoke(java.base@21.0.1/LambdaForm$MH) [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34418) YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots fa
[ https://issues.apache.org/jira/browse/FLINK-34418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816523#comment-17816523 ] Matthias Pohl commented on FLINK-34418: --- https://github.com/apache/flink/actions/runs/7851900601/job/21429775024 > YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots > failed due to disk space > - > > Key: FLINK-34418 > URL: https://issues.apache.org/jira/browse/FLINK-34418 > Project: Flink > Issue Type: Bug > Components: Test Infrastructure >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Critical > Labels: github-actions, test-stability > > [https://github.com/apache/flink/actions/runs/7838691874/job/21390739806#step:10:27746] > {code:java} > [...] > Feb 09 03:00:13 Caused by: java.io.IOException: No space left on device > 27608Feb 09 03:00:13 at java.io.FileOutputStream.writeBytes(Native Method) > 27609Feb 09 03:00:13 at > java.io.FileOutputStream.write(FileOutputStream.java:326) > 27610Feb 09 03:00:13 at > org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:250) > 27611Feb 09 03:00:13 ... 39 more > [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] (FLINK-26515) RetryingExecutorTest. testDiscardOnTimeout failed on azure
[ https://issues.apache.org/jira/browse/FLINK-26515 ] Matthias Pohl deleted comment on FLINK-26515: --- was (Author: mapohl): 1.18: https://github.com/apache/flink/actions/runs/7838691874/job/21390763726#step:10:10503 > RetryingExecutorTest. testDiscardOnTimeout failed on azure > -- > > Key: FLINK-26515 > URL: https://issues.apache.org/jira/browse/FLINK-26515 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.14.3, 1.17.0, 1.16.1, 1.18.0, 1.19.0 >Reporter: Yun Gao >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available, > test-stability > > {code:java} > Mar 06 01:20:29 [ERROR] Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, > Time elapsed: 1.941 s <<< FAILURE! - in > org.apache.flink.changelog.fs.RetryingExecutorTest > Mar 06 01:20:29 [ERROR] testTimeout Time elapsed: 1.934 s <<< FAILURE! > Mar 06 01:20:29 java.lang.AssertionError: expected:<500.0> but > was:<1922.869766> > Mar 06 01:20:29 at org.junit.Assert.fail(Assert.java:89) > Mar 06 01:20:29 at org.junit.Assert.failNotEquals(Assert.java:835) > Mar 06 01:20:29 at org.junit.Assert.assertEquals(Assert.java:555) > Mar 06 01:20:29 at org.junit.Assert.assertEquals(Assert.java:685) > Mar 06 01:20:29 at > org.apache.flink.changelog.fs.RetryingExecutorTest.testTimeout(RetryingExecutorTest.java:145) > Mar 06 01:20:29 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > Mar 06 01:20:29 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > Mar 06 01:20:29 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Mar 06 01:20:29 at java.lang.reflect.Method.invoke(Method.java:498) > Mar 06 01:20:29 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > Mar 06 01:20:29 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > Mar 06 01:20:29 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > Mar 06 01:20:29 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > Mar 06 01:20:29 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > Mar 06 01:20:29 at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > Mar 06 01:20:29 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > Mar 06 01:20:29 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > Mar 06 01:20:29 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > Mar 06 01:20:29 at > org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > Mar 06 01:20:29 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > Mar 06 01:20:29 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > Mar 06 01:20:29 at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > Mar 06 01:20:29 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > Mar 06 01:20:29 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > Mar 06 01:20:29 at > org.junit.runners.ParentRunner.run(ParentRunner.java:413) > Mar 06 01:20:29 at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > Mar 06 01:20:29 at org.junit.runner.JUnitCore.run(JUnitCore.java:115) > Mar 06 01:20:29 at > org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43) > Mar 06 01:20:29 at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) > Mar 06 01:20:29 at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > Mar 06 01:20:29 at > java.util.Iterator.forEachRemaining(Iterator.java:116) > Mar 06 01:20:29 at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > Mar 06 01:20:29 at > java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) > Mar 06 01:20:29 at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) > {code} > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=32569=logs=f450c1a5-64b1-5955-e215-49cb1ad5ec88=cc452273-9efa-565d-9db8-ef62a38a0c10=22554 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-26515) RetryingExecutorTest. testDiscardOnTimeout failed on azure
[ https://issues.apache.org/jira/browse/FLINK-26515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816522#comment-17816522 ] Matthias Pohl commented on FLINK-26515: --- 1.18: https://github.com/apache/flink/actions/runs/7838691874/job/21390763726#step:10:10503 > RetryingExecutorTest. testDiscardOnTimeout failed on azure > -- > > Key: FLINK-26515 > URL: https://issues.apache.org/jira/browse/FLINK-26515 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.14.3, 1.17.0, 1.16.1, 1.18.0, 1.19.0 >Reporter: Yun Gao >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available, > test-stability > > {code:java} > Mar 06 01:20:29 [ERROR] Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, > Time elapsed: 1.941 s <<< FAILURE! - in > org.apache.flink.changelog.fs.RetryingExecutorTest > Mar 06 01:20:29 [ERROR] testTimeout Time elapsed: 1.934 s <<< FAILURE! > Mar 06 01:20:29 java.lang.AssertionError: expected:<500.0> but > was:<1922.869766> > Mar 06 01:20:29 at org.junit.Assert.fail(Assert.java:89) > Mar 06 01:20:29 at org.junit.Assert.failNotEquals(Assert.java:835) > Mar 06 01:20:29 at org.junit.Assert.assertEquals(Assert.java:555) > Mar 06 01:20:29 at org.junit.Assert.assertEquals(Assert.java:685) > Mar 06 01:20:29 at > org.apache.flink.changelog.fs.RetryingExecutorTest.testTimeout(RetryingExecutorTest.java:145) > Mar 06 01:20:29 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > Mar 06 01:20:29 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > Mar 06 01:20:29 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Mar 06 01:20:29 at java.lang.reflect.Method.invoke(Method.java:498) > Mar 06 01:20:29 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > Mar 06 01:20:29 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > Mar 06 01:20:29 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > Mar 06 01:20:29 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > Mar 06 01:20:29 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > Mar 06 01:20:29 at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > Mar 06 01:20:29 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > Mar 06 01:20:29 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > Mar 06 01:20:29 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > Mar 06 01:20:29 at > org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > Mar 06 01:20:29 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > Mar 06 01:20:29 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > Mar 06 01:20:29 at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > Mar 06 01:20:29 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > Mar 06 01:20:29 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > Mar 06 01:20:29 at > org.junit.runners.ParentRunner.run(ParentRunner.java:413) > Mar 06 01:20:29 at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > Mar 06 01:20:29 at org.junit.runner.JUnitCore.run(JUnitCore.java:115) > Mar 06 01:20:29 at > org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43) > Mar 06 01:20:29 at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) > Mar 06 01:20:29 at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > Mar 06 01:20:29 at > java.util.Iterator.forEachRemaining(Iterator.java:116) > Mar 06 01:20:29 at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > Mar 06 01:20:29 at > java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) > Mar 06 01:20:29 at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) > {code} > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=32569=logs=f450c1a5-64b1-5955-e215-49cb1ad5ec88=cc452273-9efa-565d-9db8-ef62a38a0c10=22554 -- This message was sent by Atlassian Jira (v8.20.10#820010)