[ https://issues.apache.org/jira/browse/FLINK-31119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690314#comment-17690314 ]
Matthias Pohl edited comment on FLINK-31119 at 2/17/23 12:24 PM: ----------------------------------------------------------------- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46250&view=logs&j=0e7be18f-84f2-53f0-a32d-4a5e4a174679&t=7c1d86e3-35bd-5fd5-3b7c-30c126a78702&l=8521 {code} 01:07:57,099 [ Receiver (1/6)#1] WARN org.apache.flink.runtime.taskmanager.Task [] - Receiver (1/6)#1 (e701d0caf3247ea7554acfb5dd8df541_cb0a5d4bcd60528ae7c4e8c99900a321_0_1) switched from RUNNING to FAILED with failure cause: java.lang.NullPointerException: null at org.apache.flink.runtime.jobmaster.TestingAbstractInvokables$Receiver.invoke(TestingAbstractInvokables.java:82) ~[test-classes/:?] at org.apache.flink.runtime.jobmaster.JobRecoveryITCase$FailingOnceReceiver.invoke(JobRecoveryITCase.java:126) ~[test-classes/:?] at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:952) ~[classes/:?] at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:931) [classes/:?] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:745) [classes/:?] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) [classes/:?] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292] {code} This one fails with a {{NullPointerException}} in the same method [TestingAbstractInvokables.Receiver#invoke:71ff|https://github.com/apache/flink/blob/026675a5cb8a3704c51802fb549d6b0bc4759835/flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/TestingAbstractInvokables.java#L71]. Essentially, the data that has been received seems to be corrupted Update: There was a Wrong data exception also thrown in this case. It appeared while cancelling the tasks which was caused by the expected {{FlinkRuntimeException}}. It didn't have an impact because the job was already transitioning into CANCELLING, I guess. was (Author: mapohl): https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46250&view=logs&j=0e7be18f-84f2-53f0-a32d-4a5e4a174679&t=7c1d86e3-35bd-5fd5-3b7c-30c126a78702&l=8521 {code} 01:07:57,099 [ Receiver (1/6)#1] WARN org.apache.flink.runtime.taskmanager.Task [] - Receiver (1/6)#1 (e701d0caf3247ea7554acfb5dd8df541_cb0a5d4bcd60528ae7c4e8c99900a321_0_1) switched from RUNNING to FAILED with failure cause: java.lang.NullPointerException: null at org.apache.flink.runtime.jobmaster.TestingAbstractInvokables$Receiver.invoke(TestingAbstractInvokables.java:82) ~[test-classes/:?] at org.apache.flink.runtime.jobmaster.JobRecoveryITCase$FailingOnceReceiver.invoke(JobRecoveryITCase.java:126) ~[test-classes/:?] at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:952) ~[classes/:?] at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:931) [classes/:?] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:745) [classes/:?] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) [classes/:?] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292] {code} This one fails with a {{NullPointerException}} in the same method [TestingAbstractInvokables.Receiver#invoke:71ff|https://github.com/apache/flink/blob/026675a5cb8a3704c51802fb549d6b0bc4759835/flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/TestingAbstractInvokables.java#L71]. Essentially, the data that has been received seems to be corrupted > JobRecoveryITCase.testTaskFailureRecovery failed due to the job not finishing > successfully > ------------------------------------------------------------------------------------------ > > Key: FLINK-31119 > URL: https://issues.apache.org/jira/browse/FLINK-31119 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.17.0 > Reporter: Matthias Pohl > Priority: Blocker > Labels: test-stability > Attachments: FLINK-31119.20230217.1.log, FLINK-31119.20230217.4.log > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46247&view=logs&j=0e7be18f-84f2-53f0-a32d-4a5e4a174679&t=7c1d86e3-35bd-5fd5-3b7c-30c126a78702&l=8523 > {code} > Feb 17 02:24:35 [ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, > Time elapsed: 24.074 s <<< FAILURE! - in > org.apache.flink.runtime.jobmaster.JobRecoveryITCase > Feb 17 02:24:35 [ERROR] > org.apache.flink.runtime.jobmaster.JobRecoveryITCase.testTaskFailureRecovery > Time elapsed: 20.981 s <<< FAILURE! > Feb 17 02:24:35 java.lang.AssertionError: > Feb 17 02:24:35 > Feb 17 02:24:35 Expected: is <true> > Feb 17 02:24:35 but: was <false> > Feb 17 02:24:35 at > org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > Feb 17 02:24:35 at org.junit.Assert.assertThat(Assert.java:964) > Feb 17 02:24:35 at org.junit.Assert.assertThat(Assert.java:930) > Feb 17 02:24:35 at > org.apache.flink.runtime.jobmaster.JobRecoveryITCase.runTaskFailureRecoveryTest(JobRecoveryITCase.java:79) > Feb 17 02:24:35 at > org.apache.flink.runtime.jobmaster.JobRecoveryITCase.testTaskFailureRecovery(JobRecoveryITCase.java:63) > Feb 17 02:24:35 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > [...] > {code} > The actual cause is that unexpected data was received: > {code} > 02:24:35,301 [ Receiver (5/5)#1] WARN > org.apache.flink.runtime.taskmanager.Task [] - Receiver > (5/5)#1 > (d88e16a5e3c6f2c08cf3924d93ea18e2_28065fbb1d26fe99e018d3b846860dd3_4_1) > switched from RUNNING to FAILED with failure cause: > java.lang.Exception: Wrong data received. > at > org.apache.flink.runtime.jobmaster.TestingAbstractInvokables$Receiver.invoke(TestingAbstractInvokables.java:83) > ~[test-classes/:?] > at > org.apache.flink.runtime.jobmaster.JobRecoveryITCase$FailingOnceReceiver.invoke(JobRecoveryITCase.java:126) > ~[test-classes/:?] > at > org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:952) > ~[classes/:?] > at > org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:931) > [classes/:?] > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:745) > [classes/:?] > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) > [classes/:?] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)