[ 
https://issues.apache.org/jira/browse/FLINK-31119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690314#comment-17690314
 ] 

Matthias Pohl edited comment on FLINK-31119 at 2/17/23 12:24 PM:
-----------------------------------------------------------------

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46250&view=logs&j=0e7be18f-84f2-53f0-a32d-4a5e4a174679&t=7c1d86e3-35bd-5fd5-3b7c-30c126a78702&l=8521

{code}
01:07:57,099 [    Receiver (1/6)#1] WARN  
org.apache.flink.runtime.taskmanager.Task                    [] - Receiver 
(1/6)#1 (e701d0caf3247ea7554acfb5dd8df541_cb0a5d4bcd60528ae7c4e8c99900a321_0_1) 
switched from RUNNING to FAILED with failure cause:
java.lang.NullPointerException: null
        at 
org.apache.flink.runtime.jobmaster.TestingAbstractInvokables$Receiver.invoke(TestingAbstractInvokables.java:82)
 ~[test-classes/:?]
        at 
org.apache.flink.runtime.jobmaster.JobRecoveryITCase$FailingOnceReceiver.invoke(JobRecoveryITCase.java:126)
 ~[test-classes/:?]
        at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:952)
 ~[classes/:?]
        at 
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:931) 
[classes/:?]
        at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:745) 
[classes/:?]
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) 
[classes/:?]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
{code}

This one fails with a {{NullPointerException}} in the same method 
[TestingAbstractInvokables.Receiver#invoke:71ff|https://github.com/apache/flink/blob/026675a5cb8a3704c51802fb549d6b0bc4759835/flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/TestingAbstractInvokables.java#L71].
 Essentially, the data that has been received seems to be corrupted

Update:
There was a Wrong data exception also thrown in this case. It appeared while 
cancelling the tasks which was caused by the expected 
{{FlinkRuntimeException}}. It didn't have an impact because the job was already 
transitioning into CANCELLING, I guess.


was (Author: mapohl):
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46250&view=logs&j=0e7be18f-84f2-53f0-a32d-4a5e4a174679&t=7c1d86e3-35bd-5fd5-3b7c-30c126a78702&l=8521

{code}
01:07:57,099 [    Receiver (1/6)#1] WARN  
org.apache.flink.runtime.taskmanager.Task                    [] - Receiver 
(1/6)#1 (e701d0caf3247ea7554acfb5dd8df541_cb0a5d4bcd60528ae7c4e8c99900a321_0_1) 
switched from RUNNING to FAILED with failure cause:
java.lang.NullPointerException: null
        at 
org.apache.flink.runtime.jobmaster.TestingAbstractInvokables$Receiver.invoke(TestingAbstractInvokables.java:82)
 ~[test-classes/:?]
        at 
org.apache.flink.runtime.jobmaster.JobRecoveryITCase$FailingOnceReceiver.invoke(JobRecoveryITCase.java:126)
 ~[test-classes/:?]
        at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:952)
 ~[classes/:?]
        at 
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:931) 
[classes/:?]
        at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:745) 
[classes/:?]
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) 
[classes/:?]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
{code}

This one fails with a {{NullPointerException}} in the same method 
[TestingAbstractInvokables.Receiver#invoke:71ff|https://github.com/apache/flink/blob/026675a5cb8a3704c51802fb549d6b0bc4759835/flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/TestingAbstractInvokables.java#L71].
 Essentially, the data that has been received seems to be corrupted

> JobRecoveryITCase.testTaskFailureRecovery failed due to the job not finishing 
> successfully
> ------------------------------------------------------------------------------------------
>
>                 Key: FLINK-31119
>                 URL: https://issues.apache.org/jira/browse/FLINK-31119
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.17.0
>            Reporter: Matthias Pohl
>            Priority: Blocker
>              Labels: test-stability
>         Attachments: FLINK-31119.20230217.1.log, FLINK-31119.20230217.4.log
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46247&view=logs&j=0e7be18f-84f2-53f0-a32d-4a5e4a174679&t=7c1d86e3-35bd-5fd5-3b7c-30c126a78702&l=8523
> {code}
> Feb 17 02:24:35 [ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, 
> Time elapsed: 24.074 s <<< FAILURE! - in 
> org.apache.flink.runtime.jobmaster.JobRecoveryITCase
> Feb 17 02:24:35 [ERROR] 
> org.apache.flink.runtime.jobmaster.JobRecoveryITCase.testTaskFailureRecovery  
> Time elapsed: 20.981 s  <<< FAILURE!
> Feb 17 02:24:35 java.lang.AssertionError: 
> Feb 17 02:24:35 
> Feb 17 02:24:35 Expected: is <true>
> Feb 17 02:24:35      but: was <false>
> Feb 17 02:24:35       at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> Feb 17 02:24:35       at org.junit.Assert.assertThat(Assert.java:964)
> Feb 17 02:24:35       at org.junit.Assert.assertThat(Assert.java:930)
> Feb 17 02:24:35       at 
> org.apache.flink.runtime.jobmaster.JobRecoveryITCase.runTaskFailureRecoveryTest(JobRecoveryITCase.java:79)
> Feb 17 02:24:35       at 
> org.apache.flink.runtime.jobmaster.JobRecoveryITCase.testTaskFailureRecovery(JobRecoveryITCase.java:63)
> Feb 17 02:24:35       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> [...]
> {code}
> The actual cause is that unexpected data was received:
> {code}
> 02:24:35,301 [    Receiver (5/5)#1] WARN  
> org.apache.flink.runtime.taskmanager.Task                    [] - Receiver 
> (5/5)#1 
> (d88e16a5e3c6f2c08cf3924d93ea18e2_28065fbb1d26fe99e018d3b846860dd3_4_1) 
> switched from RUNNING to FAILED with failure cause:
> java.lang.Exception: Wrong data received.
>         at 
> org.apache.flink.runtime.jobmaster.TestingAbstractInvokables$Receiver.invoke(TestingAbstractInvokables.java:83)
>  ~[test-classes/:?]
>         at 
> org.apache.flink.runtime.jobmaster.JobRecoveryITCase$FailingOnceReceiver.invoke(JobRecoveryITCase.java:126)
>  ~[test-classes/:?]
>         at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:952)
>  ~[classes/:?]
>         at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:931) 
> [classes/:?]
>         at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:745) 
> [classes/:?]
>         at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) 
> [classes/:?]
>         at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to