[ 
https://issues.apache.org/jira/browse/FLINK-20816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317847#comment-17317847
 ] 

Arvid Heise commented on FLINK-20816:
-------------------------------------

That is actually be design of the test
{noformat}
            if (context.getCheckpointId() == DECLINE_CHECKPOINT_ID) {
                DeclineSink.waitLatch.await();
            }
{noformat}
DeclineSink is not supposed to complete it until the abortion of the first 
checkpoint is verified.

However, there is no log statement that indicate that, there is an abortion 
call happening at all.
In the attached success.log we have

{noformat}
21:04:11,624 [Source: NormalSource (1/1)#0] DEBUG 
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl [] - 
Notification of aborted checkpoint 1 for task Source: NormalSource (1/1)#0
21:04:11,624 [ DeclineSink (1/1)#0] DEBUG 
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl [] - 
Notification of aborted checkpoint 1 for task DeclineSink (1/1)#0
21:04:11,625 [   NormalMap (1/1)#0] DEBUG 
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl [] - 
Notification of aborted checkpoint 1 for task NormalMap (1/1)#0
21:04:11,739 [Source: NormalSource (1/1)#0] DEBUG 
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl [] - 
Notification of aborted checkpoint 2 for task Source: NormalSource (1/1)#0
21:04:11,739 [   NormalMap (1/1)#0] DEBUG 
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl [] - 
Notification of aborted checkpoint 2 for task NormalMap (1/1)#0
21:04:11,739 [ DeclineSink (1/1)#0] DEBUG 
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl [] - 
Notification of aborted checkpoint 2 for task DeclineSink (1/1)#0
{noformat}

while in failure.log, I can only find

{noformat}
21:04:19,260 [Source: NormalSource (1/1)#0] DEBUG 
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl [] - 
Notification of aborted checkpoint 1 for task Source: NormalSource (1/1)#0
21:04:19,268 [   NormalMap (1/1)#0] DEBUG 
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl [] - 
Notification of aborted checkpoint 1 for task NormalMap (1/1)#0
21:05:58,297 [Source: NormalSource (1/1)#0] DEBUG 
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl [] - 
Notification of aborted checkpoint 2 for task Source: NormalSource (1/1)#0
21:05:58,297 [   NormalMap (1/1)#0] DEBUG 
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl [] - 
Notification of aborted checkpoint 2 for task NormalMap (1/1)#0
{noformat}

It might be some race condition, where the {{SubtaskCheckpointCoordinatorImpl}} 
does not know that the {{DeclineSink}} is already running.

{noformat}
21:04:19,037 [flink-akka.actor.default-dispatcher-2] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - DeclineSink 
(1/1) (7457bf515844f409738c9929fffc54f7) switched from DEPLOYING to RUNNING.
{noformat}



> NotifyCheckpointAbortedITCase failed due to timeout
> ---------------------------------------------------
>
>                 Key: FLINK-20816
>                 URL: https://issues.apache.org/jira/browse/FLINK-20816
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.12.2, 1.13.0
>            Reporter: Matthias
>            Assignee: Arvid Heise
>            Priority: Critical
>              Labels: test-stability
>             Fix For: 1.13.0
>
>         Attachments: flink-20816-failure.log, flink-20816-success.log
>
>
> [This 
> build|https://dev.azure.com/mapohl/flink/_build/results?buildId=152&view=logs&j=0a15d512-44ac-5ba5-97ab-13a5d066c22c&t=634cd701-c189-5dff-24cb-606ed884db87&l=4245]
>  failed caused by a failing of {{NotifyCheckpointAbortedITCase}} due to a 
> timeout.
> {code}
> 2020-12-29T21:48:40.9430511Z [INFO] Running 
> org.apache.flink.test.checkpointing.NotifyCheckpointAbortedITCase
> 2020-12-29T21:50:28.0087043Z [ERROR] Tests run: 2, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 107.062 s <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.NotifyCheckpointAbortedITCase
> 2020-12-29T21:50:28.0087961Z [ERROR] 
> testNotifyCheckpointAborted[unalignedCheckpointEnabled 
> =true](org.apache.flink.test.checkpointing.NotifyCheckpointAbortedITCase)  
> Time elapsed: 104.044 s  <<< ERROR!
> 2020-12-29T21:50:28.0088619Z org.junit.runners.model.TestTimedOutException: 
> test timed out after 100000 milliseconds
> 2020-12-29T21:50:28.0088972Z  at java.lang.Object.wait(Native Method)
> 2020-12-29T21:50:28.0089267Z  at java.lang.Object.wait(Object.java:502)
> 2020-12-29T21:50:28.0089633Z  at 
> org.apache.flink.core.testutils.OneShotLatch.await(OneShotLatch.java:61)
> 2020-12-29T21:50:28.0090458Z  at 
> org.apache.flink.test.checkpointing.NotifyCheckpointAbortedITCase.verifyAllOperatorsNotifyAborted(NotifyCheckpointAbortedITCase.java:200)
> 2020-12-29T21:50:28.0091313Z  at 
> org.apache.flink.test.checkpointing.NotifyCheckpointAbortedITCase.testNotifyCheckpointAborted(NotifyCheckpointAbortedITCase.java:183)
> 2020-12-29T21:50:28.0091819Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-12-29T21:50:28.0092199Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-12-29T21:50:28.0092675Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-12-29T21:50:28.0093095Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-12-29T21:50:28.0093495Z  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 2020-12-29T21:50:28.0093980Z  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2020-12-29T21:50:28.0094444Z  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 2020-12-29T21:50:28.0094917Z  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2020-12-29T21:50:28.0095663Z  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> 2020-12-29T21:50:28.0096221Z  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> 2020-12-29T21:50:28.0096675Z  at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 2020-12-29T21:50:28.0097022Z  at java.lang.Thread.run(Thread.java:748)
> {code}
> The branch contained changes from FLINK-20594 and FLINK-20595. These issues 
> remove code that is not used anymore and should have had only affects on unit 
> tests. [The previous 
> build|https://dev.azure.com/mapohl/flink/_build/results?buildId=151&view=results]
>  containing all the changes accept for 
> [9c57c37|https://github.com/XComp/flink/commit/9c57c37c50733a1f592a4fc5e492b22be80d8279]
>  passed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to