[GitHub] [flink] zentol commented on a change in pull request #8820: [FLINK-12916][tests] Retry cancelWithSavepoint on cancellation barrier in AbstractOperatorRestoreTestBase

GitBox Mon, 15 Jul 2019 06:06:00 -0700

zentol commented on a change in pull request #8820: [FLINK-12916][tests] Retry 
cancelWithSavepoint on cancellation barrier in AbstractOperatorRestoreTestBase
URL: https://github.com/apache/flink/pull/8820#discussion_r303423389


 ##########
 File path: 
flink-tests/src/test/java/org/apache/flink/test/state/operator/restore/AbstractOperatorRestoreTestBase.java
 ##########
 @@ -66,12 +72,16 @@
        private static final int NUM_TMS = 1;
        private static final int NUM_SLOTS_PER_TM = 4;
        private static final Duration TEST_TIMEOUT = Duration.ofSeconds(10000L);
-       private static final Pattern 
PATTERN_CANCEL_WITH_SAVEPOINT_TOLERATED_EXCEPTIONS = Pattern
-               .compile(
-                       "(was not running)" +
-                               "|(Not all required tasks are currently 
running)" +
-                               "|(Checkpoint was declined \\(tasks not 
ready\\))"
-               );
+
+       private static final Pattern 
PATTERN_CANCEL_WITH_SAVEPOINT_TOLERATED_EXCEPTIONS = Pattern.compile(
+               Stream.of(
+                       TRIGGER_SAVEPOINT_FAILURE.message(),
+                       NOT_ALL_REQUIRED_TASKS_RUNNING.message(),
+                       CHECKPOINT_DECLINED_TASK_NOT_READY.message(),
+                       // If task already in state RUNNING while stream task 
not running, stream task would then broadcast barrier.
+                       CHECKPOINT_DECLINED_ON_CANCELLATION_BARRIER.message())
 
 Review comment:
   Here's the thing, this case shouldn't be possible in the first place. For a 
cancel-with-savepoint, we
   * disable the checkpoint coordinator
   * trigger a savepoint, and 
   * once the savepoint completes (successfully!), cancel all tasks.
   
   Given that we only cancel tasks if the SP has completed, and the SP can only 
complete if all tasks are running, I don't see how we can ever be in a 
situation where we try to cancel yet not all tasks being in a running state.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [flink] zentol commented on a change in pull request #8820: [FLINK-12916][tests] Retry cancelWithSavepoint on cancellation barrier in AbstractOperatorRestoreTestBase

Reply via email to