[GitHub] flink pull request #2964: [backport] [FLINK-5285] Abort checkpoint only once...

2016-12-08 Thread tillrohrmann
GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/2964

[backport] [FLINK-5285] Abort checkpoint only once in BarrierTracker

Backport of #2963 for the release-1.1 branch.

Prevent an interleaved sequence of cancellation markers for two consecutive 
checkpoints
to trigger a flood of cancellation markers for down stream operators. This 
is done by
aborting each checkpoint only once and don't re-create checkpoint barrier 
counts for already
aborted checkpoints.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink 
backportFixCheckpointBarrierCancellation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2964.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2964


commit debc59177c6e7a32266a12775d3379a36d23f7f6
Author: Till Rohrmann 
Date:   2016-12-07T18:05:47Z

[FLINK-5285] Abort checkpoint only once in BarrierTracker

Prevent an interleaved sequence of cancellation markers for two consecutive 
checkpoints
to trigger a flood of cancellation markers for down stream operators. This 
is done by
aborting each checkpoint only once and don't re-create checkpoint barrier 
counts for already
aborted checkpoints.

Add test case




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2964: [backport] [FLINK-5285] Abort checkpoint only once...

2016-12-08 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/2964#discussion_r91547796
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/io/BarrierTracker.java
 ---
@@ -225,17 +230,19 @@ private void 
processCheckpointAbortBarrier(CancelCheckpointMarker barrier, int c
pendingCheckpoints.removeFirst();
}
}
-   else {
+   else if (checkpointId > latestPendingCheckpointID) {
notifyAbort(checkpointId);
 
-   // first barrier for this checkpoint - remember it as 
aborted
-   // since we polled away all entries with lower 
checkpoint IDs
-   // this entry will become the new first entry
-   if (pendingCheckpoints.size() < 
MAX_CHECKPOINTS_TO_TRACK) {
-   CheckpointBarrierCount abortedMarker = new 
CheckpointBarrierCount(checkpointId);
-   abortedMarker.markAborted();
-   pendingCheckpoints.addFirst(abortedMarker);
-   }
+   latestPendingCheckpointID = checkpointId;
+
+   CheckpointBarrierCount abortedMarker = new 
CheckpointBarrierCount(checkpointId);
+   abortedMarker.markAborted();
+   pendingCheckpoints.addLast(abortedMarker);
--- End diff --

Small comment here: I would
  - either keep the `addFirst()` statement here (we can be sure that is 
true, given that we pulled out all older checkpoints)
  - or add a sanity check that `pendingCheckpoints` is empty at that point.

That way we explicitly guard the assumption that `pendingCheckpoints` 
contains entries on ordered sequence (which is currently only implicitly 
guarded by the `checkpointId > latestPendingCheckpointID` condition.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2964: [backport] [FLINK-5285] Abort checkpoint only once...

2016-12-08 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2964#discussion_r91556985
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/io/BarrierTracker.java
 ---
@@ -225,17 +230,19 @@ private void 
processCheckpointAbortBarrier(CancelCheckpointMarker barrier, int c
pendingCheckpoints.removeFirst();
}
}
-   else {
+   else if (checkpointId > latestPendingCheckpointID) {
notifyAbort(checkpointId);
 
-   // first barrier for this checkpoint - remember it as 
aborted
-   // since we polled away all entries with lower 
checkpoint IDs
-   // this entry will become the new first entry
-   if (pendingCheckpoints.size() < 
MAX_CHECKPOINTS_TO_TRACK) {
-   CheckpointBarrierCount abortedMarker = new 
CheckpointBarrierCount(checkpointId);
-   abortedMarker.markAborted();
-   pendingCheckpoints.addFirst(abortedMarker);
-   }
+   latestPendingCheckpointID = checkpointId;
+
+   CheckpointBarrierCount abortedMarker = new 
CheckpointBarrierCount(checkpointId);
+   abortedMarker.markAborted();
+   pendingCheckpoints.addLast(abortedMarker);
--- End diff --

True, will add `addFirst` again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2964: [backport] [FLINK-5285] Abort checkpoint only once...

2016-12-09 Thread tillrohrmann
Github user tillrohrmann closed the pull request at:

https://github.com/apache/flink/pull/2964


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---