[ https://issues.apache.org/jira/browse/FLINK-5085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15674179#comment-15674179 ]
ASF GitHub Bot commented on FLINK-5085: --------------------------------------- GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/2826 [FLINK-5085] Execute CheckpointCoordinator's state discard calls asynchronously This PR is a back port of #2825 for the release 1.1 branch. It is based on #2816. Thus only a70097d is relevant. The `CheckpointCoordinator` is now given an `Executor` which is used to execute the state discard calls asynchronously. This will prevent blocking operations to be executed from within the calling thread. The provided `Executor` is the same executor as the one used for the cleanup in the `ZooKeeperStateHandleStore`. The executors are now gracefully shutdown after the `JobManager` has terminated. If the executors don't shut down in the given time (akka ask timeout), then the executors are shut down hard. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink backportMakeCheckpointCoordinatorNotBlocking Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2826.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2826 ---- commit 357690b359a2890ec1842a20d345675b79d61cd1 Author: Till Rohrmann <trohrm...@apache.org> Date: 2016-11-15T21:45:04Z [FLINK-5073] Use Executor to run ZooKeeper callbacks in ZooKeeperStateHandleStore Use dedicated Executor to run ZooKeeper callbacks in ZooKeeperStateHandleStore instead of running it in the ZooKeeper client's thread. The callback can be blocking because it discards state which might entail deleting files from disk. Add TestExecutors commit 640bfef9a176d57fa70d8ac21b8675897fae11ec Author: Till Rohrmann <trohrm...@apache.org> Date: 2016-11-16T17:33:54Z [FLINK-5082] Pull ExecutorService lifecycle management out of the JobManager The provided ExecutorService will no longer be closed by the JobManager. Instead the lifecycle is managed outside of it where it was created. This will give a nicer behaviour, because it better seperates responsibilities. commit 9de05526e49158a5bde1342afe602f358cae993f Author: Till Rohrmann <trohrm...@apache.org> Date: 2016-11-16T17:51:05Z Introduce dedicated Executor for blocking io operations commit a70097d4ac619f9203604f6991d293a7b0f55b54 Author: Till Rohrmann <trohrm...@apache.org> Date: 2016-11-17T14:39:11Z [FLINK-5085] Execute CheckpointCoordinator's state discard calls asynchronously The CheckpointCoordinator is now given an Executor which is used to execute the state discard calls asynchronously. This will prevent blocking operations to be executed from within the calling thread. Shut down ExecutorServices gracefully ---- > Execute CheckpointCoodinator's state discard calls asynchronously > ----------------------------------------------------------------- > > Key: FLINK-5085 > URL: https://issues.apache.org/jira/browse/FLINK-5085 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing > Affects Versions: 1.2.0, 1.1.3 > Reporter: Till Rohrmann > Assignee: Till Rohrmann > Fix For: 1.2.0, 1.1.4 > > > The {{CheckpointCoordinator}} discards under certain circumstances pending > checkpoints or state handles. These discard operations can involve a blocking > IO operation if the underlying state handle refers to a file which has to be > deleted. In order to not block the calling thread, we should execute these > calls in a dedicated IO executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)