[jira] [Commented] (FLINK-5085) Execute CheckpointCoodinator's state discard calls asynchronously

2016-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15688042#comment-15688042
 ] 

ASF GitHub Bot commented on FLINK-5085:
---

Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/2825
  
Build passed locally 
https://travis-ci.org/tillrohrmann/flink/builds/178011045. Merging this PR.


> Execute CheckpointCoodinator's state discard calls asynchronously
> -
>
> Key: FLINK-5085
> URL: https://issues.apache.org/jira/browse/FLINK-5085
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0, 1.1.3
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
> Fix For: 1.2.0, 1.1.4
>
>
> The {{CheckpointCoordinator}} discards under certain circumstances pending 
> checkpoints or state handles. These discard operations can involve a blocking 
> IO operation if the underlying state handle refers to a file which has to be 
> deleted. In order to not block the calling thread, we should execute these 
> calls in a dedicated IO executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5085) Execute CheckpointCoodinator's state discard calls asynchronously

2016-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15687085#comment-15687085
 ] 

ASF GitHub Bot commented on FLINK-5085:
---

Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/2825
  
Rebasing the PR.


> Execute CheckpointCoodinator's state discard calls asynchronously
> -
>
> Key: FLINK-5085
> URL: https://issues.apache.org/jira/browse/FLINK-5085
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0, 1.1.3
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
> Fix For: 1.2.0, 1.1.4
>
>
> The {{CheckpointCoordinator}} discards under certain circumstances pending 
> checkpoints or state handles. These discard operations can involve a blocking 
> IO operation if the underlying state handle refers to a file which has to be 
> deleted. In order to not block the calling thread, we should execute these 
> calls in a dedicated IO executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5085) Execute CheckpointCoodinator's state discard calls asynchronously

2016-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15686312#comment-15686312
 ] 

ASF GitHub Bot commented on FLINK-5085:
---

Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/2825
  
@StefanRRichter reviewed the backport #2826 of this PR which simply uses a 
different state discarding method and gave a +1. Since Travis passes as well, 
I'll merge the PR.


> Execute CheckpointCoodinator's state discard calls asynchronously
> -
>
> Key: FLINK-5085
> URL: https://issues.apache.org/jira/browse/FLINK-5085
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0, 1.1.3
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
> Fix For: 1.2.0, 1.1.4
>
>
> The {{CheckpointCoordinator}} discards under certain circumstances pending 
> checkpoints or state handles. These discard operations can involve a blocking 
> IO operation if the underlying state handle refers to a file which has to be 
> deleted. In order to not block the calling thread, we should execute these 
> calls in a dedicated IO executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5085) Execute CheckpointCoodinator's state discard calls asynchronously

2016-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15686105#comment-15686105
 ] 

ASF GitHub Bot commented on FLINK-5085:
---

Github user tillrohrmann closed the pull request at:

https://github.com/apache/flink/pull/2826


> Execute CheckpointCoodinator's state discard calls asynchronously
> -
>
> Key: FLINK-5085
> URL: https://issues.apache.org/jira/browse/FLINK-5085
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0, 1.1.3
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
> Fix For: 1.2.0, 1.1.4
>
>
> The {{CheckpointCoordinator}} discards under certain circumstances pending 
> checkpoints or state handles. These discard operations can involve a blocking 
> IO operation if the underlying state handle refers to a file which has to be 
> deleted. In order to not block the calling thread, we should execute these 
> calls in a dedicated IO executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5085) Execute CheckpointCoodinator's state discard calls asynchronously

2016-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15686097#comment-15686097
 ] 

ASF GitHub Bot commented on FLINK-5085:
---

Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/2826
  
Build passed locally: 
https://travis-ci.org/tillrohrmann/flink/builds/177725509. Merging the PR.


> Execute CheckpointCoodinator's state discard calls asynchronously
> -
>
> Key: FLINK-5085
> URL: https://issues.apache.org/jira/browse/FLINK-5085
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0, 1.1.3
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
> Fix For: 1.2.0, 1.1.4
>
>
> The {{CheckpointCoordinator}} discards under certain circumstances pending 
> checkpoints or state handles. These discard operations can involve a blocking 
> IO operation if the underlying state handle refers to a file which has to be 
> deleted. In order to not block the calling thread, we should execute these 
> calls in a dedicated IO executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5085) Execute CheckpointCoodinator's state discard calls asynchronously

2016-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15684180#comment-15684180
 ] 

ASF GitHub Bot commented on FLINK-5085:
---

Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/2826
  
Thanks for the review @StefanRRichter. Once Travis gives green light, I'll 
merge the PR.


> Execute CheckpointCoodinator's state discard calls asynchronously
> -
>
> Key: FLINK-5085
> URL: https://issues.apache.org/jira/browse/FLINK-5085
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0, 1.1.3
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
> Fix For: 1.2.0, 1.1.4
>
>
> The {{CheckpointCoordinator}} discards under certain circumstances pending 
> checkpoints or state handles. These discard operations can involve a blocking 
> IO operation if the underlying state handle refers to a file which has to be 
> deleted. In order to not block the calling thread, we should execute these 
> calls in a dedicated IO executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5085) Execute CheckpointCoodinator's state discard calls asynchronously

2016-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15684170#comment-15684170
 ] 

ASF GitHub Bot commented on FLINK-5085:
---

Github user StefanRRichter commented on the issue:

https://github.com/apache/flink/pull/2826
  
+1 LGTM


> Execute CheckpointCoodinator's state discard calls asynchronously
> -
>
> Key: FLINK-5085
> URL: https://issues.apache.org/jira/browse/FLINK-5085
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0, 1.1.3
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
> Fix For: 1.2.0, 1.1.4
>
>
> The {{CheckpointCoordinator}} discards under certain circumstances pending 
> checkpoints or state handles. These discard operations can involve a blocking 
> IO operation if the underlying state handle refers to a file which has to be 
> deleted. In order to not block the calling thread, we should execute these 
> calls in a dedicated IO executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5085) Execute CheckpointCoodinator's state discard calls asynchronously

2016-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15683927#comment-15683927
 ] 

ASF GitHub Bot commented on FLINK-5085:
---

Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/2826
  
Review @StefanRRichter 


> Execute CheckpointCoodinator's state discard calls asynchronously
> -
>
> Key: FLINK-5085
> URL: https://issues.apache.org/jira/browse/FLINK-5085
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0, 1.1.3
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
> Fix For: 1.2.0, 1.1.4
>
>
> The {{CheckpointCoordinator}} discards under certain circumstances pending 
> checkpoints or state handles. These discard operations can involve a blocking 
> IO operation if the underlying state handle refers to a file which has to be 
> deleted. In order to not block the calling thread, we should execute these 
> calls in a dedicated IO executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5085) Execute CheckpointCoodinator's state discard calls asynchronously

2016-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15683779#comment-15683779
 ] 

ASF GitHub Bot commented on FLINK-5085:
---

Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/2826
  
Rebased the PR on the latest release-1.1 branch. 

Review @uce, @StephanEwen if you have time.


> Execute CheckpointCoodinator's state discard calls asynchronously
> -
>
> Key: FLINK-5085
> URL: https://issues.apache.org/jira/browse/FLINK-5085
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0, 1.1.3
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
> Fix For: 1.2.0, 1.1.4
>
>
> The {{CheckpointCoordinator}} discards under certain circumstances pending 
> checkpoints or state handles. These discard operations can involve a blocking 
> IO operation if the underlying state handle refers to a file which has to be 
> deleted. In order to not block the calling thread, we should execute these 
> calls in a dedicated IO executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5085) Execute CheckpointCoodinator's state discard calls asynchronously

2016-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674179#comment-15674179
 ] 

ASF GitHub Bot commented on FLINK-5085:
---

GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/2826

[FLINK-5085] Execute CheckpointCoordinator's state discard calls 
asynchronously

This PR is a back port of #2825 for the release 1.1 branch. It is based on 
#2816. Thus only a70097d is relevant.

The `CheckpointCoordinator` is now given an `Executor` which is used to 
execute the state discard
calls asynchronously. This will prevent blocking operations to be executed 
from within the
calling thread. The provided `Executor` is the same executor as the one 
used for the cleanup in the `ZooKeeperStateHandleStore`.

The executors are now gracefully shutdown after the `JobManager` has 
terminated. If the executors don't shut down in the given time (akka ask 
timeout), then the executors are shut down hard.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink 
backportMakeCheckpointCoordinatorNotBlocking

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2826.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2826


commit 357690b359a2890ec1842a20d345675b79d61cd1
Author: Till Rohrmann 
Date:   2016-11-15T21:45:04Z

[FLINK-5073] Use Executor to run ZooKeeper callbacks in 
ZooKeeperStateHandleStore

Use dedicated Executor to run ZooKeeper callbacks in 
ZooKeeperStateHandleStore instead
of running it in the ZooKeeper client's thread. The callback can be 
blocking because it
discards state which might entail deleting files from disk.

Add TestExecutors

commit 640bfef9a176d57fa70d8ac21b8675897fae11ec
Author: Till Rohrmann 
Date:   2016-11-16T17:33:54Z

[FLINK-5082] Pull ExecutorService lifecycle management out of the JobManager

The provided ExecutorService will no longer be closed by the JobManager. 
Instead the
lifecycle is managed outside of it where it was created. This will give a 
nicer behaviour,
because it better seperates responsibilities.

commit 9de05526e49158a5bde1342afe602f358cae993f
Author: Till Rohrmann 
Date:   2016-11-16T17:51:05Z

Introduce dedicated Executor for blocking io operations

commit a70097d4ac619f9203604f6991d293a7b0f55b54
Author: Till Rohrmann 
Date:   2016-11-17T14:39:11Z

[FLINK-5085] Execute CheckpointCoordinator's state discard calls 
asynchronously

The CheckpointCoordinator is now given an Executor which is used to execute 
the state discard
calls asynchronously. This will prevent blocking operations to be executed 
from within the
calling thread.

Shut down ExecutorServices gracefully




> Execute CheckpointCoodinator's state discard calls asynchronously
> -
>
> Key: FLINK-5085
> URL: https://issues.apache.org/jira/browse/FLINK-5085
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0, 1.1.3
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
> Fix For: 1.2.0, 1.1.4
>
>
> The {{CheckpointCoordinator}} discards under certain circumstances pending 
> checkpoints or state handles. These discard operations can involve a blocking 
> IO operation if the underlying state handle refers to a file which has to be 
> deleted. In order to not block the calling thread, we should execute these 
> calls in a dedicated IO executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5085) Execute CheckpointCoodinator's state discard calls asynchronously

2016-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674176#comment-15674176
 ] 

ASF GitHub Bot commented on FLINK-5085:
---

GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/2825

[FLINK-5085] Execute CheckpointCoordinator's state discard calls 
asynchronously

This PR is based on #2820 and #2815. Only the commit 77f618a is relevant.

The `CheckpointCoordinator` is now given an `Executor` which is used to 
execute the state discard
calls asynchronously. This will prevent blocking operations to be executed 
from within the
calling thread. The provided `Executor` is the same executor as the one 
used for the cleanup in the `ZooKeeperStateHandleStore`.

The executors are now gracefully shutdown after the `JobManager` has 
terminated. If the executors don't shut down in the given time (akka ask 
timeout), then the executors are shut down hard.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink 
makeCheckpointCoordinatorNotBlocking

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2825.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2825


commit 50838531f305fb92b927ca51aaf4a635e0a07499
Author: Till Rohrmann 
Date:   2016-11-15T21:45:04Z

[FLINK-5073] Use Executor to run ZooKeeper callbacks in 
ZooKeeperStateHandleStore

Use dedicated Executor to run ZooKeeper callbacks in 
ZooKeeperStateHandleStore instead
of running it in the ZooKeeper client's thread. The callback can be 
blocking because it
discards state which might entail deleting files from disk.

commit 00d0722da276251a836b4417a249123c5d7b3947
Author: Till Rohrmann 
Date:   2016-11-16T17:33:54Z

[FLINK-5082] Pull ExecutorService lifecycle management out of the JobManager

The provided ExecutorService will no longer be closed by the JobManager. 
Instead the
lifecycle is managed outside of it where it was created. This will give a 
nicer behaviour,
because it better seperates responsibilities.

commit 6384b9b2cc3a327fc9638bfa2ac6a6a652a14f3c
Author: Till Rohrmann 
Date:   2016-11-16T17:51:05Z

Introduce dedicated Executor for blocking io operations

commit 77f618a57bcb45ec710cab6081a070fb02658482
Author: Till Rohrmann 
Date:   2016-11-17T14:39:11Z

[FLINK-5085] Execute CheckpointCoordinator's state discard calls 
asynchronously

The CheckpointCoordinator is now given an Executor which is used to execute 
the state discard
calls asynchronously. This will prevent blocking operations to be executed 
from within the
calling thread.

Shut down ExecutorServices gracefully




> Execute CheckpointCoodinator's state discard calls asynchronously
> -
>
> Key: FLINK-5085
> URL: https://issues.apache.org/jira/browse/FLINK-5085
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0, 1.1.3
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
> Fix For: 1.2.0, 1.1.4
>
>
> The {{CheckpointCoordinator}} discards under certain circumstances pending 
> checkpoints or state handles. These discard operations can involve a blocking 
> IO operation if the underlying state handle refers to a file which has to be 
> deleted. In order to not block the calling thread, we should execute these 
> calls in a dedicated IO executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5085) Execute CheckpointCoodinator's state discard calls asynchronously

2016-11-16 Thread Xiaogang Shi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15673026#comment-15673026
 ] 

Xiaogang Shi commented on FLINK-5085:
-

Great, this is what i thought of in recent days. Our states are composed of 
thousands of files on HDFS. It takes a long time to delete them in sequence. A 
dedicated executor will help improve the performance.



> Execute CheckpointCoodinator's state discard calls asynchronously
> -
>
> Key: FLINK-5085
> URL: https://issues.apache.org/jira/browse/FLINK-5085
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0, 1.1.3
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
> Fix For: 1.2.0, 1.1.4
>
>
> The {{CheckpointCoordinator}} discards under certain circumstances pending 
> checkpoints or state handles. These discard operations can involve a blocking 
> IO operation if the underlying state handle refers to a file which has to be 
> deleted. In order to not block the calling thread, we should execute these 
> calls in a dedicated IO executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)