[jira] [Commented] (FLINK-4322) Unify CheckpointCoordinator and SavepointCoordinator

2016-08-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427941#comment-15427941
 ] 

ASF GitHub Bot commented on FLINK-4322:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2385


> Unify CheckpointCoordinator and SavepointCoordinator
> 
>
> Key: FLINK-4322
> URL: https://issues.apache.org/jira/browse/FLINK-4322
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.1.0
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 1.2.0
>
>
> The Checkpoint coordinator should have the functionality of both handling 
> checkpoints and savepoints.
> The difference between checkpoints and savepoints is minimal:
>   - savepoints always write the root metadata of the checkpoint
>   - savepoints are always full (never incremental)
> The commonalities are large
>   - jobs should be able to resume from checkpoint or savepoints
>   - jobs should fall back to the latest checkpoint or savepoint
> This subsumes issue https://issues.apache.org/jira/browse/FLINK-3397



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4322) Unify CheckpointCoordinator and SavepointCoordinator

2016-08-19 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427942#comment-15427942
 ] 

Ufuk Celebi commented on FLINK-4322:


Addendum in 8854d75 and 5d7f880 (master).

> Unify CheckpointCoordinator and SavepointCoordinator
> 
>
> Key: FLINK-4322
> URL: https://issues.apache.org/jira/browse/FLINK-4322
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.1.0
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 1.2.0
>
>
> The Checkpoint coordinator should have the functionality of both handling 
> checkpoints and savepoints.
> The difference between checkpoints and savepoints is minimal:
>   - savepoints always write the root metadata of the checkpoint
>   - savepoints are always full (never incremental)
> The commonalities are large
>   - jobs should be able to resume from checkpoint or savepoints
>   - jobs should fall back to the latest checkpoint or savepoint
> This subsumes issue https://issues.apache.org/jira/browse/FLINK-3397



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4322) Unify CheckpointCoordinator and SavepointCoordinator

2016-08-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427931#comment-15427931
 ] 

ASF GitHub Bot commented on FLINK-4322:
---

Github user uce commented on the issue:

https://github.com/apache/flink/pull/2385
  
I will merge this and add the following test for min pause. This fails with 
the current master, but works with your PR.

```java
/**
 * Tests that no minimum delay between savepoints is enforced.
 */
@Test
public void testMinDelayBetweenSavepoints() throws Exception {
JobID jobId = new JobID();

final ExecutionAttemptID attemptID1 = new ExecutionAttemptID();
ExecutionVertex vertex1 = mockExecutionVertex(attemptID1);

CheckpointCoordinator coord = new CheckpointCoordinator(
jobId,
10,
20,
1L, // very long min delay => should not affect 
savepoints
1,
42,
new ExecutionVertex[] { vertex1 },
new ExecutionVertex[] { vertex1 },
new ExecutionVertex[] { vertex1 },
cl,
new StandaloneCheckpointIDCounter(),
new StandaloneCompletedCheckpointStore(2, cl),
new HeapSavepointStore(),
new DisabledCheckpointStatsTracker());

Future savepoint0 = coord.triggerSavepoint(0);
assertFalse("Did not trigger savepoint", savepoint0.isCompleted());

Future savepoint1 = coord.triggerSavepoint(1);
assertFalse("Did not trigger savepoint", savepoint1.isCompleted());
}
```


> Unify CheckpointCoordinator and SavepointCoordinator
> 
>
> Key: FLINK-4322
> URL: https://issues.apache.org/jira/browse/FLINK-4322
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.1.0
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 1.2.0
>
>
> The Checkpoint coordinator should have the functionality of both handling 
> checkpoints and savepoints.
> The difference between checkpoints and savepoints is minimal:
>   - savepoints always write the root metadata of the checkpoint
>   - savepoints are always full (never incremental)
> The commonalities are large
>   - jobs should be able to resume from checkpoint or savepoints
>   - jobs should fall back to the latest checkpoint or savepoint
> This subsumes issue https://issues.apache.org/jira/browse/FLINK-3397



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4322) Unify CheckpointCoordinator and SavepointCoordinator

2016-08-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427909#comment-15427909
 ] 

ASF GitHub Bot commented on FLINK-4322:
---

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/2385
  
Looks good to me.
+1


> Unify CheckpointCoordinator and SavepointCoordinator
> 
>
> Key: FLINK-4322
> URL: https://issues.apache.org/jira/browse/FLINK-4322
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.1.0
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 1.2.0
>
>
> The Checkpoint coordinator should have the functionality of both handling 
> checkpoints and savepoints.
> The difference between checkpoints and savepoints is minimal:
>   - savepoints always write the root metadata of the checkpoint
>   - savepoints are always full (never incremental)
> The commonalities are large
>   - jobs should be able to resume from checkpoint or savepoints
>   - jobs should fall back to the latest checkpoint or savepoint
> This subsumes issue https://issues.apache.org/jira/browse/FLINK-3397



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4322) Unify CheckpointCoordinator and SavepointCoordinator

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426317#comment-15426317
 ] 

ASF GitHub Bot commented on FLINK-4322:
---

GitHub user ramkrish86 opened a pull request:

https://github.com/apache/flink/pull/2385

FLINK-4322 (addendum)Issue with savepoint considering the time interval 
like that of check…

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [ ] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [ ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed

…point

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ramkrish86/flink FLINK-4322_addendum

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2385.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2385


commit fd864d425dea920a0c49d8e4f9b433d219424252
Author: Ramkrishna 
Date:   2016-08-18T11:57:41Z

Issue with savepoint considering the time interval like that of checkpoint




> Unify CheckpointCoordinator and SavepointCoordinator
> 
>
> Key: FLINK-4322
> URL: https://issues.apache.org/jira/browse/FLINK-4322
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.1.0
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 1.2.0
>
>
> The Checkpoint coordinator should have the functionality of both handling 
> checkpoints and savepoints.
> The difference between checkpoints and savepoints is minimal:
>   - savepoints always write the root metadata of the checkpoint
>   - savepoints are always full (never incremental)
> The commonalities are large
>   - jobs should be able to resume from checkpoint or savepoints
>   - jobs should fall back to the latest checkpoint or savepoint
> This subsumes issue https://issues.apache.org/jira/browse/FLINK-3397



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4322) Unify CheckpointCoordinator and SavepointCoordinator

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426222#comment-15426222
 ] 

ASF GitHub Bot commented on FLINK-4322:
---

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/2366
  
JIRA issue never hurts. But in this case, it could piggy-bag on the 
previous issue. Your choice ;-)


> Unify CheckpointCoordinator and SavepointCoordinator
> 
>
> Key: FLINK-4322
> URL: https://issues.apache.org/jira/browse/FLINK-4322
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.1.0
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 1.2.0
>
>
> The Checkpoint coordinator should have the functionality of both handling 
> checkpoints and savepoints.
> The difference between checkpoints and savepoints is minimal:
>   - savepoints always write the root metadata of the checkpoint
>   - savepoints are always full (never incremental)
> The commonalities are large
>   - jobs should be able to resume from checkpoint or savepoints
>   - jobs should fall back to the latest checkpoint or savepoint
> This subsumes issue https://issues.apache.org/jira/browse/FLINK-3397



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4322) Unify CheckpointCoordinator and SavepointCoordinator

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426219#comment-15426219
 ] 

ASF GitHub Bot commented on FLINK-4322:
---

Github user ramkrish86 commented on the issue:

https://github.com/apache/flink/pull/2366
  
Yes. I can do it. Thanks. 
Is it better to raise a JIRA for that for accounting or just raise a PR 
with the same issue ID?


> Unify CheckpointCoordinator and SavepointCoordinator
> 
>
> Key: FLINK-4322
> URL: https://issues.apache.org/jira/browse/FLINK-4322
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.1.0
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 1.2.0
>
>
> The Checkpoint coordinator should have the functionality of both handling 
> checkpoints and savepoints.
> The difference between checkpoints and savepoints is minimal:
>   - savepoints always write the root metadata of the checkpoint
>   - savepoints are always full (never incremental)
> The commonalities are large
>   - jobs should be able to resume from checkpoint or savepoints
>   - jobs should fall back to the latest checkpoint or savepoint
> This subsumes issue https://issues.apache.org/jira/browse/FLINK-3397



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4322) Unify CheckpointCoordinator and SavepointCoordinator

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426212#comment-15426212
 ] 

ASF GitHub Bot commented on FLINK-4322:
---

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/2366
  
You are right, this was an oversight on my side. This check should also be 
within the `if (!isSavepoint())` block. Thanks for reviewing this.

Do you want to create a fix for that?


> Unify CheckpointCoordinator and SavepointCoordinator
> 
>
> Key: FLINK-4322
> URL: https://issues.apache.org/jira/browse/FLINK-4322
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.1.0
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 1.2.0
>
>
> The Checkpoint coordinator should have the functionality of both handling 
> checkpoints and savepoints.
> The difference between checkpoints and savepoints is minimal:
>   - savepoints always write the root metadata of the checkpoint
>   - savepoints are always full (never incremental)
> The commonalities are large
>   - jobs should be able to resume from checkpoint or savepoints
>   - jobs should fall back to the latest checkpoint or savepoint
> This subsumes issue https://issues.apache.org/jira/browse/FLINK-3397



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4322) Unify CheckpointCoordinator and SavepointCoordinator

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426208#comment-15426208
 ] 

ASF GitHub Bot commented on FLINK-4322:
---

Github user ramkrish86 commented on the issue:

https://github.com/apache/flink/pull/2366
  
Ya. So may be this code
`if (lastTriggeredCheckpoint + minPauseBetweenCheckpoints > timestamp) {`
The if block should also go inside the previous 'if' condition that checks 
for !savePoint, right?


> Unify CheckpointCoordinator and SavepointCoordinator
> 
>
> Key: FLINK-4322
> URL: https://issues.apache.org/jira/browse/FLINK-4322
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.1.0
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 1.2.0
>
>
> The Checkpoint coordinator should have the functionality of both handling 
> checkpoints and savepoints.
> The difference between checkpoints and savepoints is minimal:
>   - savepoints always write the root metadata of the checkpoint
>   - savepoints are always full (never incremental)
> The commonalities are large
>   - jobs should be able to resume from checkpoint or savepoints
>   - jobs should fall back to the latest checkpoint or savepoint
> This subsumes issue https://issues.apache.org/jira/browse/FLINK-3397



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4322) Unify CheckpointCoordinator and SavepointCoordinator

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426201#comment-15426201
 ] 

ASF GitHub Bot commented on FLINK-4322:
---

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/2366
  
Savepoints are excepted form the concurrent checkpoints and time between 
checkpoints limitation. These checks run only if the triggered checkpoint is 
not a savepoint (in the triggerCheckpoint(...)) method.


> Unify CheckpointCoordinator and SavepointCoordinator
> 
>
> Key: FLINK-4322
> URL: https://issues.apache.org/jira/browse/FLINK-4322
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.1.0
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 1.2.0
>
>
> The Checkpoint coordinator should have the functionality of both handling 
> checkpoints and savepoints.
> The difference between checkpoints and savepoints is minimal:
>   - savepoints always write the root metadata of the checkpoint
>   - savepoints are always full (never incremental)
> The commonalities are large
>   - jobs should be able to resume from checkpoint or savepoints
>   - jobs should fall back to the latest checkpoint or savepoint
> This subsumes issue https://issues.apache.org/jira/browse/FLINK-3397



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4322) Unify CheckpointCoordinator and SavepointCoordinator

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426167#comment-15426167
 ] 

ASF GitHub Bot commented on FLINK-4322:
---

Github user ramkrish86 commented on the issue:

https://github.com/apache/flink/pull/2366
  
Reading the code once again, just few doubts/questions
->If a save point is triggered and that is happening with in the duration 
of the 'MINIMUM_TIME_BETWEEN_CHECKPOINTS' we still throw back a decline result? 
Is it not needed that since Save points are externally triggered we need to 
isolate that with the internal timing we maintain? 
Correct me if am missing something here. Thanks. 


> Unify CheckpointCoordinator and SavepointCoordinator
> 
>
> Key: FLINK-4322
> URL: https://issues.apache.org/jira/browse/FLINK-4322
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.1.0
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 1.2.0
>
>
> The Checkpoint coordinator should have the functionality of both handling 
> checkpoints and savepoints.
> The difference between checkpoints and savepoints is minimal:
>   - savepoints always write the root metadata of the checkpoint
>   - savepoints are always full (never incremental)
> The commonalities are large
>   - jobs should be able to resume from checkpoint or savepoints
>   - jobs should fall back to the latest checkpoint or savepoint
> This subsumes issue https://issues.apache.org/jira/browse/FLINK-3397



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4322) Unify CheckpointCoordinator and SavepointCoordinator

2016-08-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15425928#comment-15425928
 ] 

ASF GitHub Bot commented on FLINK-4322:
---

Github user ramkrish86 commented on the issue:

https://github.com/apache/flink/pull/2366
  
The unification looks great and now things are simple in terms of 
restoration. Thanks.


> Unify CheckpointCoordinator and SavepointCoordinator
> 
>
> Key: FLINK-4322
> URL: https://issues.apache.org/jira/browse/FLINK-4322
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.1.0
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 1.2.0
>
>
> The Checkpoint coordinator should have the functionality of both handling 
> checkpoints and savepoints.
> The difference between checkpoints and savepoints is minimal:
>   - savepoints always write the root metadata of the checkpoint
>   - savepoints are always full (never incremental)
> The commonalities are large
>   - jobs should be able to resume from checkpoint or savepoints
>   - jobs should fall back to the latest checkpoint or savepoint
> This subsumes issue https://issues.apache.org/jira/browse/FLINK-3397



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4322) Unify CheckpointCoordinator and SavepointCoordinator

2016-08-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15424997#comment-15424997
 ] 

ASF GitHub Bot commented on FLINK-4322:
---

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/2366
  
Looked very good. Fixed a minor issue (test log level) and merged this.


> Unify CheckpointCoordinator and SavepointCoordinator
> 
>
> Key: FLINK-4322
> URL: https://issues.apache.org/jira/browse/FLINK-4322
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.1.0
>Reporter: Stephan Ewen
> Fix For: 1.2.0
>
>
> The Checkpoint coordinator should have the functionality of both handling 
> checkpoints and savepoints.
> The difference between checkpoints and savepoints is minimal:
>   - savepoints always write the root metadata of the checkpoint
>   - savepoints are always full (never incremental)
> The commonalities are large
>   - jobs should be able to resume from checkpoint or savepoints
>   - jobs should fall back to the latest checkpoint or savepoint
> This subsumes issue https://issues.apache.org/jira/browse/FLINK-3397



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4322) Unify CheckpointCoordinator and SavepointCoordinator

2016-08-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15424993#comment-15424993
 ] 

ASF GitHub Bot commented on FLINK-4322:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2366


> Unify CheckpointCoordinator and SavepointCoordinator
> 
>
> Key: FLINK-4322
> URL: https://issues.apache.org/jira/browse/FLINK-4322
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.1.0
>Reporter: Stephan Ewen
> Fix For: 1.2.0
>
>
> The Checkpoint coordinator should have the functionality of both handling 
> checkpoints and savepoints.
> The difference between checkpoints and savepoints is minimal:
>   - savepoints always write the root metadata of the checkpoint
>   - savepoints are always full (never incremental)
> The commonalities are large
>   - jobs should be able to resume from checkpoint or savepoints
>   - jobs should fall back to the latest checkpoint or savepoint
> This subsumes issue https://issues.apache.org/jira/browse/FLINK-3397



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4322) Unify CheckpointCoordinator and SavepointCoordinator

2016-08-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419941#comment-15419941
 ] 

ASF GitHub Bot commented on FLINK-4322:
---

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/2366
  
Thanks for joining this effort. I'll try to have a look at the tests in the 
next days.


> Unify CheckpointCoordinator and SavepointCoordinator
> 
>
> Key: FLINK-4322
> URL: https://issues.apache.org/jira/browse/FLINK-4322
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.1.0
>Reporter: Stephan Ewen
> Fix For: 1.2.0
>
>
> The Checkpoint coordinator should have the functionality of both handling 
> checkpoints and savepoints.
> The difference between checkpoints and savepoints is minimal:
>   - savepoints always write the root metadata of the checkpoint
>   - savepoints are always full (never incremental)
> The commonalities are large
>   - jobs should be able to resume from checkpoint or savepoints
>   - jobs should fall back to the latest checkpoint or savepoint
> This subsumes issue https://issues.apache.org/jira/browse/FLINK-3397



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4322) Unify CheckpointCoordinator and SavepointCoordinator

2016-08-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418952#comment-15418952
 ] 

ASF GitHub Bot commented on FLINK-4322:
---

GitHub user uce opened a pull request:

https://github.com/apache/flink/pull/2366

[FLINK-4322] Unify CheckpointCoordinator and SavepointCoordinator

The CheckpointCoordinator now also takes over the role of the 
SavepointCoordinator. Savepoints are just like other checkpoints - they only 
store the metadata in addition. Restoring from a savepoint means loading it 
into the CheckpointStore at startup.

This simplifies the code quite a bit. We get rid of the savepoint 
coordinator and related classes and cumbersome restoring logic in the main 
code. For the tests, we can replace some integration tests with unit tests.

`PendingSavepoint` instances are finalized to become a 
`CompletedCheckpoint` like regular `PendingCheckpoint` instances, but in 
addition store the savepoint meta data and complete a Promise for callbacks. 
`PendingSavepoints` cannot be subsumed and a `CompletedCheckpoint` from a 
savepoint does not delete its associated state when being disposed.

@StephanEwen did most of the work and I added and fixed some tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uce/flink savepointunify

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2366.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2366


commit 9d13d3b9c78b5fe6ec436c476492a82b846338aa
Author: Stephan Ewen 
Date:   2016-08-08T17:18:44Z

[FLINK-4322] [checkpointing] Unify CheckpointCoordinator and 
SavepointCoordinator

The CheckpointCoordinator now also takes over the role of the 
SavepointCoordinator.
Savepoints are just like other checkpoints - they only store the metadata 
in addition.
Restoring from a savepoint means loading it into the CheckpointStore at 
startup.

commit bcb6cf0b573314449437bc869febfc68f798b0f4
Author: Ufuk Celebi 
Date:   2016-08-11T17:40:07Z

[FLINK-4322] [checkpointing] Add and fix tests




> Unify CheckpointCoordinator and SavepointCoordinator
> 
>
> Key: FLINK-4322
> URL: https://issues.apache.org/jira/browse/FLINK-4322
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.1.0
>Reporter: Stephan Ewen
> Fix For: 1.2.0
>
>
> The Checkpoint coordinator should have the functionality of both handling 
> checkpoints and savepoints.
> The difference between checkpoints and savepoints is minimal:
>   - savepoints always write the root metadata of the checkpoint
>   - savepoints are always full (never incremental)
> The commonalities are large
>   - jobs should be able to resume from checkpoint or savepoints
>   - jobs should fall back to the latest checkpoint or savepoint
> This subsumes issue https://issues.apache.org/jira/browse/FLINK-3397



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4322) Unify CheckpointCoordinator and SavepointCoordinator

2016-08-07 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411056#comment-15411056
 ] 

Ufuk Celebi commented on FLINK-4322:


+1 I had similar thoughts about this

> Unify CheckpointCoordinator and SavepointCoordinator
> 
>
> Key: FLINK-4322
> URL: https://issues.apache.org/jira/browse/FLINK-4322
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.1.0
>Reporter: Stephan Ewen
> Fix For: 1.2.0
>
>
> The Checkpoint coordinator should have the functionality of both handling 
> checkpoints and savepoints.
> The difference between checkpoints and savepoints is minimal:
>   - savepoints always write the root metadata of the checkpoint
>   - savepoints are always full (never incremental)
> The commonalities are large
>   - jobs should be able to resume from checkpoint or savepoints
>   - jobs should fall back to the latest checkpoint or savepoint
> This subsumes issue https://issues.apache.org/jira/browse/FLINK-3397



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4322) Unify CheckpointCoordinator and SavepointCoordinator

2016-08-06 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410662#comment-15410662
 ] 

ramkrishna.s.vasudevan commented on FLINK-4322:
---

Ok, got it. I understand. Thanks. 

> Unify CheckpointCoordinator and SavepointCoordinator
> 
>
> Key: FLINK-4322
> URL: https://issues.apache.org/jira/browse/FLINK-4322
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.1.0
>Reporter: Stephan Ewen
> Fix For: 1.2.0
>
>
> The Checkpoint coordinator should have the functionality of both handling 
> checkpoints and savepoints.
> The difference between checkpoints and savepoints is minimal:
>   - savepoints always write the root metadata of the checkpoint
>   - savepoints are always full (never incremental)
> The commonalities are large
>   - jobs should be able to resume from checkpoint or savepoints
>   - jobs should fall back to the latest checkpoint or savepoint
> This subsumes issue https://issues.apache.org/jira/browse/FLINK-3397



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4322) Unify CheckpointCoordinator and SavepointCoordinator

2016-08-05 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15409961#comment-15409961
 ] 

Stephan Ewen commented on FLINK-4322:
-

I would like to make some more radical changes to the current checkpoint / 
savepoint design.
  - There should be no difference at all between checkpoint and savepoint any 
more.
  - A savepoint is merely a parameterized checkpoint (with a future to announce 
its result)
  - There is no SavepointCoordinator any more
  - a savepoint restore simply adds a completed checkpoint to the checkpoint 
coordinator

The execution graph code should become very simple that way.
Since that is a pretty deep change in the checkpointing mechanism, I would be 
happy to actually take this over.

> Unify CheckpointCoordinator and SavepointCoordinator
> 
>
> Key: FLINK-4322
> URL: https://issues.apache.org/jira/browse/FLINK-4322
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.1.0
>Reporter: Stephan Ewen
> Fix For: 1.2.0
>
>
> The Checkpoint coordinator should have the functionality of both handling 
> checkpoints and savepoints.
> The difference between checkpoints and savepoints is minimal:
>   - savepoints always write the root metadata of the checkpoint
>   - savepoints are always full (never incremental)
> The commonalities are large
>   - jobs should be able to resume from checkpoint or savepoints
>   - jobs should fall back to the latest checkpoint or savepoint
> This subsumes issue https://issues.apache.org/jira/browse/FLINK-3397



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4322) Unify CheckpointCoordinator and SavepointCoordinator

2016-08-05 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15409799#comment-15409799
 ] 

ramkrishna.s.vasudevan commented on FLINK-4322:
---

bq.jobs should be able to resume from checkpoint or savepoints
bq.jobs should fall back to the latest checkpoint or savepoint
Ya this is what FLINK-3397 tries to achieve too. 
[~uce] had given some comments and I had update the document but not yet 
uploaded in that JIRa. He was also telling about job and save point mapping. I 
had raised some comments and doubts regarding that in the other JIRA. As he is 
very busy just waiting for his inputs. As this and FLINK-3397 are related am 
happy to work on these and will do after we are arriving at a consensus on the 
design part. Thanks.

> Unify CheckpointCoordinator and SavepointCoordinator
> 
>
> Key: FLINK-4322
> URL: https://issues.apache.org/jira/browse/FLINK-4322
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.1.0
>Reporter: Stephan Ewen
> Fix For: 1.2.0
>
>
> The Checkpoint coordinator should have the functionality of both handling 
> checkpoints and savepoints.
> The difference between checkpoints and savepoints is minimal:
>   - savepoints always write the root metadata of the checkpoint
>   - savepoints are always full (never incremental)
> The commonalities are large
>   - jobs should be able to resume from checkpoint or savepoints
>   - jobs should fall back to the latest checkpoint or savepoint
> This subsumes issue https://issues.apache.org/jira/browse/FLINK-3397



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)