[GitHub] flink pull request #5928: [hotfix][doc] fix doc of externalized checkpoint

2018-05-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5928


---


[GitHub] flink pull request #5928: [hotfix][doc] fix doc of externalized checkpoint

2018-05-16 Thread zentol
Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/5928#discussion_r188548729
  
--- Diff: docs/_includes/generated/checkpointing_configuration.html ---
@@ -40,7 +40,7 @@
 
 state.checkpoints.dir
 (none)
-The default directory used for checkpoints. Used by the 
state backends that write checkpoints to file systems (MemoryStateBackend, 
FsStateBackend, RocksDBStateBackend).
+The default directory used for storing the data files and 
meta data of checkpoints in a Flink supported filesystem. Note: the storage 
path must be accessible from all participating processes/nodes(i.e. all 
TaskManagers and JobManagers).
--- End diff --

typo: "Note: The"


---


[GitHub] flink pull request #5928: [hotfix][doc] fix doc of externalized checkpoint

2018-05-16 Thread zentol
Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/5928#discussion_r188549219
  
--- Diff: docs/ops/state/checkpoints.md ---
@@ -35,60 +35,62 @@ the same semantics as a failure-free execution.
 See [Checkpointing]({{ site.baseurl 
}}/dev/stream/state/checkpointing.html) for how to enable and
 configure checkpoints for your program.
 
-## Externalized Checkpoints
+## Retain The Checkpoints
 
 Checkpoints are by default not persisted externally and are only used to
 resume a job from failures. They are deleted when a program is cancelled.
 You can, however, configure periodic checkpoints to be persisted externally
-similarly to [savepoints](savepoints.html). These *externalized 
checkpoints*
-write their meta data out to persistent storage and are *not* automatically
-cleaned up when the job fails. This way, you will have a checkpoint around
-to resume from if your job fails.
+similarly to [savepoints](savepoints.html). This way, you will have a 
persisted 
+checkpoint around to resume from if your job fails.
 
 {% highlight java %}
 CheckpointConfig config = env.getCheckpointConfig();
 
config.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
 {% endhighlight %}
 
-The `ExternalizedCheckpointCleanup` mode configures what happens with 
externalized checkpoints when you cancel the job:
+The `ExternalizedCheckpointCleanup` mode configures what happens with 
checkpoints when you cancel the job:
 
-- **`ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION`**: Retain the 
externalized checkpoint when the job is cancelled. Note that you have to 
manually clean up the checkpoint state after cancellation in this case.
+- **`ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION`**: Retain the 
checkpoint when the job is cancelled. Note that you have to manually clean up 
the checkpoint state after cancellation in this case.
 
-- **`ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION`**: Delete the 
externalized checkpoint when the job is cancelled. The checkpoint state will 
only be available if the job fails.
+- **`ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION`**: Delete the 
checkpoint when the job is cancelled. The checkpoint state will only be 
available if the job fails.
 
 ### Directory Structure
 
-Similarly to [savepoints](savepoints.html), an externalized checkpoint 
consists
-of a meta data file and, depending on the state back-end, some additional 
data
-files. The **target directory** for the externalized checkpoint's meta 
data is
-determined from the configuration key `state.checkpoints.dir` which, 
currently,
-can only be set via the configuration files.
+Similarly to [savepoints](savepoints.html), an checkpoint consists
--- End diff --

typo: "a checkpoint"


---


[GitHub] flink pull request #5928: [hotfix][doc] fix doc of externalized checkpoint

2018-05-16 Thread zentol
Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/5928#discussion_r188548955
  
--- Diff: docs/ops/state/checkpoints.md ---
@@ -35,60 +35,62 @@ the same semantics as a failure-free execution.
 See [Checkpointing]({{ site.baseurl 
}}/dev/stream/state/checkpointing.html) for how to enable and
 configure checkpoints for your program.
 
-## Externalized Checkpoints
+## Retain The Checkpoints
--- End diff --

"Retained Checkpoints"


---


[GitHub] flink pull request #5928: [hotfix][doc] fix doc of externalized checkpoint

2018-05-03 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/5928#discussion_r185737131
  
--- Diff: docs/ops/state/state_backends.md ---
@@ -152,7 +152,7 @@ Possible values for the config entry are *jobmanager* 
(MemoryStateBackend), *fil
 name of the class that implements the state backend factory 
[FsStateBackendFactory](https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/state/filesystem/FsStateBackendFactory.java),
 such as 
`org.apache.flink.contrib.streaming.state.RocksDBStateBackendFactory` for 
RocksDBStateBackend.
 
-In the case where the default state backend is set to *filesystem*, the 
entry `state.backend.fs.checkpointdir` defines the directory where the 
checkpoint data will be stored.
+In the case where the default state backend is set to *filesystem*, the 
entry `state.checkpoints.dir` defines the directory where the checkpoint data 
will be stored.
--- End diff --

The option is used by all backends that eventually store data to file 
system, including RocksDBStateBackend and MemoryStateBackend (the 
MemoryStateBackend writes its single checkpoint metadata file there, optionally)


---


[GitHub] flink pull request #5928: [hotfix][doc] fix doc of externalized checkpoint

2018-05-02 Thread zentol
Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/5928#discussion_r185569814
  
--- Diff: docs/dev/stream/state/checkpointing.md ---
@@ -137,11 +137,9 @@ Some more parameters and/or defaults may be set via 
`conf/flink-conf.yaml` (see
-  `jobmanager`: In-memory state, backup to JobManager's/ZooKeeper's 
memory. Should be used only for minimal state (Kafka offsets) or testing and 
local debugging.
-  `filesystem`: State is in-memory on the TaskManagers, and state 
snapshots are stored in a file system. Supported are all filesystems supported 
by Flink, for example HDFS, S3, ...
 
-- `state.backend.fs.checkpointdir`: Directory for storing checkpoints in a 
Flink supported filesystem. Note: State backend must be accessible from the 
JobManager, use `file://` only for local setups.
+- `state.checkpoints.dir`: The target directory for storing checkpoints 
data files and meta data of [externalized checkpoints]({{ site.baseurl 
}}/ops/state/checkpoints.html#externalized-checkpoints) in a Flink supported 
filesystem. Note: the storage path must be accessible from all participating 
processes/nodes(i.e. all TaskManagers and JobManagers).
--- End diff --

in fact, you could replace this entire section with `{% include 
generated/checkpointing_configuration.html %}`


---


[GitHub] flink pull request #5928: [hotfix][doc] fix doc of externalized checkpoint

2018-05-02 Thread zentol
Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/5928#discussion_r185569523
  
--- Diff: docs/dev/stream/state/checkpointing.md ---
@@ -137,11 +137,9 @@ Some more parameters and/or defaults may be set via 
`conf/flink-conf.yaml` (see
-  `jobmanager`: In-memory state, backup to JobManager's/ZooKeeper's 
memory. Should be used only for minimal state (Kafka offsets) or testing and 
local debugging.
-  `filesystem`: State is in-memory on the TaskManagers, and state 
snapshots are stored in a file system. Supported are all filesystems supported 
by Flink, for example HDFS, S3, ...
 
-- `state.backend.fs.checkpointdir`: Directory for storing checkpoints in a 
Flink supported filesystem. Note: State backend must be accessible from the 
JobManager, use `file://` only for local setups.
+- `state.checkpoints.dir`: The target directory for storing checkpoints 
data files and meta data of [externalized checkpoints]({{ site.baseurl 
}}/ops/state/checkpoints.html#externalized-checkpoints) in a Flink supported 
filesystem. Note: the storage path must be accessible from all participating 
processes/nodes(i.e. all TaskManagers and JobManagers).
--- End diff --

please sync these entries with the descriptions in `CheckpointingOptions`, 
so that they are identical with the one seen on the configuration page.


---


[GitHub] flink pull request #5928: [hotfix][doc] fix doc of externalized checkpoint

2018-05-02 Thread zentol
Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/5928#discussion_r185568867
  
--- Diff: docs/ops/state/checkpoints.md ---
@@ -60,25 +60,29 @@ The `ExternalizedCheckpointCleanup` mode configures 
what happens with externaliz
 
 Similarly to [savepoints](savepoints.html), an externalized checkpoint 
consists
 of a meta data file and, depending on the state back-end, some additional 
data
-files. The **target directory** for the externalized checkpoint's meta 
data is
-determined from the configuration key `state.checkpoints.dir` which, 
currently,
-can only be set via the configuration files.
+files. The externalized checkpoint's meta data is stored in the same 
directory 
+as data files. So the **target directory** can be set via configuration 
key 
+`state.checkpoints.dir` in the configuration files, and also can be 
specified 
+for per job in the code.
 
+- Configure globally via configuration files
 ```
 state.checkpoints.dir: hdfs:///checkpoints/
 ```
 
+- Configure for per job via code 
+```java
--- End diff --

please use `{% highlight java %}` syntax instead


---


[GitHub] flink pull request #5928: [hotfix][doc] fix doc of externalized checkpoint

2018-04-30 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/5928#discussion_r185021293
  
--- Diff: docs/dev/stream/state/checkpointing.md ---
@@ -137,11 +137,9 @@ Some more parameters and/or defaults may be set via 
`conf/flink-conf.yaml` (see
-  `jobmanager`: In-memory state, backup to JobManager's/ZooKeeper's 
memory. Should be used only for minimal state (Kafka offsets) or testing and 
local debugging.
-  `filesystem`: State is in-memory on the TaskManagers, and state 
snapshots are stored in a file system. Supported are all filesystems supported 
by Flink, for example HDFS, S3, ...
 
-- `state.backend.fs.checkpointdir`: Directory for storing checkpoints in a 
Flink supported filesystem. Note: State backend must be accessible from the 
JobManager, use `file://` only for local setups.
+- `state.checkpoints.dir`: The target directory for storing checkpoints 
data files and meta data of [externalized checkpoints]({{ site.baseurl 
}}/ops/state/checkpoints.html#externalized-checkpoints) in a Flink supported 
filesystem. Note: State backend must be accessible from the JobManager, use 
`file://` only for local setups.
--- End diff --

Yes, `file:///` is what you use for many NAS style storage systems, so it 
is not local-only. Let's change this to say that the storage path must be 
accessible from all participating processes/nodes, i.e., all TaskManagers and 
JobManagers


---


[GitHub] flink pull request #5928: [hotfix][doc] fix doc of externalized checkpoint

2018-04-30 Thread alpinegizmo
Github user alpinegizmo commented on a diff in the pull request:

https://github.com/apache/flink/pull/5928#discussion_r184959202
  
--- Diff: docs/dev/stream/state/checkpointing.md ---
@@ -137,11 +137,9 @@ Some more parameters and/or defaults may be set via 
`conf/flink-conf.yaml` (see
-  `jobmanager`: In-memory state, backup to JobManager's/ZooKeeper's 
memory. Should be used only for minimal state (Kafka offsets) or testing and 
local debugging.
-  `filesystem`: State is in-memory on the TaskManagers, and state 
snapshots are stored in a file system. Supported are all filesystems supported 
by Flink, for example HDFS, S3, ...
 
-- `state.backend.fs.checkpointdir`: Directory for storing checkpoints in a 
Flink supported filesystem. Note: State backend must be accessible from the 
JobManager, use `file://` only for local setups.
+- `state.checkpoints.dir`: The target directory for storing checkpoints 
data files and meta data of [externalized checkpoints]({{ site.baseurl 
}}/ops/state/checkpoints.html#externalized-checkpoints) in a Flink supported 
filesystem. Note: State backend must be accessible from the JobManager, use 
`file://` only for local setups.
--- End diff --

This seems potentially misleading -- isn't it okay to use a `file://` URI 
in the case of a distributed filesystem that is mounted at the same mount point 
across the cluster?


---


[GitHub] flink pull request #5928: [hotfix][doc] fix doc of externalized checkpoint

2018-04-27 Thread sihuazhou
GitHub user sihuazhou opened a pull request:

https://github.com/apache/flink/pull/5928

[hotfix][doc] fix doc of externalized checkpoint

## What is the purpose of the change

This PR intend to fix the incorrect doc of externalized checkpoint.

## Brief change log

- fix doc of externalized checkpoint

## Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

## Does this pull request potentially affect one of the following parts:

no

## Documentation

no

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sihuazhou/flink 
fix_doc_of_externalized_checkpoint

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5928.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5928


commit e2d02c06433243987561fdf9da95d11a61059c51
Author: sihuazhou 
Date:   2018-04-27T08:15:04Z

[hotfix] fix doc of externalized checkpoint.




---