Re: [EXTERNAL] difference between checkpoints & savepoints

2017-11-30 Thread Hao Sun
Hi team, I have one follow up question on this.

There is a discussion on resuming jobs from *a saved external checkpoint*,
I feel there are two aspects of that topic.
*1. I do not have changes to the job, just want to resume the job from a
failure.*
I can see this automatically happen with ZK enabled. I do not have to
manually do anything.
==
2017-12-01 05:02:26,603 DEBUG
org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore -
Recovering job graph f824eabe58d180d79416d9637ac6aa32 from
fraud_prevention_service/flink/jobgraphs/f824eabe58d180d79416d9637ac6aa32.
==

*2. I want to submit a new job and resume the previous process for whatever
reason. e.g. JobGraph changed, need to change parallelism, etc.*
https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/savepoints.html#faq
I am wondering for Flink 1.3.2, 1.4 and 1.5, does external checkpoint
identical to savepoint? Does it mean everything in the FAQ section, also
applies to the externalized checkpoint? *How about allowNonRestoredState,
do we have things like this for externalized chkpnt?*

I am running Flink 1.3.2 on K8S, so I am wondering what is the best
practice to do the deployment for new code releases. And Flip6 is awesome,
can't wait to use it.

Thanks as always.


On Wed, Aug 16, 2017 at 5:23 PM Raja.Aravapalli 
wrote:

>
>
> Thanks very much for the detailed explanation Stefan.
>
>
>
>
>
> Regards,
>
> Raja.
>
>
>
> *From: *Stefan Richter 
> *Date: *Monday, August 14, 2017 at 7:47 AM
> *To: *Raja Aravapalli 
> *Cc: *"user@flink.apache.org" 
> *Subject: *Re: [EXTERNAL] difference between checkpoints & savepoints
>
>
>
> Just noticed that I forgot to include also a reference to the
> documentation about externalized checkpoints:
> https://ci.apache.org/projects/flink/flink-docs-master/ops/state/checkpoints.html
> <https://ci.apache.org/projects/flink/flink-docs-master/ops/state/checkpoints.html>
>
>
>
> Am 14.08.2017 um 14:17 schrieb Stefan Richter  >:
>
>
>
>
>
> Hi,
>
>
>
>
>
> Also, in the same line, can someone detail the difference between State
> Backend & External checkpoint?
>
>
>
>
>
> Those are two very different things. If we talk about state backends in
> Flink, we mean the entity that is responsible for storing and managing the
> state inside an operator. This could for example be something like the
> FsStateBackend that is based on hash maps and keeps state on the heap, or
> the RocksDBStateBackend which is using RocksDB as a store internally and
> operates on native memory and disk.
>
>
>
> An externalized checkpoint, like a normal checkpoint, is the collection of
> all state in a job persisted to stable storage for recovery. A little more
> concrete, this typically means writing out the contents of the state
> backends to a save place so that we can restore them from there.
>
>
>
> Also, programmatic API, thru which methods we can configure those.
>
>
>
> This explains how to set the backend programatically:
>
>
>
>
> https://ci.apache.org/projects/flink/flink-docs-master/ops/state/state_backends.html
> <https://ci.apache.org/projects/flink/flink-docs-master/ops/state/state_backends.html>
>
>
>
> To activate externalized checkpoints, you activate normal checkpoints,
> plus the following line:
>
>
>
> env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.*RETAIN_ON_CANCELLATION*);
>
>
>
> where env is your StreamExecutionEnvironment.
>
>
>
> If you need an example, please take a look at the
> org.apache.flink.test.checkpointing.ExternalizedCheckpointITCase. This
> class configures everything you asked about programatically.
>
>
>
> Best,
>
> Stefan
>
>
>
>
>


Re: [EXTERNAL] difference between checkpoints & savepoints

2017-08-16 Thread Raja . Aravapalli

Thanks very much for the detailed explanation Stefan.


Regards,
Raja.

From: Stefan Richter 
Date: Monday, August 14, 2017 at 7:47 AM
To: Raja Aravapalli 
Cc: "user@flink.apache.org" 
Subject: Re: [EXTERNAL] difference between checkpoints & savepoints

Just noticed that I forgot to include also a reference to the documentation 
about externalized checkpoints: 
https://ci.apache.org/projects/flink/flink-docs-master/ops/state/checkpoints.html

Am 14.08.2017 um 14:17 schrieb Stefan Richter 
mailto:s.rich...@data-artisans.com>>:


Hi,



Also, in the same line, can someone detail the difference between State Backend 
& External checkpoint?


Those are two very different things. If we talk about state backends in Flink, 
we mean the entity that is responsible for storing and managing the state 
inside an operator. This could for example be something like the FsStateBackend 
that is based on hash maps and keeps state on the heap, or the 
RocksDBStateBackend which is using RocksDB as a store internally and operates 
on native memory and disk.

An externalized checkpoint, like a normal checkpoint, is the collection of all 
state in a job persisted to stable storage for recovery. A little more 
concrete, this typically means writing out the contents of the state backends 
to a save place so that we can restore them from there.


Also, programmatic API, thru which methods we can configure those.

This explains how to set the backend programatically:

https://ci.apache.org/projects/flink/flink-docs-master/ops/state/state_backends.html

To activate externalized checkpoints, you activate normal checkpoints, plus the 
following line:


env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);

where env is your StreamExecutionEnvironment.

If you need an example, please take a look at the 
org.apache.flink.test.checkpointing.ExternalizedCheckpointITCase. This class 
configures everything you asked about programatically.

Best,
Stefan




Re: [EXTERNAL] difference between checkpoints & savepoints

2017-08-14 Thread Stefan Richter
Just noticed that I forgot to include also a reference to the documentation 
about externalized checkpoints: 
https://ci.apache.org/projects/flink/flink-docs-master/ops/state/checkpoints.html
 


> Am 14.08.2017 um 14:17 schrieb Stefan Richter :
> 
> 
> Hi,
> 
>> 
>> Also, in the same line, can someone detail the difference between State 
>> Backend & External checkpoint?
>>  
> 
> Those are two very different things. If we talk about state backends in 
> Flink, we mean the entity that is responsible for storing and managing the 
> state inside an operator. This could for example be something like the 
> FsStateBackend that is based on hash maps and keeps state on the heap, or the 
> RocksDBStateBackend which is using RocksDB as a store internally and operates 
> on native memory and disk.
> 
> An externalized checkpoint, like a normal checkpoint, is the collection of 
> all state in a job persisted to stable storage for recovery. A little more 
> concrete, this typically means writing out the contents of the state backends 
> to a save place so that we can restore them from there.
> 
>> Also, programmatic API, thru which methods we can configure those.
> 
> This explains how to set the backend programatically:
> 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/state/state_backends.html
>  
> 
> 
> To activate externalized checkpoints, you activate normal checkpoints, plus 
> the following line:
> 
> env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
> 
> where env is your StreamExecutionEnvironment.
> 
> If you need an example, please take a look at the 
> org.apache.flink.test.checkpointing.ExternalizedCheckpointITCase. This class 
> configures everything you asked about programatically.
> 
> Best,
> Stefan
> 



Re: [EXTERNAL] difference between checkpoints & savepoints

2017-08-14 Thread Stefan Richter

Hi,

> 
> Also, in the same line, can someone detail the difference between State 
> Backend & External checkpoint?
>  

Those are two very different things. If we talk about state backends in Flink, 
we mean the entity that is responsible for storing and managing the state 
inside an operator. This could for example be something like the FsStateBackend 
that is based on hash maps and keeps state on the heap, or the 
RocksDBStateBackend which is using RocksDB as a store internally and operates 
on native memory and disk.

An externalized checkpoint, like a normal checkpoint, is the collection of all 
state in a job persisted to stable storage for recovery. A little more 
concrete, this typically means writing out the contents of the state backends 
to a save place so that we can restore them from there.

> Also, programmatic API, thru which methods we can configure those.

This explains how to set the backend programatically:

https://ci.apache.org/projects/flink/flink-docs-master/ops/state/state_backends.html
 


To activate externalized checkpoints, you activate normal checkpoints, plus the 
following line:

env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);

where env is your StreamExecutionEnvironment.

If you need an example, please take a look at the 
org.apache.flink.test.checkpointing.ExternalizedCheckpointITCase. This class 
configures everything you asked about programatically.

Best,
Stefan