Re: Check pointing for simple pipeline

2020-07-09 Thread Dawid Wysakowicz
Hi Prasanna,

I'd like to add my two cents here. I would not say using the incremental
checkpoint is always the best choice. It might have its downsides when
restoring from the checkpoint as it will have to apply all the deltas.
Therefore restoring from a non-incremental checkpoint might be faster.


As Yun Tang, mentioned the incremental checkpoints are supported by
RocksDB only. You don't necessarily need the RocksDB state backend in
all cases. If you are sure that the state will fit into the memory (it
is probably the case for such a simple job, especially if the map
function is stateless), you should be good with the Filesystem state
backend[1]. This state backend should be faster as it does not need to
spill anything to disk and keeps everything in a deserialized form
during the runtime.


You might also find this short post[2] helpful.


Best,

Dawid


[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/state/state_backends.html#the-fsstatebackend

[2]
https://www.ververica.com/blog/stateful-stream-processing-apache-flink-state-backends


On 08/07/2020 05:25, Yun Tang wrote:
> Hi Prasanna
>
> Using incremental checkpoint is always better than not as this is
> faster and less memory consumed.
> However, incremental checkpoint is only supported by RocksDB
> state-backend.
>
>
> Best
> Yun Tang
> 
> *From:* Prasanna kumar 
> *Sent:* Tuesday, July 7, 2020 20:43
> *To:* d...@flink.apache.org ; user
> 
> *Subject:* Check pointing for simple pipeline
>  
> Hi ,
>
> I have pipeline. Source-> Map(JSON transform)-> Sink.. 
>
> Both source and sink are Kafka. 
>
> What is the best checkpoint ing mechanism?
>
>  Is setting checkpoints incremental a good option? What should be
> careful of? 
>
> I am running it on aws emr.
>
> Will checkpoint slow the speed? 
>
> Thanks,
> Prasanna.


signature.asc
Description: OpenPGP digital signature


Re: Check pointing for simple pipeline

2020-07-07 Thread Yun Tang
Hi Prasanna

Using incremental checkpoint is always better than not as this is faster and 
less memory consumed.
However, incremental checkpoint is only supported by RocksDB state-backend.


Best
Yun Tang

From: Prasanna kumar 
Sent: Tuesday, July 7, 2020 20:43
To: d...@flink.apache.org ; user 
Subject: Check pointing for simple pipeline

Hi ,

I have pipeline. Source-> Map(JSON transform)-> Sink..

Both source and sink are Kafka.

What is the best checkpoint ing mechanism?

 Is setting checkpoints incremental a good option? What should be careful of?

I am running it on aws emr.

Will checkpoint slow the speed?

Thanks,
Prasanna.


Check pointing for simple pipeline

2020-07-07 Thread Prasanna kumar
Hi ,

I have pipeline. Source-> Map(JSON transform)-> Sink..

Both source and sink are Kafka.

What is the best checkpoint ing mechanism?

 Is setting checkpoints incremental a good option? What should be careful
of?

I am running it on aws emr.

Will checkpoint slow the speed?

Thanks,
Prasanna.