[ 
https://issues.apache.org/jira/browse/KAFKA-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105816#comment-17105816
 ] 

Guozhang Wang commented on KAFKA-3184:
--------------------------------------

Hello [~nizhikov] for many state-light applications, it is not worthy having 
persistent stores; but with in-memory stores since we do not have any 
persistent checkpoints, upon rolling upgrade or scaling events we always have 
to re-bootstrap the whole state from beginning and that's blocking the 
usefulness of in-memory stores. So when I created this ticket about 4 years 
ago, my main motivation is to make in-memory stores more attractive to be used 
for certain scenarios where your state is relatively small. Now with a lot of 
rebalance improvements we've done including KIP-441, I think just allow 
checkpointing for in-memory state stores locally may not be more interesting.

Instead, I think what [~vvcephei] was considering is, to provide a general 
checkpointing API for state stores in Streams (not only for in-memory but also 
for persistent stores), where the checkpoint location can be either local disks 
or remote storage, and here the design scope is primarily on 1) the API design 
for both checkpointing as well as loading checkpoints into the local state 
stores, 2) the mechanism of the checkpointing, e.g. whether it should be async? 
whether it should be executed on separate threads? etc. I think this is as of 
today a more appealing feature to add, and if you are interested, we should 
just create a new JIRA for it other than piggy-backing on 3184.

> Add Checkpoint for In-memory State Store
> ----------------------------------------
>
>                 Key: KAFKA-3184
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3184
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Guozhang Wang
>            Assignee: Nikolay Izhikov
>            Priority: Major
>              Labels: user-experience
>
> Currently Kafka Streams does not make a checkpoint of the persistent state 
> store upon committing, which would be expensive since it is "stopping the 
> world" and write on disks: for example, RocksDB would require you to copy the 
> file directory to make a copy naively. 
> However, for in-memory stores checkpointing maybe doable in an asynchronous 
> manner hence it can be done quickly. And the benefit of having intermediate 
> checkpoint is to avoid restoring from scratch if standby tasks are not 
> present.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to