[jira] [Commented] (KAFKA-4317) RocksDB checkpoint files lost on kill -9

ASF GitHub Bot (JIRA) Thu, 11 May 2017 04:54:49 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16006290#comment-16006290
 ]


ASF GitHub Bot commented on KAFKA-4317:
---------------------------------------

GitHub user dguy opened a pull request:

    https://github.com/apache/kafka/pull/3024

    KAFKA-4317: Checkpoint state stores on commit interval

    This is a backport of https://github.com/apache/kafka/pull/2471

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dguy/kafka k4881-bp

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/3024.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3024
    
----
commit b3440c4a1c8328c9de3447c7830b3d2e59176628
Author: Damian Guy <[email protected]>
Date:   2017-02-17T22:41:28Z

    backport from trunk

----


> RocksDB checkpoint files lost on kill -9
> ----------------------------------------
>
>                 Key: KAFKA-4317
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4317
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>    Affects Versions: 0.10.0.1
>            Reporter: Greg Fodor
>            Assignee: Damian Guy
>            Priority: Critical
>              Labels: architecture, needs-kip, user-experience
>             Fix For: 0.11.0.0
>
>
> Right now, the checkpoint files for logged RocksDB stores are written during 
> a graceful shutdown, and removed upon restoration. Unfortunately this means 
> that in a scenario where the process is forcibly killed, the checkpoint files 
> are not there, so all RocksDB stores are rematerialized from scratch on the 
> next launch.
> In a way, this is good, because it simulates bootstrapping a new node (for 
> example, its a good way to see how much I/O is used to rematerialize the 
> stores) however it leads to longer recovery times when a non-graceful 
> shutdown occurs and we want to get the job up and running again.
> It seems that two possible things to consider:
> - Simply do not remove checkpoint files on restoring. This way a kill -9 will 
> result in only repeating the restoration of all the data generated in the 
> source topics since the last graceful shutdown.
> - Continually update the checkpoint files (perhaps on commit) -- this would 
> result in the least amount of overhead/latency in restarting, but the 
> additional complexity may not be worth it.
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-116%3A+Add+State+Store+Checkpoint+Interval+Configuration



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KAFKA-4317) RocksDB checkpoint files lost on kill -9

Reply via email to