[ 
https://issues.apache.org/jira/browse/KAFKA-16541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17836717#comment-17836717
 ] 

Jun Rao commented on KAFKA-16541:
---------------------------------

Thanks for filing the jira, [~ocadaruma] !  Since this is a regression, it 
would be useful to have this fixed in 3.8.0 and 3.7.1.

One way to fix it is to (1) change
LeaderEpochFileCache.truncateFromEnd and LeaderEpochFileCache.truncateFromStart 
to only write to memory without writing to the checkpoint file, (2) change the 
implementation of 
[renamDir|https://github.com/apache/kafka/blob/3.6.0/core/src/main/scala/kafka/log/UnifiedLog.scala#L681]
 so that it doesn't reinitialize from the file and just change the Path of the 
backing CheckpointFile.

> Potential leader epoch checkpoint file corruption on OS crash
> -------------------------------------------------------------
>
>                 Key: KAFKA-16541
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16541
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>            Reporter: Haruki Okada
>            Assignee: Haruki Okada
>            Priority: Minor
>
> Pointed out by [~junrao] on 
> [GitHub|https://github.com/apache/kafka/pull/14242#discussion_r1556161125]
> [A patch for KAFKA-15046|https://github.com/apache/kafka/pull/14242] got rid 
> of fsync of leader-epoch ckeckpoint file in some path for performance reason.
> However, since now checkpoint file is flushed to the device asynchronously by 
> OS, content would corrupt if OS suddenly crashes (e.g. by power failure, 
> kernel panic) in the middle of flush.
> Corrupted checkpoint file could prevent Kafka broker to start-up



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to