[jira] [Commented] (KAFKA-17076) logEndOffset could be lost due to log cleaning

2024-07-14 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-17076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17865823#comment-17865823
 ] 

Haruki Okada commented on KAFKA-17076:
--

[~junrao] Is that possible?

At the step 2 in your scenario, I guess truncation doesn't happen unless at 
least one record is returned from Fetch response because of 
(https://github.com/apache/kafka/pull/9382), so empty active segment is not 
possible in my understanding.
refs: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-595%3A+A+Raft+Protocol+for+the+Metadata+Quorum#KIP595:ARaftProtocolfortheMetadataQuorum-Fetch

> logEndOffset could be lost due to log cleaning
> --
>
> Key: KAFKA-17076
> URL: https://issues.apache.org/jira/browse/KAFKA-17076
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Jun Rao
>Priority: Major
>
> It's possible for the log cleaner to remove all records in the suffix of the 
> log. If the partition is then reassigned, the new replica won't be able to 
> see the true logEndOffset since there is no record batch associated with it. 
> If this replica becomes the leader, it will assign an already used offset to 
> a newly produced record, which is incorrect.
>  
> It's relatively rare to trigger this issue since the active segment is never 
> cleaned and typically is not empty. However, the following is one possibility.
>  # records with offset 100-110 are produced and fully replicated to all ISR. 
> All those records are delete records for certain keys.
>  # record with offset 111 is produced. It forces the roll of a new segment in 
> broker b1 and is added to the log. The record is not committed and is later 
> truncated from the log, leaving an empty active segment in this log. b1 at 
> some point becomes the leader.
>  # log cleaner kicks in and removes records 100-110.
>  # The partition is reassigned to another broker b2. b2 replicates all 
> records from b1 up to offset 100 and marks its logEndOffset at 100. Since 
> there is no record to replicate after offset 100 in b1, b2's logEndOffset 
> stays at 100 and b2 can join the ISR.
>  # b2 becomes the leader and assign offset 100 to a new record.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-17076) logEndOffset could be lost due to log cleaning

2024-07-03 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-17076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862868#comment-17862868
 ] 

Jun Rao commented on KAFKA-17076:
-

One potential solution is to adjust the log cleaning logic such that it always 
preserves the last batch during each round of cleaning. If all records in the 
last batch are removed, we can just retain the empty batch to preserve the last 
offset. The empty batch will then be replicated to all replicas to preserve the 
true logEndOffset.

> logEndOffset could be lost due to log cleaning
> --
>
> Key: KAFKA-17076
> URL: https://issues.apache.org/jira/browse/KAFKA-17076
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Jun Rao
>Priority: Major
>
> It's possible for the log cleaner to remove all records in the suffix of the 
> log. If the partition is then reassigned, the new replica won't be able to 
> see the true logEndOffset since there is no record batch associated with it. 
> If this replica becomes the leader, it will assign an already used offset to 
> a newly produced record, which is incorrect.
>  
> It's relatively rare to trigger this issue since the active segment is never 
> cleaned and typically is not empty. However, the following is one possibility.
>  # records with offset 100-110 are produced and fully replicated to all ISR. 
> All those records are delete records for certain keys.
>  # record with offset 111 is produced. It forces the roll of a new segment in 
> broker b1 and is added to the log. The record is not committed and is later 
> truncated from the log, leaving an empty active segment in this log. b1 at 
> some point becomes the leader.
>  # log cleaner kicks in and removes records 100-110.
>  # The partition is reassigned to another broker b2. b2 replicates all 
> records from b1 up to offset 100 and marks its logEndOffset at 100. Since 
> there is no record to replicate after offset 100 in b1, b2's logEndOffset 
> stays at 100 and b2 can join the ISR.
>  # b2 becomes the leader and assign offset 100 to a new record.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)