[PR] KAFKA-15608: Assign lastet leader eopch and offset checkpoint to future log when replacing current log [kafka]

via GitHub Mon, 16 Oct 2023 02:35:28 -0700


drawxy opened a new pull request, #14553:
URL: https://github.com/apache/kafka/pull/14553


   Recently, I encountered a issue that there was always 1 partition having 
only 1 ISR (no produce traffic on this topic). The bug is related to altering 
log dir. When replacing current log with future log, broker doesn't copy the 
leader epoch checkpoint cache, which records the current leader epoch and log 
start offset. The cache for each partition is updated only when appending new 
messages or becoming leader. If there is no traffic and the replica is already 
the leader, the cache will not be updated any more. However, the partition 
leader will fetch its leader epoch from the cache and compare with the leader 
epoch sent by follower when handling fetch request. If the former one is missed 
or less than the latter one, the leader will interrupt the process and return 
an OffsetOutOfRangeException to follower. The follower might be out of sync 
over time.
   
   Take the following case as an example, all the key points are listed in 
chronological order:
   
   1. Reassigner submitted a partition reassignment for partition foo-1.
   ```
   {
       "topic": "foo",
       "partition": 1,
       "replicas": [
         5002,
         3003,
         4001
       ],
       "logDirs": [
         "\\data\\kafka-logs-1",
         "any",
         "any"
       ]
     } 
   ```
   3. Reassignment completed immediately due to there is no traffic on this 
topic.
   4. Controller sent LeaderAndISR requests to all the replicas.
   5. Newly added replica 5002 became the new leader and the current log 
updated the leader epoch offset cache. Replica 5002 successfully handled the 
LeaderAndISR request.
   6. Altering log dir completed and the newly updated current log didn't have 
leader epoch offset information.
   7. Replica 5002 handled fetch requests (include fetch offset and current 
leader epoch) from followers and returned OffsetOutOfRangeException due to 
leader epoch offset cache hadn't been updated. So, the replica 5002 couldn't 
update the fetch state for each follower and reported ISRShrink later. The 
followers 3003 and 4001 would repeatedly print the following log:
   ```
   WARN [ReplicaFetcher replicaId=4001, leaderId=5002, fetcherId=2] Reset fetch 
offset for partition foo-1 from 231196 to current leader's start offset 231196 
(kafka.server.ReplicaFetcherThread)
   INFO [ReplicaFetcher replicaId=4001, leaderId=5002, fetcherId=2] Current 
offset 231196 for partition foo-1 is out of range, which typically implies a 
leader change. Reset fetch offset to 231196 (kafka.server.ReplicaFetcherThread)
   ```
   
   This issue arises only when all the three conditions are met:
   
   1. No produce traffic on the partition.
   2. Newly added replica become new leader.
   3. LeaderAndISR request is handled successfully before altering log dir 
completed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[PR] KAFKA-15608: Assign lastet leader eopch and offset checkpoint to future log when replacing current log [kafka]

Reply via email to