[ 
https://issues.apache.org/jira/browse/KAFKA-6975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16512809#comment-16512809
 ] 

ASF GitHub Bot commented on KAFKA-6975:
---------------------------------------

apovzner opened a new pull request #5229: KAFKA-6975; Fix replica fetching from 
non-batch-aligned log start offset
URL: https://github.com/apache/kafka/pull/5229
 
 
   It is possible that log start offset may fall in the middle of the batch 
after AdminClient#deleteRecords(). This will cause a follower starting from log 
start offset to fail fetching (all records). Use-cases when a follower will 
start fetching from log start offset includes: 1) new replica due to partition 
re-assignment; 2) new local replica created as a result of 
AdminClient#AlterReplicaLogDirs(); 3) broker that was down for some time while 
AdminClient#deleteRecords() move log start offset beyond its HW.
   
   Added two integration tests:
   1) Produce and then AdminClient#deleteRecords() while one of the followers 
is down, and then restart of the follower requires fetching from log start 
offset;
   2)  AdminClient#AlterReplicaLogDirs() after AdminClient#deleteRecords()
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> AdminClient.deleteRecords() may cause replicas unable to fetch from beginning
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-6975
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6975
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 1.1.0, 1.0.1
>            Reporter: Anna Povzner
>            Assignee: Anna Povzner
>            Priority: Blocker
>             Fix For: 2.0.0
>
>
> AdminClient.deleteRecords(beforeOffset(offset)) will set log start offset to 
> the requested offset. If the requested offset is in the middle of the batch, 
> the replica will not be able to fetch from that offset (because it is in the 
> middle of the batch). 
> One use-case where this could cause problems is replica re-assignment. 
> Suppose we have a topic partition with 3 initial replicas, and at some point 
> the user issues  AdminClient.deleteRecords() for the offset that falls in the 
> middle of the batch. It now becomes log start offset for this topic 
> partition. Suppose at some later time, the user starts partition 
> re-assignment to 3 new replicas. The new replicas (followers) will start with 
> HW = 0, will try to fetch from 0, then get "out of order offset" because 0 < 
> log start offset (LSO); the follower will be able to reset offset to LSO of 
> the leader and fetch LSO; the leader will send a batch in response with base 
> offset <LSO, this will cause "out of order offset" on the follower which will 
> stop the fetcher thread. The end result is that the new replicas will not be 
> able to start fetching unless LSO moves to an offset that is not in the 
> middle of the batch, and the re-assignment will be stuck for a possibly a 
> very log time. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to