[ 
https://issues.apache.org/jira/browse/KAFKA-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-9393:
---------------------------
    Fix Version/s:     (was: 2.7.0)
                   2.8.0

> DeleteRecords may cause extreme lock contention for large partition 
> directories
> -------------------------------------------------------------------------------
>
>                 Key: KAFKA-9393
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9393
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: Lucas Bradstreet
>            Assignee: Gardner Vickers
>            Priority: Major
>             Fix For: 2.8.0
>
>
> DeleteRecords, frequently used by KStreams triggers a 
> Log.maybeIncrementLogStartOffset call, calling 
> kafka.log.ProducerStateManager.listSnapshotFiles which calls 
> java.io.File.listFiles on the partition dir. The time taken to list this 
> directory can be extreme for partitions with many small segments (e.g 20000) 
> taking multiple seconds to finish. This causes lock contention for the log, 
> and if produce requests are also occurring for the same log can cause a 
> majority of request handler threads to become blocked waiting for the 
> DeleteRecords call to finish.
> I believe this is a problem going back to the initial implementation of the 
> transactional producer, but I need to confirm how far back it goes.
> One possible solution is to maintain a producer state snapshot aligned to the 
> log segment, and simply delete it whenever we delete a segment. This would 
> ensure that we never have to perform a directory scan.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to