[
https://issues.apache.org/jira/browse/KAFKA-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jun Rao updated KAFKA-9393:
---------------------------
Fix Version/s: (was: 2.7.0)
2.8.0
> DeleteRecords may cause extreme lock contention for large partition
> directories
> -------------------------------------------------------------------------------
>
> Key: KAFKA-9393
> URL: https://issues.apache.org/jira/browse/KAFKA-9393
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 2.4.0
> Reporter: Lucas Bradstreet
> Assignee: Gardner Vickers
> Priority: Major
> Fix For: 2.8.0
>
>
> DeleteRecords, frequently used by KStreams triggers a
> Log.maybeIncrementLogStartOffset call, calling
> kafka.log.ProducerStateManager.listSnapshotFiles which calls
> java.io.File.listFiles on the partition dir. The time taken to list this
> directory can be extreme for partitions with many small segments (e.g 20000)
> taking multiple seconds to finish. This causes lock contention for the log,
> and if produce requests are also occurring for the same log can cause a
> majority of request handler threads to become blocked waiting for the
> DeleteRecords call to finish.
> I believe this is a problem going back to the initial implementation of the
> transactional producer, but I need to confirm how far back it goes.
> One possible solution is to maintain a producer state snapshot aligned to the
> log segment, and simply delete it whenever we delete a segment. This would
> ensure that we never have to perform a directory scan.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)