[ https://issues.apache.org/jira/browse/KAFKA-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272579#comment-14272579 ]
Jay Kreps commented on KAFKA-1853: ---------------------------------- The purpose of the async delete is to avoid locking for reads. A simple approach to ensuring deletes don't interrupt reads would be a read/write lock, but this would mean writes would end up blocking concurrent reads. Instead the approach we use is just a timed delete--a delete involves renaming the file and removing it from the index that serves reads, and then deleting it after a period of time (say 30 seconds) in which all active reads have a chance to finish. So closing the file would break any in-progress reads which would defeat the purpose of the async delete. I guess there are two issues here: 1. What was the underlying cause of the rename failure, did we ever figure that out? Is there any way to make the error more intelligible (we are pretty hampered by the terrible java api here). 2. If rename does fail how should we handle it? Leaking the file is definitely wrong. I think the right thing to do is likely just to do an immediate delete. This may give errors to fetch requests in progress but they will retry. So maybe this could look something like: {code} boolean renamed = segment.changeFileSuffixes("", Log.DeletedFileSuffix); if(renamed) { scheduler.schedule("delete-file", deleteSeg, delay = config.fileDeleteDelayMs) } else { error("File rename failed, forcefully deleting file") deleteSeg() } {code} > Unsuccessful suffix rename of expired LogSegment can leak open files and also > leave the LogSegment in an invalid state > ---------------------------------------------------------------------------------------------------------------------- > > Key: KAFKA-1853 > URL: https://issues.apache.org/jira/browse/KAFKA-1853 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 0.8.1.1 > Reporter: jaikiran pai > Fix For: 0.8.3 > > > As noted in this discussion in the user mailing list > http://mail-archives.apache.org/mod_mbox/kafka-users/201501.mbox/%3C54AE3661.8080007%40gmail.com%3E > an unsuccessful attempt at renaming the underlying files of a LogSegment can > lead to file leaks and also leave the LogSegment in an invalid state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)