[
https://issues.apache.org/jira/browse/KAFKA-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272579#comment-14272579
]
Jay Kreps commented on KAFKA-1853:
----------------------------------
The purpose of the async delete is to avoid locking for reads. A simple
approach to ensuring deletes don't interrupt reads would be a read/write lock,
but this would mean writes would end up blocking concurrent reads. Instead the
approach we use is just a timed delete--a delete involves renaming the file and
removing it from the index that serves reads, and then deleting it after a
period of time (say 30 seconds) in which all active reads have a chance to
finish.
So closing the file would break any in-progress reads which would defeat the
purpose of the async delete.
I guess there are two issues here:
1. What was the underlying cause of the rename failure, did we ever figure that
out? Is there any way to make the error more intelligible (we are pretty
hampered by the terrible java api here).
2. If rename does fail how should we handle it? Leaking the file is definitely
wrong. I think the right thing to do is likely just to do an immediate delete.
This may give errors to fetch requests in progress but they will retry.
So maybe this could look something like:
{code}
boolean renamed = segment.changeFileSuffixes("", Log.DeletedFileSuffix);
if(renamed) {
scheduler.schedule("delete-file", deleteSeg, delay =
config.fileDeleteDelayMs)
} else {
error("File rename failed, forcefully deleting file")
deleteSeg()
}
{code}
> Unsuccessful suffix rename of expired LogSegment can leak open files and also
> leave the LogSegment in an invalid state
> ----------------------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-1853
> URL: https://issues.apache.org/jira/browse/KAFKA-1853
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 0.8.1.1
> Reporter: jaikiran pai
> Fix For: 0.8.3
>
>
> As noted in this discussion in the user mailing list
> http://mail-archives.apache.org/mod_mbox/kafka-users/201501.mbox/%3C54AE3661.8080007%40gmail.com%3E
> an unsuccessful attempt at renaming the underlying files of a LogSegment can
> lead to file leaks and also leave the LogSegment in an invalid state.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)