[ 
https://issues.apache.org/jira/browse/KAFKA-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272579#comment-14272579
 ] 

Jay Kreps commented on KAFKA-1853:
----------------------------------

The purpose of the async delete is to avoid locking for reads. A simple 
approach to ensuring deletes don't interrupt reads would be a read/write lock, 
but this would mean writes would end up blocking concurrent reads. Instead the 
approach we use is just a timed delete--a delete involves renaming the file and 
removing it from the index that serves reads, and then deleting it after a 
period of time (say 30 seconds) in which all active reads have a chance to 
finish.

So closing the file would break any in-progress reads which would defeat the 
purpose of the async delete.

I guess there are two issues here:
1. What was the underlying cause of the rename failure, did we ever figure that 
out? Is there any way to make the error more intelligible (we are pretty 
hampered by the terrible java api here).
2. If rename does fail how should we handle it? Leaking the file is definitely 
wrong. I think the right thing to do is likely just to do an immediate delete. 
This may give errors to fetch requests in progress but they will retry.

So maybe this could look something like:
{code}
boolean renamed = segment.changeFileSuffixes("", Log.DeletedFileSuffix);
if(renamed) {
    scheduler.schedule("delete-file", deleteSeg, delay = 
config.fileDeleteDelayMs)
} else {
   error("File rename failed, forcefully deleting file")
   deleteSeg()
}
{code}

> Unsuccessful suffix rename of expired LogSegment can leak open files and also 
> leave the LogSegment in an invalid state
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-1853
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1853
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8.1.1
>            Reporter: jaikiran pai
>             Fix For: 0.8.3
>
>
> As noted in this discussion in the user mailing list 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201501.mbox/%3C54AE3661.8080007%40gmail.com%3E
>  an unsuccessful attempt at renaming the underlying files of a LogSegment can 
> lead to file leaks and also leave the LogSegment in an invalid state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to