[ 
https://issues.apache.org/jira/browse/LUCENE-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560851#comment-13560851
 ] 

Michael McCandless commented on LUCENE-4707:
--------------------------------------------

bq. we implemented Directory on top of Cassandra,

Ahh.. ok.

NFS has the same problem, and the usual answer is to create a custom
IndexDeletionPolicy, but you're right that the IndexDeletionPolicy is
only invoked for commits, not for the snapshot that each
near-real-time reader uses ...

One possible workaround would be to track the referenced files by the
NRT readers yourself?  Ie, in your Directory impl, after opening /
closing an NRT reader, you'd call
reader.getIndexCommit().getFileNames() on each still-open NRT reader
and accumulate all of those files into a set, and then when deleteFile
is called, if the name is still in use (in the set) then throw an
IOException (IndexWriter will catch that to mean the file cannot be
deleted now and will retry later...).

                
> Track file reference kept by readers that are opened through the writer
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-4707
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4707
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 4.0
>         Environment: Mac OS X 10.8.2 and Linux 2.6.32
>            Reporter: Jessica Cheng
>
> We ran into a bug where files (mostly CFS) that are still referred to by our 
> NRT reader/searcher are deleted by IndexFileDeleter. As far as I can see from 
> the verbose logging and reading the code, it seems that the problem is the 
> creation and merging of these CFS files between hard commits. The files 
> referred to by hard commits are incRef’ed at commit checkpoints, so these 
> files won’t be deleted until they are decRef’ed when the commit is deleted 
> according to the DeletionPolicy (good). However, intermediate files that are 
> created and merged between the hard commits only have refs through the 
> regular checkpoints, so as soon as a new checkpoint no longer includes those 
> files, they are immediately deleted by the deleter. See the abridged verbose 
> log lines that illustrate this behavior:
> IW 11 [Mon Jan 21 17:30:35 PST 2013; commitScheduler]: create compound file 
> _8.cfs
> IFD 7 [Mon Jan 21 17:23:41 PST 2013; commitScheduler]: now checkpoint 
> "_0(4.0.0.2):C3_1(4.0.0.2):C7 _2(4.0.0.2):C16 _3(4.0.0.2):C21 _4(4.0.0.2):C5 
> _5(4.0.0.2):C5_6(4.0.0.2):C5 _7(4.0.0.2):C7 _8(4.0.0.2):c6" [9 segments ; 
> isCommit = false]
> IFD 7 [Mon Jan 21 17:23:41 PST 2013; commitScheduler]:   IncRef "_8.cfs": 
> pre-incr count is 0
> IFD 7 [Mon Jan 21 17:23:42 PST 2013; commitScheduler]: now checkpoint 
> "_0(4.0.0.2):C3_1(4.0.0.2):C7 _2(4.0.0.2):C16 _3(4.0.0.2):C21 _4(4.0.0.2):C5 
> _5(4.0.0.2):C5 _6(4.0.0.2):C5 _7(4.0.0.2):C7 _8(4.0.0.2):c6 _9(4.0.0.2):c6" 
> [10 segments ; isCommit = false]
> IFD 7 [Mon Jan 21 17:23:42 PST 2013; commitScheduler]:   IncRef "_8.cfs": 
> pre-incr count is 1
> IFD 7 [Mon Jan 21 17:23:42 PST 2013; commitScheduler]:   DecRef "_8.cfs": 
> pre-decr count is 2
> IFD 7 [Mon Jan 21 17:23:42 PST 2013; Lucene Merge Thread #0]: now checkpoint 
> "_b(4.0.0.2):C81" [1 segments ; isCommit = false]
> IFD 7 [Mon Jan 21 17:23:42 PST 2013; Lucene Merge Thread #0]:   DecRef 
> "_8.cfs": pre-decr count is 1
> IFD 7 [Mon Jan 21 17:23:42 PST 2013; Lucene Merge Thread #0]: delete "_8.cfs"
> With this behavior, it seems no matter how frequently we refresh the reader 
> (unless we do it at every read), we’d run into the race where the reader 
> still holds a reference to the file that’s just been deleted by the deleter. 
> My proposal is to count the file reference handed out to the NRT 
> reader/searcher when writer.getReader(boolean) is called and decRef the files 
> only when the said reader is closed.
> Please take a look and evaluate if my observations are correct and if the 
> proposal makes sense. Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to