[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406520#comment-13406520
 ] 

Michael McCandless commented on LUCENE-4190:
--------------------------------------------

bq. Many of the lucene+facet deployments that I know of store the taxonomy 
index as a sub-directory of the search index

We won't delete directories, just files.

bq. Also, we've been storing other files in the index directory too ... this 
new feature will affect such existing deployments.

Yeah ... better to move them elsewhere or to a sub dir?

bq.  I can assume it's related to IW not knowing which files to delete when a 
segment is no longer needed, because Codecs can pick their own file names

Right: it's easy to track the positive set (files referenced by current 
segments), what's harder is the negative set (files created in the past but no 
longer referenced).

bq. If we had an instance which kept track of all files that were created, e.g. 
every Codec would register the files there (if it wants to protect from their 
deletion), would make the decision of which files to delete easier?

In theory it would ... but this would add a fair amount of complexity (we'd 
have to save this list of files into segments_N).  In fact long ago Lucene did 
this (it had a deletable file which stored the list of files previously created 
and now to-be-deleted).
                
> IndexWriter deletes non-Lucene files
> ------------------------------------
>
>                 Key: LUCENE-4190
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4190
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Michael McCandless
>            Assignee: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-4190.patch
>
>
> Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
> post: 
> http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
> IndexWriter will now (as of 4.0) delete all foreign files from the index 
> directory.  We made this change because Codecs are free to write to any files 
> now, so the space of filenames is hard to "bound".
> But if the user accidentally uses the wrong directory (eg c:/) then we will 
> in fact delete important stuff.
> I think we can at least use some simple criteria (must start with _, maybe 
> must fit certain pattern eg _<base36>(_X).Y), so we are much less likely to 
> delete a non-Lucene file....

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to