[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407157#comment-13407157 ]
Uwe Schindler commented on LUCENE-4190: --------------------------------------- Hi, I was thinking about the whole thing for longer time. My idea would limit us a bit more, but I really like Mike's proposal of fixed names. I would change the Directory class, so every method that handles or deletes files gets 2 parameters, segment name and one arbitrary codec-private file name. the directory is then responsible to create the file name, prefix with _ and so on. A custom directoy (like hbase), could use the segment name as table name and the private file name as identifier, so all segment files go into same hbase table. the diurectory would then also be responible to do a "cleanup"/"list of files", where it would only return files matching the pattern. For the index wide metdata like segments file we would then unfortunately need a special method to get indexoutput :( If we keep with current one-filename, i would make the format fixed, so it throws IOException if filename is invalid. Assert makes no sense here as it does not prevent people from doing the wrong thing. Then really nothing can create invalid files and deleting by "_[0-9a-z_]+" works and all would be happy. Alternatively, we could switch to the following: - If we create an *new* index, we enforce that listFiles returns empty list (., .. excluded, buts thats done already), otherwise we throw IOException("directory not empty"). - If there is a segment file already there, we can delete everything not allowed in an index. Uwe > IndexWriter deletes non-Lucene files > ------------------------------------ > > Key: LUCENE-4190 > URL: https://issues.apache.org/jira/browse/LUCENE-4190 > Project: Lucene - Java > Issue Type: Bug > Reporter: Michael McCandless > Assignee: Robert Muir > Fix For: 4.0, 5.0 > > Attachments: LUCENE-4190.patch, LUCENE-4190.patch > > > Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog > post: > http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html > IndexWriter will now (as of 4.0) delete all foreign files from the index > directory. We made this change because Codecs are free to write to any files > now, so the space of filenames is hard to "bound". > But if the user accidentally uses the wrong directory (eg c:/) then we will > in fact delete important stuff. > I think we can at least use some simple criteria (must start with _, maybe > must fit certain pattern eg _<base36>(_X).Y), so we are much less likely to > delete a non-Lucene file.... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org