[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407162#comment-13407162
 ] 

Robert Muir commented on LUCENE-4190:
-------------------------------------

{quote}
I was thinking about the whole thing for longer time. My idea would limit us a 
bit more, but I really like Mike's proposal of fixed names. I would change the 
Directory class, so every method that handles or deletes files gets 2 
parameters, segment name and one arbitrary codec-private file name. the 
directory is then responsible to create the file name, prefix with _ and so on. 
A custom directoy (like hbase), could use the segment name as table name and 
the private file name as identifier, so all segment files go into same hbase 
table. the diurectory would then also be responible to do a "cleanup"/"list of 
files", where it would only return files matching the pattern.
{quote}

I'm not sure matching _[0-9a-z_]+ is really that big of an improvement over 
just the underscore. But i dont think we need
to refactor Directory.java to do this. we could just change the underscore 
check to a regular expression.

{quote}
Assert makes no sense here as it does not prevent people from doing the wrong 
thing.
{quote}

I don't agree: i at first thought to do a hard check, but this is only really 
necessary for codec developers. So an assert
is enough, because you catch it when developing your codec (its either gonna 
work, or completely not work here).

{quote}
If we create an new index, we enforce that listFiles returns empty list (., .. 
excluded, buts thats done already), otherwise we throw IOException("directory 
not empty").
{quote}

I thought about this but i have concerns about things like .DS_Store and 
.nfsXXXXX or other files that some system could
be doing behind the scenes, etc.



                
> IndexWriter deletes non-Lucene files
> ------------------------------------
>
>                 Key: LUCENE-4190
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4190
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Michael McCandless
>            Assignee: Robert Muir
>             Fix For: 4.0, 5.0
>
>         Attachments: LUCENE-4190.patch, LUCENE-4190.patch
>
>
> Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
> post: 
> http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
> IndexWriter will now (as of 4.0) delete all foreign files from the index 
> directory.  We made this change because Codecs are free to write to any files 
> now, so the space of filenames is hard to "bound".
> But if the user accidentally uses the wrong directory (eg c:/) then we will 
> in fact delete important stuff.
> I think we can at least use some simple criteria (must start with _, maybe 
> must fit certain pattern eg _<base36>(_X).Y), so we are much less likely to 
> delete a non-Lucene file....

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to