Re: [jira] Commented: (LUCENE-710) Implement

Robert Engels Wed, 24 Jan 2007 08:05:36 -0800

Curious, I guess I don't understand the BSD disclaimer. The application should 
not need to track any of this. The OS should be tracking open FD and locks for 
the process, and when it closes a FD on behalf of a process it should also 
remove the locks.


-----Original Message-----
>From: "Marvin Humphrey (JIRA)" <[EMAIL PROTECTED]>
>Sent: Jan 23, 2007 10:56 PM
>To: [email protected]
>Subject: [jira] Commented: (LUCENE-710) Implement "point in time" searching 
>without relying on filesystem semantics
>
>
>    [ 
> https://issues.apache.org/jira/browse/LUCENE-710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466911
>  ] 
>
>Marvin Humphrey commented on LUCENE-710:
>----------------------------------------
>
>On Jan 23, 2007, at 2:19 PM, Michael McCandless (JIRA) wrote:
>
>> First do no harm.
>
>If that was really your guiding philosophy, you would never change anything.
>
>> And Sun's Javadocs on the equivalent Java method, File.createNewFile, has a
>> warning about not relying on this for locking:
>> 
>>   http://java.sun.com/j2se/1.4.2/docs/api/java/io/File.html#createNewFile()
>
>That page recommends that you use FileLock instead, which maps to Fcntl on
>some systems.  The FreeBSD manpage on Fcntl uses less delicate language than
>Sun in pointing out the drawbacks:
>
>     This interface follows the completely stupid semantics of System V and
>     IEEE Std 1003.1-1988 (``POSIX.1'') that require that all locks associated
>     with a file for a given process are removed when any file descriptor for
>     that file is closed by that process.  This semantic means that applica-
>     tions must be aware of any files that a subroutine library may access.
>
>Trying to guarantee that kind of discipline from library code severely limits
>your options.
>
>> This warning is why we created the NativeFSLockFactory for Directory locking
>> in the first place.
>
>Take a look at this bug, which explains how that warning got added.
>
>http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4676183
>
>Read the comment below -- the problem with the "protocol" they warn you
>against using is with deleteOnExit(), not createNewFile().  I think you're
>better off with dot-locks.
>
>> OK.  You could implement this in Lucene as a custom deletion policy once we
>> get this commmitted (I think this is 6 proposals now for "deletion policy"
>> for NFS), plus a wrapper around IndexReader.
>
>This was the response I got on the KinoSearch list:
>
>    We do not enable NFS writes, only reads (which is why Slashdot is able to
>    reliably use NFS for its heavy load :-).  So I don't think that will work,
>    if I understand you correctly.
>
>Lack of bulletproof support for NFS ain't gonna hold up my next release any
>longer.  What a freakin' nightmare...
>
>> Implement "point in time" searching without relying on filesystem semantics
>> ---------------------------------------------------------------------------
>>
>>                 Key: LUCENE-710
>>                 URL: https://issues.apache.org/jira/browse/LUCENE-710
>>             Project: Lucene - Java
>>          Issue Type: Improvement
>>          Components: Index
>>    Affects Versions: 2.1
>>            Reporter: Michael McCandless
>>         Assigned To: Michael McCandless
>>            Priority: Minor
>>
>> This was touched on in recent discussion on dev list:
>>   http://www.gossamer-threads.com/lists/lucene/java-dev/41700#41700
>> and then more recently on the user list:
>>   http://www.gossamer-threads.com/lists/lucene/java-user/42088
>> Lucene's "point in time" searching currently relies on how the
>> underlying storage handles deletion files that are held open for
>> reading.
>> This is highly variable across filesystems.  For example, UNIX-like
>> filesystems usually do "close on last delete", and Windows filesystem
>> typically refuses to delete a file open for reading (so Lucene retries
>> later).  But NFS just removes the file out from under the reader, and
>> for that reason "point in time" searching doesn't work on NFS
>> (see LUCENE-673 ).
>> With the lockless commits changes (LUCENE-701 ), it's quite simple to
>> re-implement "point in time searching" so as to not rely on filesystem
>> semantics: we can just keep more than the last segments_N file (as
>> well as all files they reference).
>> This is also in keeping with the design goal of "rely on as little as
>> possible from the filesystem".  EG with lockless we no longer re-use
>> filenames (don't rely on filesystem cache being coherent) and we no
>> longer use file renaming (because on Windows it can fails).  This
>> would be another step of not relying on semantics of "deleting open
>> files".  The less we require from filesystem the more portable Lucene
>> will be!
>> Where it gets interesting is what "policy" we would then use for
>> removing segments_N files.  The policy now is "remove all but the last
>> one".  I think we would keep this policy as the default.  Then you
>> could imagine other policies:
>>   * Keep past N day's worth
>>   * Keep the last N
>>   * Keep only those in active use by a reader somewhere (note: tricky
>>     how to reliably figure this out when readers have crashed, etc.)
>>   * Keep those "marked" as rollback points by some transaction, or
>>     marked explicitly as a "snaphshot".
>>   * Or, roll your own: the "policy" would be an interface or abstract
>>     class and you could make your own implementation.
>> I think for this issue we could just create the framework
>> (interface/abstract class for "policy" and invoke it from
>> IndexFileDeleter) and then implement the current policy (delete all
>> but most recent segments_N) as the default policy.
>> In separate issue(s) we could then create the above more interesting
>> policies.
>> I think there are some important advantages to doing this:
>>   * "Point in time" searching would work on NFS (it doesn't now
>>     because NFS doesn't do "delete on last close"; see LUCENE-673 )
>>     and any other Directory implementations that don't work
>>     currently.
>>   * Transactional semantics become a possibility: you can set a
>>     snapshot, do a bunch of stuff to your index, and then rollback to
>>     the snapshot at a later time.
>>   * If a reader crashes or machine gets rebooted, etc, it could choose
>>     to re-open the snapshot it had previously been using, whereas now
>>     the reader must always switch to the last commit point.
>>   * Searchers could search the same snapshot for follow-on actions.
>>     Meaning, user does search, then next page, drill down (Solr),
>>     drill up, etc.  These are each separate trips to the server and if
>>     searcher has been re-opened, user can get inconsistent results (=
>>     lost trust).  But with, one series of search interactions could
>>     explicitly stay on the snapshot it had started with.
>
>-- 
>This message is automatically generated by JIRA.
>-
>You can reply to this email to add a comment to the issue online.
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Commented: (LUCENE-710) Implement

Reply via email to