[ http://issues.apache.org/jira/browse/LUCENE-709?page=all ]

Chuck Williams updated LUCENE-709:
----------------------------------

    Attachment: ramDirSizeManagement.patch

I've just attached my version of this patch.  It includes a multi-threaded test 
case.  I believe it is sound.

A few notes:

  1.  Re. Yonik's comment about my synchronization scenario.  Synhronizing as 
described does resolve the issue.  No higher level synchronization is requried. 
 It doesn't matter how concurent operations on the directory are ordered or 
intereleaved, so long as any computation that does a loop sees some instance of 
the directory that corresponds to its actual content at any polnt in time.  The 
result of the loop will then be accurate for that instant.

2.  Lucene has this same syncrhonization bug today in RAMDIrectory.list().  It 
can return a list of files that never comprised the contents of the directory.  
This is fixed in the attached.

3.  Also, the long synchronization bug exists in RAMDirectory.fileModified() as 
well as RAMDIrectory.fileLength() since both are public.  These are fixed in 
the attached.

4.  I moved the synchronization off of the Hashtable (replacing it with a 
HashMap) up to the RAMDirectory as there are some operations that require 
synchronization at the directory level.  Using just one lock seems better.  As 
all Hashtable operations were already synchonized, I don't believe any material 
additional synchronization is added.

5.  Lucene currently make the assumption that if a file is being written by a 
stream then no other streams are simultaneously reading or writing it.  I've 
maintained this assumption as an optimization, allowing the streams to access 
fields directly without syncrhonization.  This is documented in the comments, 
as is the locking order.

5.  sizeInBytes is now maintained incrementally, efficiently.

6.  Yonik, your version (which I just now saw) has a bug in 
RAMDIrectory.renameFile().  The to file may already exist, in which case it is 
overwritten and it's size must be subtracted.  I actually hit this in my test 
case for my implementation and fixed it (since Lucene renames a new version of 
the segments file).

All Lucene tests, including the new test, pass.  Some contrib tests fail, I 
believe none of these failures are in any way related to this patch.




> [PATCH] Enable application-level management of IndexWriter.ramDirectory size
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-709
>                 URL: http://issues.apache.org/jira/browse/LUCENE-709
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.0.1
>         Environment: All
>            Reporter: Chuck Williams
>         Attachments: ramdir.patch, ramdir.patch, ramDirSizeManagement.patch, 
> ramDirSizeManagement.patch, ramDirSizeManagement.patch
>
>
> IndexWriter currently only supports bounding of in the in-memory index cache 
> using maxBufferedDocs, which limits it to a fixed number of documents.  When 
> document sizes vary substantially, especially when documents cannot be 
> truncated, this leads either to inefficiencies from a too-small value or 
> OutOfMemoryErrors from a too large value.
> This simple patch exposes IndexWriter.flushRamSegments(), and provides access 
> to size information about IndexWriter.ramDirectory so that an application can 
> manage this based on total number of bytes consumed by the in-memory cache, 
> thereby allow a larger number of smaller documents or a smaller number of 
> larger documents.  This can lead to much better performance while elimianting 
> the possibility of OutOfMemoryErrors.
> The actual job of managing to a size constraint, or any other constraint, is 
> left up the applicatation.
> The addition of synchronized to flushRamSegments() is only for safety of an 
> external call.  It has no significant effect on internal calls since they all 
> come from a sychronized caller.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to