RE: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

Steven Parkes Thu, 22 Mar 2007 11:52:01 -0800

  * Merge policy has problems when you "flush by RAM" (this is true
    even before my patch).  Not sure how to fix yet.


Do you mean where one would be trying to use RAM usage to determine when
to do a flush? 

-----Original Message-----
From: Michael McCandless (JIRA) [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 22, 2007 10:09 AM
To: [email protected]
Subject: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM
to buffer added documents


     [
https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira
.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-843:
--------------------------------------

    Attachment: LUCENE-843.patch

I'm attaching a patch with my current state.  NOTE: this is very rough
and very much a work in progress and nowhere near ready to commit!  I
wanted to get it out there sooner rather than later to get feedback,
maybe entice some daring early adopters, iterate, etc.

It passes all unit tests except the disk-full tests.

There are some big issues yet to resolve:

  * Merge policy has problems when you "flush by RAM" (this is true
    even before my patch).  Not sure how to fix yet.

  * Thread safety and thread concurrency aren't there yet.

  * Norms are not flushed (just use up RAM until you close the
    writer).

  * Many other things on my TODO list :)



> improve how IndexWriter uses RAM to buffer added documents
> ----------------------------------------------------------
>
>                 Key: LUCENE-843
>                 URL: https://issues.apache.org/jira/browse/LUCENE-843
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.2
>            Reporter: Michael McCandless
>         Assigned To: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-843.patch
>
>
> I'm working on a new class (MultiDocumentWriter) that writes more than
> one document directly into a single Lucene segment, more efficiently
> than the current approach.
> This only affects the creation of an initial segment from added
> documents.  I haven't changed anything after that, eg how segments are
> merged.
> The basic ideas are:
>   * Write stored fields and term vectors directly to disk (don't
>     use up RAM for these).
>   * Gather posting lists & term infos in RAM, but periodically do
>     in-RAM merges.  Once RAM is full, flush buffers to disk (and
>     merge them later when it's time to make a real segment).
>   * Recycle objects/buffers to reduce time/stress in GC.
>   * Other various optimizations.
> Some of these changes are similar to how KinoSearch builds a segment.
> But, I haven't made any changes to Lucene's file format nor added
> requirements for a global fields schema.
> So far the only externally visible change is a new method
> "setRAMBufferSize" in IndexWriter (and setMaxBufferedDocs is
> deprecated) so that it flushes according to RAM usage and not a fixed
> number documents added.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

Reply via email to