"Peter Keegan" <[EMAIL PROTECTED]> wrote:
> I did some performance comparison testing of Lucene 2.0 vs. trunk (with
> LUCENE-843). I'm seeing at least a 4X increase in indexing rate with the new
> DocumentsWriter in LUCENE-843 (still doing single-threaded indexing). Better
> yet, the total time to build the index is much shorter because I can now
> build the entire 3GB index (900K docs) in one segment in RAM (using
> FSDirectory) and flush it to disk at the end. Before, I had to build smaller
> segments (20K docs), merge after 20 segments and then optimize at the end.
Awesome :)
> The memory usage with LUCENE-843 is much lower, presumably because stored
> fields and term vectors no longer sit in RAM.
Right, not buffering the stored fields & term vectors in RAM is a big
win. In addition, the storage of the postings in RAM as a single shared
hash table using a pool of large byte[] arrays vs separate 1 KB
buffers for the files for a single segment document, also improve RAM
efficiency.
In my tests, using Europarl content with small docs (~100 terms = ~550
bytes per doc) with stored fields & term vectors enabled the RAM
efficiency is 44X better than before.
> I also observed a 20-25% gain by reusing the Field objects. Implementing my
> own Fieldable class was too complicated, so I simply extended the Field
> class (after removing final) and added 2 setter methods:
>
> public void setValue(String value) {
> this.fieldsData = value;
> }
> public void setValue(byte[] value) {
> this.fieldsData = value;
> }
>
> Since this improved performance significantly, I would vote to either add
> setters to Field or make it extendable.
OK I've opened LUCENE-963 for this & attached a patch.
> Kudos to Mike for this huge improvement!
Thanks!
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]