Well, first make sure that you set ramBufferSizeMB to well below the max
Java heap size, otherwise you could run into OOMs.

While a larger RAM buffer may speed up indexing (since it flushes less
often to disk), it's not the only factor that affects indexing speed.

For instance, if a big portion of your indexing work is reading the files
from a slow storage device (maybe NFS share, remote Http etc.), then that
could easily shadow any benefits of using large RAM buffer.

Also, do you index with a single or multiple threads? Lucene supports
multi-threaded indexing, and it's recommended to do whenever you can, e.g.
when you run on a sufficiently strong HW (4+ cores...).

Another thing, in the past I noticed that too long RAM buffers did not
improve indexing at all e.g. if your underlying IO system is slow (e.g.
indexing to an NFS share, distributed file-system etc.), then the cost of
flushing a big RAM buffer became significant, more than indexing in RAM,
and e.g. I did not observe improvements when using ramBufferSizeMB=512 vs
128. Also, using a big RAM buffer uses more space on the heap, and makes
the job of the GC harder. So I think it might be that a too big RAM buffer
may actually slow things down, rather than speed up.

Indexing speed is affected by multiple parameters, the RAM buffer is only
one of them...

Shai


On Wed, May 14, 2014 at 4:33 PM, Gudrun Siedersleben <
siedersle...@mpdl.mpg.de> wrote:

> Hi all,
>
> we want to speed up building our lucene index.  We set ramBufferSize to
> some values between 32 and 128 MB, but that does not make any difference
> concerning the time used for reindexing. We did not set maxBufferedDocs, ..
> which could conflict.
> We start the JVM with the following JAVA_OPTS:
>
> -Xms128m -Xmx512m -XX:MaxPermSize=256m
>
> What is the recommended value for ramBufferSizeMB depending on JAVA_OPTS
> and perhaps other lucene parameters set? We use Lucene 3.6.0.
>
> Best regards
>
> Gudrun
>
>
>

Reply via email to