Re: How to use concurrency efficiently

Adrien Grand Tue, 02 Apr 2013 07:07:40 -0700

On Tue, Apr 2, 2013 at 2:29 PM, Igor Shalyminov
<ishalymi...@yandex-team.ru> wrote:
> Hello!


Hi Igor,

> I have a ~20GB index and try to make a concurrent search over it.
> The index has 16 segments, I run SpanQuery.getSpans() on each segment 
> concurrently.
> I see really small performance improvement of searching concurrently. I 
> suppose, the reason is that the sizes of the segments are very non-uniform (3 
> segments have ~20 000 docs each, and the others have less than 1 000 each).
> How to make more uniformly sized segments (I now use just 
> writer.forceMerge(16)), and are multiple index segments the most important 
> thing in Lucene concurrency?

Segments have non uniform sizes by design. A segment is generated
every time a flush happens (when the ram buffer is full or if you
explicitely call commit). When there are two many segments, Lucene
merges some of them while new segments keep being generated as you add
data. So the "flush" segments will always be small while segments
resulting from a merge will be much larger since they contain data
from several other segments.

Even if segments are collected concurrently, IndexSearcher needs to
merge the results of the collection of each segments in the end. Since
your segments are very small (20000 docs), maybe the cost of
initialization/merge is not negligible compared to single-segment
collection.

--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: How to use concurrency efficiently

Reply via email to