On Tue, Apr 2, 2013 at 2:29 PM, Igor Shalyminov <ishalymi...@yandex-team.ru> wrote: > Hello!
Hi Igor, > I have a ~20GB index and try to make a concurrent search over it. > The index has 16 segments, I run SpanQuery.getSpans() on each segment > concurrently. > I see really small performance improvement of searching concurrently. I > suppose, the reason is that the sizes of the segments are very non-uniform (3 > segments have ~20 000 docs each, and the others have less than 1 000 each). > How to make more uniformly sized segments (I now use just > writer.forceMerge(16)), and are multiple index segments the most important > thing in Lucene concurrency? Segments have non uniform sizes by design. A segment is generated every time a flush happens (when the ram buffer is full or if you explicitely call commit). When there are two many segments, Lucene merges some of them while new segments keep being generated as you add data. So the "flush" segments will always be small while segments resulting from a merge will be much larger since they contain data from several other segments. Even if segments are collected concurrently, IndexSearcher needs to merge the results of the collection of each segments in the end. Since your segments are very small (20000 docs), maybe the cost of initialization/merge is not negligible compared to single-segment collection. -- Adrien --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org