Hello everyone,

We have a performance issue ever since we stopped optimizing the index. We are 
using Lucene 4.8 (jvm 32bits for searching, 64bits for indexing) on Windows 
2008R2.
Now we are letting Lucene handle the merges using the default merge policy 
(TieredMergePolicy).

We have narrowed down the issue and it seems that Lucene is making too many IO 
when there are more segments.
For instance, a search with (2 termQuery + 1 spanquery) x 6 fields made 143 IO 
calls. Now with 16 segments  we have 2432 IO calls and the search time is 
really bad.

We tried removing all the deleted docs keeping the same number of segments 
(using a custom MergePolicy) but the number of calls did not change much. So 
it's not related to the % of deleted documents. We also changed some parameters 
on the TieredMergePolicy to be more aggressive. But even with less segments (3 
or 7) the problem is still there: we have less IO calls, but it is still very 
high compared to the optimized version. The search time seems to be driven by 
the number of segments.

Tests were made without parallelism (so it has nothing to do with NIO bug in 
Windows) and the behavior is the same for several directory implementations. We 
have tested with a 64k buffer (which had better result on Windows) and the 
default one. The size of the Index is ~24gb (14millions documents). No field 
are stored, only indexed.

We were expecting some kind of overhead when searching across multiple 
segments, but not that much.

Is it normal?

Thank you,

Alessandro De Simone

Reply via email to