Hello everyone, We have a performance issue ever since we stopped optimizing the index. We are using Lucene 4.8 (jvm 32bits for searching, 64bits for indexing) on Windows 2008R2. Now we are letting Lucene handle the merges using the default merge policy (TieredMergePolicy).
We have narrowed down the issue and it seems that Lucene is making too many IO when there are more segments. For instance, a search with (2 termQuery + 1 spanquery) x 6 fields made 143 IO calls. Now with 16 segments we have 2432 IO calls and the search time is really bad. We tried removing all the deleted docs keeping the same number of segments (using a custom MergePolicy) but the number of calls did not change much. So it's not related to the % of deleted documents. We also changed some parameters on the TieredMergePolicy to be more aggressive. But even with less segments (3 or 7) the problem is still there: we have less IO calls, but it is still very high compared to the optimized version. The search time seems to be driven by the number of segments. Tests were made without parallelism (so it has nothing to do with NIO bug in Windows) and the behavior is the same for several directory implementations. We have tested with a 64k buffer (which had better result on Windows) and the default one. The size of the Index is ~24gb (14millions documents). No field are stored, only indexed. We were expecting some kind of overhead when searching across multiple segments, but not that much. Is it normal? Thank you, Alessandro De Simone