Thank you for your input > How much RAM does your search machine have?
We have 16GB of ram, and there is at least 8GB free memory for the OS file cache. The cache is working pretty well. > That sounds right. Although each segment is 1/16 of the full index size, the > number of seeks per segment is not 1/16: Larger indexes require relatively fewer seeks. Think binary search and log(values_in_field), although that is highly simplified. The "IO calls" I was referring to is the number of time the "BufferedIndexInput.refill()" function is called. So it means that we have 16 times more bytes read when there are 16 segments for the exact same result. I would have agreed to blame seeks if Lucene was reading more or less the same number of bytes but with worse performance. In fact, that's exactly what I was expecting. But this is not the case here. It's almost as if extracting the terms stats (or whatever metadata the segment has) is more costly than the search itself. And I'm not talking about queries with few results. > I am guessing that you are using spinning drives and that there is not much > RAM in the machine? As you can see we have a lot of RAM. Using the resource manager I see that nothing is trashing the system or swapping to disk. Lucene is just a lot slower for every query. When the query is in the OS cache, the call takes a few milisecs as expected. Alessandro De Simone -----Original Message----- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: samedi 17 mai 2014 20:04 To: java-user@lucene.apache.org Subject: RE: search time & number of segments De Simone, Alessandro [alessandro.desim...@bvdinfo.com] wrote: > We have a performance issue ever since we stopped optimizing the index. We > are using Lucene 4.8 (jvm 32bits for searching, 64bits for indexing) on > Windows 2008R2. How much RAM does your search machine have? > For instance, a search with (2 termQuery + 1 spanquery) x 6 fields made 143 > IO calls. Now with 16 segments we have 2432 IO calls and the search time is > really bad. [...] That sounds right. Although each segment is 1/16 of the full index size, the number of seeks per segment is not 1/16: Larger indexes require relatively fewer seeks. Think binary search and log(values_in_field), although that is highly simplified. > The size of the Index is ~24gb (14millions documents). No field are stored, > only indexed. Normally the penalty of running un-optimized is not that great, so it sounds like your machine cannot provide the I/O speed it needs (as opposed to having a great logistics overhead from the multiple segments). I am guessing that you are using spinning drives and that there is not much RAM in the machine? The easy solution is either to throw RAM at the problem or switch to SSD. - Toke Eskildsen --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org