RE: Lucene slow performance -- still broke

Scott Smith Wed, 20 Mar 2013 14:49:07 -0700

First, I decided I wasn't comfortable doing closes on the IndexReader.  So, I 
did what I hope is better.  I create a singleton SearcherManager 
(out-of-the-box from the 4.1 release) and do acquire/releases.  I assume that's 
more or less equivalent anyway.

Second, it doesn't really matter as I am still seeing the same slow searches.  
I'm becoming convinced that the problem is in the indexer (see below for why). 

 So, briefly, there are two parts to my use of lucene (all running on Windows). 
 The first part is a windows service that does the indexing.  It reads a 
directory which has new items to be indexed.  The indexing it does is totally 
serialized (meaning there are not multiple threads).  It completely indexes one 
document before it moves onto the next.  Even at that, I'm averaging about 14 
ms per document on a fairly old machine.  Each document is an xml file and 
averages about 4k bytes.

The searching happens in a tomcat web server.  Obviously, there may be multiple 
simultaneous searches.

Here's what I did today.  I did a full reindex (all the documents are in 
directories which I can walk on the local hard drive).  There were roughly 600k 
documents.  The reindex is a separate program which simply does the reindex and 
quits.  It opens the index, indexes all of the files (no commits), does a 
forceMerge, and then closes the writer (which I assume forces a commit).  
Neither the web server nor the index service were running while the reindex was 
going on (i.e., I don't think there was anything touching the index other than 
the reindex program itself).    The last thing the indexer does before closing 
the index is do a forceMerge(2).  Here's what the index directory looked like 
after the reindex completed (the value in parentheses is the total bytes for 
those files).

61 CFE (17.7KB)
61 CFS (2.09GB****)
61 si (16.9KB)
42 DEL (23.1KB
10 FDT (32.2 MB
10 FDX(12.8KB)
10 FRM 11.1KB
10 pos (157MB)
10 tim ( 28.7MB)
10 tip (582KB)
10 tvd (254kb)
10 tvf (232 MB)
10 tvx (2MB)
10 doc (62.5MB)
1 segment_1 (2KB)
1 segments.gen (1KB)

So, 377 files for a total 2.6GB and most of it in the CFS files.

I then restarted the windows service.  Since then (about 2 hours), there are 
now 82 CFS files.  51 of them range from 29.8 to 51.2 MB each (2.09GB total).

So, I'm pretty convinced the issue is in the indexing since I still haven't 
done any searching yet.

The index writer is initialized as follows:
            FSDirectory dir = FSDirectory.open(new File(indexDirectory));
            IndexWriterConfig iwc = new 
IndexWriterConfig(Constants.LUCENE_VERSION, 
                                                          oAnalyzer);

            LogByteSizeMergePolicy lbsm = new LogByteSizeMergePolicy();
            lbsm.setMaxMergeDocs(10);
            lbsm.setUseCompoundFile(true);
            iwc.setMergePolicy(lbsm);

            _oWriter = new IndexWriter(dir, iwc);

But I also notice that I added the following.  The intent was to have the 
writer flush the buffer when it had indexed enough documents to reach 50MB (an 
arbitrary number I picked out of the air because it felt right :-) ).  It seems 
odd to me that the maximum size of the CFS files is also about 50 MB.  So, I'm 
wondering if this affects the writer's ability to merge files.  

        // don't flush based on number of documents
        // flush based on buffer size

_oWriter.getConfig().setMaxBufferedDocs(IndexWriterConfig.DISABLE_AUTO_FLUSH)
                          .setRAMBufferSizeMB(50.0);

Any help in figuring out what is causing this problem would be appreciated.  I 
do now have an offline system that I can play with so I can do some intrusive 
things if need be.

Scott

-----Original Message-----
From: Scott Smith [mailto:ssm...@mainstreamdata.com] 
Sent: Saturday, March 16, 2013 1:28 PM
To: java-user@lucene.apache.org
Subject: RE: Lucene slow performance

Thanks for the help.

The reindex was done this morning and searches now take less than a second.

I will make the change to the code.

Cheers

Scott

-----Original Message-----
From: Uwe Schindler [mailto:u...@thetaphi.de]
Sent: Friday, March 15, 2013 11:17 PM
To: java-user@lucene.apache.org
Subject: RE: Lucene slow performance

Please forceMerge only one time not every time (only to clean up your index)! 
If you are doing a reindex already, just fix your close logic as discussed 
before. 

Scott Smith <ssm...@mainstreamdata.com> schrieb:

>Unfortunately, this is a production system which I can't touch (though 
>I was able to get a full reindex scheduled for tomorrow morning).
>
>Are you suggesting that I do:
>
>writer.forceMerge(1);
>writer.close();
>
>instead of just doing the close()?
>
>-----Original Message-----
>From: Simon Willnauer [mailto:simon.willna...@gmail.com]
>Sent: Friday, March 15, 2013 5:08 PM
>To: java-user@lucene.apache.org
>Subject: Re: Lucene slow performance
>
>On Sat, Mar 16, 2013 at 12:02 AM, Scott Smith 
><ssm...@mainstreamdata.com> wrote:
>> " Do you always close IndexWriter after adding few documents and when
>closing, disable "wait for merge"? In that case, all merges are 
>interrupted and the merge policy never has a chance to merge at all 
>(because you are opening and closing IndexWriter all the time with 
>cancelling all merges)?"
>>
>> Frankly I don't quite understand what this means.  When I "close" the
>indexwriter, I simply call close().  Is that the wrong thing?
>that should be fine...
>
>this sounds very odd though, do you see file that get actually removed 
>/ merged if you call IndexWriter#forceMerge(1)
>
>simon
>>
>> Thanks
>>
>> Scott
>>
>> -----Original Message-----
>> From: Uwe Schindler [mailto:u...@thetaphi.de]
>> Sent: Friday, March 15, 2013 4:49 PM
>> To: java-user@lucene.apache.org
>> Subject: RE: Lucene slow performance
>>
>> Hi,
>>
>> with standard configuartion, this cannot happen. What merge policy do
>you use? This looks to me like a misconfigured merge policy or using 
>the NoMergePolicy. With 3,000 segments, it will be slow, the question 
>is, why do you get those?
>>
>> Another thing could be: Do you always close IndexWriter after adding
>few documents and when closing, disable "wait for merge"? In that case, 
>all merges are interrupted and the merge policy never has a chance to 
>merge at all (because you are opening and closing IndexWriter all the 
>time with cancelling all merges)?
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: u...@thetaphi.de
>>
>>> -----Original Message-----
>>> From: Scott Smith [mailto:ssm...@mainstreamdata.com]
>>> Sent: Friday, March 15, 2013 11:15 PM
>>> To: java-user@lucene.apache.org
>>> Subject: Lucene slow performance
>>>
>>> We have a system that is using lucene and the searches are very
>slow.
>>> The number of documents is fairly small (less than 30,000) and each 
>>> document is typically only 2 to 10 kilo-characters.  Yet, searches
>are taking 15-16 seconds.
>>>
>>> One of the things I noticed was that the index directory has several
>
>>> thousand
>>> (3000+) .cfs files.  We do optimize the index once per day.  This is
>
>>> a system that probably gets several thousand document deletes and 
>>> additions per day (spread out across the day).
>>>
>>> Any thoughts.  We didn't really notice this until we went to 4.x.
>>>
>>> Scott
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>For additional commands, e-mail: java-user-h...@lucene.apache.org

--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de
B KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB  [  
X  ܚX KK[XZ[
  ] K]\ \ ][  X  ܚX PX [ K \X K ܙ B  ܈Y][ۘ[  [X[  K[XZ[
  ] K]\ \ Z[X [ K \X K ܙ B B

RE: Lucene slow performance -- still broke

Reply via email to