RE: Lucene slow performance -- still broke

Scott Smith Wed, 20 Mar 2013 15:19:28 -0700

Duh...it's supposed to be setMergeFactor().

Thanks


Scott

-----Original Message-----
From: Simon Willnauer [mailto:simon.willna...@gmail.com] 
Sent: Wednesday, March 20, 2013 3:53 PM
To: java-user@lucene.apache.org
Subject: Re: Lucene slow performance -- still broke

quick question,

why on earth do you set:   lbsm.setMaxMergeDocs(10);

if you have 10 docs in a segment you don't want to merge anymore? I don't think 
you should set this at all.

simon


On Wed, Mar 20, 2013 at 10:48 PM, Scott Smith <ssm...@mainstreamdata.com> wrote:
> First, I decided I wasn't comfortable doing closes on the IndexReader.  So, I 
> did what I hope is better.  I create a singleton SearcherManager 
> (out-of-the-box from the 4.1 release) and do acquire/releases.  I assume 
> that's more or less equivalent anyway.
>
> Second, it doesn't really matter as I am still seeing the same slow searches. 
>  I'm becoming convinced that the problem is in the indexer (see below for 
> why).
>
>  So, briefly, there are two parts to my use of lucene (all running on 
> Windows).  The first part is a windows service that does the indexing.  It 
> reads a directory which has new items to be indexed.  The indexing it does is 
> totally serialized (meaning there are not multiple threads).  It completely 
> indexes one document before it moves onto the next.  Even at that, I'm 
> averaging about 14 ms per document on a fairly old machine.  Each document is 
> an xml file and averages about 4k bytes.
>
> The searching happens in a tomcat web server.  Obviously, there may be 
> multiple simultaneous searches.
>
> Here's what I did today.  I did a full reindex (all the documents are in 
> directories which I can walk on the local hard drive).  There were roughly 
> 600k documents.  The reindex is a separate program which simply does the 
> reindex and quits.  It opens the index, indexes all of the files (no 
> commits), does a forceMerge, and then closes the writer (which I assume 
> forces a commit).  Neither the web server nor the index service were running 
> while the reindex was going on (i.e., I don't think there was anything 
> touching the index other than the reindex program itself).    The last thing 
> the indexer does before closing the index is do a forceMerge(2).  Here's what 
> the index directory looked like after the reindex completed (the value in 
> parentheses is the total bytes for those files).
>
> 61 CFE (17.7KB)
> 61 CFS (2.09GB****)
> 61 si (16.9KB)
> 42 DEL (23.1KB
> 10 FDT (32.2 MB
> 10 FDX(12.8KB)
> 10 FRM 11.1KB
> 10 pos (157MB)
> 10 tim ( 28.7MB)
> 10 tip (582KB)
> 10 tvd (254kb)
> 10 tvf (232 MB)
> 10 tvx (2MB)
> 10 doc (62.5MB)
> 1 segment_1 (2KB)
> 1 segments.gen (1KB)
>
> So, 377 files for a total 2.6GB and most of it in the CFS files.
>
> I then restarted the windows service.  Since then (about 2 hours), there are 
> now 82 CFS files.  51 of them range from 29.8 to 51.2 MB each (2.09GB total).
>
> So, I'm pretty convinced the issue is in the indexing since I still haven't 
> done any searching yet.
>
> The index writer is initialized as follows:
>             FSDirectory dir = FSDirectory.open(new File(indexDirectory));
>             IndexWriterConfig iwc = new 
> IndexWriterConfig(Constants.LUCENE_VERSION,
>                                                           oAnalyzer);
>
>             LogByteSizeMergePolicy lbsm = new LogByteSizeMergePolicy();
>             lbsm.setMaxMergeDocs(10);
>             lbsm.setUseCompoundFile(true);
>             iwc.setMergePolicy(lbsm);
>
>             _oWriter = new IndexWriter(dir, iwc);
>
> But I also notice that I added the following.  The intent was to have the 
> writer flush the buffer when it had indexed enough documents to reach 50MB 
> (an arbitrary number I picked out of the air because it felt right :-) ).  It 
> seems odd to me that the maximum size of the CFS files is also about 50 MB.  
> So, I'm wondering if this affects the writer's ability to merge files.
>
>         // don't flush based on number of documents
>         // flush based on buffer size
>         
> _oWriter.getConfig().setMaxBufferedDocs(IndexWriterConfig.DISABLE_AUTO_FLUSH)
>                           .setRAMBufferSizeMB(50.0);
>
> Any help in figuring out what is causing this problem would be appreciated.  
> I do now have an offline system that I can play with so I can do some 
> intrusive things if need be.
>
> Scott
>
>
>
>
> -----Original Message-----
> From: Scott Smith [mailto:ssm...@mainstreamdata.com]
> Sent: Saturday, March 16, 2013 1:28 PM
> To: java-user@lucene.apache.org
> Subject: RE: Lucene slow performance
>
> Thanks for the help.
>
> The reindex was done this morning and searches now take less than a second.
>
> I will make the change to the code.
>
> Cheers
>
> Scott
>
> -----Original Message-----
> From: Uwe Schindler [mailto:u...@thetaphi.de]
> Sent: Friday, March 15, 2013 11:17 PM
> To: java-user@lucene.apache.org
> Subject: RE: Lucene slow performance
>
> Please forceMerge only one time not every time (only to clean up your index)! 
> If you are doing a reindex already, just fix your close logic as discussed 
> before.
>
>
>
> Scott Smith <ssm...@mainstreamdata.com> schrieb:
>
>>Unfortunately, this is a production system which I can't touch (though 
>>I was able to get a full reindex scheduled for tomorrow morning).
>>
>>Are you suggesting that I do:
>>
>>writer.forceMerge(1);
>>writer.close();
>>
>>instead of just doing the close()?
>>
>>-----Original Message-----
>>From: Simon Willnauer [mailto:simon.willna...@gmail.com]
>>Sent: Friday, March 15, 2013 5:08 PM
>>To: java-user@lucene.apache.org
>>Subject: Re: Lucene slow performance
>>
>>On Sat, Mar 16, 2013 at 12:02 AM, Scott Smith 
>><ssm...@mainstreamdata.com> wrote:
>>> " Do you always close IndexWriter after adding few documents and 
>>> when
>>closing, disable "wait for merge"? In that case, all merges are 
>>interrupted and the merge policy never has a chance to merge at all 
>>(because you are opening and closing IndexWriter all the time with 
>>cancelling all merges)?"
>>>
>>> Frankly I don't quite understand what this means.  When I "close" 
>>> the
>>indexwriter, I simply call close().  Is that the wrong thing?
>>that should be fine...
>>
>>this sounds very odd though, do you see file that get actually removed 
>>/ merged if you call IndexWriter#forceMerge(1)
>>
>>simon
>>>
>>> Thanks
>>>
>>> Scott
>>>
>>> -----Original Message-----
>>> From: Uwe Schindler [mailto:u...@thetaphi.de]
>>> Sent: Friday, March 15, 2013 4:49 PM
>>> To: java-user@lucene.apache.org
>>> Subject: RE: Lucene slow performance
>>>
>>> Hi,
>>>
>>> with standard configuartion, this cannot happen. What merge policy 
>>> do
>>you use? This looks to me like a misconfigured merge policy or using 
>>the NoMergePolicy. With 3,000 segments, it will be slow, the question 
>>is, why do you get those?
>>>
>>> Another thing could be: Do you always close IndexWriter after adding
>>few documents and when closing, disable "wait for merge"? In that 
>>case, all merges are interrupted and the merge policy never has a 
>>chance to merge at all (because you are opening and closing 
>>IndexWriter all the time with cancelling all merges)?
>>>
>>> Uwe
>>>
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
>>> eMail: u...@thetaphi.de
>>>
>>>> -----Original Message-----
>>>> From: Scott Smith [mailto:ssm...@mainstreamdata.com]
>>>> Sent: Friday, March 15, 2013 11:15 PM
>>>> To: java-user@lucene.apache.org
>>>> Subject: Lucene slow performance
>>>>
>>>> We have a system that is using lucene and the searches are very
>>slow.
>>>> The number of documents is fairly small (less than 30,000) and each 
>>>> document is typically only 2 to 10 kilo-characters.  Yet, searches
>>are taking 15-16 seconds.
>>>>
>>>> One of the things I noticed was that the index directory has 
>>>> several
>>
>>>> thousand
>>>> (3000+) .cfs files.  We do optimize the index once per day.  This 
>>>> is
>>
>>>> a system that probably gets several thousand document deletes and 
>>>> additions per day (spread out across the day).
>>>>
>>>> Any thoughts.  We didn't really notice this until we went to 4.x.
>>>>
>>>> Scott
>>>>
>>>
>>>
>>>
>>> --------------------------------------------------------------------
>>> - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>For additional commands, e-mail: java-user-h...@lucene.apache.org
>
> --
> Uwe Schindler
> H.-H.-Meier-Allee 63, 28213 Bremen
> http://www.thetaphi.de
>  B KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB    
> [  X  ܚX K  K[XZ[
>    ] K]\ \ ][  X  ܚX P  X [ K \ X  K ܙ B  ܈ Y  ] [ۘ[    [X[     K[XZ[
>    ] K]\ \ Z [    X [ K \ X  K ܙ B B

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: Lucene slow performance -- still broke

Reply via email to