Duh...it's supposed to be setMergeFactor(). Thanks
Scott -----Original Message----- From: Simon Willnauer [mailto:simon.willna...@gmail.com] Sent: Wednesday, March 20, 2013 3:53 PM To: java-user@lucene.apache.org Subject: Re: Lucene slow performance -- still broke quick question, why on earth do you set: lbsm.setMaxMergeDocs(10); if you have 10 docs in a segment you don't want to merge anymore? I don't think you should set this at all. simon On Wed, Mar 20, 2013 at 10:48 PM, Scott Smith <ssm...@mainstreamdata.com> wrote: > First, I decided I wasn't comfortable doing closes on the IndexReader. So, I > did what I hope is better. I create a singleton SearcherManager > (out-of-the-box from the 4.1 release) and do acquire/releases. I assume > that's more or less equivalent anyway. > > Second, it doesn't really matter as I am still seeing the same slow searches. > I'm becoming convinced that the problem is in the indexer (see below for > why). > > So, briefly, there are two parts to my use of lucene (all running on > Windows). The first part is a windows service that does the indexing. It > reads a directory which has new items to be indexed. The indexing it does is > totally serialized (meaning there are not multiple threads). It completely > indexes one document before it moves onto the next. Even at that, I'm > averaging about 14 ms per document on a fairly old machine. Each document is > an xml file and averages about 4k bytes. > > The searching happens in a tomcat web server. Obviously, there may be > multiple simultaneous searches. > > Here's what I did today. I did a full reindex (all the documents are in > directories which I can walk on the local hard drive). There were roughly > 600k documents. The reindex is a separate program which simply does the > reindex and quits. It opens the index, indexes all of the files (no > commits), does a forceMerge, and then closes the writer (which I assume > forces a commit). Neither the web server nor the index service were running > while the reindex was going on (i.e., I don't think there was anything > touching the index other than the reindex program itself). The last thing > the indexer does before closing the index is do a forceMerge(2). Here's what > the index directory looked like after the reindex completed (the value in > parentheses is the total bytes for those files). > > 61 CFE (17.7KB) > 61 CFS (2.09GB****) > 61 si (16.9KB) > 42 DEL (23.1KB > 10 FDT (32.2 MB > 10 FDX(12.8KB) > 10 FRM 11.1KB > 10 pos (157MB) > 10 tim ( 28.7MB) > 10 tip (582KB) > 10 tvd (254kb) > 10 tvf (232 MB) > 10 tvx (2MB) > 10 doc (62.5MB) > 1 segment_1 (2KB) > 1 segments.gen (1KB) > > So, 377 files for a total 2.6GB and most of it in the CFS files. > > I then restarted the windows service. Since then (about 2 hours), there are > now 82 CFS files. 51 of them range from 29.8 to 51.2 MB each (2.09GB total). > > So, I'm pretty convinced the issue is in the indexing since I still haven't > done any searching yet. > > The index writer is initialized as follows: > FSDirectory dir = FSDirectory.open(new File(indexDirectory)); > IndexWriterConfig iwc = new > IndexWriterConfig(Constants.LUCENE_VERSION, > oAnalyzer); > > LogByteSizeMergePolicy lbsm = new LogByteSizeMergePolicy(); > lbsm.setMaxMergeDocs(10); > lbsm.setUseCompoundFile(true); > iwc.setMergePolicy(lbsm); > > _oWriter = new IndexWriter(dir, iwc); > > But I also notice that I added the following. The intent was to have the > writer flush the buffer when it had indexed enough documents to reach 50MB > (an arbitrary number I picked out of the air because it felt right :-) ). It > seems odd to me that the maximum size of the CFS files is also about 50 MB. > So, I'm wondering if this affects the writer's ability to merge files. > > // don't flush based on number of documents > // flush based on buffer size > > _oWriter.getConfig().setMaxBufferedDocs(IndexWriterConfig.DISABLE_AUTO_FLUSH) > .setRAMBufferSizeMB(50.0); > > Any help in figuring out what is causing this problem would be appreciated. > I do now have an offline system that I can play with so I can do some > intrusive things if need be. > > Scott > > > > > -----Original Message----- > From: Scott Smith [mailto:ssm...@mainstreamdata.com] > Sent: Saturday, March 16, 2013 1:28 PM > To: java-user@lucene.apache.org > Subject: RE: Lucene slow performance > > Thanks for the help. > > The reindex was done this morning and searches now take less than a second. > > I will make the change to the code. > > Cheers > > Scott > > -----Original Message----- > From: Uwe Schindler [mailto:u...@thetaphi.de] > Sent: Friday, March 15, 2013 11:17 PM > To: java-user@lucene.apache.org > Subject: RE: Lucene slow performance > > Please forceMerge only one time not every time (only to clean up your index)! > If you are doing a reindex already, just fix your close logic as discussed > before. > > > > Scott Smith <ssm...@mainstreamdata.com> schrieb: > >>Unfortunately, this is a production system which I can't touch (though >>I was able to get a full reindex scheduled for tomorrow morning). >> >>Are you suggesting that I do: >> >>writer.forceMerge(1); >>writer.close(); >> >>instead of just doing the close()? >> >>-----Original Message----- >>From: Simon Willnauer [mailto:simon.willna...@gmail.com] >>Sent: Friday, March 15, 2013 5:08 PM >>To: java-user@lucene.apache.org >>Subject: Re: Lucene slow performance >> >>On Sat, Mar 16, 2013 at 12:02 AM, Scott Smith >><ssm...@mainstreamdata.com> wrote: >>> " Do you always close IndexWriter after adding few documents and >>> when >>closing, disable "wait for merge"? In that case, all merges are >>interrupted and the merge policy never has a chance to merge at all >>(because you are opening and closing IndexWriter all the time with >>cancelling all merges)?" >>> >>> Frankly I don't quite understand what this means. When I "close" >>> the >>indexwriter, I simply call close(). Is that the wrong thing? >>that should be fine... >> >>this sounds very odd though, do you see file that get actually removed >>/ merged if you call IndexWriter#forceMerge(1) >> >>simon >>> >>> Thanks >>> >>> Scott >>> >>> -----Original Message----- >>> From: Uwe Schindler [mailto:u...@thetaphi.de] >>> Sent: Friday, March 15, 2013 4:49 PM >>> To: java-user@lucene.apache.org >>> Subject: RE: Lucene slow performance >>> >>> Hi, >>> >>> with standard configuartion, this cannot happen. What merge policy >>> do >>you use? This looks to me like a misconfigured merge policy or using >>the NoMergePolicy. With 3,000 segments, it will be slow, the question >>is, why do you get those? >>> >>> Another thing could be: Do you always close IndexWriter after adding >>few documents and when closing, disable "wait for merge"? In that >>case, all merges are interrupted and the merge policy never has a >>chance to merge at all (because you are opening and closing >>IndexWriter all the time with cancelling all merges)? >>> >>> Uwe >>> >>> ----- >>> Uwe Schindler >>> H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de >>> eMail: u...@thetaphi.de >>> >>>> -----Original Message----- >>>> From: Scott Smith [mailto:ssm...@mainstreamdata.com] >>>> Sent: Friday, March 15, 2013 11:15 PM >>>> To: java-user@lucene.apache.org >>>> Subject: Lucene slow performance >>>> >>>> We have a system that is using lucene and the searches are very >>slow. >>>> The number of documents is fairly small (less than 30,000) and each >>>> document is typically only 2 to 10 kilo-characters. Yet, searches >>are taking 15-16 seconds. >>>> >>>> One of the things I noticed was that the index directory has >>>> several >> >>>> thousand >>>> (3000+) .cfs files. We do optimize the index once per day. This >>>> is >> >>>> a system that probably gets several thousand document deletes and >>>> additions per day (spread out across the day). >>>> >>>> Any thoughts. We didn't really notice this until we went to 4.x. >>>> >>>> Scott >>>> >>> >>> >>> >>> -------------------------------------------------------------------- >>> - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> >>--------------------------------------------------------------------- >>To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >>--------------------------------------------------------------------- >>To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- > Uwe Schindler > H.-H.-Meier-Allee 63, 28213 Bremen > http://www.thetaphi.de > B KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB > [ X ܚX K K[XZ[ > ] K]\ \ ][ X ܚX P X [ K \ X K ܙ B ܈ Y ] [ۘ[ [X[ K[XZ[ > ] K]\ \ Z [ X [ K \ X K ܙ B B --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org