Hi, TieredMergePolicy, which is the default since around Lucene 3.2, prefers merging segments with many deletions, so forceMerge(1) is not needed.
Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Michael van Rooyen [mailto:mich...@loot.co.za] > Sent: Thursday, September 26, 2013 12:26 PM > To: java-user@lucene.apache.org > Cc: Ian Lea > Subject: Re: Lucene 4.4.0 mergeSegments OutOfMemoryError > > Yes, it happens as part of the early morning optimize, and yes, it's a > forceMerge(1) which I've disabled for now. > > I haven't looked at the persistence mechanism for Lucene since 2.x, but if I > remember correctly, the deleted documents would stay in an index segment > until that segment was eventually merged. Without forcing a merge > (optimize in old versions), the footprint on disk could be a multiple of the > actual space required for the live documents, and this would have an impact > on performance (the deleted documents would clutter the buffer cache). > > Is this still the case? I would have thought it good practice to force the > dead > space out of an index periodically, but if the underlying storage mechanism > has changed and the current index files are more efficient at housekeeping, > this may no longer be necessary. > > If someone could shed a little light on best practice for indexes where > documents are frequently updated (i.e. deleted and re-added), that would > be great. > > Michael. > > > On 2013/09/26 11:43 AM, Ian Lea wrote: > > Is this OOM happening as part of your early morning optimize or at > > some other point? By optimize do you mean IndexWriter.forceMerge(1)? > > You really shouldn't have to use that. If the index grows forever > > without it then something else is going on which you might wish to > > report separately. > > > > > > -- > > Ian. > > > > > > On Wed, Sep 25, 2013 at 12:35 PM, Michael van Rooyen > <mich...@loot.co.za> wrote: > >> We've recently upgraded to Lucene 4.4.0 and mergeSegments now > causes > >> an OOM error. > >> > >> As background, our index contains about 14 million documents (growing > >> slowly) and we process about 1 million updates per day. It's about > >> 8GB on disk. I'm not sure if the Lucene segments merge the way they > >> used to in the early versions, but we've always optimized at 3am to > >> get rid of dead space in the index, or otherwise it grows forever. > >> > >> The mergeSegments was working under 4.3.1 but the index has grown > >> somewhat on disk since then, probably due to a couple of added > >> NumericDocValues fields. The java process is assigned about 3GB (the > >> maximum, as it's running on a 32 bit i686 Linux box), and it still goes > >> OOM. > >> > >> Any advice as to the possible cause and how to circumvent it would be > great. > >> Here's the stack trace: > >> > >> org.apache.lucene.index.MergePolicy$MergeException: > >> java.lang.OutOfMemoryError: Java heap space > >> > org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeExceptio > n > >> (ConcurrentMergeScheduler.java:545) > >> > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Co > nc > >> urrentMergeScheduler.java:518) Caused by: > java.lang.OutOfMemoryError: > >> Java heap space > >> > org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNume > r > >> ic(Lucene42DocValuesProducer.java:212) > >> > org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeri > >> c(Lucene42DocValuesProducer.java:174) > >> > org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCor > eR > >> eaders.java:301) > >> > org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.j > av > >> a:253) > >> > org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.jav > a:2 > >> 15) > >> > org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119) > >> > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772 > >> ) > >> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376) > >> > org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(Concurrent > Me > >> rgeScheduler.java:405) > >> > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Co > nc > >> urrentMergeScheduler.java:482) > >> > >> > >> Thanks, > >> Michael. > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org