Hi,

> we use 4.8.1. We know that the javadoc advises against it. Like I wrote, the
> deletion of old documents (that appear during an update) would be done
> while closing the writer.

This is not true. The merge policy continuously merges segments that contain 
deletions. The problem you might have is the following:
If you call forceMerge(1) for the first time, your index is reduced from a well 
distributed multi-segment index to one single, large segment. If you then apply 
deletes, they are applied against this large segment. Newly added documents are 
added to new segments. Those new segments are small, so they are merged with 
preference. The deletions in the huge single segment are very unlikely merged 
away, because Lucene only touches this segment as a large resort. So the 
problem starts when you call forceMerge for the first time!

If you don’t call forceMerge and continuously index, you deletions will be 
removed quite fast. This is especially true if the deletions are 
well-distributed over the whole index! There are tons of instances with 
Elasticsearch and Lucene doing this all the time. They never ever close their 
writer. Be sure to use TieredMergePolicy (the default), because this one 
prefers segments that have many deletions. The old LogMergePolicy does not 
respect deletes, but should no longer be used, unless you rely on a specific 
index order of your documents.

> Unfortunately we can't close the writer and we
> chose the force merge as alternative with less afford. Could
> forceMergeDeletes serve our purpose here?

It could, but has the same problem like above. The only difference to 
forceMerge is that it only merges segments which have deletions.

> I will take a look into it with lsof, but I'm pretty sure, the files will be 
> held by
> some javaprocess.
> 
> Jürgen.
> 
> Am 19.01.2015 um 13:36 schrieb Ian Lea:
> > Do you need to call forceMerge(1) at all?  The javadoc, certainly for
> > recent versions of lucene, advises against it.  What version of lucene
> > are you running?
> >
> > It might be helpful to run lsof against the index directory
> > before/during/after the merge to see what files are coming or going,
> > or if there are any marked as deleted but still present.  That would
> > imply that something, somewhere, was holding on to the files.
> >
> >
> > --
> > Ian.
> >
> >
> > On Fri, Jan 16, 2015 at 1:57 PM, Jürgen Albert
> > <j.alb...@data-in-motion.biz> wrote:
> >> Hi,
> >>
> >> because we have constant updates on our index, we can't really close
> >> the index from time to time. Therefore we decided to trigger
> >> forceMerge  when the traffic is lowest, the clean up.
> >>
> >> On our development laptops (Windows and Linux) it works as expected,
> >> but on the real Servers we have some wired behaviour.
> >>
> >> Scenario:
> >>
> >> We create a fresh index and populate it. This results in an index
> >> with a size of 2 GB. If we rigger forceMerge(1) and a commit()
> >> afterwards for this index, the index grows over the next 10 minutes
> >> to 6 GB and does not shrink back. During the whole process no reader is
> opened on the index.
> >> If I try the same stunt with the same data on my Windows Laptop, it
> >> does nothing at all and finishes after a few ms.
> >>
> >> Any Ideas?
> >>
> >> Technical details:
> >> We use an MMapDirectory and the Server is a Debian7 Kernel 3.2 in a
> >> KVM. The file system is Ext4.
> >>
> >> Thx,
> >>
> >> Jürgen Albert.
> >>
> >> --
> >> Jürgen Albert
> >> Geschäftsführer
> >>
> >> Data In Motion UG (haftungsbeschränkt)
> >>
> >> Kahlaische Str. 4
> >> 07745 Jena
> >>
> >> Mobil:  0157-72521634
> >> E-Mail: j.alb...@datainmotion.de
> >> Web: www.datainmotion.de
> >>
> >> XING:   https://www.xing.com/profile/Juergen_Albert5
> >>
> >> Rechtliches
> >>
> >> Jena HBR 507027
> >> USt-IdNr: DE274553639
> >> St.Nr.: 162/107/04586
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> 
> 
> --
> Jürgen Albert
> Geschäftsführer
> 
> Data In Motion UG (haftungsbeschränkt)
> 
> Kahlaische Str. 4
> 07745 Jena
> 
> Mobil:  0157-72521634
> E-Mail: j.alb...@datainmotion.de
> Web: www.datainmotion.de
> 
> XING:   https://www.xing.com/profile/Juergen_Albert5
> 
> Rechtliches
> 
> Jena HBR 507027
> USt-IdNr: DE274553639
> St.Nr.: 162/107/04586
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to