Hi,

Am 19.01.2015 um 14:13 schrieb Uwe Schindler:
Hi,

we use 4.8.1. We know that the javadoc advises against it. Like I wrote, the
deletion of old documents (that appear during an update) would be done
while closing the writer.
This is not true. The merge policy continuously merges segments that contain 
deletions. The problem you might have is the following:
If you call forceMerge(1) for the first time, your index is reduced from a well 
distributed multi-segment index to one single, large segment. If you then apply 
deletes, they are applied against this large segment. Newly added documents are 
added to new segments. Those new segments are small, so they are merged with 
preference. The deletions in the huge single segment are very unlikely merged 
away, because Lucene only touches this segment as a large resort. So the 
problem starts when you call forceMerge for the first time!

If you don’t call forceMerge and continuously index, you deletions will be 
removed quite fast. This is especially true if the deletions are 
well-distributed over the whole index! There are tons of instances with 
Elasticsearch and Lucene doing this all the time. They never ever close their 
writer. Be sure to use TieredMergePolicy (the default), because this one 
prefers segments that have many deletions. The old LogMergePolicy does not 
respect deletes, but should no longer be used, unless you rely on a specific 
index order of your documents.
We use the default, which is the TieredMergePolicy as far as I can see. If what you write is true, I wonder why our index started growing in the first place. We have 2 indices, where the bigger one receives an update on every document every couple of days and a smaller one where every document is updated randomly over a period of roughly 3 minutes. After a couple of days, the indices became 12 GB each (the bigger one started with 2 GB and the smaller one with a couple of Megabytes). This should not happen if the MergePolicy works as intended. Can unclosed readers cause such a problem. We use a SearchManager to avoid this, but there can always be the possibility.

On the other hand we have the case I initially described. We have a fresh index, that we populate. No reader is opened and no additional updates have been made. Therefore I see no reason why forceMerge triples the size of the index at all.
Unfortunately we can't close the writer and we
chose the force merge as alternative with less afford. Could
forceMergeDeletes serve our purpose here?
It could, but has the same problem like above. The only difference to 
forceMerge is that it only merges segments which have deletions.

I will take a look into it with lsof, but I'm pretty sure, the files will be 
held by
some javaprocess.

Jürgen.

Am 19.01.2015 um 13:36 schrieb Ian Lea:
Do you need to call forceMerge(1) at all?  The javadoc, certainly for
recent versions of lucene, advises against it.  What version of lucene
are you running?

It might be helpful to run lsof against the index directory
before/during/after the merge to see what files are coming or going,
or if there are any marked as deleted but still present.  That would
imply that something, somewhere, was holding on to the files.


--
Ian.


On Fri, Jan 16, 2015 at 1:57 PM, Jürgen Albert
<j.alb...@data-in-motion.biz> wrote:
Hi,

because we have constant updates on our index, we can't really close
the index from time to time. Therefore we decided to trigger
forceMerge  when the traffic is lowest, the clean up.

On our development laptops (Windows and Linux) it works as expected,
but on the real Servers we have some wired behaviour.

Scenario:

We create a fresh index and populate it. This results in an index
with a size of 2 GB. If we rigger forceMerge(1) and a commit()
afterwards for this index, the index grows over the next 10 minutes
to 6 GB and does not shrink back. During the whole process no reader is
opened on the index.
If I try the same stunt with the same data on my Windows Laptop, it
does nothing at all and finishes after a few ms.

Any Ideas?

Technical details:
We use an MMapDirectory and the Server is a Debian7 Kernel 3.2 in a
KVM. The file system is Ext4.

Thx,

Jürgen Albert.

--
Jürgen Albert
Geschäftsführer

Data In Motion UG (haftungsbeschränkt)

Kahlaische Str. 4
07745 Jena

Mobil:  0157-72521634
E-Mail: j.alb...@datainmotion.de
Web: www.datainmotion.de

XING:   https://www.xing.com/profile/Juergen_Albert5

Rechtliches

Jena HBR 507027
USt-IdNr: DE274553639
St.Nr.: 162/107/04586


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


--
Jürgen Albert
Geschäftsführer

Data In Motion UG (haftungsbeschränkt)

Kahlaische Str. 4
07745 Jena

Mobil:  0157-72521634
E-Mail: j.alb...@datainmotion.de
Web: www.datainmotion.de

XING:   https://www.xing.com/profile/Juergen_Albert5

Rechtliches

Jena HBR 507027
USt-IdNr: DE274553639
St.Nr.: 162/107/04586


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



--
Jürgen Albert
Geschäftsführer

Data In Motion UG (haftungsbeschränkt)

Kahlaische Str. 4
07745 Jena

Mobil:  0157-72521634
E-Mail: j.alb...@datainmotion.de
Web: www.datainmotion.de

XING:   https://www.xing.com/profile/Juergen_Albert5

Rechtliches

Jena HBR 507027
USt-IdNr: DE274553639
St.Nr.: 162/107/04586


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to