OK, got your concern now. Right, when docs are deleted they are only
marked as deleted, the actual data is _not_ purged (yet).

As you add more documents to your index, segments will get merged as
part of normal processing. When segments are merged, the deleted data
is expunged. So if you're continually adding docs to your index,
you'll see the number of deleted documents disappear over time.

You can use expungeDeletes to get rid of just the deletes as well.
see: https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching

Here's  a wonderful video of this process, the third animation is the
default TieredMergePolicy:

http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

Do be aware that if you optimize, you'll be left with very large
segments (possibly one). Since TieredMergePolicy tries to merge
segments of "like size", the large segments will accumulate lots of
deletes before being merged. So if at all possible just let
Solr/Lucene take care of it.

Best,
Erick


On Thu, Jun 8, 2017 at 12:31 AM, Ludovic Bertin
<l.ber...@lombardodier.com> wrote:
> Thanks Erick for your answer, we have huge index 700Gb, 350 millions of 
> documents
> We had a case of log flooding due to a bug in an application, that generate 
> 100 000 000 documents, so we have deleted them, but there is no impact on 
> indexSize without optimize.
> I think it's normal, true ?
>
> Thanks
> Ludovic
>
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: mercredi, 7 juin 2017 17:57
> To: java-user <java-user@lucene.apache.org>
> Subject: Re: Lucene-6.2.1 -> impact of document removal on performance and 
> index size
>
> Try optimizing and measure your performance? Some anecdotal reports
> are 5%-10%. Some higher. Some lower.
>
> About the only time I recommend optimizing is if you have a relatively
> static index, i.e. one that's say updated once/day. For
> continuously-changing indexes I generally don't recommend optimizing
> for a variety of reasons.
>
> Best,
> Erick
>
> On Wed, Jun 7, 2017 at 2:01 AM, Ludovic Bertin
> <l.ber...@lombardodier.com> wrote:
>> Hello Guys,
>>
>> What is the true impact of document removal on performance and index size, 
>> without optimizing index ?
>>
>> Thnaks in advance for your answers
>> Ludovic
>> [[ rethink everything. ]]<http://www.lombardodier.com>
>>
>> DISCLAIMER **********************************************
>> This message is intended only for use by the person to
>> whom it is addressed. It may contain information that is
>> privileged and confidential. Its content does not constitute
>> a formal commitment by Bank Lombard Odier & Co Ltd
>> or any of its branches or affiliates. If you are not the
>> intended recipient of this message, kindly notify the sender
>> immediately and destroy this message. Thank You.
>> ***************************************************************
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
> [[ rethink everything. ]]<http://www.lombardodier.com>
>
> DISCLAIMER **********************************************
> This message is intended only for use by the person to
> whom it is addressed. It may contain information that is
> privileged and confidential. Its content does not constitute
> a formal commitment by Bank Lombard Odier & Co Ltd
> or any of its branches or affiliates. If you are not the
> intended recipient of this message, kindly notify the sender
> immediately and destroy this message. Thank You.
> ***************************************************************
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to