Hi Adrien and Will,

Thanks for your responses.  I work with Selva and he's busy right now with 
other things, so I'll add some more context to his question in an attempt to 
improve clarity.

The merge in question is part of our batch indexing workflow wherein we index 
new content for a given partition and then merge this new index with the big 
index of everything that was previously loaded on the given partition.  The 
increase in merge time we've seen since upgrading from 4.10 to 5.2 is on the 
order of 25%.  It varies from partition to partition, but 25% is a good 
ballpark estimate I think.  Maybe our case is non-standard, we have a large 
number of fields (> 200).

The reason we perform an index check after the merge is that this is the final 
index state that will be used for a given batch.  Since we have a 
batch-oriented workflow we are able to roll back to a previous batch if we find 
a problem with a given batch (Lucene or other problem).  However due to disk 
space constraints we can only keep a couple batches.  If our indexing workflow 
completes without errors but the index is corrupt, we may not know right away 
and we might delete the previous good batch thinking the latest batch is OK, 
which would be very bad requiring a full reload of all our content.

Checking the index prior to the merge would no doubt catch many issues, but it 
might not catch corruption that occurs during the merge step itself, so we 
implemented a check step once the index is in its final state to ensure that it 
is OK.

So, since we want to do the check post-merge, is there a way to disable the 
check during merge so we don't have to do two checks?

Thanks!

Jim 

________________________________________
From: will martin <wmartin...@gmail.com>
Sent: 29 September 2015 12:08
To: java-user@lucene.apache.org
Subject: RE: Lucene 5 : any merge performance metrics compared to 4.x?

So, if its new, it adds to pre-existing time? So it is a cost that needs to be 
understood I think.



And, I'm really curious, what happens to the result of the post merge 
checkIntegrity IFF (if and only if) there was corruption pre-merge: I mean if 
you let it merge anyway could you get a false positive for integrity?  [see the 
concept of lazy-evaluation]



These are, imo, the kinds of engineering questions Selva's post raised in my 
triage mode of the scenario.





-----Original Message-----
From: Adrien Grand [mailto:jpou...@gmail.com]
Sent: Tuesday, September 29, 2015 8:46 AM
To: java-user@lucene.apache.org
Subject: Re: Lucene 5 : any merge performance metrics compared to 4.x?



Indeed this is new but I'm a bit surprised this is the source of your issues as 
it should be much faster than the merge itself. I don't understand your 
proposal to check the index after merge: the goal is to make sure that we do 
not propagate corruptions so it's better to check the index before the merge 
starts so that we don't even try to merge if there are corruptions?



Le mar. 15 sept. 2015 à 00:40, Selva Kumar < 
<mailto:selva.kumar.at.w...@gmail.com> selva.kumar.at.w...@gmail.com> a écrit :



> it appears Lucene 5.2 index merge is running checkIntegrity on

> existing index prior to merging additional indices.

> This seems to be new.

>

> We have an existing checkIndex but this is run post index merge.

>

> Two follow up questions :

> * Is there way to turn off built-in checkIntegrity? Just for my understand.

> No plan to turn this off.

> * Is running checkIntegrity prior to index merge better than running

> post merge?

>

>

> On Mon, Sep 14, 2015 at 12:24 PM, Selva Kumar <

>  <mailto:selva.kumar.at.w...@gmail.com> selva.kumar.at.w...@gmail.com

> > wrote:

>

> > We observe some merge slowness after we migrated from 4.10 to 5.2.

> > Is this expected? Any new tunable merge parameters in Lucene 5 ?

> >

> > -Selva

> >

> >

>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to