Re: Background merge hit exception

Michael McCandless Mon, 22 Sep 2008 16:03:22 -0700

OK I found one path whereby optimize would detect that theConcurrentMergeScheduler had hit an exception while merging in a BGthread, and correctly throw an IOException back to its caller, butfail to set the root cause in that exception. I just committed it, soit should be fixed in 2.4:


    https://issues.apache.org/jira/browse/LUCENE-1397

Mike

Michael McCandless wrote:

vivek sar wrote:
Thanks Mike for the insight. I did check the stdout log and found it
was complaining of not having enough disk space. I thought we need
only x2 of the index size. Our index size is 10G (max) and we had 45G
left on that parition - should it still complain of the space?
Is there a reader open on the index while optimize is running? Thatties up potentially another 1X.
Are you certain you're closing all previously open readers?
On Linux, because the semantics is "delete on last close", it's hardto detect when you have IndexReaders still open because an "ls"won't show the deleted files, yet, they are still consuming bytes ondisk until the last open file handle is closed. You can try running"lsof" to see which files are held open, while optimize is running?
Also, if you can call IndexWriter.setInfoStream(...) for all of theoperations below, I can peak at it to try to see why it's using upso much intermediate disk space.
Some comments/questions on other issues you raised,


We have 2 threads that index the data in two different indexes and
then we merge them into a master index with following call,

  masterWriter.addIndexesNoOptimize(indices);

Once the smaller indices have merged into the master index we delete
the smaller indices.

This process runs every 5 minutes. Master Index can grow up to 10G
before we partition it - move it to other directory and start a new
master index.

Every hour we then optimize the master index using,
writer.optimize(optimizeSegment); //where optimizeSegment =10
How long does that optimize take? And what do you do with theevery-5-minutes job while optimize is running? Do you run it,anyway, sharing the same writer (ie you're callingaddIndexesNoOptimize while another thread is running the optimize)?
Here are my questions,
1) Is this process flawed in terms of performance and efficiency?What
would you recommend?
Actually I think your approach is the right approach.
2) When you say "partial optimize" what do you mean by that?
Actually, it's what you're already doing (passing 10 to optimize).This means the index just has to reduce itself to <= 10 segments,instead of the normal 1 segment for a full optimize.
Still I find that particular merge being done somewhat odd: it wasmerging 7 segments, the first of which was immense, and the final 6were tiny. It's not an efficient merge to do. Seeing theinfoStream output might help explain what led to that...
3) In Lucene 2.3 "segment merging is done in a background thread" -
how does it work, ie, how does it know which segments to merge? What
would cause this background merge exception?
The selection of segments to merge, and when, is done by theLogByteSizeMergePolicy, which you can swap out for your own mergepolicy (should not in general be necessary). Once a merge isselected, the execution of that merge is controlled byConcurrentMergeScheduler, which runs merges in background threads.You can also swap that out (eg for SerialMergeScheduler, which usesthe FG thread to merging, like Lucene used to before 2.3).
I think the background merge exception is often disk full, but ingeneral it can be anything that went wrong while merging. Suchexceptions won't corrupt your index because the merge only commitsthe changes to the index if it completes successfully.
4) Can we turn off "background merge" if I'm running the optimize
every hour in any case? How do we turn it off?
Yes: IndexWriter.setMergeScheduler(new SerialMergeScheduler()) getsyou back to the old (fg thread) way of running merges. But ingeneral this gets you worse net performance, unless you are alreadyusing multiple threads when adding documents.
Mike



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Background merge hit exception

Reply via email to