Hey everyone,

I'm trying to cut the total wall-time of indexing for some fairly large
document collections on machines with a high CPU count (> 32 indexing
threads). So far my observations are:

1) I resigned from using the concurrent merge scheduler in favor of "same
thread" merging. This means the indexing thread that encounters a merge
just does it. The CMS is designed to favor concurrent searches over
indexing and it really didn't do anything I needed - in fact, I had to
disable most things it offers. I/O throttling and thread stalling are not
really practical on fast I/O in the absence of concurrent searches - you
can literally just use as many merge threads as needed to saturate the I/O.

2) It is quite frequent that everything is churning nicely until the last
few merges combine huge smaller segments and form a "long-tail" where most
cores are just idle... Here comes my question - can we execute the
individual "parts" involved in segment merging (the logic inside
SegmentMerger) in separate threads? On the surface it looks like these
steps can be done independently (even if they're executed sequentially at
the moment) but perhaps I'm missing something?

I'd like to ask before I try to tinker with it. Thanks for any feedback.

Dawid

Reply via email to