Hey everyone, I'm trying to cut the total wall-time of indexing for some fairly large document collections on machines with a high CPU count (> 32 indexing threads). So far my observations are:
1) I resigned from using the concurrent merge scheduler in favor of "same thread" merging. This means the indexing thread that encounters a merge just does it. The CMS is designed to favor concurrent searches over indexing and it really didn't do anything I needed - in fact, I had to disable most things it offers. I/O throttling and thread stalling are not really practical on fast I/O in the absence of concurrent searches - you can literally just use as many merge threads as needed to saturate the I/O. 2) It is quite frequent that everything is churning nicely until the last few merges combine huge smaller segments and form a "long-tail" where most cores are just idle... Here comes my question - can we execute the individual "parts" involved in segment merging (the logic inside SegmentMerger) in separate threads? On the surface it looks like these steps can be done independently (even if they're executed sequentially at the moment) but perhaps I'm missing something? I'd like to ask before I try to tinker with it. Thanks for any feedback. Dawid
