Before I open an issue, I would like to double-check my sanity, see if an issue is needed.

I have noticed that the javadoc for ConcurrentMergeScheduler says that it schedules smaller merges before larger merges.  In the past, I have seen evidence suggesting this is not actually the case, that it prefers larger merges first.

---- background ----

When importing millions of rows from a database using Solr's dataimport handler, the index will be merged quite frequently while that indexing occurs.  Eventually, it reaches a point where there are multiple merges scheduled simultaneously, so the the ongoing indexing thread will be paused until the number of merges drops below maxMergeCount.

If the smallest merge was being done first, then I don't think the observed behavior would be what happens.  What I would see happen in the past is that when a large merge gets scheduled, indexing is paused long enough for the database connection to time out and be disconnected, so when the import tries to resume indexing, it can't -- the source database connection is gone.  For MySQL databases, this timeout takes about ten minutes to happen. If the smallest merge had completed first, the count would have decreased long before the database connection could time out, and indexing would have resumed with no problems.

---- end background ----

The way that I have fixed this problem in the past is to increase maxMergeCount to 6.  When that's done, the incoming thread never gets paused, and the database connection doesn't time out.

I can see that the default for maxMergeCount was changed from 2 to 6 in 2014 by LUCENE-6119.  So 5.0 and later probably might not have the problems I encountered as long as the scheduler is left at defaults ... but I suspect that the running order of merges goes larger to smaller, contrary to javadoc.  The code is pretty dense and I haven't completely deciphered it yet.

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to