Before I open an issue, I would like to double-check my sanity, see if
an issue is needed.
I have noticed that the javadoc for ConcurrentMergeScheduler says that
it schedules smaller merges before larger merges. In the past, I have
seen evidence suggesting this is not actually the case, that it prefers
larger merges first.
---- background ----
When importing millions of rows from a database using Solr's dataimport
handler, the index will be merged quite frequently while that indexing
occurs. Eventually, it reaches a point where there are multiple merges
scheduled simultaneously, so the the ongoing indexing thread will be
paused until the number of merges drops below maxMergeCount.
If the smallest merge was being done first, then I don't think the
observed behavior would be what happens. What I would see happen in the
past is that when a large merge gets scheduled, indexing is paused long
enough for the database connection to time out and be disconnected, so
when the import tries to resume indexing, it can't -- the source
database connection is gone. For MySQL databases, this timeout takes
about ten minutes to happen. If the smallest merge had completed first,
the count would have decreased long before the database connection could
time out, and indexing would have resumed with no problems.
---- end background ----
The way that I have fixed this problem in the past is to increase
maxMergeCount to 6. When that's done, the incoming thread never gets
paused, and the database connection doesn't time out.
I can see that the default for maxMergeCount was changed from 2 to 6 in
2014 by LUCENE-6119. So 5.0 and later probably might not have the
problems I encountered as long as the scheduler is left at defaults ...
but I suspect that the running order of merges goes larger to smaller,
contrary to javadoc. The code is pretty dense and I haven't completely
deciphered it yet.
Thanks,
Shawn
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org