If maxMergeCount was 2, you could get into a situation with three large merges I think; the largest would be paused, but the others could still take > 10 mins to complete. Are you sure that your observation is at odds with what the document says the scheduler is doing?
On Wed, Oct 10, 2018 at 2:28 AM Shawn Heisey <apa...@elyograg.org> wrote: > Before I open an issue, I would like to double-check my sanity, see if > an issue is needed. > > I have noticed that the javadoc for ConcurrentMergeScheduler says that > it schedules smaller merges before larger merges. In the past, I have > seen evidence suggesting this is not actually the case, that it prefers > larger merges first. > > ---- background ---- > > When importing millions of rows from a database using Solr's dataimport > handler, the index will be merged quite frequently while that indexing > occurs. Eventually, it reaches a point where there are multiple merges > scheduled simultaneously, so the the ongoing indexing thread will be > paused until the number of merges drops below maxMergeCount. > > If the smallest merge was being done first, then I don't think the > observed behavior would be what happens. What I would see happen in the > past is that when a large merge gets scheduled, indexing is paused long > enough for the database connection to time out and be disconnected, so > when the import tries to resume indexing, it can't -- the source > database connection is gone. For MySQL databases, this timeout takes > about ten minutes to happen. If the smallest merge had completed first, > the count would have decreased long before the database connection could > time out, and indexing would have resumed with no problems. > > ---- end background ---- > > The way that I have fixed this problem in the past is to increase > maxMergeCount to 6. When that's done, the incoming thread never gets > paused, and the database connection doesn't time out. > > I can see that the default for maxMergeCount was changed from 2 to 6 in > 2014 by LUCENE-6119. So 5.0 and later probably might not have the > problems I encountered as long as the scheduler is left at defaults ... > but I suspect that the running order of merges goes larger to smaller, > contrary to javadoc. The code is pretty dense and I haven't completely > deciphered it yet. > > Thanks, > Shawn > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >