If maxMergeCount was 2, you could get into a situation with three large
merges I think; the largest would be paused, but the others could still
take > 10 mins to complete. Are you sure that your observation is at odds
with what the document says the scheduler is doing?

On Wed, Oct 10, 2018 at 2:28 AM Shawn Heisey <apa...@elyograg.org> wrote:

> Before I open an issue, I would like to double-check my sanity, see if
> an issue is needed.
>
> I have noticed that the javadoc for ConcurrentMergeScheduler says that
> it schedules smaller merges before larger merges.  In the past, I have
> seen evidence suggesting this is not actually the case, that it prefers
> larger merges first.
>
> ---- background ----
>
> When importing millions of rows from a database using Solr's dataimport
> handler, the index will be merged quite frequently while that indexing
> occurs.  Eventually, it reaches a point where there are multiple merges
> scheduled simultaneously, so the the ongoing indexing thread will be
> paused until the number of merges drops below maxMergeCount.
>
> If the smallest merge was being done first, then I don't think the
> observed behavior would be what happens.  What I would see happen in the
> past is that when a large merge gets scheduled, indexing is paused long
> enough for the database connection to time out and be disconnected, so
> when the import tries to resume indexing, it can't -- the source
> database connection is gone.  For MySQL databases, this timeout takes
> about ten minutes to happen. If the smallest merge had completed first,
> the count would have decreased long before the database connection could
> time out, and indexing would have resumed with no problems.
>
> ---- end background ----
>
> The way that I have fixed this problem in the past is to increase
> maxMergeCount to 6.  When that's done, the incoming thread never gets
> paused, and the database connection doesn't time out.
>
> I can see that the default for maxMergeCount was changed from 2 to 6 in
> 2014 by LUCENE-6119.  So 5.0 and later probably might not have the
> problems I encountered as long as the scheduler is left at defaults ...
> but I suspect that the running order of merges goes larger to smaller,
> contrary to javadoc.  The code is pretty dense and I haven't completely
> deciphered it yet.
>
> Thanks,
> Shawn
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Reply via email to