Yeah, if merging down to 1 segment, there’s no choice in the matter. But with the changes to TMP in 7.5, users have to explicitly use maxSegments=1 to get that behavior.
It also seems we could usefully predict how much free space we need, each merge thread does have a list of segments, can estimate the disk space needed based on the pct deleted docs in each segment etc. I can imagine aborting the merge if disk space is at some threshold free space too. Off the top of my head, I can imagine each merge thread trying to predict how much space it needs and updating some global var with that number, that way other cores in the same JVM would have something to check to prevent race conditions. But even if something like that worked to perfection, it’s still possible to have N JVMs running at the same time going at the same disk, so the problem would still be subject to race conditions. I suppose the merge process could periodically check if there was still enough free space to succeed (plus some slop of course). A lot of this is noodling. Right now I’m traveling and have some free time… I’m not ready to spend a lot of time on soon. > On Sep 13, 2019, at 2:34 PM, Michael McCandless <[email protected]> > wrote: > > I think this is worth exploring? > > Essentially, after each large merge, we'd need to 1) commit, and 2) refresh > any open readers (and close the old readers), to fully free up transient disk > usage. Maybe we could somehow track the current transient extra disk usage > of the index + open readers and once that exceeds a threshold, do something. > The "something" could even be asynchronous, e.g. maybe the next merge kicks > off, and then asynchronously your app calls commit / refresh? It could be an > event/listener API that IW invokes maybe ... > > However, the final merge (if merging to a single segment) will necessarily > consume up to 2X the index size (1X for the current index + 1X for the newly > merged segment); I don't see how to reduce that requirement for the final > merge. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Fri, Sep 13, 2019 at 12:54 PM Bram Van Dam <[email protected]> wrote: > On 02/09/2019 17:19, Erick Erickson wrote: > > 4> Don’t quite know what to do if maxSegments is 1 (or other very low > > number). > > Having maxSegments set to > 5 (or whatever) seems like an acceptable > constraint if it enables optimize without 200% disk usage. > > > Something like this would also pave the way for “background optimizing”. > > Instead of a monolithic forceMerge, I can envision a process whereby we > > created a low-level task that merged one max-sized segment at a time, came > > up for air and reopened searchers then went back in and merged the next > > one. With its own problems about coordinating ongoing updates, but that’s > > another discussion ;). > > > > There’s lots of details to work out, throwing this out for discussion. I > > can raise a JIRA if people think the idea has legs. > > Without having looked at the code, and going only on your assumptions > and my own observations: it sounds like a good idea. The idea of a > background optimizing process is particularly tantalizing. > > AFAICT there hasn't been any other feedback re this? :-/ > > - Bram --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
