Optimizing for frequent changes sounds like a caching strategy, maybe “LRU merging”. Perhaps prefer merging segments that have not changed in a while?
wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 1, 2017, at 5:50 AM, Tommaso Teofili <tommaso.teof...@gmail.com> wrote: > > > > Il giorno mar 1 ago 2017 alle ore 14:04 Adrien Grand <jpou...@gmail.com > <mailto:jpou...@gmail.com>> ha scritto: > The trade-off does not sound simple to me. This approach could lead to having > more segments overall, making search requests and updates potentially slower > and more I/O-intensive since they have to iterate over more segments? I'm not > saying this is a bad idea, but it could have unexpected side-effects. > > yes, that's my same concern. > > > Do you actually have a high commit rate or a high reopen rate > (DirectoryReader.open(IndexWriter))? > > in my scenario both, but commit rate is much superseding reopening. > > Maybe reopening instead of committing (and still committing, but less > frequently) would decrease the I/O load since NRT segments might never need > to be actually written to disk if they are merged before the next commit > happens and you give enough memory to the filesystem cache. > > makes sense in general, however I am a bit constrained in how much I can > avoid committing (states from an MVCC systems are tight to commits, so it's > trickier). > > In general I was wondering if we could have the merge policy look at both > commit rate and no. of segments and decide whether to merge or not based on > both, so that if the segments growth is within a threshold we possibly save > some merges when we have high commit rates, but as you say we may have to do > bigger merges then. > I can imagine this to make more sense when a lot of tiny changes are made to > the index rather than a few big ones (then the bigger merges problem should > be less significant). > > Other than my specific scenario, I am thinking that we can look again at the > current MP algorithm and see if we can improve it, or make it more flexible > to the way the "sneaky opponent" (Mike's ™ [1]) behaves. > > My 2 cents, > Tommaso > > [1] : > http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html > > <http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html> > > > Le mar. 1 août 2017 à 10:59, Tommaso Teofili <tommaso.teof...@gmail.com > <mailto:tommaso.teof...@gmail.com>> a écrit : > Hi all, > > lately I am looking a bit closer at merge policies, of course particularly at > the tiered one, and I was wondering if we can mitigate the amount of possibly > avoidable merges in high commit rates scenarios, especially when a high > percentage of the commits happens on same docs. > I've observed several evolutions of merges in such scenarios and it seemed to > me the merge policy was too aggressive in merging, causing a large IO > overhead. > I've then tried the same with a merge policy which was tentatively looking at > commit rates and skipping merges if such a rate is higher than a threshold > which seemed to give slightly better results in reducing the unneeded IO > caused by avoidable merges. > > I know this is a bit abstract but I would like to know if anyone has any > ideas or plans about mitigating the merge overhead in general and / or in > similar cases. > > Regards, > Tommaso > > >