Optimizing for frequent changes sounds like a caching strategy, maybe “LRU 
merging”. Perhaps prefer merging segments that have not changed in a while?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Aug 1, 2017, at 5:50 AM, Tommaso Teofili <tommaso.teof...@gmail.com> wrote:
> 
> 
> 
> Il giorno mar 1 ago 2017 alle ore 14:04 Adrien Grand <jpou...@gmail.com 
> <mailto:jpou...@gmail.com>> ha scritto:
> The trade-off does not sound simple to me. This approach could lead to having 
> more segments overall, making search requests and updates potentially slower 
> and more I/O-intensive since they have to iterate over more segments? I'm not 
> saying this is a bad idea, but it could have unexpected side-effects.
> 
> yes, that's my same concern.
>  
> 
> Do you actually have a high commit rate or a high reopen rate 
> (DirectoryReader.open(IndexWriter))?
> 
> in my scenario both, but commit rate is much superseding reopening. 
>  
> Maybe reopening instead of committing (and still committing, but less 
> frequently) would decrease the I/O load since NRT segments might never need 
> to be actually written to disk if they are merged before the next commit 
> happens and you give enough memory to the filesystem cache.
> 
> makes sense in general, however I am a bit constrained in how much I can 
> avoid committing (states from an MVCC systems are tight to commits, so it's 
> trickier).
> 
> In general I was wondering if we could have the merge policy look at both 
> commit rate and no. of segments and decide whether to merge or not based on 
> both, so that if the segments growth is within a threshold we possibly save 
> some merges when we have high commit rates, but as you say we may have to do 
> bigger merges then. 
> I can imagine this to make more sense when a lot of tiny changes are made to 
> the index rather than a few big ones (then the bigger merges problem should 
> be less significant).
> 
> Other than my specific scenario, I am thinking that we can look again at the 
> current MP algorithm and see if we can improve it, or make it more flexible 
> to the way the "sneaky opponent" (Mike's ™ [1]) behaves.
> 
> My 2 cents,
> Tommaso
> 
> [1] : 
> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
>  
> <http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html>
>  
> 
> Le mar. 1 août 2017 à 10:59, Tommaso Teofili <tommaso.teof...@gmail.com 
> <mailto:tommaso.teof...@gmail.com>> a écrit :
> Hi all,
> 
> lately I am looking a bit closer at merge policies, of course particularly at 
> the tiered one, and I was wondering if we can mitigate the amount of possibly 
> avoidable merges in high commit rates scenarios, especially when a high 
> percentage of the commits happens on same docs.
> I've observed several evolutions of merges in such scenarios and it seemed to 
> me the merge policy was too aggressive in merging, causing a large IO 
> overhead.
> I've then tried the same with a merge policy which was tentatively looking at 
> commit rates and skipping merges if such a rate is higher than a threshold 
> which seemed to give slightly better results in reducing the unneeded IO 
> caused by avoidable merges.
> 
> I know this is a bit abstract but I would like to know if anyone has any 
> ideas or plans about mitigating the merge overhead in general and / or in 
> similar cases.
> 
> Regards,
> Tommaso
> 
> 
> 

Reply via email to