Re: Number of segments in collection is more than what is set in TieredMergePolicyFactory
Hi Shawn, Thank you for the explanation. Regards, Edwin On Wed, 30 Jan 2019 at 15:18, Shawn Heisey wrote: > On 1/28/2019 10:14 AM, Zheng Lin Edwin Yeo wrote: > > We have the following TieredMergePolicyFactory configuration in our > > solrconfig,xml > > > > class="org.apache.solr.index.TieredMergePolicyFactory"> > >10 > >10 > >10 > > These three settings are the really important ones. Except for > maxMergeAtOnceExplicit, you have these at the default settings. The > default for maxMergeAtOnceExplicit is 30 ... and you shouldn't lower it > without a really good reason. It mostly comes into play during an > optimize ... when you lower it, optimizes may take longer than normal. > It won't be able to merge as many segments at the same time, so the > number of passes required to complete the optimize could increase. > > The most important setting here is segmentsPerTier ... this does not > mean you will never have more than 10 total segments, it means that at > each tier, Lucene will try to keep the number of segments below 10. > With a large index, you are likely to have 3 or 4 tiers, possibly more. > > On an index where I spent a lot of time, my settings were, respective to > yours, 35, 105, and 35. I often had more than 100 segments in those > indexes. It was behaving correctly. > > > What could be the reason that it is not able to merge the segments to 3, > > with each of the segment size to be 5 GB? > > It is working as designed, just not as you expected. > > Thanks, > Shawn >
Re: Number of segments in collection is more than what is set in TieredMergePolicyFactory
On 1/28/2019 10:14 AM, Zheng Lin Edwin Yeo wrote: We have the following TieredMergePolicyFactory configuration in our solrconfig,xml 10 10 10 These three settings are the really important ones. Except for maxMergeAtOnceExplicit, you have these at the default settings. The default for maxMergeAtOnceExplicit is 30 ... and you shouldn't lower it without a really good reason. It mostly comes into play during an optimize ... when you lower it, optimizes may take longer than normal. It won't be able to merge as many segments at the same time, so the number of passes required to complete the optimize could increase. The most important setting here is segmentsPerTier ... this does not mean you will never have more than 10 total segments, it means that at each tier, Lucene will try to keep the number of segments below 10. With a large index, you are likely to have 3 or 4 tiers, possibly more. On an index where I spent a lot of time, my settings were, respective to yours, 35, 105, and 35. I often had more than 100 segments in those indexes. It was behaving correctly. What could be the reason that it is not able to merge the segments to 3, with each of the segment size to be 5 GB? It is working as designed, just not as you expected. Thanks, Shawn
Re: Number of segments in collection is more than what is set in TieredMergePolicyFactory
Hi, Anyone has any insights of this? Thank you in advance. Regards, Edwin On Tue, 29 Jan 2019 at 01:14, Zheng Lin Edwin Yeo wrote: > Hi, > > We have the following TieredMergePolicyFactory configuration in our > solrconfig,xml > > > 10 > 10 > 10 > 10 > 5120 > 0.1 > 2048 > 10.0 > > > However, when we index data to the collection, the number of segments that > we are getting does not match what we configured. > For example, our collection size is 13.7 GB. With the above > TieredMergePolicyFactory configuration, we should expect to have 3 segments > (since 13.7 / 5 = 2.74, which rounds up to 3). But we are getting 24 > segments in our collection, which we have attached the screenshot in the > link below. > > https://drive.google.com/file/d/1hjIQVk_L2Bn9MYOmCdf2wKD_f_D2DNV6/view?usp=sharing > > What could be the reason that it is not able to merge the segments to 3, > with each of the segment size to be 5 GB? > > Regards, > Edwin > > > >
Number of segments in collection is more than what is set in TieredMergePolicyFactory
Hi, We have the following TieredMergePolicyFactory configuration in our solrconfig,xml 10 10 10 10 5120 0.1 2048 10.0 However, when we index data to the collection, the number of segments that we are getting does not match what we configured. For example, our collection size is 13.7 GB. With the above TieredMergePolicyFactory configuration, we should expect to have 3 segments (since 13.7 / 5 = 2.74, which rounds up to 3). But we are getting 24 segments in our collection, which we have attached the screenshot in the link below. https://drive.google.com/file/d/1hjIQVk_L2Bn9MYOmCdf2wKD_f_D2DNV6/view?usp=sharing What could be the reason that it is not able to merge the segments to 3, with each of the segment size to be 5 GB? Regards, Edwin
Re: Number of segments
My main concern was just making sure we were getting the best search performance, and that we did not have too many segments. Every attempt I made to adjust the segment count resulted in no difference (segment count never changed). Looking at that blog page, it looks like 30-40 segments is probably the norm. On 04/08/2013 08:43 PM, Chris Hostetter wrote: : How do I determine how many tiers it has? You may find this blog post from mccandless helpful... http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html (don't ignore the videos! watching them really helpful to understand what he is talking about) Once you've obsorbed that, then please revist your question, specifically Upayavira's key point: what is the problem you are trying to solve? https://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss
Number of segments
I'm running solr 4.0. I'm noticing my segments are staying in the 30+ range, even though I have these settings: indexConfig mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=segmentsPerTier10/int int name=maxMergeAtOnce10/int int name=maxMergeAtOnceExplicit10/int /mergePolicy useCompoundFilefalse/useCompoundFile Can anyone give me some advice on what I should change or check?
Re: Number of segments
On Mon, Apr 8, 2013, at 02:35 PM, Michael Long wrote: I'm running solr 4.0. I'm noticing my segments are staying in the 30+ range, even though I have these settings: indexConfig mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=segmentsPerTier10/int int name=maxMergeAtOnce10/int int name=maxMergeAtOnceExplicit10/int /mergePolicy useCompoundFilefalse/useCompoundFile Can anyone give me some advice on what I should change or check? How many documents do you have? How big are the files on disk? Note it says segments per tier, you may have multiple tiers at play meaning you can have more than ten segments. There's also, I believe, properties that define the maximum size on disk for a segment and the like that can prevent merges from happening. Upayavira
Re: Number of segments
On 04/08/2013 09:41 AM, Upayavira wrote: How many documents do you have? How big are the files on disk? 2,795,601 and the index dir is 50G Note it says segments per tier, you may have multiple tiers at play meaning you can have more than ten segments. How do I determine how many tiers it has? There's also, I believe, properties that define the maximum size on disk for a segment and the like that can prevent merges from happening. I just have the defaults...nothing explicitly set Upayavira
Re: Number of segments
On Mon, Apr 8, 2013, at 02:51 PM, Michael Long wrote: On 04/08/2013 09:41 AM, Upayavira wrote: How many documents do you have? How big are the files on disk? 2,795,601 and the index dir is 50G Note it says segments per tier, you may have multiple tiers at play meaning you can have more than ten segments. How do I determine how many tiers it has? There's also, I believe, properties that define the maximum size on disk for a segment and the like that can prevent merges from happening. I just have the defaults...nothing explicitly set What issue are you trying to solve here? Generally, the tiered merge policy works well, and if searches perform well, then having a reasonable number of segments needn't cause you any issues. Indeed, with larger indexes, having too few segments can cause issues as merging can require copying large segments, which can be time-consuming. Upayavira
Re: Number of segments
: How do I determine how many tiers it has? You may find this blog post from mccandless helpful... http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html (don't ignore the videos! watching them really helpful to understand what he is talking about) Once you've obsorbed that, then please revist your question, specifically Upayavira's key point: what is the problem you are trying to solve? https://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss