Re: Number of segments in collection is more than what is set in TieredMergePolicyFactory

2019-01-30 Thread Zheng Lin Edwin Yeo
Hi Shawn,

Thank you for the explanation.

Regards,
Edwin

On Wed, 30 Jan 2019 at 15:18, Shawn Heisey  wrote:

> On 1/28/2019 10:14 AM, Zheng Lin Edwin Yeo wrote:
> > We have the following TieredMergePolicyFactory configuration in our
> > solrconfig,xml
> >
> >  class="org.apache.solr.index.TieredMergePolicyFactory">
> >10
> >10
> >10
>
> These three settings are the really important ones.  Except for
> maxMergeAtOnceExplicit, you have these at the default settings.  The
> default for maxMergeAtOnceExplicit is 30 ... and you shouldn't lower it
> without a really good reason.  It mostly comes into play during an
> optimize ... when you lower it, optimizes may take longer than normal.
> It won't be able to merge as many segments at the same time, so the
> number of passes required to complete the optimize could increase.
>
> The most important setting here is segmentsPerTier ... this does not
> mean you will never have more than 10 total segments, it means that at
> each tier, Lucene will try to keep the number of segments below 10.
> With a large index, you are likely to have 3 or 4 tiers, possibly more.
>
> On an index where I spent a lot of time, my settings were, respective to
> yours, 35, 105, and 35.  I often had more than 100 segments in those
> indexes.  It was behaving correctly.
>
> > What could be the reason that it is not able to merge the segments to 3,
> > with each of the  segment size to be 5 GB?
>
> It is working as designed, just not as you expected.
>
> Thanks,
> Shawn
>


Re: Number of segments in collection is more than what is set in TieredMergePolicyFactory

2019-01-29 Thread Shawn Heisey

On 1/28/2019 10:14 AM, Zheng Lin Edwin Yeo wrote:

We have the following TieredMergePolicyFactory configuration in our
solrconfig,xml


   10
   10
   10


These three settings are the really important ones.  Except for 
maxMergeAtOnceExplicit, you have these at the default settings.  The 
default for maxMergeAtOnceExplicit is 30 ... and you shouldn't lower it 
without a really good reason.  It mostly comes into play during an 
optimize ... when you lower it, optimizes may take longer than normal. 
It won't be able to merge as many segments at the same time, so the 
number of passes required to complete the optimize could increase.


The most important setting here is segmentsPerTier ... this does not 
mean you will never have more than 10 total segments, it means that at 
each tier, Lucene will try to keep the number of segments below 10. 
With a large index, you are likely to have 3 or 4 tiers, possibly more.


On an index where I spent a lot of time, my settings were, respective to 
yours, 35, 105, and 35.  I often had more than 100 segments in those 
indexes.  It was behaving correctly.



What could be the reason that it is not able to merge the segments to 3,
with each of the  segment size to be 5 GB?


It is working as designed, just not as you expected.

Thanks,
Shawn


Re: Number of segments in collection is more than what is set in TieredMergePolicyFactory

2019-01-29 Thread Zheng Lin Edwin Yeo
Hi,

Anyone has any insights of this?

Thank you in advance.

Regards,
Edwin

On Tue, 29 Jan 2019 at 01:14, Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> We have the following TieredMergePolicyFactory configuration in our
> solrconfig,xml
>
> 
>   10
>   10
>   10
>   10
>   5120
>   0.1
>   2048
>   10.0
> 
>
> However, when we index data to the collection, the number of segments that
> we are getting does not match what we configured.
> For example, our collection size is 13.7 GB. With the above
> TieredMergePolicyFactory configuration, we should expect to have 3 segments
> (since 13.7 / 5 = 2.74, which rounds up to 3). But we are getting 24
> segments in our collection, which we have attached the screenshot in the
> link below.
>
> https://drive.google.com/file/d/1hjIQVk_L2Bn9MYOmCdf2wKD_f_D2DNV6/view?usp=sharing
>
> What could be the reason that it is not able to merge the segments to 3,
> with each of the  segment size to be 5 GB?
>
> Regards,
> Edwin
>
>
>
>


Number of segments in collection is more than what is set in TieredMergePolicyFactory

2019-01-28 Thread Zheng Lin Edwin Yeo
Hi,

We have the following TieredMergePolicyFactory configuration in our
solrconfig,xml


  10
  10
  10
  10
  5120
  0.1
  2048
  10.0


However, when we index data to the collection, the number of segments that
we are getting does not match what we configured.
For example, our collection size is 13.7 GB. With the above
TieredMergePolicyFactory configuration, we should expect to have 3 segments
(since 13.7 / 5 = 2.74, which rounds up to 3). But we are getting 24
segments in our collection, which we have attached the screenshot in the
link below.
https://drive.google.com/file/d/1hjIQVk_L2Bn9MYOmCdf2wKD_f_D2DNV6/view?usp=sharing

What could be the reason that it is not able to merge the segments to 3,
with each of the  segment size to be 5 GB?

Regards,
Edwin


Re: Number of segments

2013-04-09 Thread Michael Long
My main concern was just making sure we were getting the best search 
performance, and that we did not have too many segments. Every attempt I 
made to adjust the segment count resulted in no difference (segment 
count never changed). Looking at that blog page, it looks like 30-40 
segments is probably the norm.


On 04/08/2013 08:43 PM, Chris Hostetter wrote:

: How do I determine how many tiers it has?

You may find this blog post from mccandless helpful...

http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

(don't ignore the videos! watching them really helpful to understand what
he is talking about)

Once you've obsorbed that, then please revist your question, specifically
Upayavira's key point: what is the problem you are trying to solve?

https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an XY Problem ... that is: you are dealing
with X, you are assuming Y will help you, and you are asking about Y
without giving more details about the X so that we can understand the
full issue.  Perhaps the best solution doesn't involve Y at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341


-Hoss




Number of segments

2013-04-08 Thread Michael Long
I'm running solr 4.0. I'm noticing my segments are staying in the 30+ 
range, even though I have these settings:


indexConfig
mergePolicy class=org.apache.lucene.index.TieredMergePolicy
int name=segmentsPerTier10/int
int name=maxMergeAtOnce10/int
int name=maxMergeAtOnceExplicit10/int
/mergePolicy
useCompoundFilefalse/useCompoundFile

Can anyone give me some advice on what I should change or check?


Re: Number of segments

2013-04-08 Thread Upayavira
On Mon, Apr 8, 2013, at 02:35 PM, Michael Long wrote:
 I'm running solr 4.0. I'm noticing my segments are staying in the 30+ 
 range, even though I have these settings:
 
 indexConfig
 mergePolicy class=org.apache.lucene.index.TieredMergePolicy
  int name=segmentsPerTier10/int
  int name=maxMergeAtOnce10/int
  int name=maxMergeAtOnceExplicit10/int
 /mergePolicy
 useCompoundFilefalse/useCompoundFile
 
 Can anyone give me some advice on what I should change or check?

How many documents do you have? How big are the files on disk?

Note it says segments per tier, you may have multiple tiers at play
meaning you can have more than ten segments.

There's also, I believe, properties that define the maximum size on disk
for a segment and the like that can prevent merges from happening.

Upayavira


Re: Number of segments

2013-04-08 Thread Michael Long


On 04/08/2013 09:41 AM, Upayavira wrote:

How many documents do you have? How big are the files on disk?

2,795,601 and the index dir is 50G


Note it says segments per tier, you may have multiple tiers at play
meaning you can have more than ten segments.

How do I determine how many tiers it has?


There's also, I believe, properties that define the maximum size on disk
for a segment and the like that can prevent merges from happening.

I just have the defaults...nothing explicitly set

Upayavira




Re: Number of segments

2013-04-08 Thread Upayavira

On Mon, Apr 8, 2013, at 02:51 PM, Michael Long wrote:
 
 On 04/08/2013 09:41 AM, Upayavira wrote:
  How many documents do you have? How big are the files on disk?
 2,795,601 and the index dir is 50G
 
  Note it says segments per tier, you may have multiple tiers at play
  meaning you can have more than ten segments.
 How do I determine how many tiers it has?

  There's also, I believe, properties that define the maximum size on disk
  for a segment and the like that can prevent merges from happening.
 I just have the defaults...nothing explicitly set

What issue are you trying to solve here? Generally, the tiered merge
policy works well, and if searches perform well, then having a
reasonable number of segments needn't cause you any issues.

Indeed, with larger indexes, having too few segments can cause issues as
merging can require copying large segments, which can be time-consuming.

Upayavira


Re: Number of segments

2013-04-08 Thread Chris Hostetter

: How do I determine how many tiers it has?

You may find this blog post from mccandless helpful...

http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

(don't ignore the videos! watching them really helpful to understand what 
he is talking about)

Once you've obsorbed that, then please revist your question, specifically 
Upayavira's key point: what is the problem you are trying to solve?

https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an XY Problem ... that is: you are dealing
with X, you are assuming Y will help you, and you are asking about Y
without giving more details about the X so that we can understand the
full issue.  Perhaps the best solution doesn't involve Y at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341


-Hoss