Re: Increasing maxMergedSegmentMB value

2016-01-31 Thread Jack Krupansky
Make sure you fully digest Mike McCandless' blog post on segment merge
before trying to outguess his code:
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

Generally, I don't think you would want to merge just two segments.
Generally, you should do a bunch at a time, typically 10. IOW, take all the
segments on a tier and merge them into one segment at the next tier.

There is no documented practical upper limit for how big to make a single
segment, but very large segments are not likely to be optimized well in
Lucene, hence the default max merge size of 5GB. If you want to get a lot
above that, you're in uncharted territory. Besides, if you start pushing
your index well above the amount of available system memory your query
performance will suffer. I'd watch for the latter before pushing on the
former.


-- Jack Krupansky

On Sun, Jan 31, 2016 at 10:43 AM, Zheng Lin Edwin Yeo 
wrote:

> Thanks for your reply Shawn and Jack.
>
> I wanted to increase the segment size to 15GB, so that there will be lesser
> segments to search for during the query, which should potentially improve
> the query speed.
>
> What if I set the segment size to 20GB? Will all the existing 10GB segments
> be merge to 20GB, as now merging two 10GB segments will results in a 20GB
> segment?
>
> Regards,
> Edwin
>
>
> On 31 January 2016 at 12:16, Jack Krupansky 
> wrote:
>
> > From the Lucene MergePolicy Javadoc:
> >
> > "Whenever the segments in an index have been altered by IndexWriter
> > <
> >
> https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/IndexWriter.html
> > >,
> > either the addition of a newly flushed segment, addition of many segments
> > from addIndexes* calls, or a previous merge that may now need to cascade,
> > IndexWriter
> > <
> >
> https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/IndexWriter.html
> > >
> >  invokes findMerges(org.apache.lucene.index.MergeTrigger,
> > org.apache.lucene.index.SegmentInfos,
> org.apache.lucene.index.IndexWriter)
> > <
> >
> https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/MergePolicy.html#findMerges(org.apache.lucene.index.MergeTrigger
> > ,
> > org.apache.lucene.index.SegmentInfos,
> > org.apache.lucene.index.IndexWriter)> to
> > give the MergePolicy a chance to pick merges that are now required. This
> > method returns a MergePolicy.MergeSpecification
> > <
> >
> https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/MergePolicy.MergeSpecification.html
> > >
> > instance
> > describing the set of merges that should be done, or null if no merges
> are
> > necessary. When IndexWriter.forceMerge is called, it calls
> > findForcedMerges(SegmentInfos,int,Map,
> > IndexWriter)
> > <
> >
> https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/MergePolicy.html#findForcedMerges(org.apache.lucene.index.SegmentInfos
> > ,
> > int, java.util.Map, org.apache.lucene.index.IndexWriter)> and the
> > MergePolicy should then return the necessary merges."
> >
> > See:
> >
> >
> https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/MergePolicy.html
> >
> > IOW, when the next commit occurs that closes and flushes the currently
> open
> > segment.
> >
> > Nothing will happen to any existing 10GB segments, now or ever in the
> > future since merging two 10GB segments would not be possible with a limit
> > of only 15GB.
> >
> > Maybe you could clue us in as to what effect you are trying to achieve. I
> > mean, why should any app care whether segments are 10GB or 15GB?
> >
> >
> > -- Jack Krupansky
> >
> > On Sat, Jan 30, 2016 at 6:28 PM, Shawn Heisey 
> wrote:
> >
> > > On 1/30/2016 7:31 AM, Zheng Lin Edwin Yeo wrote:
> > > > I would like to find out, when I increase the maxMergedSegmentMB from
> > > 10240
> > > > (10GB) to 15360 (15GB), will all the 10GB segments that were created
> > > > previously be automatically merge to 15GB?
> > >
> > > Not necessarily.  It will make those 10GB+ segments eligible for
> further
> > > merging, whereas they would have been ineligible before the change.
> > >
> > > This might mean that one or more of those large segments will be merged
> > > soon after the change and restart/reload, but I do not know when it
> > > might happen.  It would probably wait until at least one new segment
> was
> > > created, at which time the merge policy would be consulted.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>


Re: Increasing maxMergedSegmentMB value

2016-01-31 Thread Zheng Lin Edwin Yeo
Hi Jack,

Yes, I plan to merge all the 10GB segments to 20GB, and not just two of the
segments. Sorry for the confusion.

I have recently increased the system memory from 64GB to 192GB, but as our
index size grows (which means more segments are created), I found that the
query speed actually slow downs. So we have decided to increase the segment
size to see how it goes, as there will be fewer segments to search for.

Regards,
Edwin


On 1 February 2016 at 01:37, Jack Krupansky 
wrote:

> Make sure you fully digest Mike McCandless' blog post on segment merge
> before trying to outguess his code:
>
> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
>
> Generally, I don't think you would want to merge just two segments.
> Generally, you should do a bunch at a time, typically 10. IOW, take all the
> segments on a tier and merge them into one segment at the next tier.
>
> There is no documented practical upper limit for how big to make a single
> segment, but very large segments are not likely to be optimized well in
> Lucene, hence the default max merge size of 5GB. If you want to get a lot
> above that, you're in uncharted territory. Besides, if you start pushing
> your index well above the amount of available system memory your query
> performance will suffer. I'd watch for the latter before pushing on the
> former.
>
>
> -- Jack Krupansky
>
> On Sun, Jan 31, 2016 at 10:43 AM, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> wrote:
>
> > Thanks for your reply Shawn and Jack.
> >
> > I wanted to increase the segment size to 15GB, so that there will be
> lesser
> > segments to search for during the query, which should potentially improve
> > the query speed.
> >
> > What if I set the segment size to 20GB? Will all the existing 10GB
> segments
> > be merge to 20GB, as now merging two 10GB segments will results in a 20GB
> > segment?
> >
> > Regards,
> > Edwin
> >
> >
> > On 31 January 2016 at 12:16, Jack Krupansky 
> > wrote:
> >
> > > From the Lucene MergePolicy Javadoc:
> > >
> > > "Whenever the segments in an index have been altered by IndexWriter
> > > <
> > >
> >
> https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/IndexWriter.html
> > > >,
> > > either the addition of a newly flushed segment, addition of many
> segments
> > > from addIndexes* calls, or a previous merge that may now need to
> cascade,
> > > IndexWriter
> > > <
> > >
> >
> https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/IndexWriter.html
> > > >
> > >  invokes findMerges(org.apache.lucene.index.MergeTrigger,
> > > org.apache.lucene.index.SegmentInfos,
> > org.apache.lucene.index.IndexWriter)
> > > <
> > >
> >
> https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/MergePolicy.html#findMerges(org.apache.lucene.index.MergeTrigger
> > > ,
> > > org.apache.lucene.index.SegmentInfos,
> > > org.apache.lucene.index.IndexWriter)> to
> > > give the MergePolicy a chance to pick merges that are now required.
> This
> > > method returns a MergePolicy.MergeSpecification
> > > <
> > >
> >
> https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/MergePolicy.MergeSpecification.html
> > > >
> > > instance
> > > describing the set of merges that should be done, or null if no merges
> > are
> > > necessary. When IndexWriter.forceMerge is called, it calls
> > > findForcedMerges(SegmentInfos,int,Map,
> > > IndexWriter)
> > > <
> > >
> >
> https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/MergePolicy.html#findForcedMerges(org.apache.lucene.index.SegmentInfos
> > > ,
> > > int, java.util.Map, org.apache.lucene.index.IndexWriter)> and the
> > > MergePolicy should then return the necessary merges."
> > >
> > > See:
> > >
> > >
> >
> https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/MergePolicy.html
> > >
> > > IOW, when the next commit occurs that closes and flushes the currently
> > open
> > > segment.
> > >
> > > Nothing will happen to any existing 10GB segments, now or ever in the
> > > future since merging two 10GB segments would not be possible with a
> limit
> > > of only 15GB.
> > >
> > > Maybe you could clue us in as to what effect you are trying to
> achieve. I
> > > mean, why should any app care whether segments are 10GB or 15GB?
> > >
> > >
> > > -- Jack Krupansky
> > >
> > > On Sat, Jan 30, 2016 at 6:28 PM, Shawn Heisey 
> > wrote:
> > >
> > > > On 1/30/2016 7:31 AM, Zheng Lin Edwin Yeo wrote:
> > > > > I would like to find out, when I increase the maxMergedSegmentMB
> from
> > > > 10240
> > > > > (10GB) to 15360 (15GB), will all the 10GB segments that were
> created
> > > > > previously be automatically merge to 15GB?
> > > >
> > > > Not necessarily.  It will make those 10GB+ segments eligible for
> > further
> > > > merging, whereas they would have been ineligible before the change.
> > > >
> > > > This might mean that one or more of those large segments will be
> 

Re: Increasing maxMergedSegmentMB value

2016-01-31 Thread Zheng Lin Edwin Yeo
Thanks for your reply Shawn and Jack.

I wanted to increase the segment size to 15GB, so that there will be lesser
segments to search for during the query, which should potentially improve
the query speed.

What if I set the segment size to 20GB? Will all the existing 10GB segments
be merge to 20GB, as now merging two 10GB segments will results in a 20GB
segment?

Regards,
Edwin


On 31 January 2016 at 12:16, Jack Krupansky 
wrote:

> From the Lucene MergePolicy Javadoc:
>
> "Whenever the segments in an index have been altered by IndexWriter
> <
> https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/IndexWriter.html
> >,
> either the addition of a newly flushed segment, addition of many segments
> from addIndexes* calls, or a previous merge that may now need to cascade,
> IndexWriter
> <
> https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/IndexWriter.html
> >
>  invokes findMerges(org.apache.lucene.index.MergeTrigger,
> org.apache.lucene.index.SegmentInfos, org.apache.lucene.index.IndexWriter)
> <
> https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/MergePolicy.html#findMerges(org.apache.lucene.index.MergeTrigger
> ,
> org.apache.lucene.index.SegmentInfos,
> org.apache.lucene.index.IndexWriter)> to
> give the MergePolicy a chance to pick merges that are now required. This
> method returns a MergePolicy.MergeSpecification
> <
> https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/MergePolicy.MergeSpecification.html
> >
> instance
> describing the set of merges that should be done, or null if no merges are
> necessary. When IndexWriter.forceMerge is called, it calls
> findForcedMerges(SegmentInfos,int,Map,
> IndexWriter)
> <
> https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/MergePolicy.html#findForcedMerges(org.apache.lucene.index.SegmentInfos
> ,
> int, java.util.Map, org.apache.lucene.index.IndexWriter)> and the
> MergePolicy should then return the necessary merges."
>
> See:
>
> https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/MergePolicy.html
>
> IOW, when the next commit occurs that closes and flushes the currently open
> segment.
>
> Nothing will happen to any existing 10GB segments, now or ever in the
> future since merging two 10GB segments would not be possible with a limit
> of only 15GB.
>
> Maybe you could clue us in as to what effect you are trying to achieve. I
> mean, why should any app care whether segments are 10GB or 15GB?
>
>
> -- Jack Krupansky
>
> On Sat, Jan 30, 2016 at 6:28 PM, Shawn Heisey  wrote:
>
> > On 1/30/2016 7:31 AM, Zheng Lin Edwin Yeo wrote:
> > > I would like to find out, when I increase the maxMergedSegmentMB from
> > 10240
> > > (10GB) to 15360 (15GB), will all the 10GB segments that were created
> > > previously be automatically merge to 15GB?
> >
> > Not necessarily.  It will make those 10GB+ segments eligible for further
> > merging, whereas they would have been ineligible before the change.
> >
> > This might mean that one or more of those large segments will be merged
> > soon after the change and restart/reload, but I do not know when it
> > might happen.  It would probably wait until at least one new segment was
> > created, at which time the merge policy would be consulted.
> >
> > Thanks,
> > Shawn
> >
> >
>


Re: Increasing maxMergedSegmentMB value

2016-01-30 Thread Shawn Heisey
On 1/30/2016 7:31 AM, Zheng Lin Edwin Yeo wrote:
> I would like to find out, when I increase the maxMergedSegmentMB from 10240
> (10GB) to 15360 (15GB), will all the 10GB segments that were created
> previously be automatically merge to 15GB?

Not necessarily.  It will make those 10GB+ segments eligible for further
merging, whereas they would have been ineligible before the change.

This might mean that one or more of those large segments will be merged
soon after the change and restart/reload, but I do not know when it
might happen.  It would probably wait until at least one new segment was
created, at which time the merge policy would be consulted.

Thanks,
Shawn



Re: Increasing maxMergedSegmentMB value

2016-01-30 Thread Jack Krupansky
>From the Lucene MergePolicy Javadoc:

"Whenever the segments in an index have been altered by IndexWriter
,
either the addition of a newly flushed segment, addition of many segments
from addIndexes* calls, or a previous merge that may now need to cascade,
IndexWriter

 invokes findMerges(org.apache.lucene.index.MergeTrigger,
org.apache.lucene.index.SegmentInfos, org.apache.lucene.index.IndexWriter)
 to
give the MergePolicy a chance to pick merges that are now required. This
method returns a MergePolicy.MergeSpecification

instance
describing the set of merges that should be done, or null if no merges are
necessary. When IndexWriter.forceMerge is called, it calls
findForcedMerges(SegmentInfos,int,Map,
IndexWriter)
 and the
MergePolicy should then return the necessary merges."

See:
https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/MergePolicy.html

IOW, when the next commit occurs that closes and flushes the currently open
segment.

Nothing will happen to any existing 10GB segments, now or ever in the
future since merging two 10GB segments would not be possible with a limit
of only 15GB.

Maybe you could clue us in as to what effect you are trying to achieve. I
mean, why should any app care whether segments are 10GB or 15GB?


-- Jack Krupansky

On Sat, Jan 30, 2016 at 6:28 PM, Shawn Heisey  wrote:

> On 1/30/2016 7:31 AM, Zheng Lin Edwin Yeo wrote:
> > I would like to find out, when I increase the maxMergedSegmentMB from
> 10240
> > (10GB) to 15360 (15GB), will all the 10GB segments that were created
> > previously be automatically merge to 15GB?
>
> Not necessarily.  It will make those 10GB+ segments eligible for further
> merging, whereas they would have been ineligible before the change.
>
> This might mean that one or more of those large segments will be merged
> soon after the change and restart/reload, but I do not know when it
> might happen.  It would probably wait until at least one new segment was
> created, at which time the merge policy would be consulted.
>
> Thanks,
> Shawn
>
>