Re: Index optimization takes too long

2018-11-04 Thread Toke Eskildsen
On Sat, 2018-11-03 at 21:41 -0700, Wei wrote:
> Thanks everyone! I checked the system metrics during the optimization
> process. CPU usage is quite low, there is no I/O wait,  and memory
> usage is not much different from before the docValues change.  So I
> wonder what could be the bottleneck.

Are you looking at overall CPU usage or single-core? When we run force
merge, we have a single core at 100% while the rest are idle.


NB: There is currently a thread "Static index, fastest way to do
forceMerge" in the Lucene users mailinglist, which seem to be quite
parallel to this thread.

- Toke Eskildsen, royal Danish Library




Re: Index optimization takes too long

2018-11-03 Thread Wei
Thanks everyone! I checked the system metrics during the optimization
process. CPU usage is quite low, there is no I/O wait,  and memory usage is
not much different from before the docValues change.  So I wonder what
could be the bottleneck.

Thanks,
Wei

On Sat, Nov 3, 2018 at 1:38 PM Erick Erickson 
wrote:

> Going from my phone so it'll be terse.  See uninvertingmergeuodateprocessor
> (or something like that). Also, there's an idea in SOLR-12259 IIRC, but
> that'll be in 7.6 at the earliest.
>
> On Sat, Nov 3, 2018, 07:13 Shawn Heisey 
> > On 11/3/2018 5:32 AM, Dave wrote:
> > > On a side note, does adding docvalues to an already indexed field, and
> > then optimizing, prevent the need to reindex to take advantage of
> > docvalues? I was under the impression you had to reindex the content.
> >
> > You must reindex when changing the schema to add docValues.  An optimize
> > will not build the new data structures. It will only rebuild the data
> > structures that are already there.
> >
> > Thanks,
> > Shawn
> >
> >
>


Re: Index optimization takes too long

2018-11-03 Thread Erick Erickson
Going from my phone so it'll be terse.  See uninvertingmergeuodateprocessor
(or something like that). Also, there's an idea in SOLR-12259 IIRC, but
that'll be in 7.6 at the earliest.

On Sat, Nov 3, 2018, 07:13 Shawn Heisey  On 11/3/2018 5:32 AM, Dave wrote:
> > On a side note, does adding docvalues to an already indexed field, and
> then optimizing, prevent the need to reindex to take advantage of
> docvalues? I was under the impression you had to reindex the content.
>
> You must reindex when changing the schema to add docValues.  An optimize
> will not build the new data structures. It will only rebuild the data
> structures that are already there.
>
> Thanks,
> Shawn
>
>


Re: Index optimization takes too long

2018-11-03 Thread Shawn Heisey

On 11/3/2018 5:32 AM, Dave wrote:

On a side note, does adding docvalues to an already indexed field, and then 
optimizing, prevent the need to reindex to take advantage of docvalues? I was 
under the impression you had to reindex the content.


You must reindex when changing the schema to add docValues.  An optimize 
will not build the new data structures. It will only rebuild the data 
structures that are already there.


Thanks,
Shawn



Re: Index optimization takes too long

2018-11-03 Thread Dave
On a side note, does adding docvalues to an already indexed field, and then 
optimizing, prevent the need to reindex to take advantage of docvalues? I was 
under the impression you had to reindex the content. 

> On Nov 3, 2018, at 4:41 AM, Deepak Goel  wrote:
> 
> I would start by monitoring the hardware (CPU, Memory, Disk) & software
> (heap, threads) utilization's and seeing where the bottlenecks are. Or what
> is getting utilized the most. And then tune that parameter.
> 
> I would also look at profiling the software.
> 
> 
> Deepak
> "The greatness of a nation can be judged by the way its animals are
> treated. Please consider stopping the cruelty by becoming a Vegan"
> 
> +91 73500 12833
> deic...@gmail.com
> 
> Facebook: https://www.facebook.com/deicool
> LinkedIn: www.linkedin.com/in/deicool
> 
> "Plant a Tree, Go Green"
> 
> Make In India : http://www.makeinindia.com/home
> 
> 
>> On Sat, Nov 3, 2018 at 4:30 AM Wei  wrote:
>> 
>> Hello,
>> 
>> After a recent schema change,  it takes almost 40 minutes to optimize the
>> index.  The schema change is to enable docValues for all sort/facet fields,
>> which increase the index size from 12G to 14G. Before the change it only
>> takes 5 minutes to do the optimization.
>> 
>> I have tried to increase maxMergeAtOnceExplicit because the default 30
>> could be too low:
>> 
>> 100
>> 
>> But it doesn't seem to help. Any suggestions?
>> 
>> Thanks,
>> Wei
>> 


Re: Index optimization takes too long

2018-11-03 Thread Deepak Goel
I would start by monitoring the hardware (CPU, Memory, Disk) & software
(heap, threads) utilization's and seeing where the bottlenecks are. Or what
is getting utilized the most. And then tune that parameter.

I would also look at profiling the software.


Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please consider stopping the cruelty by becoming a Vegan"

+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home


On Sat, Nov 3, 2018 at 4:30 AM Wei  wrote:

> Hello,
>
> After a recent schema change,  it takes almost 40 minutes to optimize the
> index.  The schema change is to enable docValues for all sort/facet fields,
> which increase the index size from 12G to 14G. Before the change it only
> takes 5 minutes to do the optimization.
>
> I have tried to increase maxMergeAtOnceExplicit because the default 30
> could be too low:
>
> 100
>
> But it doesn't seem to help. Any suggestions?
>
> Thanks,
> Wei
>


Re: Index optimization takes too long

2018-11-02 Thread Shawn Heisey

On 11/2/2018 5:00 PM, Wei wrote:

After a recent schema change,  it takes almost 40 minutes to optimize the
index.  The schema change is to enable docValues for all sort/facet fields,
which increase the index size from 12G to 14G. Before the change it only
takes 5 minutes to do the optimization.


An optimize is not just a straight data copy.  Lucene is actually 
completely recalculating the index data structures.  It will never 
proceed at the full data rate your disks are capable of achieving.


I do not know how docValues actually work during a segment merge, but 
given exactly how the info relates to the inverted index, it's probably 
even more complicated than the rest of the data structures in a Lucene 
index.


On one of the systems I used to manage, back in March of 2017, I was 
seeing a 50GB index take 1.73 hours to optimize.  I do not recall 
whether I had docValues at that point, but I probably did.


http://lucene.472066.n3.nabble.com/What-is-the-bottleneck-for-an-optimise-operation-tt4323039.html#a4323140

There's not much you can do to make this go faster. Putting massively 
faster CPUs in the machine MIGHT make a difference, but it probably 
wouldn't be a BIG difference.  I'm talking about clock speed, not core 
count.


Thanks,
Shawn