Re: Index optimization takes too long

2018-11-03 Thread Wei
Thanks everyone! I checked the system metrics during the optimization
process. CPU usage is quite low, there is no I/O wait,  and memory usage is
not much different from before the docValues change.  So I wonder what
could be the bottleneck.

Thanks,
Wei

On Sat, Nov 3, 2018 at 1:38 PM Erick Erickson 
wrote:

> Going from my phone so it'll be terse.  See uninvertingmergeuodateprocessor
> (or something like that). Also, there's an idea in SOLR-12259 IIRC, but
> that'll be in 7.6 at the earliest.
>
> On Sat, Nov 3, 2018, 07:13 Shawn Heisey 
> > On 11/3/2018 5:32 AM, Dave wrote:
> > > On a side note, does adding docvalues to an already indexed field, and
> > then optimizing, prevent the need to reindex to take advantage of
> > docvalues? I was under the impression you had to reindex the content.
> >
> > You must reindex when changing the schema to add docValues.  An optimize
> > will not build the new data structures. It will only rebuild the data
> > structures that are already there.
> >
> > Thanks,
> > Shawn
> >
> >
>


Questions about stored fields and updates.

2018-11-03 Thread Ash Ramesh
Hi everyone,

My company currently uses SOLR to completely hydrate client objects by
storing all fields (stored=true). Therefore we have 2 types of fields:

   1. indexed=true | stored=true : For fields that will be used for
   searching, sorting, etc.
   2. indexed=false | stored=true: For fields that only need hydrating for
   clients

We are re-architecting this so that we will eventually only get the id from
SOLR (fl=id) and hydrate from another data source. This means we can
obviously delete all the indexed=false | stored=true fields to reduce our
index size.

However, when it comes to the indexed=true | stored=true fields, we are not
sure whether to also set them to be stored=false and perform in-place
updates or leave it as is and perform atomic updates. We've done a fair bit
of research on the archives of this mailing list, but are still a bit
confused:

1. Will having the fields be converted from indexed=true | stored=true ->
indexed=true | stored=false cause our index size to reduce? Will it also
mean that indexing will be less compute expensive due to the compression of
stored field logic?
2. Are atomic updates preferred to in-place updates? Obviously if we move
to index only fields, then we have to do in-place updates all the time.
This isn't an issue for us, but we are a bit concerned about how SOLR's
indexing speed will suffer & deleted docs increase. Currently we perform
both.

Some points about our SOLR usecase:
- 40-60M docs with 8 shards (PULL/TLOG structure) Solr 7.4
- No need for extremely fast indexing
- Need for high query throughput (thus why we only want to retrieve the id
field and hydrate with a faster db store)

Thanks everyone, always appreciate the good information being shared here
daily :)

Regards,

Ash

-- 
*P.S. We've launched a new blog to share the latest ideas and case studies 
from our team. Check it out here: product.canva.com 
. ***
** Empowering the world 
to design
Also, we're hiring. Apply here! 

  
  








Re: Index optimization takes too long

2018-11-03 Thread Erick Erickson
Going from my phone so it'll be terse.  See uninvertingmergeuodateprocessor
(or something like that). Also, there's an idea in SOLR-12259 IIRC, but
that'll be in 7.6 at the earliest.

On Sat, Nov 3, 2018, 07:13 Shawn Heisey  On 11/3/2018 5:32 AM, Dave wrote:
> > On a side note, does adding docvalues to an already indexed field, and
> then optimizing, prevent the need to reindex to take advantage of
> docvalues? I was under the impression you had to reindex the content.
>
> You must reindex when changing the schema to add docValues.  An optimize
> will not build the new data structures. It will only rebuild the data
> structures that are already there.
>
> Thanks,
> Shawn
>
>


Re: Index optimization takes too long

2018-11-03 Thread Shawn Heisey

On 11/3/2018 5:32 AM, Dave wrote:

On a side note, does adding docvalues to an already indexed field, and then 
optimizing, prevent the need to reindex to take advantage of docvalues? I was 
under the impression you had to reindex the content.


You must reindex when changing the schema to add docValues.  An optimize 
will not build the new data structures. It will only rebuild the data 
structures that are already there.


Thanks,
Shawn



Re: Index optimization takes too long

2018-11-03 Thread Dave
On a side note, does adding docvalues to an already indexed field, and then 
optimizing, prevent the need to reindex to take advantage of docvalues? I was 
under the impression you had to reindex the content. 

> On Nov 3, 2018, at 4:41 AM, Deepak Goel  wrote:
> 
> I would start by monitoring the hardware (CPU, Memory, Disk) & software
> (heap, threads) utilization's and seeing where the bottlenecks are. Or what
> is getting utilized the most. And then tune that parameter.
> 
> I would also look at profiling the software.
> 
> 
> Deepak
> "The greatness of a nation can be judged by the way its animals are
> treated. Please consider stopping the cruelty by becoming a Vegan"
> 
> +91 73500 12833
> deic...@gmail.com
> 
> Facebook: https://www.facebook.com/deicool
> LinkedIn: www.linkedin.com/in/deicool
> 
> "Plant a Tree, Go Green"
> 
> Make In India : http://www.makeinindia.com/home
> 
> 
>> On Sat, Nov 3, 2018 at 4:30 AM Wei  wrote:
>> 
>> Hello,
>> 
>> After a recent schema change,  it takes almost 40 minutes to optimize the
>> index.  The schema change is to enable docValues for all sort/facet fields,
>> which increase the index size from 12G to 14G. Before the change it only
>> takes 5 minutes to do the optimization.
>> 
>> I have tried to increase maxMergeAtOnceExplicit because the default 30
>> could be too low:
>> 
>> 100
>> 
>> But it doesn't seem to help. Any suggestions?
>> 
>> Thanks,
>> Wei
>> 


Re: Index optimization takes too long

2018-11-03 Thread Deepak Goel
I would start by monitoring the hardware (CPU, Memory, Disk) & software
(heap, threads) utilization's and seeing where the bottlenecks are. Or what
is getting utilized the most. And then tune that parameter.

I would also look at profiling the software.


Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please consider stopping the cruelty by becoming a Vegan"

+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home


On Sat, Nov 3, 2018 at 4:30 AM Wei  wrote:

> Hello,
>
> After a recent schema change,  it takes almost 40 minutes to optimize the
> index.  The schema change is to enable docValues for all sort/facet fields,
> which increase the index size from 12G to 14G. Before the change it only
> takes 5 minutes to do the optimization.
>
> I have tried to increase maxMergeAtOnceExplicit because the default 30
> could be too low:
>
> 100
>
> But it doesn't seem to help. Any suggestions?
>
> Thanks,
> Wei
>