Re: Index optimization takes too long
On Sat, 2018-11-03 at 21:41 -0700, Wei wrote: > Thanks everyone! I checked the system metrics during the optimization > process. CPU usage is quite low, there is no I/O wait, and memory > usage is not much different from before the docValues change. So I > wonder what could be the bottleneck. Are you looking at overall CPU usage or single-core? When we run force merge, we have a single core at 100% while the rest are idle. NB: There is currently a thread "Static index, fastest way to do forceMerge" in the Lucene users mailinglist, which seem to be quite parallel to this thread. - Toke Eskildsen, royal Danish Library
Re: Index optimization takes too long
Thanks everyone! I checked the system metrics during the optimization process. CPU usage is quite low, there is no I/O wait, and memory usage is not much different from before the docValues change. So I wonder what could be the bottleneck. Thanks, Wei On Sat, Nov 3, 2018 at 1:38 PM Erick Erickson wrote: > Going from my phone so it'll be terse. See uninvertingmergeuodateprocessor > (or something like that). Also, there's an idea in SOLR-12259 IIRC, but > that'll be in 7.6 at the earliest. > > On Sat, Nov 3, 2018, 07:13 Shawn Heisey > > On 11/3/2018 5:32 AM, Dave wrote: > > > On a side note, does adding docvalues to an already indexed field, and > > then optimizing, prevent the need to reindex to take advantage of > > docvalues? I was under the impression you had to reindex the content. > > > > You must reindex when changing the schema to add docValues. An optimize > > will not build the new data structures. It will only rebuild the data > > structures that are already there. > > > > Thanks, > > Shawn > > > > >
Re: Index optimization takes too long
Going from my phone so it'll be terse. See uninvertingmergeuodateprocessor (or something like that). Also, there's an idea in SOLR-12259 IIRC, but that'll be in 7.6 at the earliest. On Sat, Nov 3, 2018, 07:13 Shawn Heisey On 11/3/2018 5:32 AM, Dave wrote: > > On a side note, does adding docvalues to an already indexed field, and > then optimizing, prevent the need to reindex to take advantage of > docvalues? I was under the impression you had to reindex the content. > > You must reindex when changing the schema to add docValues. An optimize > will not build the new data structures. It will only rebuild the data > structures that are already there. > > Thanks, > Shawn > >
Re: Index optimization takes too long
On 11/3/2018 5:32 AM, Dave wrote: On a side note, does adding docvalues to an already indexed field, and then optimizing, prevent the need to reindex to take advantage of docvalues? I was under the impression you had to reindex the content. You must reindex when changing the schema to add docValues. An optimize will not build the new data structures. It will only rebuild the data structures that are already there. Thanks, Shawn
Re: Index optimization takes too long
On a side note, does adding docvalues to an already indexed field, and then optimizing, prevent the need to reindex to take advantage of docvalues? I was under the impression you had to reindex the content. > On Nov 3, 2018, at 4:41 AM, Deepak Goel wrote: > > I would start by monitoring the hardware (CPU, Memory, Disk) & software > (heap, threads) utilization's and seeing where the bottlenecks are. Or what > is getting utilized the most. And then tune that parameter. > > I would also look at profiling the software. > > > Deepak > "The greatness of a nation can be judged by the way its animals are > treated. Please consider stopping the cruelty by becoming a Vegan" > > +91 73500 12833 > deic...@gmail.com > > Facebook: https://www.facebook.com/deicool > LinkedIn: www.linkedin.com/in/deicool > > "Plant a Tree, Go Green" > > Make In India : http://www.makeinindia.com/home > > >> On Sat, Nov 3, 2018 at 4:30 AM Wei wrote: >> >> Hello, >> >> After a recent schema change, it takes almost 40 minutes to optimize the >> index. The schema change is to enable docValues for all sort/facet fields, >> which increase the index size from 12G to 14G. Before the change it only >> takes 5 minutes to do the optimization. >> >> I have tried to increase maxMergeAtOnceExplicit because the default 30 >> could be too low: >> >> 100 >> >> But it doesn't seem to help. Any suggestions? >> >> Thanks, >> Wei >>
Re: Index optimization takes too long
I would start by monitoring the hardware (CPU, Memory, Disk) & software (heap, threads) utilization's and seeing where the bottlenecks are. Or what is getting utilized the most. And then tune that parameter. I would also look at profiling the software. Deepak "The greatness of a nation can be judged by the way its animals are treated. Please consider stopping the cruelty by becoming a Vegan" +91 73500 12833 deic...@gmail.com Facebook: https://www.facebook.com/deicool LinkedIn: www.linkedin.com/in/deicool "Plant a Tree, Go Green" Make In India : http://www.makeinindia.com/home On Sat, Nov 3, 2018 at 4:30 AM Wei wrote: > Hello, > > After a recent schema change, it takes almost 40 minutes to optimize the > index. The schema change is to enable docValues for all sort/facet fields, > which increase the index size from 12G to 14G. Before the change it only > takes 5 minutes to do the optimization. > > I have tried to increase maxMergeAtOnceExplicit because the default 30 > could be too low: > > 100 > > But it doesn't seem to help. Any suggestions? > > Thanks, > Wei >
Re: Index optimization takes too long
On 11/2/2018 5:00 PM, Wei wrote: After a recent schema change, it takes almost 40 minutes to optimize the index. The schema change is to enable docValues for all sort/facet fields, which increase the index size from 12G to 14G. Before the change it only takes 5 minutes to do the optimization. An optimize is not just a straight data copy. Lucene is actually completely recalculating the index data structures. It will never proceed at the full data rate your disks are capable of achieving. I do not know how docValues actually work during a segment merge, but given exactly how the info relates to the inverted index, it's probably even more complicated than the rest of the data structures in a Lucene index. On one of the systems I used to manage, back in March of 2017, I was seeing a 50GB index take 1.73 hours to optimize. I do not recall whether I had docValues at that point, but I probably did. http://lucene.472066.n3.nabble.com/What-is-the-bottleneck-for-an-optimise-operation-tt4323039.html#a4323140 There's not much you can do to make this go faster. Putting massively faster CPUs in the machine MIGHT make a difference, but it probably wouldn't be a BIG difference. I'm talking about clock speed, not core count. Thanks, Shawn
Index optimization takes too long
Hello, After a recent schema change, it takes almost 40 minutes to optimize the index. The schema change is to enable docValues for all sort/facet fields, which increase the index size from 12G to 14G. Before the change it only takes 5 minutes to do the optimization. I have tried to increase maxMergeAtOnceExplicit because the default 30 could be too low: 100 But it doesn't seem to help. Any suggestions? Thanks, Wei