Re: Re: solr 4.2.1 index gets slower over time
This sounds interesting, I'll check this out. Thanks! Elisabeth 2014-04-02 8:54 GMT+02:00 Dmitry Kan : > Thanks, Markus, that is useful. > I'm guessing the higher the weight, the longer the op takes? > > > On Tue, Apr 1, 2014 at 10:39 PM, Markus Jelsma > wrote: > > > You may want to increase reclaimdeletesweight for tieredmergepolicy from > 2 > > to 3 or 4. By default it may keep too much deleted or updated docs in the > > index. This can increase index size by 50%!! Dmitry Kan < > > solrexp...@gmail.com> schreef:Elisabeth, > > > > Yes, I believe you are right in that the deletes are part of the optimize > > process. If you delete often, you may consider (if not already) the > > TieredMergePolicy, which is suited for this scenario. Check out this > > relevant discussion I had with Lucene committers: > > https://twitter.com/DmitryKan/status/399820408444051456 > > > > HTH, > > > > Dmitry > > > > > > On Tue, Apr 1, 2014 at 11:34 AM, elisabeth benoit < > > elisaelisael...@gmail.com > > > wrote: > > > > > Thanks a lot for your answers! > > > > > > Shawn. Our GC configuration has far less parameters defined, so we'll > > check > > > this out. > > > > > > Dimitry, about the expungeDeletes option, we'll add that in the delete > > > process. But from what I read, this is done in the optimize process > (cf. > > > > > > > > > http://lucene.472066.n3.nabble.com/Does-expungeDeletes-need-calling-during-an-optimize-td1214083.html > > > ). > > > Or maybe not? > > > > > > Thanks again, > > > Elisabeth > > > > > > > > > 2014-04-01 7:52 GMT+02:00 Dmitry Kan : > > > > > > > Hi, > > > > > > > > We have noticed something like this as well, but with older versions > of > > > > solr, 3.4. In our setup we delete documents pretty often. Internally > in > > > > Lucene, when a document is client requested to be deleted, it is not > > > > physically deleted, but only marked as "deleted". Our original > > > optimization > > > > assumption was such that the "deleted" documents would get physically > > > > removed on each optimize command issued. We started to suspect it > > wasn't > > > > always true as the shards (especially relatively large shards) became > > > > slower over time. So we found out about the expungeDeletes option, > > which > > > > purges the "deleted" docs and is by default false. We have set it to > > > true. > > > > If your solr update lifecycle includes frequent deletes, try this > out. > > > > > > > > This of course does not override working towards finding better > > > > GCparameters. > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching > > > > > > > > > > > > On Mon, Mar 31, 2014 at 3:57 PM, elisabeth benoit < > > > > elisaelisael...@gmail.com > > > > > wrote: > > > > > > > > > Hello, > > > > > > > > > > We are currently using solr 4.2.1. Our index is updated on a daily > > > basis. > > > > > After noticing solr query time has increased (two times the initial > > > size) > > > > > without any change in index size or in solr configuration, we tried > > an > > > > > optimize on the index but it didn't fix our problem. We checked the > > > > garbage > > > > > collector, but everything seemed fine. What did in fact fix our > > problem > > > > was > > > > > to delete all documents and reindex from scratch. > > > > > > > > > > It looks like over time our index gets "corrupted" and optimize > > doesn't > > > > fix > > > > > it. Does anyone have a clue how to investigate further this > > situation? > > > > > > > > > > > > > > > Elisabeth > > > > > > > > > > > > > > > > > > > > > -- > > > > Dmitry > > > > Blog: http://dmitrykan.blogspot.com > > > > Twitter: http://twitter.com/dmitrykan > > > > > > > > > > > > > > > -- > > Dmitry > > Blog: http://dmitrykan.blogspot.com > > Twitter: http://twitter.com/dmitrykan > > > > > > -- > Dmitry > Blog: http://dmitrykan.blogspot.com > Twitter: http://twitter.com/dmitrykan >
Re: Re: solr 4.2.1 index gets slower over time
Thanks, Markus, that is useful. I'm guessing the higher the weight, the longer the op takes? On Tue, Apr 1, 2014 at 10:39 PM, Markus Jelsma wrote: > You may want to increase reclaimdeletesweight for tieredmergepolicy from 2 > to 3 or 4. By default it may keep too much deleted or updated docs in the > index. This can increase index size by 50%!! Dmitry Kan < > solrexp...@gmail.com> schreef:Elisabeth, > > Yes, I believe you are right in that the deletes are part of the optimize > process. If you delete often, you may consider (if not already) the > TieredMergePolicy, which is suited for this scenario. Check out this > relevant discussion I had with Lucene committers: > https://twitter.com/DmitryKan/status/399820408444051456 > > HTH, > > Dmitry > > > On Tue, Apr 1, 2014 at 11:34 AM, elisabeth benoit < > elisaelisael...@gmail.com > > wrote: > > > Thanks a lot for your answers! > > > > Shawn. Our GC configuration has far less parameters defined, so we'll > check > > this out. > > > > Dimitry, about the expungeDeletes option, we'll add that in the delete > > process. But from what I read, this is done in the optimize process (cf. > > > > > http://lucene.472066.n3.nabble.com/Does-expungeDeletes-need-calling-during-an-optimize-td1214083.html > > ). > > Or maybe not? > > > > Thanks again, > > Elisabeth > > > > > > 2014-04-01 7:52 GMT+02:00 Dmitry Kan : > > > > > Hi, > > > > > > We have noticed something like this as well, but with older versions of > > > solr, 3.4. In our setup we delete documents pretty often. Internally in > > > Lucene, when a document is client requested to be deleted, it is not > > > physically deleted, but only marked as "deleted". Our original > > optimization > > > assumption was such that the "deleted" documents would get physically > > > removed on each optimize command issued. We started to suspect it > wasn't > > > always true as the shards (especially relatively large shards) became > > > slower over time. So we found out about the expungeDeletes option, > which > > > purges the "deleted" docs and is by default false. We have set it to > > true. > > > If your solr update lifecycle includes frequent deletes, try this out. > > > > > > This of course does not override working towards finding better > > > GCparameters. > > > > > > > > > https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching > > > > > > > > > On Mon, Mar 31, 2014 at 3:57 PM, elisabeth benoit < > > > elisaelisael...@gmail.com > > > > wrote: > > > > > > > Hello, > > > > > > > > We are currently using solr 4.2.1. Our index is updated on a daily > > basis. > > > > After noticing solr query time has increased (two times the initial > > size) > > > > without any change in index size or in solr configuration, we tried > an > > > > optimize on the index but it didn't fix our problem. We checked the > > > garbage > > > > collector, but everything seemed fine. What did in fact fix our > problem > > > was > > > > to delete all documents and reindex from scratch. > > > > > > > > It looks like over time our index gets "corrupted" and optimize > doesn't > > > fix > > > > it. Does anyone have a clue how to investigate further this > situation? > > > > > > > > > > > > Elisabeth > > > > > > > > > > > > > > > > -- > > > Dmitry > > > Blog: http://dmitrykan.blogspot.com > > > Twitter: http://twitter.com/dmitrykan > > > > > > > > > -- > Dmitry > Blog: http://dmitrykan.blogspot.com > Twitter: http://twitter.com/dmitrykan > -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan
Re: Re: solr 4.2.1 index gets slower over time
You may want to increase reclaimdeletesweight for tieredmergepolicy from 2 to 3 or 4. By default it may keep too much deleted or updated docs in the index. This can increase index size by 50%!! Dmitry Kan schreef:Elisabeth, Yes, I believe you are right in that the deletes are part of the optimize process. If you delete often, you may consider (if not already) the TieredMergePolicy, which is suited for this scenario. Check out this relevant discussion I had with Lucene committers: https://twitter.com/DmitryKan/status/399820408444051456 HTH, Dmitry On Tue, Apr 1, 2014 at 11:34 AM, elisabeth benoit wrote: > Thanks a lot for your answers! > > Shawn. Our GC configuration has far less parameters defined, so we'll check > this out. > > Dimitry, about the expungeDeletes option, we'll add that in the delete > process. But from what I read, this is done in the optimize process (cf. > > http://lucene.472066.n3.nabble.com/Does-expungeDeletes-need-calling-during-an-optimize-td1214083.html > ). > Or maybe not? > > Thanks again, > Elisabeth > > > 2014-04-01 7:52 GMT+02:00 Dmitry Kan : > > > Hi, > > > > We have noticed something like this as well, but with older versions of > > solr, 3.4. In our setup we delete documents pretty often. Internally in > > Lucene, when a document is client requested to be deleted, it is not > > physically deleted, but only marked as "deleted". Our original > optimization > > assumption was such that the "deleted" documents would get physically > > removed on each optimize command issued. We started to suspect it wasn't > > always true as the shards (especially relatively large shards) became > > slower over time. So we found out about the expungeDeletes option, which > > purges the "deleted" docs and is by default false. We have set it to > true. > > If your solr update lifecycle includes frequent deletes, try this out. > > > > This of course does not override working towards finding better > > GCparameters. > > > > > https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching > > > > > > On Mon, Mar 31, 2014 at 3:57 PM, elisabeth benoit < > > elisaelisael...@gmail.com > > > wrote: > > > > > Hello, > > > > > > We are currently using solr 4.2.1. Our index is updated on a daily > basis. > > > After noticing solr query time has increased (two times the initial > size) > > > without any change in index size or in solr configuration, we tried an > > > optimize on the index but it didn't fix our problem. We checked the > > garbage > > > collector, but everything seemed fine. What did in fact fix our problem > > was > > > to delete all documents and reindex from scratch. > > > > > > It looks like over time our index gets "corrupted" and optimize doesn't > > fix > > > it. Does anyone have a clue how to investigate further this situation? > > > > > > > > > Elisabeth > > > > > > > > > > > -- > > Dmitry > > Blog: http://dmitrykan.blogspot.com > > Twitter: http://twitter.com/dmitrykan > > > -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan
Re: solr 4.2.1 index gets slower over time
Elisabeth, Yes, I believe you are right in that the deletes are part of the optimize process. If you delete often, you may consider (if not already) the TieredMergePolicy, which is suited for this scenario. Check out this relevant discussion I had with Lucene committers: https://twitter.com/DmitryKan/status/399820408444051456 HTH, Dmitry On Tue, Apr 1, 2014 at 11:34 AM, elisabeth benoit wrote: > Thanks a lot for your answers! > > Shawn. Our GC configuration has far less parameters defined, so we'll check > this out. > > Dimitry, about the expungeDeletes option, we'll add that in the delete > process. But from what I read, this is done in the optimize process (cf. > > http://lucene.472066.n3.nabble.com/Does-expungeDeletes-need-calling-during-an-optimize-td1214083.html > ). > Or maybe not? > > Thanks again, > Elisabeth > > > 2014-04-01 7:52 GMT+02:00 Dmitry Kan : > > > Hi, > > > > We have noticed something like this as well, but with older versions of > > solr, 3.4. In our setup we delete documents pretty often. Internally in > > Lucene, when a document is client requested to be deleted, it is not > > physically deleted, but only marked as "deleted". Our original > optimization > > assumption was such that the "deleted" documents would get physically > > removed on each optimize command issued. We started to suspect it wasn't > > always true as the shards (especially relatively large shards) became > > slower over time. So we found out about the expungeDeletes option, which > > purges the "deleted" docs and is by default false. We have set it to > true. > > If your solr update lifecycle includes frequent deletes, try this out. > > > > This of course does not override working towards finding better > > GCparameters. > > > > > https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching > > > > > > On Mon, Mar 31, 2014 at 3:57 PM, elisabeth benoit < > > elisaelisael...@gmail.com > > > wrote: > > > > > Hello, > > > > > > We are currently using solr 4.2.1. Our index is updated on a daily > basis. > > > After noticing solr query time has increased (two times the initial > size) > > > without any change in index size or in solr configuration, we tried an > > > optimize on the index but it didn't fix our problem. We checked the > > garbage > > > collector, but everything seemed fine. What did in fact fix our problem > > was > > > to delete all documents and reindex from scratch. > > > > > > It looks like over time our index gets "corrupted" and optimize doesn't > > fix > > > it. Does anyone have a clue how to investigate further this situation? > > > > > > > > > Elisabeth > > > > > > > > > > > -- > > Dmitry > > Blog: http://dmitrykan.blogspot.com > > Twitter: http://twitter.com/dmitrykan > > > -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan
Re: solr 4.2.1 index gets slower over time
Thanks a lot for your answers! Shawn. Our GC configuration has far less parameters defined, so we'll check this out. Dimitry, about the expungeDeletes option, we'll add that in the delete process. But from what I read, this is done in the optimize process (cf. http://lucene.472066.n3.nabble.com/Does-expungeDeletes-need-calling-during-an-optimize-td1214083.html). Or maybe not? Thanks again, Elisabeth 2014-04-01 7:52 GMT+02:00 Dmitry Kan : > Hi, > > We have noticed something like this as well, but with older versions of > solr, 3.4. In our setup we delete documents pretty often. Internally in > Lucene, when a document is client requested to be deleted, it is not > physically deleted, but only marked as "deleted". Our original optimization > assumption was such that the "deleted" documents would get physically > removed on each optimize command issued. We started to suspect it wasn't > always true as the shards (especially relatively large shards) became > slower over time. So we found out about the expungeDeletes option, which > purges the "deleted" docs and is by default false. We have set it to true. > If your solr update lifecycle includes frequent deletes, try this out. > > This of course does not override working towards finding better > GCparameters. > > https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching > > > On Mon, Mar 31, 2014 at 3:57 PM, elisabeth benoit < > elisaelisael...@gmail.com > > wrote: > > > Hello, > > > > We are currently using solr 4.2.1. Our index is updated on a daily basis. > > After noticing solr query time has increased (two times the initial size) > > without any change in index size or in solr configuration, we tried an > > optimize on the index but it didn't fix our problem. We checked the > garbage > > collector, but everything seemed fine. What did in fact fix our problem > was > > to delete all documents and reindex from scratch. > > > > It looks like over time our index gets "corrupted" and optimize doesn't > fix > > it. Does anyone have a clue how to investigate further this situation? > > > > > > Elisabeth > > > > > > -- > Dmitry > Blog: http://dmitrykan.blogspot.com > Twitter: http://twitter.com/dmitrykan >
Re: solr 4.2.1 index gets slower over time
Hi, We have noticed something like this as well, but with older versions of solr, 3.4. In our setup we delete documents pretty often. Internally in Lucene, when a document is client requested to be deleted, it is not physically deleted, but only marked as "deleted". Our original optimization assumption was such that the "deleted" documents would get physically removed on each optimize command issued. We started to suspect it wasn't always true as the shards (especially relatively large shards) became slower over time. So we found out about the expungeDeletes option, which purges the "deleted" docs and is by default false. We have set it to true. If your solr update lifecycle includes frequent deletes, try this out. This of course does not override working towards finding better GCparameters. https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching On Mon, Mar 31, 2014 at 3:57 PM, elisabeth benoit wrote: > Hello, > > We are currently using solr 4.2.1. Our index is updated on a daily basis. > After noticing solr query time has increased (two times the initial size) > without any change in index size or in solr configuration, we tried an > optimize on the index but it didn't fix our problem. We checked the garbage > collector, but everything seemed fine. What did in fact fix our problem was > to delete all documents and reindex from scratch. > > It looks like over time our index gets "corrupted" and optimize doesn't fix > it. Does anyone have a clue how to investigate further this situation? > > > Elisabeth > -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan
Re: solr 4.2.1 index gets slower over time
On 3/31/2014 9:03 AM, elisabeth benoit wrote: We use JVisualVM. The CPU usage is very high (90%), but the GC activity shows less than 0.01% average activity. Plus the heap usage stays low (below 4G while the max heap size is 16G). Do you have a different tool to suggest to check the GC? Do you think there is something else me might not see? You can't get actual usable GC pause information from jvisualvm or jconsole, only totals and averages. Those tools seem to be geared more towards seeing problems when your heap is too small. To see real pause information, you can turn on GC logging and then run the log through a tool like GCLogViewer to see a graph of your collection pauses. What I used to initially see the problem was a program called jHiccup, which will show *ANY* pause, not just those caused by garbage collection. GC is almost always the reason there is a pause, though. http://www.azulsystems.com/jHiccup https://code.google.com/p/gclogviewer/ You can still have long GC pauses even if your max heap isn't reached. Have you provided any GC-related options to your JVM at all? With a heap size of 4GB and a max heap of 16GB, I can absolutely guarantee that you will have pause problems unless you provide the JVM with a number of tuning options. I was frequently having pauses as high as 10 to 12 seconds on an 8GB heap, even after I switched to CMS. Further tuning was required. These options made the situation a lot better, but I think they can probably be improved: http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning More expanded info, which references the link above: http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems Thanks, Shawn
Re: solr 4.2.1 index gets slower over time
Hello, Thanks for your answer. We use JVisualVM. The CPU usage is very high (90%), but the GC activity shows less than 0.01% average activity. Plus the heap usage stays low (below 4G while the max heap size is 16G). Do you have a different tool to suggest to check the GC? Do you think there is something else me might not see? Thanks again, Elisabeth 2014-03-31 16:26 GMT+02:00 Shawn Heisey : > On 3/31/2014 6:57 AM, elisabeth benoit wrote: > > We are currently using solr 4.2.1. Our index is updated on a daily basis. > > After noticing solr query time has increased (two times the initial size) > > without any change in index size or in solr configuration, we tried an > > optimize on the index but it didn't fix our problem. We checked the > garbage > > collector, but everything seemed fine. What did in fact fix our problem > was > > to delete all documents and reindex from scratch. > > > > It looks like over time our index gets "corrupted" and optimize doesn't > fix > > it. Does anyone have a clue how to investigate further this situation? > > That seems very odd. I have one production copy of my index using > 4.2.1, and it has been working fine for quite a long time. We are > transitioning to Solr 4.6.1 now, so the other copy is running that > version. We do occasionally do a full rebuild, but that is for index > content, not for any problems. > > When you say you checked your garbage collector, what tools did you use? > I was having GC pause problems, but I didn't know it until I started > using different tools. > > Thanks, > Shawn > >
Re: solr 4.2.1 index gets slower over time
On 3/31/2014 6:57 AM, elisabeth benoit wrote: > We are currently using solr 4.2.1. Our index is updated on a daily basis. > After noticing solr query time has increased (two times the initial size) > without any change in index size or in solr configuration, we tried an > optimize on the index but it didn't fix our problem. We checked the garbage > collector, but everything seemed fine. What did in fact fix our problem was > to delete all documents and reindex from scratch. > > It looks like over time our index gets "corrupted" and optimize doesn't fix > it. Does anyone have a clue how to investigate further this situation? That seems very odd. I have one production copy of my index using 4.2.1, and it has been working fine for quite a long time. We are transitioning to Solr 4.6.1 now, so the other copy is running that version. We do occasionally do a full rebuild, but that is for index content, not for any problems. When you say you checked your garbage collector, what tools did you use? I was having GC pause problems, but I didn't know it until I started using different tools. Thanks, Shawn