Re: solr.NRTCachingDirectoryFactory
Thanks Michail. I am unable to locate bottleneck so far. Will try jstack and other tools. On 8/25/16 11:40 PM, Mikhail Khludnev wrote: Rough sampling under load makes sense as usual. JMC is one of the suitable tools for this. Sometimes even just jstack or looking at SolrAdmin/Threads is enough. If the only small ratio of documents is updated and a bottleneck is filterCache you can experiment with segmened filters which suite more for NRT. http://blog-archive.griddynamics.com/2014/01/segmented-filter-cache-in-solr.html On Fri, Aug 26, 2016 at 2:56 AM, Rallavagu wrote: Follow up update ... Set autowarm count to zero for caches for NRT and I could negotiate latency from 2 min to 5 min :) However, still seeing high QTimes and wondering where else can I look? Should I debug the code or run some tools to isolate bottlenecks (disk I/O, CPU or Query itself). Looking for some tuning advice. Thanks. On 7/26/16 9:42 AM, Erick Erickson wrote: And, I might add, you should look through your old logs and see how long it takes to open a searcher. Let's say Shawn's lower bound is what you see, i.e. it takes a minute each to execute all the autowarming in filterCache and queryResultCache... So you're current latency is _at least_ 2 minutes between the time something is indexed and it's available for search just for autowarming. Plus up to another 2 minutes for your soft commit interval to expire. So if your business people haven't noticed a 4 minute latency yet, tell them they don't know what they're talking about when they insist on the NRT interval being a few seconds ;). Best, Erick On Tue, Jul 26, 2016 at 7:20 AM, Rallavagu wrote: On 7/26/16 5:46 AM, Shawn Heisey wrote: On 7/22/2016 10:15 AM, Rallavagu wrote: size="2" initialSize="2" autowarmCount="500"/> As Erick indicated, these settings are incompatible with Near Real Time updates. With those settings, every time you commit and create a new searcher, Solr will execute up to 1000 queries (potentially 500 for each of the caches above) before that new searcher will begin returning new results. I do not know how fast your filter queries execute when they aren't cached... but even if they only take 100 milliseconds each, that's could take up to a minute for filterCache warming. If each one takes two seconds and there are 500 entries in the cache, then autowarming the filterCache would take nearly 17 minutes. You would also need to wait for the warming queries on queryResultCache. The autowarmCount on my filterCache is 4, and warming that cache *still* sometimes takes ten or more seconds to complete. If you want true NRT, you need to set all your autowarmCount values to zero. The tradeoff with NRT is that your caches are ineffective immediately after a new searcher is created. Will look into this and make changes as suggested. Looking at the "top" screenshot ... you have plenty of memory to cache the entire index. Unless your queries are extreme, this is usually enough for good performance. One possible problem is that cache warming is taking far longer than your autoSoftCommit interval, and the server is constantly busy making thousands of warming queries. Reducing autowarmCount, possibly to zero, *might* fix that. I would expect higher CPU load than what your screenshot shows if this were happening, but it still might be the problem. Great point. Thanks for the help. Thanks, Shawn
Re: solr.NRTCachingDirectoryFactory
Rough sampling under load makes sense as usual. JMC is one of the suitable tools for this. Sometimes even just jstack or looking at SolrAdmin/Threads is enough. If the only small ratio of documents is updated and a bottleneck is filterCache you can experiment with segmened filters which suite more for NRT. http://blog-archive.griddynamics.com/2014/01/segmented-filter-cache-in-solr.html On Fri, Aug 26, 2016 at 2:56 AM, Rallavagu wrote: > Follow up update ... > > Set autowarm count to zero for caches for NRT and I could negotiate > latency from 2 min to 5 min :) > > However, still seeing high QTimes and wondering where else can I look? > Should I debug the code or run some tools to isolate bottlenecks (disk I/O, > CPU or Query itself). Looking for some tuning advice. Thanks. > > > On 7/26/16 9:42 AM, Erick Erickson wrote: > >> And, I might add, you should look through your old logs >> and see how long it takes to open a searcher. Let's >> say Shawn's lower bound is what you see, i.e. >> it takes a minute each to execute all the autowarming >> in filterCache and queryResultCache... So you're current >> latency is _at least_ 2 minutes between the time something >> is indexed and it's available for search just for autowarming. >> >> Plus up to another 2 minutes for your soft commit interval >> to expire. >> >> So if your business people haven't noticed a 4 minute >> latency yet, tell them they don't know what they're talking >> about when they insist on the NRT interval being a few >> seconds ;). >> >> Best, >> Erick >> >> On Tue, Jul 26, 2016 at 7:20 AM, Rallavagu wrote: >> >>> >>> >>> On 7/26/16 5:46 AM, Shawn Heisey wrote: >>> On 7/22/2016 10:15 AM, Rallavagu wrote: > > size="5000" > initialSize="5000" > autowarmCount="500"/> > > size="2" > initialSize="2" > autowarmCount="500"/> > As Erick indicated, these settings are incompatible with Near Real Time updates. With those settings, every time you commit and create a new searcher, Solr will execute up to 1000 queries (potentially 500 for each of the caches above) before that new searcher will begin returning new results. I do not know how fast your filter queries execute when they aren't cached... but even if they only take 100 milliseconds each, that's could take up to a minute for filterCache warming. If each one takes two seconds and there are 500 entries in the cache, then autowarming the filterCache would take nearly 17 minutes. You would also need to wait for the warming queries on queryResultCache. The autowarmCount on my filterCache is 4, and warming that cache *still* sometimes takes ten or more seconds to complete. If you want true NRT, you need to set all your autowarmCount values to zero. The tradeoff with NRT is that your caches are ineffective immediately after a new searcher is created. >>> >>> Will look into this and make changes as suggested. >>> >>> Looking at the "top" screenshot ... you have plenty of memory to cache the entire index. Unless your queries are extreme, this is usually enough for good performance. One possible problem is that cache warming is taking far longer than your autoSoftCommit interval, and the server is constantly busy making thousands of warming queries. Reducing autowarmCount, possibly to zero, *might* fix that. I would expect higher CPU load than what your screenshot shows if this were happening, but it still might be the problem. >>> >>> Great point. Thanks for the help. >>> >>> Thanks, Shawn >>> -- Sincerely yours Mikhail Khludnev
Re: solr.NRTCachingDirectoryFactory
Follow up update ... Set autowarm count to zero for caches for NRT and I could negotiate latency from 2 min to 5 min :) However, still seeing high QTimes and wondering where else can I look? Should I debug the code or run some tools to isolate bottlenecks (disk I/O, CPU or Query itself). Looking for some tuning advice. Thanks. On 7/26/16 9:42 AM, Erick Erickson wrote: And, I might add, you should look through your old logs and see how long it takes to open a searcher. Let's say Shawn's lower bound is what you see, i.e. it takes a minute each to execute all the autowarming in filterCache and queryResultCache... So you're current latency is _at least_ 2 minutes between the time something is indexed and it's available for search just for autowarming. Plus up to another 2 minutes for your soft commit interval to expire. So if your business people haven't noticed a 4 minute latency yet, tell them they don't know what they're talking about when they insist on the NRT interval being a few seconds ;). Best, Erick On Tue, Jul 26, 2016 at 7:20 AM, Rallavagu wrote: On 7/26/16 5:46 AM, Shawn Heisey wrote: On 7/22/2016 10:15 AM, Rallavagu wrote: As Erick indicated, these settings are incompatible with Near Real Time updates. With those settings, every time you commit and create a new searcher, Solr will execute up to 1000 queries (potentially 500 for each of the caches above) before that new searcher will begin returning new results. I do not know how fast your filter queries execute when they aren't cached... but even if they only take 100 milliseconds each, that's could take up to a minute for filterCache warming. If each one takes two seconds and there are 500 entries in the cache, then autowarming the filterCache would take nearly 17 minutes. You would also need to wait for the warming queries on queryResultCache. The autowarmCount on my filterCache is 4, and warming that cache *still* sometimes takes ten or more seconds to complete. If you want true NRT, you need to set all your autowarmCount values to zero. The tradeoff with NRT is that your caches are ineffective immediately after a new searcher is created. Will look into this and make changes as suggested. Looking at the "top" screenshot ... you have plenty of memory to cache the entire index. Unless your queries are extreme, this is usually enough for good performance. One possible problem is that cache warming is taking far longer than your autoSoftCommit interval, and the server is constantly busy making thousands of warming queries. Reducing autowarmCount, possibly to zero, *might* fix that. I would expect higher CPU load than what your screenshot shows if this were happening, but it still might be the problem. Great point. Thanks for the help. Thanks, Shawn
Re: solr.NRTCachingDirectoryFactory
And, I might add, you should look through your old logs and see how long it takes to open a searcher. Let's say Shawn's lower bound is what you see, i.e. it takes a minute each to execute all the autowarming in filterCache and queryResultCache... So you're current latency is _at least_ 2 minutes between the time something is indexed and it's available for search just for autowarming. Plus up to another 2 minutes for your soft commit interval to expire. So if your business people haven't noticed a 4 minute latency yet, tell them they don't know what they're talking about when they insist on the NRT interval being a few seconds ;). Best, Erick On Tue, Jul 26, 2016 at 7:20 AM, Rallavagu wrote: > > > On 7/26/16 5:46 AM, Shawn Heisey wrote: >> >> On 7/22/2016 10:15 AM, Rallavagu wrote: >>> >>> >> size="5000" >>> initialSize="5000" >>> autowarmCount="500"/> >>> >> >>> >> size="2" >>> initialSize="2" >>> autowarmCount="500"/> >> >> >> As Erick indicated, these settings are incompatible with Near Real Time >> updates. >> >> With those settings, every time you commit and create a new searcher, >> Solr will execute up to 1000 queries (potentially 500 for each of the >> caches above) before that new searcher will begin returning new results. >> >> I do not know how fast your filter queries execute when they aren't >> cached... but even if they only take 100 milliseconds each, that's could >> take up to a minute for filterCache warming. If each one takes two >> seconds and there are 500 entries in the cache, then autowarming the >> filterCache would take nearly 17 minutes. You would also need to wait >> for the warming queries on queryResultCache. >> >> The autowarmCount on my filterCache is 4, and warming that cache *still* >> sometimes takes ten or more seconds to complete. >> >> If you want true NRT, you need to set all your autowarmCount values to >> zero. The tradeoff with NRT is that your caches are ineffective >> immediately after a new searcher is created. > > Will look into this and make changes as suggested. > >> >> Looking at the "top" screenshot ... you have plenty of memory to cache >> the entire index. Unless your queries are extreme, this is usually >> enough for good performance. >> >> One possible problem is that cache warming is taking far longer than >> your autoSoftCommit interval, and the server is constantly busy making >> thousands of warming queries. Reducing autowarmCount, possibly to zero, >> *might* fix that. I would expect higher CPU load than what your >> screenshot shows if this were happening, but it still might be the >> problem. > > Great point. Thanks for the help. > >> >> Thanks, >> Shawn >> >
Re: solr.NRTCachingDirectoryFactory
On 7/26/16 5:46 AM, Shawn Heisey wrote: On 7/22/2016 10:15 AM, Rallavagu wrote: As Erick indicated, these settings are incompatible with Near Real Time updates. With those settings, every time you commit and create a new searcher, Solr will execute up to 1000 queries (potentially 500 for each of the caches above) before that new searcher will begin returning new results. I do not know how fast your filter queries execute when they aren't cached... but even if they only take 100 milliseconds each, that's could take up to a minute for filterCache warming. If each one takes two seconds and there are 500 entries in the cache, then autowarming the filterCache would take nearly 17 minutes. You would also need to wait for the warming queries on queryResultCache. The autowarmCount on my filterCache is 4, and warming that cache *still* sometimes takes ten or more seconds to complete. If you want true NRT, you need to set all your autowarmCount values to zero. The tradeoff with NRT is that your caches are ineffective immediately after a new searcher is created. Will look into this and make changes as suggested. Looking at the "top" screenshot ... you have plenty of memory to cache the entire index. Unless your queries are extreme, this is usually enough for good performance. One possible problem is that cache warming is taking far longer than your autoSoftCommit interval, and the server is constantly busy making thousands of warming queries. Reducing autowarmCount, possibly to zero, *might* fix that. I would expect higher CPU load than what your screenshot shows if this were happening, but it still might be the problem. Great point. Thanks for the help. Thanks, Shawn
Re: solr.NRTCachingDirectoryFactory
On 7/22/2016 10:15 AM, Rallavagu wrote: > size="5000" > initialSize="5000" > autowarmCount="500"/> > > size="2" > initialSize="2" > autowarmCount="500"/> As Erick indicated, these settings are incompatible with Near Real Time updates. With those settings, every time you commit and create a new searcher, Solr will execute up to 1000 queries (potentially 500 for each of the caches above) before that new searcher will begin returning new results. I do not know how fast your filter queries execute when they aren't cached... but even if they only take 100 milliseconds each, that's could take up to a minute for filterCache warming. If each one takes two seconds and there are 500 entries in the cache, then autowarming the filterCache would take nearly 17 minutes. You would also need to wait for the warming queries on queryResultCache. The autowarmCount on my filterCache is 4, and warming that cache *still* sometimes takes ten or more seconds to complete. If you want true NRT, you need to set all your autowarmCount values to zero. The tradeoff with NRT is that your caches are ineffective immediately after a new searcher is created. Looking at the "top" screenshot ... you have plenty of memory to cache the entire index. Unless your queries are extreme, this is usually enough for good performance. One possible problem is that cache warming is taking far longer than your autoSoftCommit interval, and the server is constantly busy making thousands of warming queries. Reducing autowarmCount, possibly to zero, *might* fix that. I would expect higher CPU load than what your screenshot shows if this were happening, but it still might be the problem. Thanks, Shawn
Re: solr.NRTCachingDirectoryFactory
On 7/22/16 9:56 AM, Erick Erickson wrote: OK, scratch autowarming. In fact your autowarm counts are quite high, I suspect far past "diminishing returns". I usually see autowarm counts < 64, but YMMV. Are you seeing actual hit ratios that are decent on those caches (admin UI>>plugins/stats>>cache>>...) And your cache sizes are also quite high in my experience, it's probably worth measuring the utilization there as well. And, BTW, your filterCache can occupy up to 2G of your heap. That's probably not your central problem, but it's something to consider. Will look into it. So I don't know why your queries are taking that long, my assumption is that they may simply be very complex queries, or you have grouping on or. Queries are a bit complex for sure. I guess the next thing I'd do is start trying to characterize what queries are slow. Grouping? Pivot Faceting? 'cause from everything you've said so far it's surprising that you're seeing queries take this long, something doesn't feel right but what it is I don't have a clue. Thanks Best, Erick On Fri, Jul 22, 2016 at 9:15 AM, Rallavagu wrote: On 7/22/16 8:34 AM, Erick Erickson wrote: Mostly this sounds like a problem that could be cured with autowarming. But two things are conflicting here: 1> you say "We have a requirement to have updates available immediately (NRT)" 2> your docs aren't available for 120 seconds given your autoSoftCommit settings unless you're specifying -Dsolr.autoSoftCommit.maxTime=some_other_interval as a startup parameter. Yes. We have 120 seconds available. So assuming you really do have a 120 second autocommit time, you should be able to smooth out the spikes by appropriate autowarming. You also haven't indicated what your filterCache and queryResultCache settings are. They come with a default of 0 for autowarm. But what is their size? And do you see a correlation between longer queries every on 2 minute intervals? And do you have some test harness in place (jmeter works well) to demonstrate that differences in your configuration help or hurt? I can't over-emphasize the importance of this, otherwise if you rely on somebody simply saying "it's slow" you have no way to know what effect changes have. Here is the cache configuration. We have run load tests using JMeter with directory pointing to Solr and also tests that are pointing to the application that queries Solr. In both cases, we have noticed the results being slower. Thanks Best, Erick On Thu, Jul 21, 2016 at 11:22 PM, Shawn Heisey wrote: On 7/21/2016 11:25 PM, Rallavagu wrote: There is no other software running on the system and it is completely dedicated to Solr. It is running on Linux. Here is the full version. Linux version 3.8.13-55.1.6.el7uek.x86_64 (mockbu...@ca-build56.us.oracle.com) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) #2 SMP Wed Feb 11 14:18:22 PST 2015 Run the top program, press shift-M to sort by memory usage, and then grab a screenshot of the terminal window. Share it with a site like dropbox, imgur, or something similar, and send the URL. You'll end up with something like this: https://www.dropbox.com/s/zlvpvd0rrr14yit/linux-solr-top.png?dl=0 If you know what to look for, you can figure out all the relevant memory details from that. Thanks, Shawn
Re: solr.NRTCachingDirectoryFactory
Also, here is the link to screenshot. https://dl.dropboxusercontent.com/u/39813705/Screen%20Shot%202016-07-22%20at%2010.40.21%20AM.png Thanks On 7/21/16 11:22 PM, Shawn Heisey wrote: On 7/21/2016 11:25 PM, Rallavagu wrote: There is no other software running on the system and it is completely dedicated to Solr. It is running on Linux. Here is the full version. Linux version 3.8.13-55.1.6.el7uek.x86_64 (mockbu...@ca-build56.us.oracle.com) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) #2 SMP Wed Feb 11 14:18:22 PST 2015 Run the top program, press shift-M to sort by memory usage, and then grab a screenshot of the terminal window. Share it with a site like dropbox, imgur, or something similar, and send the URL. You'll end up with something like this: https://www.dropbox.com/s/zlvpvd0rrr14yit/linux-solr-top.png?dl=0 If you know what to look for, you can figure out all the relevant memory details from that. Thanks, Shawn
Re: solr.NRTCachingDirectoryFactory
Here is the snapshot of memory usage from "top" as you mentioned. First row is "solr" process. Thanks. PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 29468 solr 20 0 27.536g 0.013t 3.297g S 45.7 27.6 4251:45 java 21366 root 20 0 14.499g 217824 12952 S 1.0 0.4 192:11.54 java 2077 root 20 0 14.049g 190824 9980 S 0.7 0.4 62:44.00 java 511 root 20 0 125792 56848 56616 S 0.0 0.1 9:33.23 systemd-journal 316 splunk20 0 232056 44284 11804 S 0.7 0.1 84:52.74 splunkd 1045 root 20 0 257680 39956 6836 S 0.3 0.1 7:05.78 puppet 32631 root 20 0 360956 39292 4788 S 0.0 0.1 4:55.37 mcollectived 703 root 20 0 250372 9000976 S 0.0 0.0 1:35.52 rsyslogd 1058 nslcd 20 0 454192 6004 2996 S 0.0 0.0 15:08.87 nslcd On 7/21/16 11:22 PM, Shawn Heisey wrote: On 7/21/2016 11:25 PM, Rallavagu wrote: There is no other software running on the system and it is completely dedicated to Solr. It is running on Linux. Here is the full version. Linux version 3.8.13-55.1.6.el7uek.x86_64 (mockbu...@ca-build56.us.oracle.com) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) #2 SMP Wed Feb 11 14:18:22 PST 2015 Run the top program, press shift-M to sort by memory usage, and then grab a screenshot of the terminal window. Share it with a site like dropbox, imgur, or something similar, and send the URL. You'll end up with something like this: https://www.dropbox.com/s/zlvpvd0rrr14yit/linux-solr-top.png?dl=0 If you know what to look for, you can figure out all the relevant memory details from that. Thanks, Shawn
Re: solr.NRTCachingDirectoryFactory
OK, scratch autowarming. In fact your autowarm counts are quite high, I suspect far past "diminishing returns". I usually see autowarm counts < 64, but YMMV. Are you seeing actual hit ratios that are decent on those caches (admin UI>>plugins/stats>>cache>>...) And your cache sizes are also quite high in my experience, it's probably worth measuring the utilization there as well. And, BTW, your filterCache can occupy up to 2G of your heap. That's probably not your central problem, but it's something to consider. So I don't know why your queries are taking that long, my assumption is that they may simply be very complex queries, or you have grouping on or. I guess the next thing I'd do is start trying to characterize what queries are slow. Grouping? Pivot Faceting? 'cause from everything you've said so far it's surprising that you're seeing queries take this long, something doesn't feel right but what it is I don't have a clue. Best, Erick On Fri, Jul 22, 2016 at 9:15 AM, Rallavagu wrote: > > > On 7/22/16 8:34 AM, Erick Erickson wrote: >> >> Mostly this sounds like a problem that could be cured with >> autowarming. But two things are conflicting here: >> 1> you say "We have a requirement to have updates available immediately >> (NRT)" >> 2> your docs aren't available for 120 seconds given your autoSoftCommit >> settings unless you're specifying >> -Dsolr.autoSoftCommit.maxTime=some_other_interval >> as a startup parameter. >> > Yes. We have 120 seconds available. > >> So assuming you really do have a 120 second autocommit time, you should be >> able to smooth out the spikes by appropriate autowarming. You also haven't >> indicated what your filterCache and queryResultCache settings are. They >> come with a default of 0 for autowarm. But what is their size? And do you >> see a correlation between longer queries every on 2 minute intervals? And >> do you have some test harness in place (jmeter works well) to demonstrate >> that differences in your configuration help or hurt? I can't >> over-emphasize the >> importance of this, otherwise if you rely on somebody simply saying "it's >> slow" >> you have no way to know what effect changes have. > > > Here is the cache configuration. > > size="5000" > initialSize="5000" > autowarmCount="500"/> > > > size="2" > initialSize="2" > autowarmCount="500"/> > > > size="10" >initialSize="10" >autowarmCount="0"/> > > We have run load tests using JMeter with directory pointing to Solr and also > tests that are pointing to the application that queries Solr. In both cases, > we have noticed the results being slower. > > Thanks > >> >> Best, >> Erick >> >> >> On Thu, Jul 21, 2016 at 11:22 PM, Shawn Heisey >> wrote: >>> >>> On 7/21/2016 11:25 PM, Rallavagu wrote: There is no other software running on the system and it is completely dedicated to Solr. It is running on Linux. Here is the full version. Linux version 3.8.13-55.1.6.el7uek.x86_64 (mockbu...@ca-build56.us.oracle.com) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) #2 SMP Wed Feb 11 14:18:22 PST 2015 >>> >>> >>> Run the top program, press shift-M to sort by memory usage, and then >>> grab a screenshot of the terminal window. Share it with a site like >>> dropbox, imgur, or something similar, and send the URL. You'll end up >>> with something like this: >>> >>> https://www.dropbox.com/s/zlvpvd0rrr14yit/linux-solr-top.png?dl=0 >>> >>> If you know what to look for, you can figure out all the relevant memory >>> details from that. >>> >>> Thanks, >>> Shawn >>> >
Re: solr.NRTCachingDirectoryFactory
On 7/22/16 8:34 AM, Erick Erickson wrote: Mostly this sounds like a problem that could be cured with autowarming. But two things are conflicting here: 1> you say "We have a requirement to have updates available immediately (NRT)" 2> your docs aren't available for 120 seconds given your autoSoftCommit settings unless you're specifying -Dsolr.autoSoftCommit.maxTime=some_other_interval as a startup parameter. Yes. We have 120 seconds available. So assuming you really do have a 120 second autocommit time, you should be able to smooth out the spikes by appropriate autowarming. You also haven't indicated what your filterCache and queryResultCache settings are. They come with a default of 0 for autowarm. But what is their size? And do you see a correlation between longer queries every on 2 minute intervals? And do you have some test harness in place (jmeter works well) to demonstrate that differences in your configuration help or hurt? I can't over-emphasize the importance of this, otherwise if you rely on somebody simply saying "it's slow" you have no way to know what effect changes have. Here is the cache configuration. We have run load tests using JMeter with directory pointing to Solr and also tests that are pointing to the application that queries Solr. In both cases, we have noticed the results being slower. Thanks Best, Erick On Thu, Jul 21, 2016 at 11:22 PM, Shawn Heisey wrote: On 7/21/2016 11:25 PM, Rallavagu wrote: There is no other software running on the system and it is completely dedicated to Solr. It is running on Linux. Here is the full version. Linux version 3.8.13-55.1.6.el7uek.x86_64 (mockbu...@ca-build56.us.oracle.com) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) #2 SMP Wed Feb 11 14:18:22 PST 2015 Run the top program, press shift-M to sort by memory usage, and then grab a screenshot of the terminal window. Share it with a site like dropbox, imgur, or something similar, and send the URL. You'll end up with something like this: https://www.dropbox.com/s/zlvpvd0rrr14yit/linux-solr-top.png?dl=0 If you know what to look for, you can figure out all the relevant memory details from that. Thanks, Shawn
Re: solr.NRTCachingDirectoryFactory
Mostly this sounds like a problem that could be cured with autowarming. But two things are conflicting here: 1> you say "We have a requirement to have updates available immediately (NRT)" 2> your docs aren't available for 120 seconds given your autoSoftCommit settings unless you're specifying -Dsolr.autoSoftCommit.maxTime=some_other_interval as a startup parameter. So assuming you really do have a 120 second autocommit time, you should be able to smooth out the spikes by appropriate autowarming. You also haven't indicated what your filterCache and queryResultCache settings are. They come with a default of 0 for autowarm. But what is their size? And do you see a correlation between longer queries every on 2 minute intervals? And do you have some test harness in place (jmeter works well) to demonstrate that differences in your configuration help or hurt? I can't over-emphasize the importance of this, otherwise if you rely on somebody simply saying "it's slow" you have no way to know what effect changes have. Best, Erick On Thu, Jul 21, 2016 at 11:22 PM, Shawn Heisey wrote: > On 7/21/2016 11:25 PM, Rallavagu wrote: >> There is no other software running on the system and it is completely >> dedicated to Solr. It is running on Linux. Here is the full version. >> >> Linux version 3.8.13-55.1.6.el7uek.x86_64 >> (mockbu...@ca-build56.us.oracle.com) (gcc version 4.8.3 20140911 (Red >> Hat 4.8.3-9) (GCC) ) #2 SMP Wed Feb 11 14:18:22 PST 2015 > > Run the top program, press shift-M to sort by memory usage, and then > grab a screenshot of the terminal window. Share it with a site like > dropbox, imgur, or something similar, and send the URL. You'll end up > with something like this: > > https://www.dropbox.com/s/zlvpvd0rrr14yit/linux-solr-top.png?dl=0 > > If you know what to look for, you can figure out all the relevant memory > details from that. > > Thanks, > Shawn >
Re: solr.NRTCachingDirectoryFactory
On 7/21/2016 11:25 PM, Rallavagu wrote: > There is no other software running on the system and it is completely > dedicated to Solr. It is running on Linux. Here is the full version. > > Linux version 3.8.13-55.1.6.el7uek.x86_64 > (mockbu...@ca-build56.us.oracle.com) (gcc version 4.8.3 20140911 (Red > Hat 4.8.3-9) (GCC) ) #2 SMP Wed Feb 11 14:18:22 PST 2015 Run the top program, press shift-M to sort by memory usage, and then grab a screenshot of the terminal window. Share it with a site like dropbox, imgur, or something similar, and send the URL. You'll end up with something like this: https://www.dropbox.com/s/zlvpvd0rrr14yit/linux-solr-top.png?dl=0 If you know what to look for, you can figure out all the relevant memory details from that. Thanks, Shawn
Re: solr.NRTCachingDirectoryFactory
On 7/21/16 9:16 PM, Shawn Heisey wrote: On 7/21/2016 9:37 AM, Rallavagu wrote: I suspect swapping as well. But, for my understanding - are the index files from disk memory mapped automatically at the startup time? They are *mapped* at startup time, but they are not *read* at startup. The mapping just sets up a virtual address space for the entire file, but until something actually reads the data from the disk, it will not be in memory. Getting the data in memory is what makes mmap fast. http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html We are not performing "commit" after every update and here is the configuration for softCommit and hardCommit. ${solr.autoCommit.maxTime:15000} false ${solr.autoSoftCommit.maxTime:12} I am seeing QTimes (for searches) swing between 10 seconds - 2 seconds. Some queries were showing the slowness caused to due to faceting (debug=true). Since we have adjusted indexing and facet times are improved but basic query QTime is still high so wondering where can I look? Is there a way to debug (instrument) a query on Solr node? Assuming you have not defined the maxTime system properties mentioned in those configs, that config means you will potentially be creating a new searcher every two minutes ... but if you are sending explicit commits or using commitWithin on your updates, then the true situation may be very different than what's configured here. We have allocated significant amount of RAM (48G total physical memory, 12G heap, Total index disk size is 15G) Assuming there's no other software on the system besides the one instance of Solr with a 12GB heap, this would mean that you have enough room to cache the entire index. What OS are you running on? With that information, I may be able to relay some instructions that will help determine what the complete memory situation is on your server. There is no other software running on the system and it is completely dedicated to Solr. It is running on Linux. Here is the full version. Linux version 3.8.13-55.1.6.el7uek.x86_64 (mockbu...@ca-build56.us.oracle.com) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) #2 SMP Wed Feb 11 14:18:22 PST 2015 Thanks Thanks, Shawn
Re: solr.NRTCachingDirectoryFactory
On 7/21/2016 9:37 AM, Rallavagu wrote: > I suspect swapping as well. But, for my understanding - are the index > files from disk memory mapped automatically at the startup time? They are *mapped* at startup time, but they are not *read* at startup. The mapping just sets up a virtual address space for the entire file, but until something actually reads the data from the disk, it will not be in memory. Getting the data in memory is what makes mmap fast. http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html > We are not performing "commit" after every update and here is the > configuration for softCommit and hardCommit. > > >${solr.autoCommit.maxTime:15000} >false > > > >${solr.autoSoftCommit.maxTime:12} > > > I am seeing QTimes (for searches) swing between 10 seconds - 2 > seconds. Some queries were showing the slowness caused to due to > faceting (debug=true). Since we have adjusted indexing and facet times > are improved but basic query QTime is still high so wondering where > can I look? Is there a way to debug (instrument) a query on Solr node? Assuming you have not defined the maxTime system properties mentioned in those configs, that config means you will potentially be creating a new searcher every two minutes ... but if you are sending explicit commits or using commitWithin on your updates, then the true situation may be very different than what's configured here. >>> We have allocated significant amount of RAM (48G total >>> physical memory, 12G heap, Total index disk size is 15G) Assuming there's no other software on the system besides the one instance of Solr with a 12GB heap, this would mean that you have enough room to cache the entire index. What OS are you running on? With that information, I may be able to relay some instructions that will help determine what the complete memory situation is on your server. Thanks, Shawn
Re: solr.NRTCachingDirectoryFactory
Thanks Erick. On 7/21/16 8:25 AM, Erick Erickson wrote: bq: map index files so "reading from disk" will be as simple and quick as reading from memory hence would not incur any significant performance degradation. Well, if 1> the read has already been done. First time a page of the file is accessed, it must be read from disk. 2> You have enough physical memory that _all_ of the files can be held in memory at once. <2> is a little tricky since the big slowdown comes from swapping eventually. But in an LRU scheme, that may be OK if the oldest pages are the stored=true data which are only accessed to return the top N, not to satisfy the search. I suspect swapping as well. But, for my understanding - are the index files from disk memory mapped automatically at the startup time? What are your QTimes anyway? Define "optimal" I'd really push back on this statement: "We have a requirement to have updates available immediately (NRT)". Truly? You can't set expectations that 5 seconds will be needed (or 10?). Often this is an artificial requirement that does no real service to the user, it's just something people think they want. If this means you're sending a commit after every document, it's actually a really bad practice that'll get you into trouble eventually. Plus you won't be able to do any autowarming which will read data from disk into the OS memory and smooth out any spikes We are not performing "commit" after every update and here is the configuration for softCommit and hardCommit. ${solr.autoCommit.maxTime:15000} false ${solr.autoSoftCommit.maxTime:12} I am seeing QTimes (for searches) swing between 10 seconds - 2 seconds. Some queries were showing the slowness caused to due to faceting (debug=true). Since we have adjusted indexing and facet times are improved but basic query QTime is still high so wondering where can I look? Is there a way to debug (instrument) a query on Solr node? FWIW, Erick On Thu, Jul 21, 2016 at 8:18 AM, Rallavagu wrote: Solr 5.4.1 with embedded jetty with cloud enabled We have a Solr deployment (approximately 3 million documents) with both write and search operations happening. We have a requirement to have updates available immediately (NRT). Configured with default "solr.NRTCachingDirectoryFactory" for directory factory. Considering the fact that every time there is an update, caches are invalidated and re-built I assume that "solr.NRTCachingDirectoryFactory" would memory map index files so "reading from disk" will be as simple and quick as reading from memory hence would not incur any significant performance degradation. Am I right in my assumption? We have allocated significant amount of RAM (48G total physical memory, 12G heap, Total index disk size is 15G) but not sure if I am seeing the optimal QTimes (for searches). Any inputs are welcome. Thanks in advance.
Re: solr.NRTCachingDirectoryFactory
bq: map index files so "reading from disk" will be as simple and quick as reading from memory hence would not incur any significant performance degradation. Well, if 1> the read has already been done. First time a page of the file is accessed, it must be read from disk. 2> You have enough physical memory that _all_ of the files can be held in memory at once. <2> is a little tricky since the big slowdown comes from swapping eventually. But in an LRU scheme, that may be OK if the oldest pages are the stored=true data which are only accessed to return the top N, not to satisfy the search. What are your QTimes anyway? Define "optimal" I'd really push back on this statement: "We have a requirement to have updates available immediately (NRT)". Truly? You can't set expectations that 5 seconds will be needed (or 10?). Often this is an artificial requirement that does no real service to the user, it's just something people think they want. If this means you're sending a commit after every document, it's actually a really bad practice that'll get you into trouble eventually. Plus you won't be able to do any autowarming which will read data from disk into the OS memory and smooth out any spikes. FWIW, Erick On Thu, Jul 21, 2016 at 8:18 AM, Rallavagu wrote: > Solr 5.4.1 with embedded jetty with cloud enabled > > We have a Solr deployment (approximately 3 million documents) with both > write and search operations happening. We have a requirement to have updates > available immediately (NRT). Configured with default > "solr.NRTCachingDirectoryFactory" for directory factory. Considering the > fact that every time there is an update, caches are invalidated and re-built > I assume that "solr.NRTCachingDirectoryFactory" would memory map index files > so "reading from disk" will be as simple and quick as reading from memory > hence would not incur any significant performance degradation. Am I right in > my assumption? We have allocated significant amount of RAM (48G total > physical memory, 12G heap, Total index disk size is 15G) but not sure if I > am seeing the optimal QTimes (for searches). Any inputs are welcome. Thanks > in advance.
solr.NRTCachingDirectoryFactory
Solr 5.4.1 with embedded jetty with cloud enabled We have a Solr deployment (approximately 3 million documents) with both write and search operations happening. We have a requirement to have updates available immediately (NRT). Configured with default "solr.NRTCachingDirectoryFactory" for directory factory. Considering the fact that every time there is an update, caches are invalidated and re-built I assume that "solr.NRTCachingDirectoryFactory" would memory map index files so "reading from disk" will be as simple and quick as reading from memory hence would not incur any significant performance degradation. Am I right in my assumption? We have allocated significant amount of RAM (48G total physical memory, 12G heap, Total index disk size is 15G) but not sure if I am seeing the optimal QTimes (for searches). Any inputs are welcome. Thanks in advance.