Re: solr4 performance question
bq: solr.autoCommit.maxTime:60 10 true Every 100K documents or 10 minutes (whichever comes first) your current searchers will be closed and a new searcher opened, all the warmup queries etc. might happen. I suspect you're not doing much with autwarming and/or newSearcher queries. So occasionally your search has to wait for caches to be read, terms to be populated, etc. Some possibilities to test this: 1> create some newSearcher queries in solrconfig.xml 2> specify a reasonable autowarm count for queryResultCache (don't go crazy here, start with 16 or some similiar) 3> set openSearcher to false above. In this case you won't be able to see the documents until either a hard or soft commit happens, you could cure this with a single hard commit at the end of your indexing run. It all depends on what latency you can tolerate in terms of searching newly-indexed documents. Here's a reference... http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Best, Erick On Tue, Apr 8, 2014 at 12:11 PM, Joshi, Shital wrote: > We don't do any soft commit. This is our hard commit setting. > > >${solr.autoCommit.maxTime:60} >10 >true > > > We use this update command: > > solr_command=$(cat< time zcat --force $file2load | /usr/bin/curl --proxy "" --silent --show-error > --max-time 3600 \ > "http://$solr_url/solr/$solr_core/update/csv?\ > commit=false\ > &separator=|\ > &escape=\\\ > &trim=true\ > &header=false\ > &skipLines=2\ > &overwrite=true\ > &_shard_=$shardid\ > &fieldnames=$fieldnames\ > &f.cs_rep.split=true\ > &f.cs_rep.separator=%5E" --data-binary @- -H 'Content-type:text/plain; > charset=utf-8' > EnD) > > > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Tuesday, April 08, 2014 2:21 PM > To: solr-user@lucene.apache.org > Subject: Re: solr4 performance question > > What do you have for hour _softcommit_ settings in solrconfig.xml? I'm > guessing you're using SolrJ or similar, but the solrconfig settings > will trip a commit as well. > > For that matter ,what are all our commit settings in solrconfig.xml, > both hard and soft? > > Best, > Erick > > On Tue, Apr 8, 2014 at 10:28 AM, Joshi, Shital wrote: >> Hi, >> >> We have 10 node Solr Cloud (5 shards, 2 replicas) with 30 GB JVM on 60GB >> machine and 40 GB of index. >> We're constantly noticing that Solr queries take longer time while update >> (with commit=false setting) is in progress. The query which usually takes .5 >> seconds, take up to 2 minutes while updates are in progress. And this is not >> the case with all queries, it is very sporadic behavior. >> >> Any pointer to nail this issue would be appreciated. >> >> Is there a way to find how much of a query result came from cache? Can we >> enable any log settings to start printing what came from cache vs. what was >> queried? >> >> Thanks!
RE: solr4 performance question
We don't do any soft commit. This is our hard commit setting. ${solr.autoCommit.maxTime:60} 10 true We use this update command: solr_command=$(cat<http://$solr_url/solr/$solr_core/update/csv?\ commit=false\ &separator=|\ &escape=\\\ &trim=true\ &header=false\ &skipLines=2\ &overwrite=true\ &_shard_=$shardid\ &fieldnames=$fieldnames\ &f.cs_rep.split=true\ &f.cs_rep.separator=%5E" --data-binary @- -H 'Content-type:text/plain; charset=utf-8' EnD) -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, April 08, 2014 2:21 PM To: solr-user@lucene.apache.org Subject: Re: solr4 performance question What do you have for hour _softcommit_ settings in solrconfig.xml? I'm guessing you're using SolrJ or similar, but the solrconfig settings will trip a commit as well. For that matter ,what are all our commit settings in solrconfig.xml, both hard and soft? Best, Erick On Tue, Apr 8, 2014 at 10:28 AM, Joshi, Shital wrote: > Hi, > > We have 10 node Solr Cloud (5 shards, 2 replicas) with 30 GB JVM on 60GB > machine and 40 GB of index. > We're constantly noticing that Solr queries take longer time while update > (with commit=false setting) is in progress. The query which usually takes .5 > seconds, take up to 2 minutes while updates are in progress. And this is not > the case with all queries, it is very sporadic behavior. > > Any pointer to nail this issue would be appreciated. > > Is there a way to find how much of a query result came from cache? Can we > enable any log settings to start printing what came from cache vs. what was > queried? > > Thanks!
Re: solr4 performance question
What do you have for hour _softcommit_ settings in solrconfig.xml? I'm guessing you're using SolrJ or similar, but the solrconfig settings will trip a commit as well. For that matter ,what are all our commit settings in solrconfig.xml, both hard and soft? Best, Erick On Tue, Apr 8, 2014 at 10:28 AM, Joshi, Shital wrote: > Hi, > > We have 10 node Solr Cloud (5 shards, 2 replicas) with 30 GB JVM on 60GB > machine and 40 GB of index. > We're constantly noticing that Solr queries take longer time while update > (with commit=false setting) is in progress. The query which usually takes .5 > seconds, take up to 2 minutes while updates are in progress. And this is not > the case with all queries, it is very sporadic behavior. > > Any pointer to nail this issue would be appreciated. > > Is there a way to find how much of a query result came from cache? Can we > enable any log settings to start printing what came from cache vs. what was > queried? > > Thanks!
Re: solr4 performance question
Hi Joshi; Click to the Plugins/Stats section under your collection at Solr Admin UI. You will see the cache statistics for different types of caches. hitratio and evictions are good statistics to look at first. On the other hand you should read here: https://wiki.apache.org/solr/SolrPerformanceFactors Thanks; Furkan KAMACI 2014-04-08 20:28 GMT+03:00 Joshi, Shital : > Hi, > > We have 10 node Solr Cloud (5 shards, 2 replicas) with 30 GB JVM on 60GB > machine and 40 GB of index. > We're constantly noticing that Solr queries take longer time while update > (with commit=false setting) is in progress. The query which usually takes > .5 seconds, take up to 2 minutes while updates are in progress. And this is > not the case with all queries, it is very sporadic behavior. > > Any pointer to nail this issue would be appreciated. > > Is there a way to find how much of a query result came from cache? Can we > enable any log settings to start printing what came from cache vs. what was > queried? > > Thanks! >
solr4 performance question
Hi, We have 10 node Solr Cloud (5 shards, 2 replicas) with 30 GB JVM on 60GB machine and 40 GB of index. We're constantly noticing that Solr queries take longer time while update (with commit=false setting) is in progress. The query which usually takes .5 seconds, take up to 2 minutes while updates are in progress. And this is not the case with all queries, it is very sporadic behavior. Any pointer to nail this issue would be appreciated. Is there a way to find how much of a query result came from cache? Can we enable any log settings to start printing what came from cache vs. what was queried? Thanks!
RE: Solr4 performance
Thanks. We find little evidence that page/disk cache is causing this issue. We use sar to collect statistics. Here is the statistics on a node where the query took maximum time. (out of 5 shards, one with most data takes long time) However, we're reducing heap size and testing in QA. CPU %user %nice %system %iowait%steal %idle 17:00:01 all 2.11 0.00 0.04 0.00 0.00 97.85 Average: all 7.52 0.00 0.16 0.02 0.00 92.31 tps rtps wtps bread/s bwrtn/s 17:00:0110.63 0.0010.63 0.00 140.56 Average:73.90 2.65 71.24314.241507.93 pgpgin/s pgpgout/s fault/s majflt/s 17:00:010.00 23.42367.95 0.00 Average:52.37 251.32 586.79 0.82 Our current JVM is 30G and usage is ~26G. If we reduce JVM to 25G, we're afraid of hitting OOM error in Java. -Original Message- From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] Sent: Thursday, February 27, 2014 3:45 PM To: solr-user@lucene.apache.org Subject: Re: Solr4 performance You would get more room for disk cache by reducing your large heap. Otherwise, you'd have to add more RAM to your systems or shard your index to more nodes to gain more RAM that way. The Linux VM subsystem actually has a number of tuning parameters (like vm.bdflush, vm.swappiness and vm.pagecache), but I don't know if there's any definitive information about how to set them appropriately for Solr. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. "The Science of Influence Marketing" 18 East 41st Street New York, NY 10017 t: @appinions <https://twitter.com/Appinions> | g+: plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts> w: appinions.com <http://www.appinions.com/> On Thu, Feb 27, 2014 at 3:09 PM, Joshi, Shital wrote: > Hi Michael, > > If page cache is the issue, what is the solution? > > Thanks! > > -Original Message- > From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] > Sent: Monday, February 24, 2014 9:54 PM > To: solr-user@lucene.apache.org > Subject: Re: Solr4 performance > > I'm not sure how you're measuring free RAM. Maybe this will help: > > http://www.linuxatemyram.com/play.html > > Michael Della Bitta > > Applications Developer > > o: +1 646 532 3062 > > appinions inc. > > "The Science of Influence Marketing" > > 18 East 41st Street > > New York, NY 10017 > > t: @appinions <https://twitter.com/Appinions> | g+: > plus.google.com/appinions< > https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts > > > w: appinions.com <http://www.appinions.com/> > > > On Mon, Feb 24, 2014 at 5:35 PM, Joshi, Shital > wrote: > > > Thanks. > > > > We found some evidence that this could be the issue. We're monitoring > > closely to confirm this. > > > > One question though: none of our nodes show more that 50% of physical > > memory used. So there is enough memory available for memory mapped files. > > Can this kind of pause still happen? > > > > > > -Original Message- > > From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] > > Sent: Friday, February 21, 2014 5:28 PM > > To: solr-user@lucene.apache.org > > Subject: Re: Solr4 performance > > > > It could be that your query is churning the page cache on that node > > sometimes, so Solr pauses so the OS can drag those pages off of disk. > Have > > you tried profiling your iowait in top or iostat during these pauses? > > (assuming you're using linux). > > > > Michael Della Bitta > > > > Applications Developer > > > > o: +1 646 532 3062 > > > > appinions inc. > > > > "The Science of Influence Marketing" > > > > 18 East 41st Street > > > > New York, NY 10017 > > > > t: @appinions <https://twitter.com/Appinions> | g+: > > plus.google.com/appinions< > > > https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts > > > > > w: appinions.com <http://www.appinions.com/> > > > > > > On Fri, Feb 21, 2014 at 5:20 PM, Joshi, Shital > > wrote: > > > > > Thanks for your answer. > > > > > > We confirmed that it is not GC issue. > > > > > > The auto warmin
Re: Solr4 performance
You would get more room for disk cache by reducing your large heap. Otherwise, you'd have to add more RAM to your systems or shard your index to more nodes to gain more RAM that way. The Linux VM subsystem actually has a number of tuning parameters (like vm.bdflush, vm.swappiness and vm.pagecache), but I don't know if there's any definitive information about how to set them appropriately for Solr. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. "The Science of Influence Marketing" 18 East 41st Street New York, NY 10017 t: @appinions <https://twitter.com/Appinions> | g+: plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts> w: appinions.com <http://www.appinions.com/> On Thu, Feb 27, 2014 at 3:09 PM, Joshi, Shital wrote: > Hi Michael, > > If page cache is the issue, what is the solution? > > Thanks! > > -Original Message- > From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] > Sent: Monday, February 24, 2014 9:54 PM > To: solr-user@lucene.apache.org > Subject: Re: Solr4 performance > > I'm not sure how you're measuring free RAM. Maybe this will help: > > http://www.linuxatemyram.com/play.html > > Michael Della Bitta > > Applications Developer > > o: +1 646 532 3062 > > appinions inc. > > "The Science of Influence Marketing" > > 18 East 41st Street > > New York, NY 10017 > > t: @appinions <https://twitter.com/Appinions> | g+: > plus.google.com/appinions< > https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts > > > w: appinions.com <http://www.appinions.com/> > > > On Mon, Feb 24, 2014 at 5:35 PM, Joshi, Shital > wrote: > > > Thanks. > > > > We found some evidence that this could be the issue. We're monitoring > > closely to confirm this. > > > > One question though: none of our nodes show more that 50% of physical > > memory used. So there is enough memory available for memory mapped files. > > Can this kind of pause still happen? > > > > > > -Original Message- > > From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] > > Sent: Friday, February 21, 2014 5:28 PM > > To: solr-user@lucene.apache.org > > Subject: Re: Solr4 performance > > > > It could be that your query is churning the page cache on that node > > sometimes, so Solr pauses so the OS can drag those pages off of disk. > Have > > you tried profiling your iowait in top or iostat during these pauses? > > (assuming you're using linux). > > > > Michael Della Bitta > > > > Applications Developer > > > > o: +1 646 532 3062 > > > > appinions inc. > > > > "The Science of Influence Marketing" > > > > 18 East 41st Street > > > > New York, NY 10017 > > > > t: @appinions <https://twitter.com/Appinions> | g+: > > plus.google.com/appinions< > > > https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts > > > > > w: appinions.com <http://www.appinions.com/> > > > > > > On Fri, Feb 21, 2014 at 5:20 PM, Joshi, Shital > > wrote: > > > > > Thanks for your answer. > > > > > > We confirmed that it is not GC issue. > > > > > > The auto warming query looks good too and queries before and after the > > > long running query comes back really quick. The only thing stands out > is > > > shard on which query takes long time has couple million more documents > > than > > > other shards. > > > > > > -Original Message- > > > From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] > > > Sent: Thursday, February 20, 2014 5:26 PM > > > To: solr-user@lucene.apache.org > > > Subject: RE: Solr4 performance > > > > > > Hi, > > > > > > As for your first question, setting openSearcher to true means you will > > see > > > the new docs after every hard commit. Soft and hard commits only become > > > isolated from one another with that set to false. > > > > > > Your second problem might be explained by your large heap and garbage > > > collection. Walking a heap that large can take an appreciable amount of > > > time. You might consider turning on the JVM options for logging GC and > > > seeing if you can correlate your slow responses to times when your JVM > is > > > garbage collecting. > > > > > > Hope
Re: Solr4 performance
On 2/27/2014 1:09 PM, Joshi, Shital wrote: If page cache is the issue, what is the solution? What operating system are you using, and what tool are you looking at to see your memory usage? Can you share a screenshot with us? Use a file sharing website for that - the list generally doesn't like attachments. Thanks, Shawn
RE: Solr4 performance
Hi Michael, If page cache is the issue, what is the solution? Thanks! -Original Message- From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] Sent: Monday, February 24, 2014 9:54 PM To: solr-user@lucene.apache.org Subject: Re: Solr4 performance I'm not sure how you're measuring free RAM. Maybe this will help: http://www.linuxatemyram.com/play.html Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. "The Science of Influence Marketing" 18 East 41st Street New York, NY 10017 t: @appinions <https://twitter.com/Appinions> | g+: plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts> w: appinions.com <http://www.appinions.com/> On Mon, Feb 24, 2014 at 5:35 PM, Joshi, Shital wrote: > Thanks. > > We found some evidence that this could be the issue. We're monitoring > closely to confirm this. > > One question though: none of our nodes show more that 50% of physical > memory used. So there is enough memory available for memory mapped files. > Can this kind of pause still happen? > > > -Original Message- > From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] > Sent: Friday, February 21, 2014 5:28 PM > To: solr-user@lucene.apache.org > Subject: Re: Solr4 performance > > It could be that your query is churning the page cache on that node > sometimes, so Solr pauses so the OS can drag those pages off of disk. Have > you tried profiling your iowait in top or iostat during these pauses? > (assuming you're using linux). > > Michael Della Bitta > > Applications Developer > > o: +1 646 532 3062 > > appinions inc. > > "The Science of Influence Marketing" > > 18 East 41st Street > > New York, NY 10017 > > t: @appinions <https://twitter.com/Appinions> | g+: > plus.google.com/appinions< > https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts > > > w: appinions.com <http://www.appinions.com/> > > > On Fri, Feb 21, 2014 at 5:20 PM, Joshi, Shital > wrote: > > > Thanks for your answer. > > > > We confirmed that it is not GC issue. > > > > The auto warming query looks good too and queries before and after the > > long running query comes back really quick. The only thing stands out is > > shard on which query takes long time has couple million more documents > than > > other shards. > > > > -Original Message- > > From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] > > Sent: Thursday, February 20, 2014 5:26 PM > > To: solr-user@lucene.apache.org > > Subject: RE: Solr4 performance > > > > Hi, > > > > As for your first question, setting openSearcher to true means you will > see > > the new docs after every hard commit. Soft and hard commits only become > > isolated from one another with that set to false. > > > > Your second problem might be explained by your large heap and garbage > > collection. Walking a heap that large can take an appreciable amount of > > time. You might consider turning on the JVM options for logging GC and > > seeing if you can correlate your slow responses to times when your JVM is > > garbage collecting. > > > > Hope that helps, > > On Feb 20, 2014 4:52 PM, "Joshi, Shital" wrote: > > > > > Hi! > > > > > > I have few other questions regarding Solr4 performance issue we're > > facing. > > > > > > We're committing data to Solr4 every ~30 seconds (up to 20K rows). We > use > > > commit=false in update URL. We have only hard commit setting in Solr4 > > > config. > > > > > > > > >${solr.autoCommit.maxTime:60} > > >10 > > >true > > > > > > > > > > > > Since we're not using Soft commit at all (commit=false), the caches > will > > > not get reloaded for every commit and recently added documents will not > > be > > > visible, correct? > > > > > > What we see is queries which usually take few milli seconds, takes ~40 > > > seconds once in a while. Can high IO during hard commit cause queries > to > > > slow down? > > > > > > For some shards we see 98% full physical memory. We have 60GB machine > (30 > > > GB JVM, 28 GB free RAM, ~35 GB of index). We're ruling out that high > > > physical memory would cause queries to slow down. We're in process of > > > reducing JVM size anyways. > > >
Re: Solr4 performance
I'm not sure how you're measuring free RAM. Maybe this will help: http://www.linuxatemyram.com/play.html Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. "The Science of Influence Marketing" 18 East 41st Street New York, NY 10017 t: @appinions <https://twitter.com/Appinions> | g+: plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts> w: appinions.com <http://www.appinions.com/> On Mon, Feb 24, 2014 at 5:35 PM, Joshi, Shital wrote: > Thanks. > > We found some evidence that this could be the issue. We're monitoring > closely to confirm this. > > One question though: none of our nodes show more that 50% of physical > memory used. So there is enough memory available for memory mapped files. > Can this kind of pause still happen? > > > -Original Message- > From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] > Sent: Friday, February 21, 2014 5:28 PM > To: solr-user@lucene.apache.org > Subject: Re: Solr4 performance > > It could be that your query is churning the page cache on that node > sometimes, so Solr pauses so the OS can drag those pages off of disk. Have > you tried profiling your iowait in top or iostat during these pauses? > (assuming you're using linux). > > Michael Della Bitta > > Applications Developer > > o: +1 646 532 3062 > > appinions inc. > > "The Science of Influence Marketing" > > 18 East 41st Street > > New York, NY 10017 > > t: @appinions <https://twitter.com/Appinions> | g+: > plus.google.com/appinions< > https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts > > > w: appinions.com <http://www.appinions.com/> > > > On Fri, Feb 21, 2014 at 5:20 PM, Joshi, Shital > wrote: > > > Thanks for your answer. > > > > We confirmed that it is not GC issue. > > > > The auto warming query looks good too and queries before and after the > > long running query comes back really quick. The only thing stands out is > > shard on which query takes long time has couple million more documents > than > > other shards. > > > > -Original Message- > > From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] > > Sent: Thursday, February 20, 2014 5:26 PM > > To: solr-user@lucene.apache.org > > Subject: RE: Solr4 performance > > > > Hi, > > > > As for your first question, setting openSearcher to true means you will > see > > the new docs after every hard commit. Soft and hard commits only become > > isolated from one another with that set to false. > > > > Your second problem might be explained by your large heap and garbage > > collection. Walking a heap that large can take an appreciable amount of > > time. You might consider turning on the JVM options for logging GC and > > seeing if you can correlate your slow responses to times when your JVM is > > garbage collecting. > > > > Hope that helps, > > On Feb 20, 2014 4:52 PM, "Joshi, Shital" wrote: > > > > > Hi! > > > > > > I have few other questions regarding Solr4 performance issue we're > > facing. > > > > > > We're committing data to Solr4 every ~30 seconds (up to 20K rows). We > use > > > commit=false in update URL. We have only hard commit setting in Solr4 > > > config. > > > > > > > > >${solr.autoCommit.maxTime:60} > > >10 > > >true > > > > > > > > > > > > Since we're not using Soft commit at all (commit=false), the caches > will > > > not get reloaded for every commit and recently added documents will not > > be > > > visible, correct? > > > > > > What we see is queries which usually take few milli seconds, takes ~40 > > > seconds once in a while. Can high IO during hard commit cause queries > to > > > slow down? > > > > > > For some shards we see 98% full physical memory. We have 60GB machine > (30 > > > GB JVM, 28 GB free RAM, ~35 GB of index). We're ruling out that high > > > physical memory would cause queries to slow down. We're in process of > > > reducing JVM size anyways. > > > > > > We have never run optimization till now. QA optimization didn't yield > in > > > performance gain. > > > > > > Thanks much for all help. > > > > > > -Original Message- > > > From: Shawn Heisey [mailto:s...@elyograg.org] > > > Sent: Tuesday, February 18, 2014 4:55 PM > > > To: solr-user@lucene.apache.org > > > Subject: Re: Solr4 performance > > > > > > On 2/18/2014 2:14 PM, Joshi, Shital wrote: > > > > Thanks much for all suggestions. We're looking into reducing > allocated > > > heap size of Solr4 JVM. > > > > > > > > We're using NRTCachingDirectoryFactory. Does it use MMapDirectory > > > internally? Can someone please confirm? > > > > > > In Solr, NRTCachingDirectory does indeed use MMapDirectory as its > > > default delegate. That's probably also the case with Lucene -- these > > > are Lucene classes, after all. > > > > > > MMapDirectory is almost always the most efficient way to handle on-disk > > > indexes. > > > > > > Thanks, > > > Shawn > > > > > > > > >
RE: Solr4 performance
Thanks. We found some evidence that this could be the issue. We're monitoring closely to confirm this. One question though: none of our nodes show more that 50% of physical memory used. So there is enough memory available for memory mapped files. Can this kind of pause still happen? -Original Message- From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] Sent: Friday, February 21, 2014 5:28 PM To: solr-user@lucene.apache.org Subject: Re: Solr4 performance It could be that your query is churning the page cache on that node sometimes, so Solr pauses so the OS can drag those pages off of disk. Have you tried profiling your iowait in top or iostat during these pauses? (assuming you're using linux). Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. "The Science of Influence Marketing" 18 East 41st Street New York, NY 10017 t: @appinions <https://twitter.com/Appinions> | g+: plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts> w: appinions.com <http://www.appinions.com/> On Fri, Feb 21, 2014 at 5:20 PM, Joshi, Shital wrote: > Thanks for your answer. > > We confirmed that it is not GC issue. > > The auto warming query looks good too and queries before and after the > long running query comes back really quick. The only thing stands out is > shard on which query takes long time has couple million more documents than > other shards. > > -Original Message- > From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] > Sent: Thursday, February 20, 2014 5:26 PM > To: solr-user@lucene.apache.org > Subject: RE: Solr4 performance > > Hi, > > As for your first question, setting openSearcher to true means you will see > the new docs after every hard commit. Soft and hard commits only become > isolated from one another with that set to false. > > Your second problem might be explained by your large heap and garbage > collection. Walking a heap that large can take an appreciable amount of > time. You might consider turning on the JVM options for logging GC and > seeing if you can correlate your slow responses to times when your JVM is > garbage collecting. > > Hope that helps, > On Feb 20, 2014 4:52 PM, "Joshi, Shital" wrote: > > > Hi! > > > > I have few other questions regarding Solr4 performance issue we're > facing. > > > > We're committing data to Solr4 every ~30 seconds (up to 20K rows). We use > > commit=false in update URL. We have only hard commit setting in Solr4 > > config. > > > > > >${solr.autoCommit.maxTime:60} > >10 > >true > > > > > > > > Since we're not using Soft commit at all (commit=false), the caches will > > not get reloaded for every commit and recently added documents will not > be > > visible, correct? > > > > What we see is queries which usually take few milli seconds, takes ~40 > > seconds once in a while. Can high IO during hard commit cause queries to > > slow down? > > > > For some shards we see 98% full physical memory. We have 60GB machine (30 > > GB JVM, 28 GB free RAM, ~35 GB of index). We're ruling out that high > > physical memory would cause queries to slow down. We're in process of > > reducing JVM size anyways. > > > > We have never run optimization till now. QA optimization didn't yield in > > performance gain. > > > > Thanks much for all help. > > > > -Original Message- > > From: Shawn Heisey [mailto:s...@elyograg.org] > > Sent: Tuesday, February 18, 2014 4:55 PM > > To: solr-user@lucene.apache.org > > Subject: Re: Solr4 performance > > > > On 2/18/2014 2:14 PM, Joshi, Shital wrote: > > > Thanks much for all suggestions. We're looking into reducing allocated > > heap size of Solr4 JVM. > > > > > > We're using NRTCachingDirectoryFactory. Does it use MMapDirectory > > internally? Can someone please confirm? > > > > In Solr, NRTCachingDirectory does indeed use MMapDirectory as its > > default delegate. That's probably also the case with Lucene -- these > > are Lucene classes, after all. > > > > MMapDirectory is almost always the most efficient way to handle on-disk > > indexes. > > > > Thanks, > > Shawn > > > > >
Re: Solr4 performance
It could be that your query is churning the page cache on that node sometimes, so Solr pauses so the OS can drag those pages off of disk. Have you tried profiling your iowait in top or iostat during these pauses? (assuming you're using linux). Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. "The Science of Influence Marketing" 18 East 41st Street New York, NY 10017 t: @appinions <https://twitter.com/Appinions> | g+: plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts> w: appinions.com <http://www.appinions.com/> On Fri, Feb 21, 2014 at 5:20 PM, Joshi, Shital wrote: > Thanks for your answer. > > We confirmed that it is not GC issue. > > The auto warming query looks good too and queries before and after the > long running query comes back really quick. The only thing stands out is > shard on which query takes long time has couple million more documents than > other shards. > > -Original Message- > From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] > Sent: Thursday, February 20, 2014 5:26 PM > To: solr-user@lucene.apache.org > Subject: RE: Solr4 performance > > Hi, > > As for your first question, setting openSearcher to true means you will see > the new docs after every hard commit. Soft and hard commits only become > isolated from one another with that set to false. > > Your second problem might be explained by your large heap and garbage > collection. Walking a heap that large can take an appreciable amount of > time. You might consider turning on the JVM options for logging GC and > seeing if you can correlate your slow responses to times when your JVM is > garbage collecting. > > Hope that helps, > On Feb 20, 2014 4:52 PM, "Joshi, Shital" wrote: > > > Hi! > > > > I have few other questions regarding Solr4 performance issue we're > facing. > > > > We're committing data to Solr4 every ~30 seconds (up to 20K rows). We use > > commit=false in update URL. We have only hard commit setting in Solr4 > > config. > > > > > >${solr.autoCommit.maxTime:60} > >10 > >true > > > > > > > > Since we're not using Soft commit at all (commit=false), the caches will > > not get reloaded for every commit and recently added documents will not > be > > visible, correct? > > > > What we see is queries which usually take few milli seconds, takes ~40 > > seconds once in a while. Can high IO during hard commit cause queries to > > slow down? > > > > For some shards we see 98% full physical memory. We have 60GB machine (30 > > GB JVM, 28 GB free RAM, ~35 GB of index). We're ruling out that high > > physical memory would cause queries to slow down. We're in process of > > reducing JVM size anyways. > > > > We have never run optimization till now. QA optimization didn't yield in > > performance gain. > > > > Thanks much for all help. > > > > -Original Message- > > From: Shawn Heisey [mailto:s...@elyograg.org] > > Sent: Tuesday, February 18, 2014 4:55 PM > > To: solr-user@lucene.apache.org > > Subject: Re: Solr4 performance > > > > On 2/18/2014 2:14 PM, Joshi, Shital wrote: > > > Thanks much for all suggestions. We're looking into reducing allocated > > heap size of Solr4 JVM. > > > > > > We're using NRTCachingDirectoryFactory. Does it use MMapDirectory > > internally? Can someone please confirm? > > > > In Solr, NRTCachingDirectory does indeed use MMapDirectory as its > > default delegate. That's probably also the case with Lucene -- these > > are Lucene classes, after all. > > > > MMapDirectory is almost always the most efficient way to handle on-disk > > indexes. > > > > Thanks, > > Shawn > > > > >
RE: Solr4 performance
Thanks for your answer. We confirmed that it is not GC issue. The auto warming query looks good too and queries before and after the long running query comes back really quick. The only thing stands out is shard on which query takes long time has couple million more documents than other shards. -Original Message- From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] Sent: Thursday, February 20, 2014 5:26 PM To: solr-user@lucene.apache.org Subject: RE: Solr4 performance Hi, As for your first question, setting openSearcher to true means you will see the new docs after every hard commit. Soft and hard commits only become isolated from one another with that set to false. Your second problem might be explained by your large heap and garbage collection. Walking a heap that large can take an appreciable amount of time. You might consider turning on the JVM options for logging GC and seeing if you can correlate your slow responses to times when your JVM is garbage collecting. Hope that helps, On Feb 20, 2014 4:52 PM, "Joshi, Shital" wrote: > Hi! > > I have few other questions regarding Solr4 performance issue we're facing. > > We're committing data to Solr4 every ~30 seconds (up to 20K rows). We use > commit=false in update URL. We have only hard commit setting in Solr4 > config. > > >${solr.autoCommit.maxTime:60} >10 >true > > > > Since we're not using Soft commit at all (commit=false), the caches will > not get reloaded for every commit and recently added documents will not be > visible, correct? > > What we see is queries which usually take few milli seconds, takes ~40 > seconds once in a while. Can high IO during hard commit cause queries to > slow down? > > For some shards we see 98% full physical memory. We have 60GB machine (30 > GB JVM, 28 GB free RAM, ~35 GB of index). We're ruling out that high > physical memory would cause queries to slow down. We're in process of > reducing JVM size anyways. > > We have never run optimization till now. QA optimization didn't yield in > performance gain. > > Thanks much for all help. > > -Original Message- > From: Shawn Heisey [mailto:s...@elyograg.org] > Sent: Tuesday, February 18, 2014 4:55 PM > To: solr-user@lucene.apache.org > Subject: Re: Solr4 performance > > On 2/18/2014 2:14 PM, Joshi, Shital wrote: > > Thanks much for all suggestions. We're looking into reducing allocated > heap size of Solr4 JVM. > > > > We're using NRTCachingDirectoryFactory. Does it use MMapDirectory > internally? Can someone please confirm? > > In Solr, NRTCachingDirectory does indeed use MMapDirectory as its > default delegate. That's probably also the case with Lucene -- these > are Lucene classes, after all. > > MMapDirectory is almost always the most efficient way to handle on-disk > indexes. > > Thanks, > Shawn > >
RE: Solr4 performance
Hi, As for your first question, setting openSearcher to true means you will see the new docs after every hard commit. Soft and hard commits only become isolated from one another with that set to false. Your second problem might be explained by your large heap and garbage collection. Walking a heap that large can take an appreciable amount of time. You might consider turning on the JVM options for logging GC and seeing if you can correlate your slow responses to times when your JVM is garbage collecting. Hope that helps, On Feb 20, 2014 4:52 PM, "Joshi, Shital" wrote: > Hi! > > I have few other questions regarding Solr4 performance issue we're facing. > > We're committing data to Solr4 every ~30 seconds (up to 20K rows). We use > commit=false in update URL. We have only hard commit setting in Solr4 > config. > > >${solr.autoCommit.maxTime:60} >10 >true > > > > Since we're not using Soft commit at all (commit=false), the caches will > not get reloaded for every commit and recently added documents will not be > visible, correct? > > What we see is queries which usually take few milli seconds, takes ~40 > seconds once in a while. Can high IO during hard commit cause queries to > slow down? > > For some shards we see 98% full physical memory. We have 60GB machine (30 > GB JVM, 28 GB free RAM, ~35 GB of index). We're ruling out that high > physical memory would cause queries to slow down. We're in process of > reducing JVM size anyways. > > We have never run optimization till now. QA optimization didn't yield in > performance gain. > > Thanks much for all help. > > -Original Message- > From: Shawn Heisey [mailto:s...@elyograg.org] > Sent: Tuesday, February 18, 2014 4:55 PM > To: solr-user@lucene.apache.org > Subject: Re: Solr4 performance > > On 2/18/2014 2:14 PM, Joshi, Shital wrote: > > Thanks much for all suggestions. We're looking into reducing allocated > heap size of Solr4 JVM. > > > > We're using NRTCachingDirectoryFactory. Does it use MMapDirectory > internally? Can someone please confirm? > > In Solr, NRTCachingDirectory does indeed use MMapDirectory as its > default delegate. That's probably also the case with Lucene -- these > are Lucene classes, after all. > > MMapDirectory is almost always the most efficient way to handle on-disk > indexes. > > Thanks, > Shawn > >
RE: Solr4 performance
Hi! I have few other questions regarding Solr4 performance issue we're facing. We're committing data to Solr4 every ~30 seconds (up to 20K rows). We use commit=false in update URL. We have only hard commit setting in Solr4 config. ${solr.autoCommit.maxTime:60} 10 true Since we're not using Soft commit at all (commit=false), the caches will not get reloaded for every commit and recently added documents will not be visible, correct? What we see is queries which usually take few milli seconds, takes ~40 seconds once in a while. Can high IO during hard commit cause queries to slow down? For some shards we see 98% full physical memory. We have 60GB machine (30 GB JVM, 28 GB free RAM, ~35 GB of index). We're ruling out that high physical memory would cause queries to slow down. We're in process of reducing JVM size anyways. We have never run optimization till now. QA optimization didn't yield in performance gain. Thanks much for all help. -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Tuesday, February 18, 2014 4:55 PM To: solr-user@lucene.apache.org Subject: Re: Solr4 performance On 2/18/2014 2:14 PM, Joshi, Shital wrote: > Thanks much for all suggestions. We're looking into reducing allocated heap > size of Solr4 JVM. > > We're using NRTCachingDirectoryFactory. Does it use MMapDirectory internally? > Can someone please confirm? In Solr, NRTCachingDirectory does indeed use MMapDirectory as its default delegate. That's probably also the case with Lucene -- these are Lucene classes, after all. MMapDirectory is almost always the most efficient way to handle on-disk indexes. Thanks, Shawn
Re: Solr4 performance
On 2/18/2014 2:14 PM, Joshi, Shital wrote: Thanks much for all suggestions. We're looking into reducing allocated heap size of Solr4 JVM. We're using NRTCachingDirectoryFactory. Does it use MMapDirectory internally? Can someone please confirm? In Solr, NRTCachingDirectory does indeed use MMapDirectory as its default delegate. That's probably also the case with Lucene -- these are Lucene classes, after all. MMapDirectory is almost always the most efficient way to handle on-disk indexes. Thanks, Shawn
RE: Solr4 performance
Hi, Thanks much for all suggestions. We're looking into reducing allocated heap size of Solr4 JVM. We're using NRTCachingDirectoryFactory. Does it use MMapDirectory internally? Can someone please confirm? Would optimization help with performance? We did that in QA (took about 13 hours for 700 mil documents) Thanks! -Original Message- From: Roman Chyla [mailto:roman.ch...@gmail.com] Sent: Wednesday, February 12, 2014 3:17 PM To: solr-user@lucene.apache.org Subject: Re: Solr4 performance And perhaps one other, but very pertinent, recommendation is: allocate only as little heap as is necessary. By allocating more, you are working against the OS caching. To know how much is enough is bit tricky, though. Best, roman On Wed, Feb 12, 2014 at 2:56 PM, Shawn Heisey wrote: > On 2/12/2014 12:07 PM, Greg Walters wrote: > >> Take a look at http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory- >> on-64bit.html as it's a pretty decent explanation of memory mapped >> files. I don't believe that the default configuration for solr is to use >> MMapDirectory but even if it does my understanding is that the entire file >> won't be forcibly cached by solr. The OS's filesystem cache should control >> what's actually in ram and the eviction process will depend on the OS. >> > > I only have a little bit to add. Here's the first thing that Uwe's blog > post (linked above) says: > > "Since version 3.1, *Apache Lucene*and *Solr *use MMapDirectoryby default > on 64bit Windows and Solaris systems; since version 3.3 also for 64bit > Linux systems." > > The default in Solr 4.x is NRTCachingDirectory, which uses MMapDirectory > by default under the hood. > > A summary about all this that should be relevant to the original question: > > It's the *operating system* that handles memory mapping, including any > caching that happens. Assuming that you don't have a badly configured > virtual machine setup, I'm fairly sure that only real memory gets used, > never swap space on the disk. If something else on the system makes a > memory allocation, the operating system will instantly give up memory used > for caching and mapping. One of the strengths of mmap is that it can't > exceed available resources unless it's used incorrectly. > > Thanks, > Shawn > >
Re: Solr4 performance
And perhaps one other, but very pertinent, recommendation is: allocate only as little heap as is necessary. By allocating more, you are working against the OS caching. To know how much is enough is bit tricky, though. Best, roman On Wed, Feb 12, 2014 at 2:56 PM, Shawn Heisey wrote: > On 2/12/2014 12:07 PM, Greg Walters wrote: > >> Take a look at http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory- >> on-64bit.html as it's a pretty decent explanation of memory mapped >> files. I don't believe that the default configuration for solr is to use >> MMapDirectory but even if it does my understanding is that the entire file >> won't be forcibly cached by solr. The OS's filesystem cache should control >> what's actually in ram and the eviction process will depend on the OS. >> > > I only have a little bit to add. Here's the first thing that Uwe's blog > post (linked above) says: > > "Since version 3.1, *Apache Lucene*and *Solr *use MMapDirectoryby default > on 64bit Windows and Solaris systems; since version 3.3 also for 64bit > Linux systems." > > The default in Solr 4.x is NRTCachingDirectory, which uses MMapDirectory > by default under the hood. > > A summary about all this that should be relevant to the original question: > > It's the *operating system* that handles memory mapping, including any > caching that happens. Assuming that you don't have a badly configured > virtual machine setup, I'm fairly sure that only real memory gets used, > never swap space on the disk. If something else on the system makes a > memory allocation, the operating system will instantly give up memory used > for caching and mapping. One of the strengths of mmap is that it can't > exceed available resources unless it's used incorrectly. > > Thanks, > Shawn > >
Re: Solr4 performance
On 2/12/2014 12:07 PM, Greg Walters wrote: Take a look at http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html as it's a pretty decent explanation of memory mapped files. I don't believe that the default configuration for solr is to use MMapDirectory but even if it does my understanding is that the entire file won't be forcibly cached by solr. The OS's filesystem cache should control what's actually in ram and the eviction process will depend on the OS. I only have a little bit to add. Here's the first thing that Uwe's blog post (linked above) says: "Since version 3.1, *Apache Lucene*and *Solr *use MMapDirectoryby default on 64bit Windows and Solaris systems; since version 3.3 also for 64bit Linux systems." The default in Solr 4.x is NRTCachingDirectory, which uses MMapDirectory by default under the hood. A summary about all this that should be relevant to the original question: It's the *operating system* that handles memory mapping, including any caching that happens. Assuming that you don't have a badly configured virtual machine setup, I'm fairly sure that only real memory gets used, never swap space on the disk. If something else on the system makes a memory allocation, the operating system will instantly give up memory used for caching and mapping. One of the strengths of mmap is that it can't exceed available resources unless it's used incorrectly. Thanks, Shawn
Re: Solr4 performance
Shital, Take a look at http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html as it's a pretty decent explanation of memory mapped files. I don't believe that the default configuration for solr is to use MMapDirectory but even if it does my understanding is that the entire file won't be forcibly cached by solr. The OS's filesystem cache should control what's actually in ram and the eviction process will depend on the OS. Thanks, Greg On Feb 12, 2014, at 12:57 PM, "Joshi, Shital" wrote: > Does Solr4 load entire index in Memory mapped file? What is the eviction > policy of this memory mapped file? Can we control it? > > _ > From: Joshi, Shital [Tech] > Sent: Wednesday, February 05, 2014 12:00 PM > To: 'solr-user@lucene.apache.org' > Subject: Solr4 performance > > > Hi, > > We have SolrCloud cluster (5 shards and 2 replicas) on 10 dynamic compute > boxes (cloud). We're using local disk (/local/data) to store solr index > files. All hosts have 60GB ram and Solr4 JVM are running with max 30GB heap > size. So far we have 470 million documents. We are using custom sharding and > all shards have ~9-10 million documents. We have a GUI sending queries to > this cloud and GUI has 30 seconds of timeout. > > Lately we're getting many timeouts on GUI and upon checking we found that all > timeouts are happening on 2 hosts. The admin GUI for one of the hosts show > 96% of physical memory but the other host looks perfectly good. Both hosts > are for different shards. Would increasing ram of these two hosts make these > timeouts go away? What else we can check? > > Many Thanks! > >
Re: Solr4 performance
No, Solr doesn't load the entire index in memory. I think you'll find Uwe's blog most helpful on this matter: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html On Thu, Feb 13, 2014 at 12:27 AM, Joshi, Shital wrote: > Does Solr4 load entire index in Memory mapped file? What is the eviction > policy of this memory mapped file? Can we control it? > > _ > From: Joshi, Shital [Tech] > Sent: Wednesday, February 05, 2014 12:00 PM > To: 'solr-user@lucene.apache.org' > Subject: Solr4 performance > > > Hi, > > We have SolrCloud cluster (5 shards and 2 replicas) on 10 dynamic compute > boxes (cloud). We're using local disk (/local/data) to store solr index > files. All hosts have 60GB ram and Solr4 JVM are running with max 30GB heap > size. So far we have 470 million documents. We are using custom sharding and > all shards have ~9-10 million documents. We have a GUI sending queries to > this cloud and GUI has 30 seconds of timeout. > > Lately we're getting many timeouts on GUI and upon checking we found that all > timeouts are happening on 2 hosts. The admin GUI for one of the hosts show > 96% of physical memory but the other host looks perfectly good. Both hosts > are for different shards. Would increasing ram of these two hosts make these > timeouts go away? What else we can check? > > Many Thanks! > > -- Regards, Shalin Shekhar Mangar.
RE: Solr4 performance
Does Solr4 load entire index in Memory mapped file? What is the eviction policy of this memory mapped file? Can we control it? _ From: Joshi, Shital [Tech] Sent: Wednesday, February 05, 2014 12:00 PM To: 'solr-user@lucene.apache.org' Subject: Solr4 performance Hi, We have SolrCloud cluster (5 shards and 2 replicas) on 10 dynamic compute boxes (cloud). We're using local disk (/local/data) to store solr index files. All hosts have 60GB ram and Solr4 JVM are running with max 30GB heap size. So far we have 470 million documents. We are using custom sharding and all shards have ~9-10 million documents. We have a GUI sending queries to this cloud and GUI has 30 seconds of timeout. Lately we're getting many timeouts on GUI and upon checking we found that all timeouts are happening on 2 hosts. The admin GUI for one of the hosts show 96% of physical memory but the other host looks perfectly good. Both hosts are for different shards. Would increasing ram of these two hosts make these timeouts go away? What else we can check? Many Thanks!
Solr4 performance
Hi, We have SolrCloud cluster (5 shards and 2 replicas) on 10 dynamic compute boxes (cloud). We're using local disk (/local/data) to store solr index files. All hosts have 60GB ram and Solr4 JVM are running with max 30GB heap size. So far we have 470 million documents. We are using custom sharding and all shards have ~9-10 million documents. We have a GUI sending queries to this cloud and GUI has 30 seconds of timeout. Lately we're getting many timeouts on GUI and upon checking we found that all timeouts are happening on 2 hosts. The admin GUI for one of the hosts show 96% of physical memory but the other host looks perfectly good. Both hosts are for different shards. Would increasing ram of these two hosts make these timeouts go away? What else we can check? Many Thanks!