Re: Queries regarding solr cache
On 1/4/2017 3:45 AM, kshitij tyagi wrote: > Problem: > > I am Noticing that my slaves are not able to use proper caching as: > > 1. I am indexing on my master and committing frequently, what i am noticing > is that my slaves are committing very frequently and cache is not being > build properly and so my hit ratio is almost zero for caching. > > 2. What changes I need to make so that the cache builds up properly even > after commits and cache could be used properly, this is wasting a lot of my > resources and also slowering up the queries. Whenever you commit with openSearcher set to true (which is the default), Solr immediately throws the cache away. This is by design -- the cache contains internal document IDs from the previous index, due to merging, the new index might have entirely different ID values for the same documents. A commit on the master will cause the slave to copy the index on its next configured replication interval, and then basically do a commit of its own to signal that a new searcher is needed. The caches have a feature called autowarming, which takes the top N entries in the cache and re-executes the queries that produced the entries to populate the new cache before the new searcher starts. If you set autowarmCount too high, it makes the commits take a really long time. If you are committing so frequently that your cache is ineffective, then you need to commit less frequently. Whenever you do a commit on the master, the slave will also do a commit after it copies the new index. Thanks, Shawn
Re: Queries regarding solr cache
Hi Shawn, Need your help: I am using master slave architecture in my system and here is the solrconfig.xml: ${enable.master:false} startup commit 00:00:10 managed-schema ${enable.slave:false} http://${MASTER_CORE_URL}/${solr.core.name} ${POLL_TIME} Problem: I am Noticing that my slaves are not able to use proper caching as: 1. I am indexing on my master and committing frequently, what i am noticing is that my slaves are committing very frequently and cache is not being build properly and so my hit ratio is almost zero for caching. 2. What changes I need to make so that the cache builds up properly even after commits and cache could be used properly, this is wasting a lot of my resources and also slowering up the queries. On Mon, Dec 5, 2016 at 9:06 PM, Shawn Heisey wrote: > On 12/5/2016 6:44 AM, kshitij tyagi wrote: > > - lookups:381 > > - hits:24 > > - hitratio:0.06 > > - inserts:363 > > - evictions:0 > > - size:345 > > - warmupTime:2932 > > - cumulative_lookups:294948 > > - cumulative_hits:15840 > > - cumulative_hitratio:0.05 > > - cumulative_inserts:277963 > > - cumulative_evictions:70078 > > > > How can I increase my hit ratio? I am not able to understand solr > > caching mechanism clearly. Please help. > > This means that out of the nearly 30 queries executed by that > handler, only five percent (15000) of them were found in the cache. The > rest of them were not found in the cache at the moment they were made. > Since these numbers come from the queryResultCache, this refers to the > "q" parameter. The filterCache handles things in the fq parameter. The > documentCache holds actual documents from your index and fills in stored > data in results so the document doesn't have to be fetched from the index. > > Possible reasons: 1) Your users are rarely entering the same query more > than once. 2) Your client code is adding something unique to every > query (q parameter) so very few of them are the same. 3) You are > committing so frequently that the cache never has a chance to get large > enough to make a difference. > > Here are some queryResultCache stats from one of my indexes: > > class:org.apache.solr.search.FastLRUCache > version:1.0 > description:Concurrent LRU Cache(maxSize=512, initialSize=512, > minSize=460, acceptableSize=486, cleanupThread=true, > autowarmCount=8, > regenerator=org.apache.solr.search.SolrIndexSearcher$3@1d172ac0) > src:$URL: > https://svn.apache.org/repos/asf/lucene/dev/ > branches/lucene_solr_4_7/solr/core/src/java/org/ > apache/solr/search/FastLRUCache.java > lookups: 3496 > hits: 3145 > hitratio: 0.9 > inserts: 335 > evictions: 0 > size: 338 > warmupTime: 2209 > cumulative_lookups: 12394606 > cumulative_hits: 11247114 > cumulative_hitratio: 0.91 > cumulative_inserts: 1110375 > cumulative_evictions: 409887 > > These numbers indicate that 91 percent of the queries made to this > handler were served from the cache. > > Thanks, > Shawn > >
Re: Queries regarding solr cache
On 12/5/2016 6:44 AM, kshitij tyagi wrote: > - lookups:381 > - hits:24 > - hitratio:0.06 > - inserts:363 > - evictions:0 > - size:345 > - warmupTime:2932 > - cumulative_lookups:294948 > - cumulative_hits:15840 > - cumulative_hitratio:0.05 > - cumulative_inserts:277963 > - cumulative_evictions:70078 > > How can I increase my hit ratio? I am not able to understand solr > caching mechanism clearly. Please help. This means that out of the nearly 30 queries executed by that handler, only five percent (15000) of them were found in the cache. The rest of them were not found in the cache at the moment they were made. Since these numbers come from the queryResultCache, this refers to the "q" parameter. The filterCache handles things in the fq parameter. The documentCache holds actual documents from your index and fills in stored data in results so the document doesn't have to be fetched from the index. Possible reasons: 1) Your users are rarely entering the same query more than once. 2) Your client code is adding something unique to every query (q parameter) so very few of them are the same. 3) You are committing so frequently that the cache never has a chance to get large enough to make a difference. Here are some queryResultCache stats from one of my indexes: class:org.apache.solr.search.FastLRUCache version:1.0 description:Concurrent LRU Cache(maxSize=512, initialSize=512, minSize=460, acceptableSize=486, cleanupThread=true, autowarmCount=8, regenerator=org.apache.solr.search.SolrIndexSearcher$3@1d172ac0) src:$URL: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_7/solr/core/src/java/org/apache/solr/search/FastLRUCache.java lookups: 3496 hits: 3145 hitratio: 0.9 inserts: 335 evictions: 0 size: 338 warmupTime: 2209 cumulative_lookups: 12394606 cumulative_hits: 11247114 cumulative_hitratio: 0.91 cumulative_inserts: 1110375 cumulative_evictions: 409887 These numbers indicate that 91 percent of the queries made to this handler were served from the cache. Thanks, Shawn
Re: Queries regarding solr cache
Hi Shawn, Thanks for the reply: here are the details for query result cache(i am not using NOW in my queries and most of the queries are common): - class:org.apache.solr.search.LRUCache - version:1.0 - description:LRU Cache(maxSize=1000, initialSize=1000, autowarmCount=10, regenerator=org.apache.solr.search.SolrIndexSearcher$3@73380510) - src:null - stats: - lookups:381 - hits:24 - hitratio:0.06 - inserts:363 - evictions:0 - size:345 - warmupTime:2932 - cumulative_lookups:294948 - cumulative_hits:15840 - cumulative_hitratio:0.05 - cumulative_inserts:277963 - cumulative_evictions:70078 How can I increase my hit ratio? I am not able to understand solr caching mechanism clearly. Please help. On Thu, Dec 1, 2016 at 8:19 PM, Shawn Heisey wrote: > On 12/1/2016 4:04 AM, kshitij tyagi wrote: > > I am using Solr and serving huge number of requests in my application. > > > > I need to know how can I utilize caching in Solr. > > > > As of now in then clicking Core Selector → [core name] → Plugins / > Stats. > > > > I am seeing my hit ration as 0 for all the caches. What does this mean > and > > how this can be optimized. > > If your hitratio is zero, then none of the queries related to that cache > are finding matches. This means that your client systems are never > sending the same query twice. > > One possible reason for a zero hitratio is using "NOW" in date queries > -- NOW changes every millisecond, and the actual timestamp value is what > ends up in the cache. This means that the same query with NOW executed > more than once will actually be different from the cache's perspective. > The solution is date rounding -- using things like NOW/HOUR or NOW/DAY. > You could use NOW/MINUTE, but the window for caching would be quite small. > > 5000 entries for your filterCache is almost certainly too big. Each > filterCache entry tends to be quite large. If the core has ten million > documents in it, then each filterCache entry would be 1.25 million bytes > in size -- the entry is a bitset of all documents in the core. This > includes deleted docs that have not yet been reclaimed by merging. If a > filterCache for an index that size (which is not all that big) were to > actually fill up with 5000 entries, it would require over six gigabytes > of memory just for the cache. > > The 1000 that you have on queryResultCache is also rather large, but > probably not a problem. There's also documentCache, which generally is > OK to have sized at several thousand -- I have 16384 on mine. If your > documents are particularly large, then you probably would want to have a > smaller number. > > It's good that your autowarmCount values are low. High values here tend > to make commits take a very long time. > > You do not need to send your message more than once. The first repeat > was after less than 40 minutes. The second was after about two hours. > Waiting a day or two for a response, particularly for a difficult > problem, is not unusual for a mailing list. I begain this reply as soon > as I saw your message -- about 7:30 AM in my timezone. > > Thanks, > Shawn > >
Re: Queries regarding solr cache
I found this, which intends to explore the usage of RoaringDocIdSet for solr: https://issues.apache.org/jira/browse/SOLR-9008 This suggests Lucene’s filter cache already uses it, or did at one point: https://issues.apache.org/jira/browse/LUCENE-6077 I was playing with id set implementations earlier this year for https://issues.apache.org/jira/browse/LUCENE-7211. I know I tried a SparseFixedBitSet there, and I think that I observed that RoaringDocIdSet existed and tried that too, but I apparently didn’t write anything down. My vague recollection is that I couldn’t use it for my use case, due to the in-order insertion requirement. On 12/1/16, 8:10 AM, "Shawn Heisey" wrote: On 12/1/2016 8:16 AM, Dorian Hoxha wrote: > @Shawn > Any idea why the cache doesn't use roaring bitsets ? I had to look that up to even know what it was. Apparently Lucene does have an implementation of that, a class called RoaringDocIdSet. It was incorporated into the source code in October 2014 with this issue: https://issues.apache.org/jira/browse/LUCENE-5983 As for the reason that it wasn't used for the filterCache, I think that's because the filterCache existed LONG before that bitset implementation was available, and when things work well (which describes the filterCache), devs try not to mess with them too much. I have mentioned the idea on a recently-filed issue regarding bitset memory efficiency: https://issues.apache.org/jira/browse/SOLR-9764 Thanks, Shawn
Re: Queries regarding solr cache
On 12/1/2016 8:16 AM, Dorian Hoxha wrote: > @Shawn > Any idea why the cache doesn't use roaring bitsets ? I had to look that up to even know what it was. Apparently Lucene does have an implementation of that, a class called RoaringDocIdSet. It was incorporated into the source code in October 2014 with this issue: https://issues.apache.org/jira/browse/LUCENE-5983 As for the reason that it wasn't used for the filterCache, I think that's because the filterCache existed LONG before that bitset implementation was available, and when things work well (which describes the filterCache), devs try not to mess with them too much. I have mentioned the idea on a recently-filed issue regarding bitset memory efficiency: https://issues.apache.org/jira/browse/SOLR-9764 Thanks, Shawn
Re: Queries regarding solr cache
@Shawn Any idea why the cache doesn't use roaring bitsets ? On Thu, Dec 1, 2016 at 3:49 PM, Shawn Heisey wrote: > On 12/1/2016 4:04 AM, kshitij tyagi wrote: > > I am using Solr and serving huge number of requests in my application. > > > > I need to know how can I utilize caching in Solr. > > > > As of now in then clicking Core Selector → [core name] → Plugins / > Stats. > > > > I am seeing my hit ration as 0 for all the caches. What does this mean > and > > how this can be optimized. > > If your hitratio is zero, then none of the queries related to that cache > are finding matches. This means that your client systems are never > sending the same query twice. > > One possible reason for a zero hitratio is using "NOW" in date queries > -- NOW changes every millisecond, and the actual timestamp value is what > ends up in the cache. This means that the same query with NOW executed > more than once will actually be different from the cache's perspective. > The solution is date rounding -- using things like NOW/HOUR or NOW/DAY. > You could use NOW/MINUTE, but the window for caching would be quite small. > > 5000 entries for your filterCache is almost certainly too big. Each > filterCache entry tends to be quite large. If the core has ten million > documents in it, then each filterCache entry would be 1.25 million bytes > in size -- the entry is a bitset of all documents in the core. This > includes deleted docs that have not yet been reclaimed by merging. If a > filterCache for an index that size (which is not all that big) were to > actually fill up with 5000 entries, it would require over six gigabytes > of memory just for the cache. > > The 1000 that you have on queryResultCache is also rather large, but > probably not a problem. There's also documentCache, which generally is > OK to have sized at several thousand -- I have 16384 on mine. If your > documents are particularly large, then you probably would want to have a > smaller number. > > It's good that your autowarmCount values are low. High values here tend > to make commits take a very long time. > > You do not need to send your message more than once. The first repeat > was after less than 40 minutes. The second was after about two hours. > Waiting a day or two for a response, particularly for a difficult > problem, is not unusual for a mailing list. I begain this reply as soon > as I saw your message -- about 7:30 AM in my timezone. > > Thanks, > Shawn > >
Re: Queries regarding solr cache
On 12/1/2016 4:04 AM, kshitij tyagi wrote: > I am using Solr and serving huge number of requests in my application. > > I need to know how can I utilize caching in Solr. > > As of now in then clicking Core Selector → [core name] → Plugins / Stats. > > I am seeing my hit ration as 0 for all the caches. What does this mean and > how this can be optimized. If your hitratio is zero, then none of the queries related to that cache are finding matches. This means that your client systems are never sending the same query twice. One possible reason for a zero hitratio is using "NOW" in date queries -- NOW changes every millisecond, and the actual timestamp value is what ends up in the cache. This means that the same query with NOW executed more than once will actually be different from the cache's perspective. The solution is date rounding -- using things like NOW/HOUR or NOW/DAY. You could use NOW/MINUTE, but the window for caching would be quite small. 5000 entries for your filterCache is almost certainly too big. Each filterCache entry tends to be quite large. If the core has ten million documents in it, then each filterCache entry would be 1.25 million bytes in size -- the entry is a bitset of all documents in the core. This includes deleted docs that have not yet been reclaimed by merging. If a filterCache for an index that size (which is not all that big) were to actually fill up with 5000 entries, it would require over six gigabytes of memory just for the cache. The 1000 that you have on queryResultCache is also rather large, but probably not a problem. There's also documentCache, which generally is OK to have sized at several thousand -- I have 16384 on mine. If your documents are particularly large, then you probably would want to have a smaller number. It's good that your autowarmCount values are low. High values here tend to make commits take a very long time. You do not need to send your message more than once. The first repeat was after less than 40 minutes. The second was after about two hours. Waiting a day or two for a response, particularly for a difficult problem, is not unusual for a mailing list. I begain this reply as soon as I saw your message -- about 7:30 AM in my timezone. Thanks, Shawn
Fwd: Queries regarding solr cache
-- Forwarded message -- From: kshitij tyagi Date: Thu, Dec 1, 2016 at 4:34 PM Subject: Queries regarding solr cache To: solr-user@lucene.apache.org Hi All, I am using Solr and serving huge number of requests in my application. I need to know how can I utilize caching in Solr. As of now in then clicking Core Selector → [core name] → Plugins / Stats. I am seeing my hit ration as 0 for all the caches. What does this mean and how this can be optimized. My current solr configurations are: Regards, Kshitij
Queries regarding solr cache
Hi All, I am using Solr and serving huge number of requests in my application. I need to know how can I utilize caching in Solr. As of now in then clicking Core Selector → [core name] → Plugins / Stats. I am seeing my hit ration as 0 for all the caches. What does this mean and how this can be optimized. My current solr configurations are: Regards, Kshitij