Re: Solr hangs / LRU operations are heavy on cpu
We use filter very heavily because we run an e-commerce site which has a lot of faceting and drill downs configured at different paths on the store .. We are using master slave replication and we use slaves to support higher qps. filterCache : Concurrent LFU Cache(maxSize=1, initialSize=4000, minSize=9000, acceptableSize=9500, cleanupThread=true, timeDecay=true). We see 95-99% hit ratio on filter cache and most of our filters evictions on filter cache. These are figures from one of our prod boxes .. - size:9260 - warmupTime:272007 - timeDecay:true - cumulative_lookups:9220776 - cumulative_hits:9048703 - cumulative_hitratio:0.98 We had the default settings 2 yrs back on cache (untuned caches) and our perf numbers were real bad. We got like 25% latency improvement by tuning our caches properly .. So tuning the caches was well worth the effort .. On 21 March 2015 at 02:16, Erick Erickson erickerick...@gmail.com wrote: Are you faceting? That can sometimes use one of the caches (just glanced at stack trace...) as entries are pushed into and removed from the cache during the same request. Shot in the dark. Best, Erick On Fri, Mar 20, 2015 at 12:17 PM, Yonik Seeley ysee...@gmail.com wrote: The document cache is not really going to be taking up time here. How many concurrent requests (threads) are you testing with here? One thing I've seen over the years is a false sense of what is taking up time when benchmarks with a lot of threads are used. The reason is that when there are a lot more threads than CPUs, it's natural for context switches to happen where synchronizations happen. You look at a profiler or thread dumps, and you see a bunch of threads piled up on synchronization. This does not mean that removing that synchronization will really help anything... the threads can't all run at once. -Yonik On Thu, Mar 19, 2015 at 6:35 PM, Sergey Shvets ser...@bintime.com wrote: Hi, we have quite a problem with Solr. We are running it in a config 6x3, and suddenly solr started to hang, taking all the available cpu on the nodes. In the threads dump noticed things like this can eat lot of CPU time - org.apache.solr.search.LRUCache.put(LRUCache.java:116) - org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:705) - org.apache.solr.response.BinaryResponseWriter$Resolver.writeResultsBody(BinaryResponseWriter.java:155) - org.apache.solr.response.BinaryResponseWriter$Resolver.writeResults(BinaryResponseWriter.java:183) - org.apache.solr.response.BinaryResponseWriter$Resolver.resolve(BinaryResponseWriter.java:88) - org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:158) - org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:148) - org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:242) - org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:153) - org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:96) - org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:52) - org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:758) - org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:426) - org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) - org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) - org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) - org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) - org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) - org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170) - org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) - org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) - org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) The cache itself is very minimalistic filterCache class=solr.FastLRUCache size=512 initialSize=512 autowarmCount=0/ queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ fieldValueCache class=solr.FastLRUCache size=1024 autowarmCount=256 showItems=10 / cache name=perSegFilter class=solr.search.LRUCache size=10 initialSize=0 autowarmCount=10 regenerator=solr.NoOpRegenerator/ enableLazyFieldLoadingtrue/enableLazyFieldLoading queryResultWindowSize20/queryResultWindowSize queryResultMaxDocsCached200/queryResultMaxDocsCached Solr version is 4.10.3 Any of help is appreciated! sergey
Re: Solr hangs / LRU operations are heavy on cpu
On 3/19/2015 8:49 PM, Umesh Prasad wrote: It might be because LRUCache by default will try to evict its entries on each call to put and putAll. LRUCache is built on top of java's LinkedHashMap. Check the javadoc of removeEldestEntry http://docs.oracle.com/javase/7/docs/api/java/util/LinkedHashMap.html#removeEldestEntry%28java.util.Map.Entry%29 Try using LFUCache and a separate cleanup thread .. We have been using that for over 2 yrs now without any issues .. All cache implementations evict old entries on put if the cache is full, including LFUCache. What's different is how the evicted entry is chosen and how efficient the eviction process is. I wrote the LFUCache implementation that's currently in Solr. It is the most basic naive implementation of LFU that you can write, the kind of thing that a beginning Computer Science student would write to show a correct implementation. :) It's probably suitable for very small cache sizes (double digits), but if the cache size is large, LFUCache is very inefficient at eviction. With a large size, it might hit the CPU even harder than LRUCache. I have written a much better implementation that's more efficient, I need to polish the code and commit it. As a general rule, I would expect the LRU implementations to always be more efficient at eviction than any implementation of LFU, but some query patterns will have a higher cache hitCount with an LFU implementation, so the tradeoff might be worth making. Thanks, Shawn
Re: Solr hangs / LRU operations are heavy on cpu
Hello Umesh, Thank you, indeed that gave positive results so far. we changed completely to LFU. Today it went quite okay. We wait till it shows more stability and then work out the optimal cache size. Below is a summary of the changes. - filterCache class=solr.FastLRUCache size=512 initialSize=512 autowarmCount=0/ - queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ - documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ - cache name=perSegFilter class=solr.search.LRUCache size=10 initialSize=0 autowarmCount=10 regenerator=solr.NoOpRegenerator/ + filterCache class=solr.LFUCache size=512 initialSize=512 autowarmCount=0 cleanupThread=True / + queryResultCache class=solr.LFUCache size=512 initialSize=512 autowarmCount=0 cleanupThread=True / + documentCache class=solr.LFUCache size=512 initialSize=512 autowarmCount=0 cleanupThread=True / + fieldValueCache class=solr.LFUCache size=512 autowarmCount=256 showItems=10 cleanupThread=True / + cache name=perSegFilter class=solr.LFUCache size=10 initialSize=0 autowarmCount=10 regenerator=solr.NoOpRegenerator cleanupThread=True / -- Best regards, Sergeymailto:ser...@bintime.com
Re: Solr hangs / LRU operations are heavy on cpu
Hello Shawn, In that case it makes it a bit strange the behavior as it was noticed. LRU was heavy on the CPU in threads dump, and I don't have any reasonable explanation for that. However switch to LFU seemingly solved the case. -- Best regards, Sergeymailto:ser...@bintime.com
Re: Solr hangs / LRU operations are heavy on cpu
The document cache is not really going to be taking up time here. How many concurrent requests (threads) are you testing with here? One thing I've seen over the years is a false sense of what is taking up time when benchmarks with a lot of threads are used. The reason is that when there are a lot more threads than CPUs, it's natural for context switches to happen where synchronizations happen. You look at a profiler or thread dumps, and you see a bunch of threads piled up on synchronization. This does not mean that removing that synchronization will really help anything... the threads can't all run at once. -Yonik On Thu, Mar 19, 2015 at 6:35 PM, Sergey Shvets ser...@bintime.com wrote: Hi, we have quite a problem with Solr. We are running it in a config 6x3, and suddenly solr started to hang, taking all the available cpu on the nodes. In the threads dump noticed things like this can eat lot of CPU time - org.apache.solr.search.LRUCache.put(LRUCache.java:116) - org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:705) - org.apache.solr.response.BinaryResponseWriter$Resolver.writeResultsBody(BinaryResponseWriter.java:155) - org.apache.solr.response.BinaryResponseWriter$Resolver.writeResults(BinaryResponseWriter.java:183) - org.apache.solr.response.BinaryResponseWriter$Resolver.resolve(BinaryResponseWriter.java:88) - org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:158) - org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:148) - org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:242) - org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:153) - org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:96) - org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:52) - org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:758) - org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:426) - org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) - org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) - org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) - org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) - org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) - org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170) - org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) - org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) - org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) The cache itself is very minimalistic filterCache class=solr.FastLRUCache size=512 initialSize=512 autowarmCount=0/ queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ fieldValueCache class=solr.FastLRUCache size=1024 autowarmCount=256 showItems=10 / cache name=perSegFilter class=solr.search.LRUCache size=10 initialSize=0 autowarmCount=10 regenerator=solr.NoOpRegenerator/ enableLazyFieldLoadingtrue/enableLazyFieldLoading queryResultWindowSize20/queryResultWindowSize queryResultMaxDocsCached200/queryResultMaxDocsCached Solr version is 4.10.3 Any of help is appreciated! sergey
Re: Solr hangs / LRU operations are heavy on cpu
: we have quite a problem with Solr. We are running it in a config 6x3, and : suddenly solr started to hang, taking all the available cpu on the nodes. : : In the threads dump noticed things like this can eat lot of CPU time : : :- org.apache.solr.search.LRUCache.put​(LRUCache.java:116) :- :org.apache.solr.search.SolrIndexSearcher.doc​(SolrIndexSearcher.java:705) That specific code path pertains to the documentCache - this particular thread appears to be blocked on inserting docs into that (synchronized) map because of some other thread already doing an insert. depending on your usage patterns, you may find it better to just disable the documentCache completley -- it's primarily useful when you have lots of stored fields in your docs, and a lot of hot documents that are frequently returned by a lot of different searches (ie: because you always sort on the same sets of fields) ... but if you aren't seeing any hits on your documentCache, just get rid of it. the choice of having a documentCache and what type of cacheImpl to use for the doc cache can be completley independent of what impl you use ofr oher caches (maybe you disable the doc cache, use LRU for the filterCache, and LFU for the queryResultCache -- they are all independently configurable, one size doesn't neccessarily fit all) -Hoss http://www.lucidworks.com/
Re: Solr hangs / LRU operations are heavy on cpu
Are you faceting? That can sometimes use one of the caches (just glanced at stack trace...) as entries are pushed into and removed from the cache during the same request. Shot in the dark. Best, Erick On Fri, Mar 20, 2015 at 12:17 PM, Yonik Seeley ysee...@gmail.com wrote: The document cache is not really going to be taking up time here. How many concurrent requests (threads) are you testing with here? One thing I've seen over the years is a false sense of what is taking up time when benchmarks with a lot of threads are used. The reason is that when there are a lot more threads than CPUs, it's natural for context switches to happen where synchronizations happen. You look at a profiler or thread dumps, and you see a bunch of threads piled up on synchronization. This does not mean that removing that synchronization will really help anything... the threads can't all run at once. -Yonik On Thu, Mar 19, 2015 at 6:35 PM, Sergey Shvets ser...@bintime.com wrote: Hi, we have quite a problem with Solr. We are running it in a config 6x3, and suddenly solr started to hang, taking all the available cpu on the nodes. In the threads dump noticed things like this can eat lot of CPU time - org.apache.solr.search.LRUCache.put(LRUCache.java:116) - org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:705) - org.apache.solr.response.BinaryResponseWriter$Resolver.writeResultsBody(BinaryResponseWriter.java:155) - org.apache.solr.response.BinaryResponseWriter$Resolver.writeResults(BinaryResponseWriter.java:183) - org.apache.solr.response.BinaryResponseWriter$Resolver.resolve(BinaryResponseWriter.java:88) - org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:158) - org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:148) - org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:242) - org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:153) - org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:96) - org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:52) - org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:758) - org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:426) - org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) - org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) - org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) - org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) - org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) - org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170) - org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) - org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) - org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) The cache itself is very minimalistic filterCache class=solr.FastLRUCache size=512 initialSize=512 autowarmCount=0/ queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ fieldValueCache class=solr.FastLRUCache size=1024 autowarmCount=256 showItems=10 / cache name=perSegFilter class=solr.search.LRUCache size=10 initialSize=0 autowarmCount=10 regenerator=solr.NoOpRegenerator/ enableLazyFieldLoadingtrue/enableLazyFieldLoading queryResultWindowSize20/queryResultWindowSize queryResultMaxDocsCached200/queryResultMaxDocsCached Solr version is 4.10.3 Any of help is appreciated! sergey
Re: Solr hangs / LRU operations are heavy on cpu
It might be because LRUCache by default will try to evict its entries on each call to put and putAll. LRUCache is built on top of java's LinkedHashMap. Check the javadoc of removeEldestEntry http://docs.oracle.com/javase/7/docs/api/java/util/LinkedHashMap.html#removeEldestEntry%28java.util.Map.Entry%29 Try using LFUCache and a separate cleanup thread .. We have been using that for over 2 yrs now without any issues .. For comparison of Cache in solr you can check this link https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig On 20 March 2015 at 04:05, Sergey Shvets ser...@bintime.com wrote: LRUCache It -- Thanks Regards Umesh Prasad Tech Lead @ flipkart.com in.linkedin.com/pub/umesh-prasad/6/5bb/580/