Hi, We're running Solr 4.10.1 on Linux using Tomcat. Distributed environment, 40 virtual servers with high resources. Concurrent queries that are quite complex (may be hundreds of terms), NRT indexing and a few hundreds of facet fields which might have many (hundreds of thousands) distinct values.
We've configured a 6GB JVM heap, and after quite a bit of work, it seems to be pretty well configured GC parameter-wise (we're using CMS and ParNew). The following problem occurs - Once every couple of hours, suddenly start getting "concurrent-mode-failure" on one or more servers, the memory starts climbing up further and further and "concurrent-mode-failure" continues. Naturally, during this time, SOLR is unresponsive and the queries are timed-out. Eventually it might pass (GC will succeed), after 5-10 minutes. Sometimes this phenomenon can occur for a great deal of time, one server goes up and then another and so forth. Memory dumps point to ConcurrentLRUCache (used in filterCache and fieldValueCache). Mathematically speaking, the sizes I see in the dumps do not make sense. The configured sizes shouldn't take up more than a few hunderds of MBs. Any ideas? Anyone seen this kind of problem?