[ https://issues.apache.org/jira/browse/SOLR-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mark Miller updated SOLR-5691: ------------------------------ Fix Version/s: 4.7 5.0 > Unsynchronized WeakHashMap in SolrDispatchFilter causing issues in SolrCloud > ---------------------------------------------------------------------------- > > Key: SOLR-5691 > URL: https://issues.apache.org/jira/browse/SOLR-5691 > Project: Solr > Issue Type: Bug > Components: SolrCloud > Affects Versions: 4.6.1 > Reporter: Bojan Smid > Assignee: Mark Miller > Fix For: 5.0, 4.7 > > > I have a large SolrCloud setup, 7 nodes, each hosting few 1000 cores > (leaders/replicas of same shard exist on different nodes), which is maybe > making it easier to notice the problem. > Node can randomly get into a state where it "stops" responding to PeerSync > /get requests from other nodes. When that happens, threaddump of that node > shows multiple entries like this one (one entry for each "blocked" request > from other node; they don't go away with time): > "http-bio-8080-exec-1781" daemon prio=5 tid=0x440177200000 nid=0x25ae [ JVM > locked by VM at safepoint, polling bits: safep ] > java.lang.Thread.State: RUNNABLE > at java.util.WeakHashMap.get(WeakHashMap.java:471) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) > WeakHashMap's internal state can easily get corrupted when used in > unsynchronized way, in which case it is known to enter infinite loop in > .get() call. It is very likely that this happens here too. The reason why > other maybe don't see this issue could be related to huge number of cores I > have in this system. The problem is usually created when some node is > starting. Also, it doesn't happen with each start, it obviously depends on > "correct" timing of events which lead to map's corruption. > The fix may be as simple as changing: > protected final Map<SolrConfig, SolrRequestParsers> parsers = new > WeakHashMap<SolrConfig, SolrRequestParsers>(); > to: > protected final Map<SolrConfig, SolrRequestParsers> parsers = > Collections.synchronizedMap( > new WeakHashMap<SolrConfig, SolrRequestParsers>()); > but there may be performance considerations around this since it is entrance > into Solr. -- This message was sent by Atlassian JIRA (v6.1.5#6160) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org