[ https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721767#comment-13721767 ]
Erick Erickson commented on SOLR-5081: -------------------------------------- You really want to go for broke? Try SOLR-4816 (note assuming you're indexing from SolrJ). The deadlock I've seen has to do with intra-shard routing, essentially forwarding the packets to other shards, if there are enough packets, can lead to this situation. That JIRA is about having SolrJ just send the documents to the right leader so it will not have to route the docs to other shards. We'd be really interested to see if that worked in the real world... NOTE: I'm not sure what the current state of that patch is, I think it was ready to rock-n-roll but just missed the cut for 4.4. > Highly parallel document insertion hangs SolrCloud > -------------------------------------------------- > > Key: SOLR-5081 > URL: https://issues.apache.org/jira/browse/SOLR-5081 > Project: Solr > Issue Type: Bug > Components: SolrCloud > Affects Versions: 4.3.1 > Reporter: Mike Schrag > Attachments: threads.txt > > > If I do a highly parallel document load using a Hadoop cluster into an 18 > node solrcloud cluster, I can deadlock solr every time. > The ulimits on the nodes are: > core file size (blocks, -c) 0 > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 1031181 > max locked memory (kbytes, -l) unlimited > max memory size (kbytes, -m) unlimited > open files (-n) 32768 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 10240 > cpu time (seconds, -t) unlimited > max user processes (-u) 515590 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > The open file count is only around 4000 when this happens. > If I bounce all the servers, things start working again, which makes me think > this is Solr and not ZK. > I'll attach the stack trace from one of the servers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org