[ https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13726831#comment-13726831 ]
Erick Erickson commented on SOLR-5081: -------------------------------------- Yeah, that is odd. The stack traces you sent basically showed no deadlocks, nothing interesting at all. I suspect pursuing whether anything is getting to Solr or not is a good idea.... Hmmmm, blunt-instrument test when the cluster is hung. What happens if you, say, submit a query directly to one of the nodes? Does it respond or do you see anything in the solr log on that node? Tip: adding &distrib=false to the _query_ will not try to send sub-queries to other shards. And I wonder what happens if you, say, use post.jar (comes with the example) to try to send a doc to Solr when it's hung, anything? Clearly I'm grasping at straws here, but I'm kind of out of good ideas. > Highly parallel document insertion hangs SolrCloud > -------------------------------------------------- > > Key: SOLR-5081 > URL: https://issues.apache.org/jira/browse/SOLR-5081 > Project: Solr > Issue Type: Bug > Components: SolrCloud > Affects Versions: 4.3.1 > Reporter: Mike Schrag > Attachments: threads.txt > > > If I do a highly parallel document load using a Hadoop cluster into an 18 > node solrcloud cluster, I can deadlock solr every time. > The ulimits on the nodes are: > core file size (blocks, -c) 0 > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 1031181 > max locked memory (kbytes, -l) unlimited > max memory size (kbytes, -m) unlimited > open files (-n) 32768 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 10240 > cpu time (seconds, -t) unlimited > max user processes (-u) 515590 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > The open file count is only around 4000 when this happens. > If I bounce all the servers, things start working again, which makes me think > this is Solr and not ZK. > I'll attach the stack trace from one of the servers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org