[ 
https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13726831#comment-13726831
 ] 

Erick Erickson commented on SOLR-5081:
--------------------------------------

Yeah, that is odd. The stack traces you sent basically showed no deadlocks, 
nothing interesting at all. I suspect pursuing whether anything is getting to 
Solr or not is a good idea....

Hmmmm, blunt-instrument test when the cluster is hung. What happens if you, 
say, submit a query directly to one of the nodes? Does it respond or do you see 
anything in the solr log on that node? Tip: adding &distrib=false to the 
_query_ will not try to send sub-queries to other shards.

And I wonder what happens if you, say, use post.jar (comes with the example) to 
try to send a doc to Solr when it's hung, anything?

Clearly I'm grasping at straws here, but I'm kind of out of good ideas.
                
> Highly parallel document insertion hangs SolrCloud
> --------------------------------------------------
>
>                 Key: SOLR-5081
>                 URL: https://issues.apache.org/jira/browse/SOLR-5081
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.3.1
>            Reporter: Mike Schrag
>         Attachments: threads.txt
>
>
> If I do a highly parallel document load using a Hadoop cluster into an 18 
> node solrcloud cluster, I can deadlock solr every time.
> The ulimits on the nodes are:
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 1031181
> max locked memory       (kbytes, -l) unlimited
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 32768
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 10240
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 515590
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
> The open file count is only around 4000 when this happens.
> If I bounce all the servers, things start working again, which makes me think 
> this is Solr and not ZK.
> I'll attach the stack trace from one of the servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to