[ 
https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748047#comment-13748047
 ] 

Kevin Osborn commented on SOLR-5081:
------------------------------------

I may have this issue as well. I am posting batches of 1000 through SolrJ. I 
have autoCommit set to 15000 with openSearcher=false. autoSoftCommit is set to 
30000. During my initial testing, I was able to recreate it after just a couple 
updates. I then change the limit of the number of open files for the process 
from 4096 to 15000. This seemed to help, but only to a point.

If all my updates are at once, it seems to succeed. But if I have pauses 
between updates, it seems to have problems. I have also only seen this error 
when I have more than 1 node in my SolrCloud cluster.

I also took a look at netstat. There seemed to be a lot of connections between 
my two nodes. Could the the frequency of my updates be overwhelming the 
connection from the leader to the replica?

Deletes also fail, but queries still seem to work.

Restarting the nodes fixes the problem.
                
> Highly parallel document insertion hangs SolrCloud
> --------------------------------------------------
>
>                 Key: SOLR-5081
>                 URL: https://issues.apache.org/jira/browse/SOLR-5081
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.3.1
>            Reporter: Mike Schrag
>         Attachments: threads.txt
>
>
> If I do a highly parallel document load using a Hadoop cluster into an 18 
> node solrcloud cluster, I can deadlock solr every time.
> The ulimits on the nodes are:
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 1031181
> max locked memory       (kbytes, -l) unlimited
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 32768
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 10240
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 515590
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
> The open file count is only around 4000 when this happens.
> If I bounce all the servers, things start working again, which makes me think 
> this is Solr and not ZK.
> I'll attach the stack trace from one of the servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to