[jira] [Commented] (SOLR-3685) Solr Cloud sometimes skipped peersync attempt and replicated instead due to tlog flags not being cleared when no updates were buffered during a previous replication.

Uwe Schindler (JIRA) Thu, 16 Aug 2012 12:13:41 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436243#comment-13436243
 ]


Uwe Schindler commented on SOLR-3685:
-------------------------------------

Hi,
I also don't think MMap is the reason for this, but it's good that you test it. 
You are saying that this happened with NIOFS, too, so my only guess is:

As noted before (in my last comment), there seems to be something using 
off-heap memory (RES does not contain mmap, so if RES raises, its definitely 
not mmap), but other "direct memory". I am not sure about other components in 
solr, that might use direct memory. Maybe Zookeeper? Its hard to find those 
things in external libraries. Can you try to limit the -XX:MaxDirectMemorySize 
to zero and see if exceptions occur? Also it would be good to have the output 
of "pmap <pid>", this shows allocated and mapped memory, we should look at 
anonymous mappings and how many are there. Pmap is in procutils package.
                
> Solr Cloud sometimes skipped peersync attempt and replicated instead due to 
> tlog flags not being cleared when no updates were buffered during a previous 
> replication.
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-3685
>                 URL: https://issues.apache.org/jira/browse/SOLR-3685
>             Project: Solr
>          Issue Type: Bug
>          Components: replication (java), SolrCloud
>    Affects Versions: 4.0-ALPHA
>         Environment: Debian GNU/Linux Squeeze 64bit
> Solr 5.0-SNAPSHOT 1365667M - markus - 2012-07-25 19:09:43
>            Reporter: Markus Jelsma
>            Assignee: Yonik Seeley
>            Priority: Critical
>             Fix For: 4.0, 5.0
>
>         Attachments: info.log, oom-killer.log
>
>
> There's a serious problem with restarting nodes, not cleaning old or unused 
> index directories and sudden replication and Java being killed by the OS due 
> to excessive memory allocation. Since SOLR-1781 was fixed index directories 
> get cleaned up when a node is being restarted cleanly, however, old or unused 
> index directories still pile up if Solr crashes or is being killed by the OS, 
> happening here.
> We have a six-node 64-bit Linux test cluster with each node having two 
> shards. There's 512MB RAM available and no swap. Each index is roughly 27MB 
> so about 50MB per node, this fits easily and works fine. However, if a node 
> is being restarted, Solr will consistently crash because it immediately eats 
> up all RAM. If swap is enabled Solr will eat an additional few 100MB's right 
> after start up.
> This cannot be solved by restarting Solr, it will just crash again and leave 
> index directories in place until the disk is full. The only way i can restart 
> a node safely is to delete the index directories and have it replicate from 
> another node. If i then restart the node it will crash almost consistently.
> I'll attach a log of one of the nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3685) Solr Cloud sometimes skipped peersync attempt and replicated instead due to tlog flags not being cleared when no updates were buffered during a previous replication.

Reply via email to