[jira] [Updated] (SOLR-10987) Solr Cloud (5 nodes and 70 million documents) going down, when the overseer node becomes unreachable. Issue Started Recently

RAHAT BHALLA (JIRA) Fri, 30 Jun 2017 09:09:38 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-10987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


RAHAT BHALLA updated SOLR-10987:
--------------------------------
    Summary: Solr Cloud (5 nodes and 70 million documents) going down, when the 
overseer node becomes unreachable. Issue Started Recently  (was: Solr Cloud (5 
nodes and 70 million documents) going down, when the overseer node becomes 
unreachable. Started Recently)

> Solr Cloud (5 nodes and 70 million documents) going down, when the overseer 
> node becomes unreachable. Issue Started Recently
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-10987
>                 URL: https://issues.apache.org/jira/browse/SOLR-10987
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 6.1
>         Environment: *The following is the usage on each of the Solr Nodes:*
> Tasks: 254 total,   1 running, 252 sleeping,   0 stopped,   1 zombie
> %Cpu(s):  0.4 us,  0.3 sy,  0.0 ni, 99.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 
> st
> KiB Mem : 20392276 total,  4169296 free,  2917012 used, 13305968 buff/cache
> KiB Swap:  5111804 total,  5111636 free,      168 used. 16058184 avail Mem
>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
> 21250 solr      20   0 23.599g 1.184g 228440 S   2.0  6.1  59:55.91 java
> *Solr is running on 5 machines with similar configuration:*
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                4
> On-line CPU(s) list:   0-3
> Thread(s) per core:    1
> Core(s) per socket:    2
> Socket(s):             2
> NUMA node(s):          1
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 62
> Model name:            Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
> Stepping:              4
> CPU MHz:               2799.033
> BogoMIPS:              5600.00
> Hypervisor vendor:     VMware
> Virtualization type:   full
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              256K
> L3 cache:              25600K
> NUMA node0 CPU(s):     0-3
>            Reporter: RAHAT BHALLA
>              Labels: assistance, critical, customer, impacting, issue, need, 
> production
>
> We host a Solr Cloud of 5 Nodes for Solr Instances and 3 Zookeeper nodes to 
> maintain the cloud. We have over 70 million docs spread across 13 collections 
> with 40K more documents being added every day almost near time within spans 
> of 5 to 6 minutes.
> The System was working as expected and as required for th elast 7 months 
> until suddenly we saw the following exception and all of our instances went 
> offline. We restarted the instances and the cloud ran smoothly for three days 
> before it came crashing down again.
> *Exception It gives before it goes down is as follows:*
> 3542285 ERROR 
> (OverseerCollectionConfigSetProcessor-98221003671470081-prod-solr-node01:9080_solr-n_0000000106)
>  [   ] o.a.s.c.OverseerTaskProcessor
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /overseer_elect/leader
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
>         at 
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:348)
>         at 
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:345)
>         at 
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
>         at 
> org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:345)
>         at 
> org.apache.solr.cloud.OverseerTaskProcessor.amILeader(OverseerTaskProcessor.java:384)
>         at 
> org.apache.solr.cloud.OverseerTaskProcessor.run(OverseerTaskProcessor.java:191)
>         at java.lang.Thread.run(Unknown Source)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10987) Solr Cloud (5 nodes and 70 million documents) going down, when the overseer node becomes unreachable. Issue Started Recently

Reply via email to