Gregg Donovan created SOLR-5805:
-----------------------------------

             Summary: SolrCloud: run a healthcheck in a background thread
                 Key: SOLR-5805
                 URL: https://issues.apache.org/jira/browse/SOLR-5805
             Project: Solr
          Issue Type: Improvement
          Components: SolrCloud
    Affects Versions: 4.7
            Reporter: Gregg Donovan


>From a [discussion|http://search-lucene.com/m/QTPaJeWIM/] on the mailing list:

We had a brief SolrCloud outage this weekend when a node's SSD began to fail 
but the node still appeared to be up to the rest of the SolrCloud cluster (i.e. 
still green in clusterstate.json). Distributed queries that reached this node 
would fail but whatever heartbeat keeps the node in the clusterstate.json must 
have continued to succeed.

We eventually had to power the node down to get it to be removed from 
clusterstate.json.

Mark Miller:
"One simple improvement might even be a background thread that periodically 
checks some local readings and depending on the results, pulls itself out of 
the mix as best it can (remove itself from clusterstate.json or simply closes 
it’s zk connection)."



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to