Varun Thacker created SOLR-6923:
-----------------------------------

             Summary: kill -9 doesn't change the replica state in 
clusterstate.json
                 Key: SOLR-6923
                 URL: https://issues.apache.org/jira/browse/SOLR-6923
             Project: Solr
          Issue Type: Bug
            Reporter: Varun Thacker


- I did the following 
{code}
./solr start -e cloud -noprompt

kill -9 <pid-of-node2> //Not the node which is running ZK
{code}

- /live_nodes reflects that the node is gone.

- This is the only message which gets logged on the node1 server after killing 
node2

{code}
45812 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9983] WARN  
org.apache.zookeeper.server.NIOServerCnxn  – caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 
0x14ac40f26660001, likely client has closed socket
    at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
    at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
    at java.lang.Thread.run(Thread.java:745)
{code}

- The graph shows the node2 as 'Gone' state

- clusterstate.json keeps showing the replica as 'active'

{code}
{"collection1":{
    "shards":{"shard1":{
        "range":"80000000-7fffffff",
        "state":"active",
        "replicas":{
          "core_node1":{
            "state":"active",
            "core":"collection1",
            "node_name":"169.254.113.194:8983_solr",
            "base_url":"http://169.254.113.194:8983/solr";,
            "leader":"true"},
          "core_node2":{
            "state":"active",
            "core":"collection1",
            "node_name":"169.254.113.194:8984_solr",
            "base_url":"http://169.254.113.194:8984/solr"}}}},
    "maxShardsPerNode":"1",
    "router":{"name":"compositeId"},
    "replicationFactor":"1",
    "autoAddReplicas":"false",
    "autoCreated":"true"}}
{code}


One immediate problem I can see is that AutoAddReplicas doesn't work since the 
clusterstate.json never changes. There might be more features which are 
affected by this.

On first thought I think we can handle this - The shard leader could listen to 
changes on /live_nodes and if it has replicas that were on that node, mark it 
as 'down' in the clusterstate.json?






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to