[ 
https://issues.apache.org/jira/browse/SOLR-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14268278#comment-14268278
 ] 

Timothy Potter commented on SOLR-6923:
--------------------------------------

The actual runtime state of a replica is determined by 1) what's in 
clusterstate.json and 2) check that the node hosting the replica is live. If 
the node is not live, the state reported in clusterstate.json can be "stale" 
for some time. It has always worked this way in SolrCloud. Thus, 
AutoAddReplicas needs to consult live nodes prior to thinking a node is live.

> kill -9 doesn't change the replica state in clusterstate.json
> -------------------------------------------------------------
>
>                 Key: SOLR-6923
>                 URL: https://issues.apache.org/jira/browse/SOLR-6923
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Varun Thacker
>
> - I did the following 
> {code}
> ./solr start -e cloud -noprompt
> kill -9 <pid-of-node2> //Not the node which is running ZK
> {code}
> - /live_nodes reflects that the node is gone.
> - This is the only message which gets logged on the node1 server after 
> killing node2
> {code}
> 45812 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9983] WARN  
> org.apache.zookeeper.server.NIOServerCnxn  – caught end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid 
> 0x14ac40f26660001, likely client has closed socket
>     at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
>     at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
>     at java.lang.Thread.run(Thread.java:745)
> {code}
> - The graph shows the node2 as 'Gone' state
> - clusterstate.json keeps showing the replica as 'active'
> {code}
> {"collection1":{
>     "shards":{"shard1":{
>         "range":"80000000-7fffffff",
>         "state":"active",
>         "replicas":{
>           "core_node1":{
>             "state":"active",
>             "core":"collection1",
>             "node_name":"169.254.113.194:8983_solr",
>             "base_url":"http://169.254.113.194:8983/solr";,
>             "leader":"true"},
>           "core_node2":{
>             "state":"active",
>             "core":"collection1",
>             "node_name":"169.254.113.194:8984_solr",
>             "base_url":"http://169.254.113.194:8984/solr"}}}},
>     "maxShardsPerNode":"1",
>     "router":{"name":"compositeId"},
>     "replicationFactor":"1",
>     "autoAddReplicas":"false",
>     "autoCreated":"true"}}
> {code}
> One immediate problem I can see is that AutoAddReplicas doesn't work since 
> the clusterstate.json never changes. There might be more features which are 
> affected by this.
> On first thought I think we can handle this - The shard leader could listen 
> to changes on /live_nodes and if it has replicas that were on that node, mark 
> it as 'down' in the clusterstate.json?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to