[jira] [Commented] (SOLR-6923) kill -9 doesn't change the replica state in clusterstate.json

2015-01-11 Thread Varun Thacker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272953#comment-14272953
 ] 

Varun Thacker commented on SOLR-6923:
-

Thanks Tim for pointing it out. I was not aware of this.

I'll rename the issue appropriately with this information and come up up with a 
patch for AutoAddReplicas to consult live nodes too.

> kill -9 doesn't change the replica state in clusterstate.json
> -
>
> Key: SOLR-6923
> URL: https://issues.apache.org/jira/browse/SOLR-6923
> Project: Solr
>  Issue Type: Bug
>Reporter: Varun Thacker
>
> - I did the following 
> {code}
> ./solr start -e cloud -noprompt
> kill -9  //Not the node which is running ZK
> {code}
> - /live_nodes reflects that the node is gone.
> - This is the only message which gets logged on the node1 server after 
> killing node2
> {code}
> 45812 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9983] WARN  
> org.apache.zookeeper.server.NIOServerCnxn  – caught end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid 
> 0x14ac40f26660001, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> - The graph shows the node2 as 'Gone' state
> - clusterstate.json keeps showing the replica as 'active'
> {code}
> {"collection1":{
> "shards":{"shard1":{
> "range":"8000-7fff",
> "state":"active",
> "replicas":{
>   "core_node1":{
> "state":"active",
> "core":"collection1",
> "node_name":"169.254.113.194:8983_solr",
> "base_url":"http://169.254.113.194:8983/solr";,
> "leader":"true"},
>   "core_node2":{
> "state":"active",
> "core":"collection1",
> "node_name":"169.254.113.194:8984_solr",
> "base_url":"http://169.254.113.194:8984/solr",
> "maxShardsPerNode":"1",
> "router":{"name":"compositeId"},
> "replicationFactor":"1",
> "autoAddReplicas":"false",
> "autoCreated":"true"}}
> {code}
> One immediate problem I can see is that AutoAddReplicas doesn't work since 
> the clusterstate.json never changes. There might be more features which are 
> affected by this.
> On first thought I think we can handle this - The shard leader could listen 
> to changes on /live_nodes and if it has replicas that were on that node, mark 
> it as 'down' in the clusterstate.json?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6923) kill -9 doesn't change the replica state in clusterstate.json

2015-01-07 Thread Timothy Potter (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14268278#comment-14268278
 ] 

Timothy Potter commented on SOLR-6923:
--

The actual runtime state of a replica is determined by 1) what's in 
clusterstate.json and 2) check that the node hosting the replica is live. If 
the node is not live, the state reported in clusterstate.json can be "stale" 
for some time. It has always worked this way in SolrCloud. Thus, 
AutoAddReplicas needs to consult live nodes prior to thinking a node is live.

> kill -9 doesn't change the replica state in clusterstate.json
> -
>
> Key: SOLR-6923
> URL: https://issues.apache.org/jira/browse/SOLR-6923
> Project: Solr
>  Issue Type: Bug
>Reporter: Varun Thacker
>
> - I did the following 
> {code}
> ./solr start -e cloud -noprompt
> kill -9  //Not the node which is running ZK
> {code}
> - /live_nodes reflects that the node is gone.
> - This is the only message which gets logged on the node1 server after 
> killing node2
> {code}
> 45812 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9983] WARN  
> org.apache.zookeeper.server.NIOServerCnxn  – caught end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid 
> 0x14ac40f26660001, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> - The graph shows the node2 as 'Gone' state
> - clusterstate.json keeps showing the replica as 'active'
> {code}
> {"collection1":{
> "shards":{"shard1":{
> "range":"8000-7fff",
> "state":"active",
> "replicas":{
>   "core_node1":{
> "state":"active",
> "core":"collection1",
> "node_name":"169.254.113.194:8983_solr",
> "base_url":"http://169.254.113.194:8983/solr";,
> "leader":"true"},
>   "core_node2":{
> "state":"active",
> "core":"collection1",
> "node_name":"169.254.113.194:8984_solr",
> "base_url":"http://169.254.113.194:8984/solr",
> "maxShardsPerNode":"1",
> "router":{"name":"compositeId"},
> "replicationFactor":"1",
> "autoAddReplicas":"false",
> "autoCreated":"true"}}
> {code}
> One immediate problem I can see is that AutoAddReplicas doesn't work since 
> the clusterstate.json never changes. There might be more features which are 
> affected by this.
> On first thought I think we can handle this - The shard leader could listen 
> to changes on /live_nodes and if it has replicas that were on that node, mark 
> it as 'down' in the clusterstate.json?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org