[ 
https://issues.apache.org/jira/browse/SOLR-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493888#comment-13493888
 ] 

Per Steffensen commented on SOLR-3785:
--------------------------------------

Well, I believe the entire thing with the Overseer is a bad idea. It requires 
at least one Solr is running before you can trust the state-descriptions in ZK 
- even if this particular "issue" SOLR-3785 is solved using Overseer. We have 
clients that uses the state-descriptions (through 
CloudSolrServer/ZkStateReader) to detect if the Solr cluster is running well 
enough to use it. If all Solrs are down I believe it cannot be seen from the 
state (you can check live-nodes, and if no Solrs are running you know that you 
cant trust it).

I think you should remove the Overseer entirely and modify ZkStateReader to be 
able to, single-handedly, look at the ZK state and calculate correct 
ClusterState. E.g. shard-state could be maintained by the Solr running the 
shard (as it is today), but as an ephemeral node that disappears when the Solr 
is not running. ZkStateReader should have logic that, when calculating a 
shard-state, looks at this ephemeral node, but if it is missing assumes 
"down"-state.

Regards, Per Steffensen
                
> Cluster-state inconsistent
> --------------------------
>
>                 Key: SOLR-3785
>                 URL: https://issues.apache.org/jira/browse/SOLR-3785
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.0
>         Environment: Self-build Solr release built on Apache Solr revision 
> 1355667 from 4.x branch
>            Reporter: Per Steffensen
>         Attachments: SOLR-3785.patch
>
>
> Information in CloudSolrServer.getZkStateReader().getCloudState() (called 
> cloudState below) seems to be inconsistent. 
> I have a Solr running the leader of slice "sliceName" in collection 
> "collectionName" - no replica to take over. I shut down this Solr, and I want 
> to detect that there is now no leader active. 
> I do e.g.
> {code}
> ZkNodeProps leader = cloudState.getLeader(indexName, sliceName);
> boolean notActive = (leader == null) || 
> !leader.containsKey(ZkStateReader.STATE_PROP) || 
> !leader.get(ZkStateReader.STATE_PROP).equals(ZkStateReader.ACTIVE);
> {code}
> This does not work. It seems like changing state of a shard does it not 
> changed when this Solr goes down.
> I do e.g.
> {code}
> ZkNodeProps leader = cloudState.getLeader(indexName, sliceName);
> boolean notActive = (leader == null) || 
> !leader.containsKey(ZkStateReader.STATE_PROP) || 
> !leader.get(ZkStateReader.STATE_PROP).equals(ZkStateReader.ACTIVE) ||
> !leader.containsKey(ZkStateReader.NODE_NAME_PROP) || 
> !cloudState.getLiveNodes().contains(leader.get(ZkStateReader.NODE_NAME_PROP))
> {code}
> Whis works.
> It seems like live-nodes of cloudState is updated when Solr goes down, but 
> that some of the other info available through cloudState is not - e.g. 
> getLeader().
> This might already have already been solved on 4.x branch in a revision later 
> than 1355667. Then please just tell me - thanks.
> Regards, Per Steffensen

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to