[jira] [Comment Edited] (SOLR-5473) Make one state.json per collection

Noble Paul (JIRA) Sun, 22 Jun 2014 11:22:48 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039800#comment-14039800
 ]


Noble Paul edited comment on SOLR-5473 at 6/22/14 6:21 PM:
-----------------------------------------------------------

Patch updated to trunk. Incorporating most of the comments
# All external references are eliminated from the APIs
# the node is given a suffix as /state.json instead of "/state"
# removed the redundant attribute externla/stateVersion from the state object. 
The version is automatically derived from the znode from which the object is 
read
# Thread-safety issues addressed
# Added javadocs

(and many more other subtle cleanups)

The comments which are not addressed are
# The selective watching of collection nodes by solr nodes.  There are ony 3 
choices when it comes to watching states
#* Watch all nodes : this will would be equivalent or worse than the current 
clusterstate.json solution. All nodes will be notified of each state change 
(multiple times, one per collection where it is a member of )
#* Watch none. Just fetch the state data just in time (will kil the ZK) or 
cache , means the node will not have an updated state to make the right 
decision at the right time
#* Watch selectively. This is the approach we have taken here 
# maintaining the zkStateReader reference in clusterstate. Agreed that is not 
elegant. The ideal solution would be to completely get rid of ClusterState.java 
because that node is going to go away. and we will only hava ZkStateReader and 
DocCollection and nothing in between. The problem is we have clusterstate.json 
now and it will exist there for a at least a couple of releases . So , I am 
torn between the choices and I decided to go with the not so elegant choice of 
ClusterState keeping a reference to ZkStatereade , so that all APIs work fine . 
My suggestion is to eliminate CLusterState.java when we deprecate the old format
# The ephemeralCollectionData data in ZkStateReader. This is again not so 
elegant. This one is simple and performant and have minimal impact of the 
code.I'm happy to hear any other simpler ideas to make it better. 

We have done extensive testing on this patch internally with very large 
clusters (120+ nodes ) and very large non:of collections (1000+ of 
collections). The solr-5473 branch already has this code committed . 

If there are no objections I plan to commit this fairly soon 




was (Author: noble.paul):
Patch updated to trunk. Incorporating most of the comments
# All external references are eliminated from the APIs
# the node is given a suffix as /state.json instead of "/state"
# removed the redundant attribute externla/stateVersion from the state object. 
The version is automatically derived from the znode from which the object is 
read
# Thread-safety issues addressed
# Added javadocs

(and many more other subtle cleanups)

The comments which are not addressed are
# The selective watching of collection nodes by solr nodes.  There are ony 3 
choices when it comes to watching states
#* Watch all nodes : this will would be equivalent or worse than the current 
clusterstate.json solution. All nodes will be notified of each state change 
(multiple times, one per collection where it is a member of )
#* Watch none. Just fetch the state data just in time (will kil the ZK) or 
cache , means the node will not have an updated state to make the right 
decision at the right time
#* Watch selectively. This is the approach we have taken here 
# maintaining the zkStateReader reference in clusterstate. Agreed that is not 
elegant. The ideal solution would be to completely get rid of ClusterState.java 
because that node is going to go away. and we will only hava ZkStateReader and 
DocCollection and nothing in between. The problem is we have clusterstate.json 
now and it will exist there for a at least a couple of releases . So , I am 
torn between the choices and I decided to go with the not so elegant choice of 
ClusterState keeping a reference to ZkStatereade , so that all APIs work fine . 
My suggestion is to eliminate CLusterState.java when we deprecate the old format
# The ephemeralCollectionData data in ZkStateReader. This is again not so 
elegant. This one is simple and performant and have minimal impact of the 
code.I'm happy to hear any other simpler ideas to make it better. 

We have done extensive testing on this patch internally with very large 
clusters (120+ nodes ) and very large non:of collections (100s of collections). 
The solr-5473 branch already has this code committed . 

If there are no objections I plan to commit this fairly soon 



> Make one state.json per collection
> ----------------------------------
>
>                 Key: SOLR-5473
>                 URL: https://issues.apache.org/jira/browse/SOLR-5473
>             Project: Solr
>          Issue Type: Sub-task
>          Components: SolrCloud
>            Reporter: Noble Paul
>            Assignee: Noble Paul
>             Fix For: 5.0
>
>         Attachments: SOLR-5473-74 .patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-configname-fix.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473_undo.patch, ec2-23-20-119-52_solr.log, ec2-50-16-38-73_solr.log
>
>
> As defined in the parent issue, store the states of each collection under 
> /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-5473) Make one state.json per collection

Reply via email to