[ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980053#comment-13980053
 ] 

Timothy Potter commented on SOLR-5473:
--------------------------------------

Thought I'd add my 2 cents on this one as I've worked on some of this code and 
want to get a better sense of how to move forward. Reverting and moving out to 
a branch sounds like a good idea.

In general, I think it would be good to split the discussion about this topic 
into 3 sections: 1) overall design / architecture, 2) implementation and impact 
on public API, 3) testing. Moving forward we should start with identifying 
where we have common ground in these areas and which aspects are more 
controversial and need more hashing out between us. 

Here's what I think I know but please correct where I'm off-base:

1) Overall Design / Architecture

It sounds like we're all on-board with splitting cluster state into a 
per-collection state znode. Do we intend to support both formats or do we 
intend to just migrate to the split approach? I think the answer is the latter, 
that going forward, SolrCloud will keep state in a separate znode per 
collection.

Noble's idea is that once the state is split, then cores only need to watch the 
znode for the collection/shard it's linked to. In other words, each SolrCore 
watches a specific state znode and thus does not receive any state change 
updates for other collections.

In terms of what's watched and what is not watched, this patch includes code 
from 5474 (as they were too intimately tied together to keep separated) which 
doesn't watch collection state changes on the client side. Instead the client 
relies on a _stateVer_ check during request processing and receives an error 
from the server if the client state is stale. I too think this is a little 
controversial / confusing and maybe we don't have to keep that as part of this 
solution. It was our mistake to merge those two into a single patch. We 
originally were thinking 5474 was needed to keep the number of watchers on a 
znode to a minimum in the event of many clients using many collections. 
However, I do think this feature can be split out and dealt with in a better 
way, if at all. In other words, split state znodes are watched from server and 
client side. 

Are there any other things design / architecture wise that are controversial?

2) Implementation (and API impact)

This seems like the biggest area of contention right now. The main issue is 
that the API changes still give the impression of two state tracking formats, 
whereas we really only want one format.

The common ground here is that there should be no mention of "external" in any 
public method or state format for that matter, right?

Noble: Assuming we're moving forward with stateFormat == 2 and the unified 
/clusterstate.json is going away, is it possible to not change any of the 
existing public methods? In other words, we're changing the internals of where 
state is kept, so why does that have to impact the public API? If not, let's 
come up with a plan for each change and how we can minimize impact of this. It 
seems to me that we need to be more diligent about API impacts of this change 
and focus on not breaking the public view of cluster state as much as possible. 
It would be helpful to have a bullet list of API impacts that are needed for 
this so we don't have to scour the patch looking for all possible changes.

3) Testing

I just wanted to mention that we've been doing a fair amount of integration 
testing with 100's of "external" collections per cluster. So I realize this is 
a big change but we have been testing this extensively in our QA labs. I only 
mention this so that others know that have been concentrating on hardening this 
feature over the past couple of months. Once we sort out the API problems, I'm 
confident that this approach will be solid.

To recap, I see a lot of common ground here and to move forward, we need to 
move this out to a branch and off trunk where we'll focus on cleaning up the 
API impacts of this work, support only the split format going forward (with a 
migration plan for existing installations). We also want to revisit the 
thinking behind not watching state changes on the client as that wasn't clear 
in the patch to this point.



> Make one state.json per collection
> ----------------------------------
>
>                 Key: SOLR-5473
>                 URL: https://issues.apache.org/jira/browse/SOLR-5473
>             Project: Solr
>          Issue Type: Sub-task
>          Components: SolrCloud
>            Reporter: Noble Paul
>            Assignee: Noble Paul
>             Fix For: 5.0
>
>         Attachments: SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, ec2-23-20-119-52_solr.log, 
> ec2-50-16-38-73_solr.log
>
>
> As defined in the parent issue, store the states of each collection under 
> /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to