[ https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980053#comment-13980053 ]
Timothy Potter commented on SOLR-5473: -------------------------------------- Thought I'd add my 2 cents on this one as I've worked on some of this code and want to get a better sense of how to move forward. Reverting and moving out to a branch sounds like a good idea. In general, I think it would be good to split the discussion about this topic into 3 sections: 1) overall design / architecture, 2) implementation and impact on public API, 3) testing. Moving forward we should start with identifying where we have common ground in these areas and which aspects are more controversial and need more hashing out between us. Here's what I think I know but please correct where I'm off-base: 1) Overall Design / Architecture It sounds like we're all on-board with splitting cluster state into a per-collection state znode. Do we intend to support both formats or do we intend to just migrate to the split approach? I think the answer is the latter, that going forward, SolrCloud will keep state in a separate znode per collection. Noble's idea is that once the state is split, then cores only need to watch the znode for the collection/shard it's linked to. In other words, each SolrCore watches a specific state znode and thus does not receive any state change updates for other collections. In terms of what's watched and what is not watched, this patch includes code from 5474 (as they were too intimately tied together to keep separated) which doesn't watch collection state changes on the client side. Instead the client relies on a _stateVer_ check during request processing and receives an error from the server if the client state is stale. I too think this is a little controversial / confusing and maybe we don't have to keep that as part of this solution. It was our mistake to merge those two into a single patch. We originally were thinking 5474 was needed to keep the number of watchers on a znode to a minimum in the event of many clients using many collections. However, I do think this feature can be split out and dealt with in a better way, if at all. In other words, split state znodes are watched from server and client side. Are there any other things design / architecture wise that are controversial? 2) Implementation (and API impact) This seems like the biggest area of contention right now. The main issue is that the API changes still give the impression of two state tracking formats, whereas we really only want one format. The common ground here is that there should be no mention of "external" in any public method or state format for that matter, right? Noble: Assuming we're moving forward with stateFormat == 2 and the unified /clusterstate.json is going away, is it possible to not change any of the existing public methods? In other words, we're changing the internals of where state is kept, so why does that have to impact the public API? If not, let's come up with a plan for each change and how we can minimize impact of this. It seems to me that we need to be more diligent about API impacts of this change and focus on not breaking the public view of cluster state as much as possible. It would be helpful to have a bullet list of API impacts that are needed for this so we don't have to scour the patch looking for all possible changes. 3) Testing I just wanted to mention that we've been doing a fair amount of integration testing with 100's of "external" collections per cluster. So I realize this is a big change but we have been testing this extensively in our QA labs. I only mention this so that others know that have been concentrating on hardening this feature over the past couple of months. Once we sort out the API problems, I'm confident that this approach will be solid. To recap, I see a lot of common ground here and to move forward, we need to move this out to a branch and off trunk where we'll focus on cleaning up the API impacts of this work, support only the split format going forward (with a migration plan for existing installations). We also want to revisit the thinking behind not watching state changes on the client as that wasn't clear in the patch to this point. > Make one state.json per collection > ---------------------------------- > > Key: SOLR-5473 > URL: https://issues.apache.org/jira/browse/SOLR-5473 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud > Reporter: Noble Paul > Assignee: Noble Paul > Fix For: 5.0 > > Attachments: SOLR-5473-74.patch, SOLR-5473-74.patch, > SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, > SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, > SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, > SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, > SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, > SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, > SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, > SOLR-5473-74.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, > SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, > SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, > SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, > SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, ec2-23-20-119-52_solr.log, > ec2-50-16-38-73_solr.log > > > As defined in the parent issue, store the states of each collection under > /collections/collectionname/state.json node -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org