[
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980053#comment-13980053
]
Timothy Potter commented on SOLR-5473:
--------------------------------------
Thought I'd add my 2 cents on this one as I've worked on some of this code and
want to get a better sense of how to move forward. Reverting and moving out to
a branch sounds like a good idea.
In general, I think it would be good to split the discussion about this topic
into 3 sections: 1) overall design / architecture, 2) implementation and impact
on public API, 3) testing. Moving forward we should start with identifying
where we have common ground in these areas and which aspects are more
controversial and need more hashing out between us.
Here's what I think I know but please correct where I'm off-base:
1) Overall Design / Architecture
It sounds like we're all on-board with splitting cluster state into a
per-collection state znode. Do we intend to support both formats or do we
intend to just migrate to the split approach? I think the answer is the latter,
that going forward, SolrCloud will keep state in a separate znode per
collection.
Noble's idea is that once the state is split, then cores only need to watch the
znode for the collection/shard it's linked to. In other words, each SolrCore
watches a specific state znode and thus does not receive any state change
updates for other collections.
In terms of what's watched and what is not watched, this patch includes code
from 5474 (as they were too intimately tied together to keep separated) which
doesn't watch collection state changes on the client side. Instead the client
relies on a _stateVer_ check during request processing and receives an error
from the server if the client state is stale. I too think this is a little
controversial / confusing and maybe we don't have to keep that as part of this
solution. It was our mistake to merge those two into a single patch. We
originally were thinking 5474 was needed to keep the number of watchers on a
znode to a minimum in the event of many clients using many collections.
However, I do think this feature can be split out and dealt with in a better
way, if at all. In other words, split state znodes are watched from server and
client side.
Are there any other things design / architecture wise that are controversial?
2) Implementation (and API impact)
This seems like the biggest area of contention right now. The main issue is
that the API changes still give the impression of two state tracking formats,
whereas we really only want one format.
The common ground here is that there should be no mention of "external" in any
public method or state format for that matter, right?
Noble: Assuming we're moving forward with stateFormat == 2 and the unified
/clusterstate.json is going away, is it possible to not change any of the
existing public methods? In other words, we're changing the internals of where
state is kept, so why does that have to impact the public API? If not, let's
come up with a plan for each change and how we can minimize impact of this. It
seems to me that we need to be more diligent about API impacts of this change
and focus on not breaking the public view of cluster state as much as possible.
It would be helpful to have a bullet list of API impacts that are needed for
this so we don't have to scour the patch looking for all possible changes.
3) Testing
I just wanted to mention that we've been doing a fair amount of integration
testing with 100's of "external" collections per cluster. So I realize this is
a big change but we have been testing this extensively in our QA labs. I only
mention this so that others know that have been concentrating on hardening this
feature over the past couple of months. Once we sort out the API problems, I'm
confident that this approach will be solid.
To recap, I see a lot of common ground here and to move forward, we need to
move this out to a branch and off trunk where we'll focus on cleaning up the
API impacts of this work, support only the split format going forward (with a
migration plan for existing installations). We also want to revisit the
thinking behind not watching state changes on the client as that wasn't clear
in the patch to this point.
> Make one state.json per collection
> ----------------------------------
>
> Key: SOLR-5473
> URL: https://issues.apache.org/jira/browse/SOLR-5473
> Project: Solr
> Issue Type: Sub-task
> Components: SolrCloud
> Reporter: Noble Paul
> Assignee: Noble Paul
> Fix For: 5.0
>
> Attachments: SOLR-5473-74.patch, SOLR-5473-74.patch,
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch,
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch,
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch,
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch,
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch,
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch,
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch,
> SOLR-5473-74.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch,
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch,
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch,
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch,
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, ec2-23-20-119-52_solr.log,
> ec2-50-16-38-73_solr.log
>
>
> As defined in the parent issue, store the states of each collection under
> /collections/collectionname/state.json node
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]