[
https://issues.apache.org/jira/browse/SOLR-8907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212498#comment-15212498
]
Hoss Man commented on SOLR-8907:
--------------------------------
The motivation for creating this issue came out of a situation i noticed while
working on SOLR-445.
The goal was to test that updates were working reliably regardless of if what
node they were routed to.
The test, in a nutshell, looked like this...
{code}
// tests setup...
cluster.createCollection(...);
CLOUD_CLIENT = cluster.getSolrClient();
NODE_CLIENTS = new ArrayList<SolrClient>(numServers);
for (JettySolrRunner jetty : cluster.getJettySolrRunners()) {
URL jettyURL = jetty.getBaseUrl();
NODE_CLIENTS.add(new HttpSolrClient(jettyURL.toString() + "/" +
COLLECTION_NAME + "/"));
}
// in a loop...
SolrRequest req = makeRandomUpdateRequest(random());
SolrClient client = random().nextBoolean() ? CLOUD_CLIENT
: NODE_CLIENTS.get(TestUtil.nextInt(random(), 0, NODE_CLIENTS.size()-1));
}
assertSomeStuffAboutResponse(req.process(client));
{code}
There was a bug in the code such that in some specific situations (based on the
output of {{makeRandomUpdateRequest(...)}}) updates meeting certain criteria
would fail _unless_ they were sent to the leader of a particular shard
(particular because it was the leader for all the Ids generated by
{{makeRandomUpdateRequest(...)}} in that particular loop iteration)
This meant that there were particular seeds that _most of the time_ would
reliably reproduce, but roughly every {{1 / numServer}} number of attempts, the
leader for the particular shard in question would randomly be assigned to the
jetty instance whose httpSolrClient was randomly (but consistently for this
seed) being selected at this point.
That made the test far more confusing to try and debug then if the leaders for
the shards were being consistently assigned to the same jetty nodes (relative
to their ordering in the list returned by {{cluster.getJettySolrRunners()}})
... like how older, pre-cloud, distributed update tests use to work.
In short: given a fixed seed, the test code was doing everything in it's power
to be 100% consistent w/ the requests it generated and the jetty nodes those
requests were sent to -- but the test still wasn't very reproducible because of
the shard & leader assignments were random.
----
I suspect that the best way to try and implement something like this would be
to use [rule based replica
placement|https://cwiki.apache.org/confluence/display/solr/Rule-based+Replica+Placement]
feature -- perhaps with a special "Snitch" designed for use in
MiniSolrCloudCluster tests? ... But i'm not really sure how it would work
because i don't really understand how to use / extend that feature.
So assuming for the sake of argument that it's not possible using the rule
based placement stuff, here's a description of the approach that initially
ocured to me to serve as a straw man for discussion...
* If it's not already, {{MiniSolrCloudCluster}} should ensure every Jetty
instance is started up with a consistent node name (sequentially numbered or
whatever)
* If it's not already, {{MiniSolrCloudCluster.getJettySolrRunners()}} should
return the jetty instances in a consistently sorted order (based on something
like node name -- not something non-deterministic like the port#, or order that
they started up)
* {{MiniSolrCloudCluster.createCollection(...)}} (or some new method with a
similar signature) should be changed to more explicitly do a lot of work
currently done implicitly by the {{CREATE}} API call...
** use the {{shards}} param to provide explicitly generated names for every
shard
** use the {{createNodeSet=EMPTY}} param
** Once the collection is created (w/o any replicas)...
*** {{ADDREPLICA}} and {{ADDREPLICAPROP}} should be used explicitly to create a
preferedLeader for each (named) {{shard}} and assign it to a predictably chosen
{{node}} (by name).
*** Additional {{ADDREPLICA}} calls should then be made as needed to add the
expected number of replicas for each {{shard}} on predictably chosen {{node}}s
(by name).
* {{MiniSolrCloudCluster}} could then support some new convenience methods for
tests to use:
** Things like...
*** {{List<HttpSolrClient> getClientsForAllReplicas(String collectionName)}}
*** {{List<HttpSolrClient> getClientsForShard(String collectionName, String
shardName)}}
*** {{SortedMap<String,HttpSolrClient> getClientsForLeaders(String
collectionName) // keyed by shardName}}
*** {{HttpSolrClient getClientForLeader(String collectionName, String
shardName)}}
** These methods should do a "live" lookup of the data current in ZK, so that
even if a test shuts down nodes, or adds replicas, or triggers some bit of
chaos they can still subsequently lookup a useful SolrClient to test some
action with
** Obviously these methods should return all clients in a consistent order (ie:
sort by core node name)
** (See {{TestTolerantUpdateProcessorCloud.createMiniSolrCloudCluster()}} for
some sample code of building up SolrClients targeting shard leaders)
...what do folks think?
is this possible/easy using a custom "snitch" ?
> add features to MiniSolrCloudCluster to make shard/leader/replica placement
> more reproducible
> ---------------------------------------------------------------------------------------------
>
> Key: SOLR-8907
> URL: https://issues.apache.org/jira/browse/SOLR-8907
> Project: Solr
> Issue Type: Improvement
> Reporter: Hoss Man
>
> I think MiniSolrCloudCluster would be greatly improved if (by default)
> collections created for test purposes had predictable shard/leader/core
> assignment across the jetty instances that are spun up. Even though the
> port#s used by the jettys will obviously vary every time a test is run,
> ideally a given seed should ensure that the following are all consistent:
> * the node_name used by each JettySolrRunner
> * which nodes host which shards
> * the core names use on each jetty instance
> * which core is the leader for each shard
> Obviously this wouldn't make sense for tests where the entire purpose is to
> ensure that the automatic assignment of these things works properly when
> creating a collection, or when explicitly testing things like
> "preferedLeader", but for tests of non-collection API related features (ie:
> update requests, search requests, sorting, etc...) where the test setup
> already takes advantage of methods like
> {{MiniSolrCloudCluster.createCollection(...)}} as a short cut to using the
> API directly, this type of consistency would make potential test failures a
> lot more reproducible && easier to diagnose.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]