[
https://issues.apache.org/jira/browse/SOLR-12050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382930#comment-16382930
]
Hoss Man commented on SOLR-12050:
---------------------------------
I've attached a sample log file from running this test after my assert/logging
updates, if you look for the new logging messages, it's pretty easy to see that
while the 2nd UTILIZE command is causing a replica to be moved onto the new
node (jettyY), it seems to be completley ignoring the fact that there is a core
hosted on a "blacklist" (per the policy) port (jettyX) that should be the first
candidate for being moved...
{noformat}
// in this particular run, the first UTILIZENODE command works,
// it moves a replica off a random node to jettyX/34444
//
// (allthough see TODO in test -- based on how the docs are worded,
// it's not clear if there's any requirement that it do so)
9201 INFO (TEST-TestUtilizeNode.test-seed#[78A4DE08FC5237FE]) [ ]
o.a.s.c.TestUtilizeNode Sending UTILIZE command for jettyX
(127.0.0.1:34444_solr)
9204 INFO (qtp1498399719-45) [n:127.0.0.1:33567_solr ]
o.a.s.h.a.CollectionsHandler Invoked Collection Action :utilizenode with params
node=127.0.0.1:34444_solr&action=UTILIZENODE&wt=javabin&version=2 and
sendToOCPQueue=true
...
9355 INFO
(OverseerThreadFactory-20-thread-3-processing-n:127.0.0.1:46180_solr)
[n:127.0.0.1:46180_solr ] o.a.s.c.a.c.MoveReplicaCmd Replica will be moved
to node 127.0.0.1:34444_solr:
core_node8:{"core":"utilizenodecoll_shard2_replica_n7","base_url":"http://127.0.0.1:33567/solr","node_name":"127.0.0.1:33567_solr","state":"active","type":"NRT"}
9361 INFO
(OverseerThreadFactory-20-thread-3-processing-n:127.0.0.1:46180_solr)
[n:127.0.0.1:46180_solr ] o.a.s.c.a.c.AddReplicaCmd Node Identified
127.0.0.1:34444_solr for creating new replica
...
10078 INFO (qtp1498399719-45) [n:127.0.0.1:33567_solr ]
o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/collections
params={node=127.0.0.1:34444_solr&action=UTILIZENODE&wt=javabin&version=2}
status=0 QTime=874
// next up, sanity check which replicas jettyX/34444 now has,
// then set a new policy saying that port 34444 should have 0 replicas...
10079 INFO (TEST-TestUtilizeNode.test-seed#[78A4DE08FC5237FE]) [ ]
o.a.s.c.TestUtilizeNode jettyX replicas prior to being blacklisted:
[core_node10:{"core":"utilizenodecoll_shard2_replica_n9","base_url":"http://127.0.0.1:34444/solr","node_name":"127.0.0.1:34444_solr","state":"recovering","type":"NRT"}]
10079 INFO (TEST-TestUtilizeNode.test-seed#[78A4DE08FC5237FE]) [ ]
o.a.s.c.TestUtilizeNode Setting new policy to blacklist jettyX
(127.0.0.1:34444_solr) port=34444
...
10143 INFO (qtp1498399719-27) [n:127.0.0.1:33567_solr ]
o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/autoscaling
params={wt=javabin&version=2} status=0 QTime=59
// now spin up another new node: jettyY/55619,
// redundently sanity check the replicas on jettyX again,
10144 INFO (TEST-TestUtilizeNode.test-seed#[78A4DE08FC5237FE]) [ ]
o.a.s.c.TestUtilizeNode Spinning up additional jettyY...
...
10361 INFO (zkConnectionManagerCallback-78-thread-1) [ ]
o.a.s.c.c.ConnectionManager zkClient has connected
10365 INFO (TEST-TestUtilizeNode.test-seed#[78A4DE08FC5237FE]) [ ]
o.a.s.c.TestUtilizeNode jettyX replicas prior to utilizing jettyY:
[core_node10:{"core":"utilizenodecoll_shard2_replica_n9","base_url":"http://127.0.0.1:34444/solr","node_name":"127.0.0.1:34444_solr","state":"recovering","type":"NRT"}]
// Now send a UTILIZENODE command for jettyY/55619,
// this *should* move the replica from jettyX->jettyY
// (in order to resolve the existing policy violation)
10365 INFO (TEST-TestUtilizeNode.test-seed#[78A4DE08FC5237FE]) [ ]
o.a.s.c.TestUtilizeNode Sending UTILIZE command for jettyY
(127.0.0.1:55619_solr)
10366 INFO (qtp1498399719-45) [n:127.0.0.1:33567_solr ]
o.a.s.h.a.CollectionsHandler Invoked Collection Action :utilizenode with params
node=127.0.0.1:55619_solr&action=UTILIZENODE&wt=javabin&version=2 and
sendToOCPQueue=true
...
10448 INFO
(OverseerThreadFactory-20-thread-4-processing-n:127.0.0.1:46180_solr)
[n:127.0.0.1:46180_solr ] o.a.s.c.a.c.MoveReplicaCmd Replica will be moved
to node 127.0.0.1:55619_solr:
core_node6:{"core":"utilizenodecoll_shard2_replica_n5","base_url":"http://127.0.0.1:46180/solr","node_name":"127.0.0.1:46180_solr","state":"active","type":"NRT","leader":"true"}
10450 INFO
(OverseerThreadFactory-20-thread-4-processing-n:127.0.0.1:46180_solr)
[n:127.0.0.1:46180_solr ] o.a.s.c.a.c.AddReplicaCmd Node Identified
127.0.0.1:55619_solr for creating new replica
...
12710 INFO (qtp1498399719-45) [n:127.0.0.1:33567_solr ]
o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/collections
params={node=127.0.0.1:55619_solr&action=UTILIZENODE&wt=javabin&version=2}
status=0 QTime=2343
// but as you can see above, the replica that's added to jettyY/55619
// comes from a completley different node on port 46180
{noformat}
> UTILIZENODE does not enforce policy rules
> -----------------------------------------
>
> Key: SOLR-12050
> URL: https://issues.apache.org/jira/browse/SOLR-12050
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Hoss Man
> Priority: Major
> Attachments: SOLR-12050.log.txt
>
>
> I've been poking around TestUtilizeNode and some of it's recent jenkins
> failures -- AFAICT the {{UTILIZENODE}} is not behaving correctly per it's
> current documentation...
> bq. It tries to fix any policy violations first and then it tries to move
> some load off of the most loaded nodes according to the preferences
> ...based on my testing w/a slightly modified testcase that does additional
> logging/asserts, it will frequently choose to move a "random" replica to
> move, even when there are existing replicas that violate the policy.
> I will be commiting my current improvements to the test while citing this
> issue, and marking the test \@AwaitsFix Then i'll attach some logs/comments
> showing what i mean.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]