[ 
https://issues.apache.org/jira/browse/SOLR-12050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382930#comment-16382930
 ] 

Hoss Man commented on SOLR-12050:
---------------------------------


I've attached a sample log file from running this test after my assert/logging 
updates, if you look for the new logging messages, it's pretty easy to see that 
while the 2nd UTILIZE command is causing a replica to be moved onto the new 
node (jettyY), it seems to be completley ignoring the fact that there is a core 
hosted on a "blacklist" (per the policy) port (jettyX) that should be the first 
candidate for being moved...

{noformat}
  // in this particular run, the first UTILIZENODE command works,
  // it moves a replica off a random node to jettyX/34444
  //
  // (allthough see TODO in test -- based on how the docs are worded,
  // it's not clear if there's any requirement that it do so)
  
9201 INFO  (TEST-TestUtilizeNode.test-seed#[78A4DE08FC5237FE]) [    ] 
o.a.s.c.TestUtilizeNode Sending UTILIZE command for jettyX 
(127.0.0.1:34444_solr)
9204 INFO  (qtp1498399719-45) [n:127.0.0.1:33567_solr    ] 
o.a.s.h.a.CollectionsHandler Invoked Collection Action :utilizenode with params 
node=127.0.0.1:34444_solr&action=UTILIZENODE&wt=javabin&version=2 and 
sendToOCPQueue=true
  ...
9355 INFO  
(OverseerThreadFactory-20-thread-3-processing-n:127.0.0.1:46180_solr) 
[n:127.0.0.1:46180_solr    ] o.a.s.c.a.c.MoveReplicaCmd Replica will be moved 
to node 127.0.0.1:34444_solr: 
core_node8:{"core":"utilizenodecoll_shard2_replica_n7","base_url":"http://127.0.0.1:33567/solr","node_name":"127.0.0.1:33567_solr","state":"active","type":"NRT"}
9361 INFO  
(OverseerThreadFactory-20-thread-3-processing-n:127.0.0.1:46180_solr) 
[n:127.0.0.1:46180_solr    ] o.a.s.c.a.c.AddReplicaCmd Node Identified 
127.0.0.1:34444_solr for creating new replica
  ...
10078 INFO  (qtp1498399719-45) [n:127.0.0.1:33567_solr    ] 
o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/collections 
params={node=127.0.0.1:34444_solr&action=UTILIZENODE&wt=javabin&version=2} 
status=0 QTime=874
  
  // next up, sanity check which replicas jettyX/34444 now has,
  // then set a new policy saying that port 34444 should have 0 replicas...

10079 INFO  (TEST-TestUtilizeNode.test-seed#[78A4DE08FC5237FE]) [    ] 
o.a.s.c.TestUtilizeNode jettyX replicas prior to being blacklisted: 
[core_node10:{"core":"utilizenodecoll_shard2_replica_n9","base_url":"http://127.0.0.1:34444/solr","node_name":"127.0.0.1:34444_solr","state":"recovering","type":"NRT"}]
10079 INFO  (TEST-TestUtilizeNode.test-seed#[78A4DE08FC5237FE]) [    ] 
o.a.s.c.TestUtilizeNode Setting new policy to blacklist jettyX 
(127.0.0.1:34444_solr) port=34444
  ...
10143 INFO  (qtp1498399719-27) [n:127.0.0.1:33567_solr    ] 
o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/autoscaling 
params={wt=javabin&version=2} status=0 QTime=59

  // now spin up another new node: jettyY/55619, 
  // redundently sanity check the replicas on jettyX again,

10144 INFO  (TEST-TestUtilizeNode.test-seed#[78A4DE08FC5237FE]) [    ] 
o.a.s.c.TestUtilizeNode Spinning up additional jettyY...
  ...
10361 INFO  (zkConnectionManagerCallback-78-thread-1) [    ] 
o.a.s.c.c.ConnectionManager zkClient has connected
10365 INFO  (TEST-TestUtilizeNode.test-seed#[78A4DE08FC5237FE]) [    ] 
o.a.s.c.TestUtilizeNode jettyX replicas prior to utilizing jettyY: 
[core_node10:{"core":"utilizenodecoll_shard2_replica_n9","base_url":"http://127.0.0.1:34444/solr","node_name":"127.0.0.1:34444_solr","state":"recovering","type":"NRT"}]

  // Now send a UTILIZENODE command for jettyY/55619,
  // this *should* move the replica from jettyX->jettyY
  // (in order to resolve the existing policy violation)

10365 INFO  (TEST-TestUtilizeNode.test-seed#[78A4DE08FC5237FE]) [    ] 
o.a.s.c.TestUtilizeNode Sending UTILIZE command for jettyY 
(127.0.0.1:55619_solr)
10366 INFO  (qtp1498399719-45) [n:127.0.0.1:33567_solr    ] 
o.a.s.h.a.CollectionsHandler Invoked Collection Action :utilizenode with params 
node=127.0.0.1:55619_solr&action=UTILIZENODE&wt=javabin&version=2 and 
sendToOCPQueue=true
  ...
10448 INFO  
(OverseerThreadFactory-20-thread-4-processing-n:127.0.0.1:46180_solr) 
[n:127.0.0.1:46180_solr    ] o.a.s.c.a.c.MoveReplicaCmd Replica will be moved 
to node 127.0.0.1:55619_solr: 
core_node6:{"core":"utilizenodecoll_shard2_replica_n5","base_url":"http://127.0.0.1:46180/solr","node_name":"127.0.0.1:46180_solr","state":"active","type":"NRT","leader":"true"}
10450 INFO  
(OverseerThreadFactory-20-thread-4-processing-n:127.0.0.1:46180_solr) 
[n:127.0.0.1:46180_solr    ] o.a.s.c.a.c.AddReplicaCmd Node Identified 
127.0.0.1:55619_solr for creating new replica
  ...
12710 INFO  (qtp1498399719-45) [n:127.0.0.1:33567_solr    ] 
o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/collections 
params={node=127.0.0.1:55619_solr&action=UTILIZENODE&wt=javabin&version=2} 
status=0 QTime=2343

  // but as you can see above, the replica that's added to jettyY/55619
  // comes from a completley different node on port 46180

{noformat}



> UTILIZENODE does not enforce policy rules
> -----------------------------------------
>
>                 Key: SOLR-12050
>                 URL: https://issues.apache.org/jira/browse/SOLR-12050
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Priority: Major
>         Attachments: SOLR-12050.log.txt
>
>
> I've been poking around TestUtilizeNode and some of it's recent jenkins 
> failures -- AFAICT the {{UTILIZENODE}} is not behaving correctly per it's 
> current documentation...
> bq. It tries to fix any policy violations first and then it tries to move 
> some load off of the most loaded nodes according to the preferences
> ...based on my testing w/a slightly modified testcase that does additional 
> logging/asserts, it will frequently choose to move a "random" replica to 
> move, even when there are existing replicas that violate the policy.
> I will be commiting my current improvements to the test while citing this 
> issue, and marking the test \@AwaitsFix  Then i'll attach some logs/comments 
> showing what i mean.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to