[ https://issues.apache.org/jira/browse/SOLR-12050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382930#comment-16382930 ]
Hoss Man commented on SOLR-12050: --------------------------------- I've attached a sample log file from running this test after my assert/logging updates, if you look for the new logging messages, it's pretty easy to see that while the 2nd UTILIZE command is causing a replica to be moved onto the new node (jettyY), it seems to be completley ignoring the fact that there is a core hosted on a "blacklist" (per the policy) port (jettyX) that should be the first candidate for being moved... {noformat} // in this particular run, the first UTILIZENODE command works, // it moves a replica off a random node to jettyX/34444 // // (allthough see TODO in test -- based on how the docs are worded, // it's not clear if there's any requirement that it do so) 9201 INFO (TEST-TestUtilizeNode.test-seed#[78A4DE08FC5237FE]) [ ] o.a.s.c.TestUtilizeNode Sending UTILIZE command for jettyX (127.0.0.1:34444_solr) 9204 INFO (qtp1498399719-45) [n:127.0.0.1:33567_solr ] o.a.s.h.a.CollectionsHandler Invoked Collection Action :utilizenode with params node=127.0.0.1:34444_solr&action=UTILIZENODE&wt=javabin&version=2 and sendToOCPQueue=true ... 9355 INFO (OverseerThreadFactory-20-thread-3-processing-n:127.0.0.1:46180_solr) [n:127.0.0.1:46180_solr ] o.a.s.c.a.c.MoveReplicaCmd Replica will be moved to node 127.0.0.1:34444_solr: core_node8:{"core":"utilizenodecoll_shard2_replica_n7","base_url":"http://127.0.0.1:33567/solr","node_name":"127.0.0.1:33567_solr","state":"active","type":"NRT"} 9361 INFO (OverseerThreadFactory-20-thread-3-processing-n:127.0.0.1:46180_solr) [n:127.0.0.1:46180_solr ] o.a.s.c.a.c.AddReplicaCmd Node Identified 127.0.0.1:34444_solr for creating new replica ... 10078 INFO (qtp1498399719-45) [n:127.0.0.1:33567_solr ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/collections params={node=127.0.0.1:34444_solr&action=UTILIZENODE&wt=javabin&version=2} status=0 QTime=874 // next up, sanity check which replicas jettyX/34444 now has, // then set a new policy saying that port 34444 should have 0 replicas... 10079 INFO (TEST-TestUtilizeNode.test-seed#[78A4DE08FC5237FE]) [ ] o.a.s.c.TestUtilizeNode jettyX replicas prior to being blacklisted: [core_node10:{"core":"utilizenodecoll_shard2_replica_n9","base_url":"http://127.0.0.1:34444/solr","node_name":"127.0.0.1:34444_solr","state":"recovering","type":"NRT"}] 10079 INFO (TEST-TestUtilizeNode.test-seed#[78A4DE08FC5237FE]) [ ] o.a.s.c.TestUtilizeNode Setting new policy to blacklist jettyX (127.0.0.1:34444_solr) port=34444 ... 10143 INFO (qtp1498399719-27) [n:127.0.0.1:33567_solr ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/autoscaling params={wt=javabin&version=2} status=0 QTime=59 // now spin up another new node: jettyY/55619, // redundently sanity check the replicas on jettyX again, 10144 INFO (TEST-TestUtilizeNode.test-seed#[78A4DE08FC5237FE]) [ ] o.a.s.c.TestUtilizeNode Spinning up additional jettyY... ... 10361 INFO (zkConnectionManagerCallback-78-thread-1) [ ] o.a.s.c.c.ConnectionManager zkClient has connected 10365 INFO (TEST-TestUtilizeNode.test-seed#[78A4DE08FC5237FE]) [ ] o.a.s.c.TestUtilizeNode jettyX replicas prior to utilizing jettyY: [core_node10:{"core":"utilizenodecoll_shard2_replica_n9","base_url":"http://127.0.0.1:34444/solr","node_name":"127.0.0.1:34444_solr","state":"recovering","type":"NRT"}] // Now send a UTILIZENODE command for jettyY/55619, // this *should* move the replica from jettyX->jettyY // (in order to resolve the existing policy violation) 10365 INFO (TEST-TestUtilizeNode.test-seed#[78A4DE08FC5237FE]) [ ] o.a.s.c.TestUtilizeNode Sending UTILIZE command for jettyY (127.0.0.1:55619_solr) 10366 INFO (qtp1498399719-45) [n:127.0.0.1:33567_solr ] o.a.s.h.a.CollectionsHandler Invoked Collection Action :utilizenode with params node=127.0.0.1:55619_solr&action=UTILIZENODE&wt=javabin&version=2 and sendToOCPQueue=true ... 10448 INFO (OverseerThreadFactory-20-thread-4-processing-n:127.0.0.1:46180_solr) [n:127.0.0.1:46180_solr ] o.a.s.c.a.c.MoveReplicaCmd Replica will be moved to node 127.0.0.1:55619_solr: core_node6:{"core":"utilizenodecoll_shard2_replica_n5","base_url":"http://127.0.0.1:46180/solr","node_name":"127.0.0.1:46180_solr","state":"active","type":"NRT","leader":"true"} 10450 INFO (OverseerThreadFactory-20-thread-4-processing-n:127.0.0.1:46180_solr) [n:127.0.0.1:46180_solr ] o.a.s.c.a.c.AddReplicaCmd Node Identified 127.0.0.1:55619_solr for creating new replica ... 12710 INFO (qtp1498399719-45) [n:127.0.0.1:33567_solr ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/collections params={node=127.0.0.1:55619_solr&action=UTILIZENODE&wt=javabin&version=2} status=0 QTime=2343 // but as you can see above, the replica that's added to jettyY/55619 // comes from a completley different node on port 46180 {noformat} > UTILIZENODE does not enforce policy rules > ----------------------------------------- > > Key: SOLR-12050 > URL: https://issues.apache.org/jira/browse/SOLR-12050 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Hoss Man > Priority: Major > Attachments: SOLR-12050.log.txt > > > I've been poking around TestUtilizeNode and some of it's recent jenkins > failures -- AFAICT the {{UTILIZENODE}} is not behaving correctly per it's > current documentation... > bq. It tries to fix any policy violations first and then it tries to move > some load off of the most loaded nodes according to the preferences > ...based on my testing w/a slightly modified testcase that does additional > logging/asserts, it will frequently choose to move a "random" replica to > move, even when there are existing replicas that violate the policy. > I will be commiting my current improvements to the test while citing this > issue, and marking the test \@AwaitsFix Then i'll attach some logs/comments > showing what i mean. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org