[ https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725884#comment-16725884 ]
Jason Gerlowski commented on SOLR-13045: ---------------------------------------- One of the remaining failures for in TestSimPolicyCloud occurs in {{testCreateCollectionAddShardUsingPolicy}} when the initial collection creation (and subsequent shard creation) seem to violate a policy which specifies that all replicas should be created on the same node. After looking closer, it looks like this comes down to a race condition of sorts between two threads attempting to set the autoscaling.json "ZK" node. Two different threads touch the autoscaling config node in this test: the OverseerTriggerThread tries to set the default nodeAdded trigger, and the test code tries to set a policy that the test relies on. These threads rely on optimistic concurrency versioning to ensure that updates don't clobber one another. But SimDistribStateManager has a bug which prevents this from working correctly all the time. The initial node version in the sim framework is -1, which is also the flag used to indicate "I don't care about concurrency, just overwrite the node". (For comparison, ZkDistribStateManager has node versions start at 0). Depending on timing, this causes the default nodeAdded trigger to clobber the policy that our test relies on, causing it to fail. So one fix that'll make this test (and probably others in the sim framework) more reliable is to ensure that SimDistribStateManager's node-versioning lines up better with ZkDistribStateManager's. Or at least that it avoids this -1 edge case. I've been testing variations of a patch to accomplish this, and will upload my results shortly. > Harden TestSimPolicyCloud > ------------------------- > > Key: SOLR-13045 > URL: https://issues.apache.org/jira/browse/SOLR-13045 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling > Affects Versions: master (8.0) > Reporter: Jason Gerlowski > Assignee: Jason Gerlowski > Priority: Major > Attachments: SOLR-13045.patch, SOLR-13045.patch, jenkins.log.txt.gz > > > Several tests in TestSimPolicyCloud, but especially > {{testCreateCollectionAddReplica}}, have some flaky behavior, even after > Mark's recent test-fix commit. This JIRA covers looking into and (hopefully) > fixing this test failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org