[ https://issues.apache.org/jira/browse/SOLR-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Erick Erickson resolved SOLR-7936. ---------------------------------- Resolution: Cannot Reproduce Can't get this to fail now. > Bogus failure when deleting collections. > ---------------------------------------- > > Key: SOLR-7936 > URL: https://issues.apache.org/jira/browse/SOLR-7936 > Project: Solr > Issue Type: Bug > Reporter: Erick Erickson > Assignee: Erick Erickson > > When looking at the CDCR test failures, we began to wonder whether the > problem was > 1> the cdcr code itself > 2> the test framework > 3> Solr > Some of the failures seem to be "impossible" assuming collection > creation/deletion work OK. > So I wrote a little program to exercise collection creation/deletion outside > the test framework by just adding and deleting the same collection over and > over and over again, and it started regularly failing in > OverseerCollectionMessageHandler.deleteCollection about line 780 it would > throw the "Could not fully remove the collection" exception: > {code} > TimeOut timeout = new TimeOut(30, TimeUnit.SECONDS); > boolean removed = false; > while (! timeout.hasTimedOut()) { > Thread.sleep(100); > // WORKS SO FAR IF UNCOMMENTED zkStateReader.updateClusterState(); > removed = !zkStateReader.getClusterState().hasCollection(collection); > if (removed) { > Thread.sleep(500); // just a bit of time so it's more likely other > // readers see on return > break; > } > } > if (!removed) { > throw new SolrException(ErrorCode.SERVER_ERROR, > "Could not fully remove collection: " + collection); > } > {code} > However, the collection is really gone from clusterstate. When I put the > updateClusterState() in above, it doesn't seem to fail. Is it as simple as > the updateClusterState() call? > Without the update in place, it failed within 20 reps very regularly. So far, > with the update in place we're at 132 and counting. Any comments? > If this runs 1,000 times tonight, I'll check it in if there are no > objections. I don't know what it means for CDCR yet though. > I'm also suspicious of the 500ms sleep. Anyone have a clue what that's in > there for? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org