[ https://issues.apache.org/jira/browse/SOLR-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258110#comment-15258110 ]
Shalin Shekhar Mangar commented on SOLR-9030: --------------------------------------------- You can reproduce using: {code} ant -Dtestcase=AsyncCallRequestStatusResponseTest -Dbeast.iters=50 beast {code} This particular assert is tripped when the following hold true: # control_collection is in state format 1 # collection1 is in state format 2 # collection1 is created but not written to ZK -- znode version is -1 # downnode modifies at least two collections (in the above test, it modifies control_collection and collection1) # maybeFlushBefore logic is tripped when modifying control_collection (because it is in a different state format) -- this flushes collection1 -- znode version is now 0 # now the change made by downnode to collection1 trips the assert because when 'downnode' was executed, the returned ZkWriteCommand had collection1 with version -1 but now it is 0. One way to solve this problem i.e. to avoid the overwriting would be to use a sentinel value other than -1 for znodeVersion because ZK considers -1 as any version in CAS. If we can use, say Integer.MAX_VALUE then the logic to retry on version conflict will automatically take care of this problem and we can remove the assert too. > The 'downnode' command can trip asserts in ZkStateWriter or cause > BadVersionException in Overseer > ------------------------------------------------------------------------------------------------- > > Key: SOLR-9030 > URL: https://issues.apache.org/jira/browse/SOLR-9030 > Project: Solr > Issue Type: Bug > Components: SolrCloud > Reporter: Shalin Shekhar Mangar > Fix For: master, 6.1 > > > While working on SOLR-9014 I came across a strange test failure. > {code} > [junit4] ERROR 16.9s | > AsyncCallRequestStatusResponseTest.testAsyncCallStatusResponse <<< > [junit4] > Throwable #1: > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=46, > name=OverseerStateUpdate-95769832112259076-127.0.0.1:51135_z_oeg%2Ft-n_0000000000, > state=RUNNABLE, group=Overseer state updater.] > [junit4] > at > __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3:CBF7E84BCF328A1A]:0) > [junit4] > Caused by: java.lang.AssertionError > [junit4] > at > __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3]:0) > [junit4] > at > org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:231) > [junit4] > at > org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:240) > [junit4] > at java.lang.Thread.run(Thread.java:745) > {code} > The underlying problem can manifest by tripping the above assert or a > BadVersionException as well. I found that this was introduced in SOLR-7281 > where a new 'downnode' command was added. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org