[ 
https://issues.apache.org/jira/browse/SOLR-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258110#comment-15258110
 ] 

Shalin Shekhar Mangar commented on SOLR-9030:
---------------------------------------------

You can reproduce using:
{code}
ant -Dtestcase=AsyncCallRequestStatusResponseTest -Dbeast.iters=50 beast
{code}

This particular assert is tripped when the following hold true:
# control_collection is in state format 1
# collection1 is in state format 2
# collection1 is created but not written to ZK -- znode version is -1
# downnode modifies at least two collections (in the above test, it modifies 
control_collection and collection1)
# maybeFlushBefore logic is tripped when modifying control_collection (because 
it is in a different state format) -- this flushes collection1 -- znode version 
is now 0
# now the change made by downnode to collection1 trips the assert because when 
'downnode' was executed, the returned ZkWriteCommand had collection1 with 
version -1 but now it is 0.

One way to solve this problem i.e. to avoid the overwriting would be to use a 
sentinel value other than -1 for znodeVersion because ZK considers -1 as any 
version in CAS. If we can use, say Integer.MAX_VALUE then the logic to retry on 
version conflict will automatically take care of this problem and we can remove 
the assert too.

> The 'downnode' command can trip asserts in ZkStateWriter or cause 
> BadVersionException in Overseer
> -------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-9030
>                 URL: https://issues.apache.org/jira/browse/SOLR-9030
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>            Reporter: Shalin Shekhar Mangar
>             Fix For: master, 6.1
>
>
> While working on SOLR-9014 I came across a strange test failure.
> {code}
>    [junit4] ERROR   16.9s | 
> AsyncCallRequestStatusResponseTest.testAsyncCallStatusResponse <<<
>    [junit4]    > Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=46, 
> name=OverseerStateUpdate-95769832112259076-127.0.0.1:51135_z_oeg%2Ft-n_0000000000,
>  state=RUNNABLE, group=Overseer state updater.]
>    [junit4]    >      at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3:CBF7E84BCF328A1A]:0)
>    [junit4]    > Caused by: java.lang.AssertionError
>    [junit4]    >      at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3]:0)
>    [junit4]    >      at 
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:231)
>    [junit4]    >      at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:240)
>    [junit4]    >      at java.lang.Thread.run(Thread.java:745)
> {code}
> The underlying problem can manifest by tripping the above assert or a 
> BadVersionException as well. I found that this was introduced in SOLR-7281 
> where a new 'downnode' command was added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to