[jira] [Commented] (SOLR-7118) ChaosMonkeyNothingIsSafeTest fails with too many update fails

Shalin Shekhar Mangar (JIRA) Wed, 18 Feb 2015 01:28:33 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-7118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325641#comment-14325641
 ]


Shalin Shekhar Mangar commented on SOLR-7118:
---------------------------------------------

bq. An indexing request comes in but there is no leader in the cached state. 
Every subsequent indexing request will continue to fail until the cache entry 
expires i.e. 60 seconds

Reading the code again I realized that when this condition happens, the cached 
state is evicted. So, the above statement isn't true. What's really happening 
here is that in the time it takes for leader election, many updates are 
rejected (as they should be) and the test fails saying too many updates failed. 

>From the logs,
# The leader election process logs the following at time=1646937:
{code}
[junit4]   2> 1646937 T8923 oasc.ShardLeaderElectionContext.runLeaderProcess I 
may be the new leader - try and sync
{code}
# Then it goes to sleep for 2500 ms to wait for ongoing updates to finish (see 
ShardLeaderElectionContext.runLeaderProcess)
# By the time it wakes up at time=1649437, the monkey is finished and the test 
is being teared down
{code}
[junit4]   2> 1648644 T8922 oasc.ChaosMonkey.monkeyLog monkey: finished
   [junit4]   2> 1648645 T8922 oasc.ChaosMonkey.monkeyLog monkey: I ran for 
7.931sec. I stopped 1 and I started 0. I also expired 0 and caused 0 connection 
losses
   [junit4]   2> added docs:123 with 24 fails deletes:44
   [junit4]   2> num searches done:3 with 0 fails
   [junit4]   2> ASYNC  NEW_CORE C3537 name=collection1 
org.apache.solr.core.SolrCore@1f81ebc url=http://127.0.0.1:38608/collection1 
node=127.0.0.1:38608_ C3537_STATE=coll:collection1 core:collection1 
props:{core=collection1, base_url=http://127.0.0.1:38608, 
node_name=127.0.0.1:38608_, state=active}
   [junit4]   2> 1649437 T8923 C3537 P38608 oasc.SyncStrategy.sync Sync 
replicas to http://127.0.0.1:38608/collection1/
{code}

In summary, there is no bug, just another spurious failure because the test is 
tolerant of an arbitrary number of failures.

> ChaosMonkeyNothingIsSafeTest fails with too many update fails
> -------------------------------------------------------------
>
>                 Key: SOLR-7118
>                 URL: https://issues.apache.org/jira/browse/SOLR-7118
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud, Tests
>    Affects Versions: 5.0
>            Reporter: Shalin Shekhar Mangar
>             Fix For: Trunk, 5.1
>
>         Attachments: SOLR-7118.patch
>
>
> There are frequent failures on both trunk and branch_5x with the following 
> message:
> {code}
> java.lang.AssertionError: There were too many update fails - we expect it can 
> happen, but shouldn't easily
>       at 
> __randomizedtesting.SeedInfo.seed([786DB0FD42626C16:F98B3EE5353D0C2A]:0)
>       at org.junit.Assert.fail(Assert.java:93)
>       at org.junit.Assert.assertTrue(Assert.java:43)
>       at org.junit.Assert.assertFalse(Assert.java:68)
>       at 
> org.apache.solr.cloud.ChaosMonkeyNothingIsSafeTest.doTest(ChaosMonkeyNothingIsSafeTest.java:224)
>       at 
> org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:878)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-7118) ChaosMonkeyNothingIsSafeTest fails with too many update fails

Reply via email to