[ 
https://issues.apache.org/jira/browse/SOLR-12607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16563230#comment-16563230
 ] 

Shalin Shekhar Mangar commented on SOLR-12607:
----------------------------------------------

The testSplitWithChaosMonkey failures increased noticeably after SOLR-11665 was 
committed. I looked at the logs of a recent failure and here's what I found:

# Shard Split succeeds in creating new sub-shards and new replicas
# The leader node is killed by chaos monkey before the new replicas can become 
active
# SOLR-11665 kicks in and cleans up (deletes) the sub-shards in construction 
including all their state from ZK
# The old leader node is started up again and re-registers the local cores 
thereby creating state in ZK again. However this time, since the parent shard 
information was deleted by the cleanup, the state is missing parent and range 
and slice state is set to active.
# This causes the assertions in the test to fail i.e. either no sub-shards 
exist or if they do, they are active and recovered

There are two bugs in play here:
# The async API status of the split shard command is COMPLETED instead of 
FAILED which leads the test to believe that the sub-shard slice and replicas 
should exist but they don't.
# By default, our tests still use legacyCloud=true unless set otherwise.

I'll set legacyCloud=false for this test and open another issue to set this to 
false by default throughout the test suite.

> Investigate ShardSplitTest failures
> -----------------------------------
>
>                 Key: SOLR-12607
>                 URL: https://issues.apache.org/jira/browse/SOLR-12607
>             Project: Solr
>          Issue Type: Task
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>            Reporter: Shalin Shekhar Mangar
>            Assignee: Shalin Shekhar Mangar
>            Priority: Major
>             Fix For: master (8.0), 7.5
>
>
> There have been many recent ShardSplitTest failures. 
> According to http://fucit.org/solr-jenkins-reports/failure-report.html
> {code}
> Class: org.apache.solr.cloud.api.collections.ShardSplitTest
> Method: testSplitWithChaosMonkey
> Failures: 72.32% (81 / 112)
> Class: org.apache.solr.cloud.api.collections.ShardSplitTest
> Method: test
> Failures: 26.79% (30 / 112)
> {code} 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to