[
https://issues.apache.org/jira/browse/SOLR-18277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Smiley updated SOLR-18277:
--------------------------------
Attachment: OUTPUT-org.apache.solr.cloud.api.collections.ShardSplitTest.txt
> SplitShard cleanupAfterFailure race flaw
> ----------------------------------------
>
> Key: SOLR-18277
> URL: https://issues.apache.org/jira/browse/SOLR-18277
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Reporter: David Smiley
> Priority: Major
> Attachments:
> OUTPUT-org.apache.solr.cloud.api.collections.ShardSplitTest.txt
>
>
> {{testSplitAfterFailedSplit2}} fails because the parent shard (shard1) is
> permanently stuck in INACTIVE state after a failed split attempt, preventing
> the retry split from succeeding.
> _Disclaimer: issue is AI generated_
> h3. Root Cause
> There is a race condition in {{{}SplitShardCmd.cleanupAfterFailure(){}}}:
> # The normal split flow queues an Overseer state update: {{shard1→inactive,
> shard1_0→active, shard1_1→active}}
> # {{cleanupAfterFailure()}} calls {{forceUpdateCollection()}} — but reads
> the collection state *before* the Overseer has processed message 1
> # Cleanup sees shard1 still as ACTIVE, so it does *not* include
> {{shard1→active}} in its corrective state update
> # Cleanup queues: {{shard1_0→construction, shard1_1→construction}}
> # Overseer processes message 1: shard1 goes INACTIVE
> # Overseer processes message 2: sub-shards go to CONSTRUCTION (no fix for
> shard1)
> # Sub-shards are then deleted. shard1 is permanently stuck INACTIVE with no
> sub-shards.
> h3. Impact
> The retry split fails with: {{Parent slice is not active: collection1/
> shard1, state=inactive}}
> h3. Suggested Fix
> {{cleanupAfterFailure()}} should unconditionally include
> {{parentShard→active}} in its state update propMap (or re-read state after
> ensuring the Overseer queue is drained), rather than relying on a
> point-in-time read that may be stale due to the concurrent Overseer
> processing.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]