[
https://issues.apache.org/jira/browse/SOLR-12729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18017907#comment-18017907
]
Jason Gerlowski commented on SOLR-12729:
----------------------------------------
Ran into some lock-orphaning issues recently, which brought me to this JIRA
ticket. My understanding of the design is as follows:
* lock is created early on in SplitShardCmd, as the ephemeral znode
"/collections/myCollName/<shardName>-splitting"
* SplitShardCmd cleans up the lock in a try-finally, handling the failure case
(i.e. when "success" flag is false)
* when SplitShardCmd finishes successfully, (i.e. "success" flag is true), the
lock is left in place
* ReplicaMutator.setState cleans up the lock when it detects that the last
sub-shard replica has finished recovery, which aims to cover the successful
case.
Assuming that understand is correct and I'm not missing other pieces of the
design, I think we've got some issues:
First, it seems like we're trying to cover replica-recovery in the locking, but
we do so inconsistently. In the happy-path, the lock is left in place until
after sub-shard replicas recover. But if the overseer node restarts during
that recovery phase, then the ephemeral lock will be lost.
Second (and admittedly less importantly), the lock can be orphaned if users
attempt to delete sub-shard replicas (or the sub-shard itself) while they're
still recovering. In particular I've seen this occur when workflow automation
gives up on a long-running split and runs DELETESHARD on the two sub-shards.
DELETESHARD succeeds...but as a result the ReplicaMutator lock-clearing will
never trigger. I suspect this would also occur if a user runs DELETEREPLICA on
a recovering sub-shard replica, but haven't tested it myself.
> SplitShardCmd should lock the parent shard to prevent parallel splitting
> requests
> ---------------------------------------------------------------------------------
>
> Key: SOLR-12729
> URL: https://issues.apache.org/jira/browse/SOLR-12729
> Project: Solr
> Issue Type: Bug
> Components: AutoScaling
> Reporter: Andrzej Bialecki
> Assignee: Andrzej Bialecki
> Priority: Major
> Fix For: 7.6, 8.0
>
>
> This scenario was discovered by the simulation framework, but it exists also
> in the non-simulated code.
> When {{IndexSizeTrigger}} requests SPLITSHARD, which is then successfully
> started and “completed” from the point of view of {{ExecutePlanAction}}, the
> reality is that it still can take significant amount of time until the moment
> when the new replicas fully recover and cause the switch of shard states
> (parent to INACTIVE, child from RECOVERY to ACTIVE).
> If this time is longer than the trigger's {{waitFor}} the trigger will issue
> the same SPLITSHARD request again. {{SplitShardCmd}} doesn't prevent this new
> request from being processed because the parent shard is still ACTIVE.
> However, a section of the code in {{SplitShardCmd}} will realize that
> sub-slices with the target names already exist and they are not active, at
> which point it will delete the new sub-slices ({{SplitShardCmd:182}}).
> The end result is an infinite loop, where {{IndexSizeTrigger}} will keep
> generating SPLITSHARD, and {{SplitShardCmd}} will keep deleting the
> recovering sub-slices created by the previous command.
> A simple solution is for the parent shard to be marked to indicate that it’s
> in a process of splitting, so that no other split is attempted on the same
> shard. Furthermore, {{IndexSizeTrigger}} could temporarily exclude such
> shards from monitoring.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]