[
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297710#comment-15297710
]
Noble Paul commented on SOLR-8744:
----------------------------------
bq. In OverseerTaskProcessor, handing off the lock object (up to and including
the tpe.execute call) should probably move inside the try block; on
successfully executing the task, null out the lock local, and then put an "if
lock != null unlock" into a finally block.
The runner is always executed in a different thread. So the finally block will
always evaluate {{lock != null}} to true. Actually, only in case of an
exception, OverseerTaskProcessor should unlock it
bq.Do we need some kind of complete reset in the event stuff really blows up?
Should there be a session-level clear that just unlocks everything?
Yes, we need it, but not at the session level. If I clear up everything for
each session, it defeats the purpose. The session is valid for only one batch.
The locks should survive for multiple batches.
A better solution is to periodically check if the running tasks are empty and
if yes, just create a new {{LockTree}}. I was planning t do it anyway
bq. I think MigrateStateFormat could be collection level rather than cluster
level?
I have set the lock levels more pessimistically. We should re-evaluate them
> Overseer operations need more fine grained mutual exclusion
> -----------------------------------------------------------
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
> Issue Type: Improvement
> Components: SolrCloud
> Affects Versions: 5.4.1
> Reporter: Scott Blum
> Assignee: Noble Paul
> Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this
> is a big scaling problem. Multiple split shard operations could happen at
> the time time, as long as different shards are being split. In practice,
> those shards often reside on different machines, so there's no I/O bottleneck
> in those cases, just the mutex in Overseer forcing the operations to be done
> serially.
> Given that a single split can take many minutes on a large collection, this
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need
> exclusive access at various levels. Each operation must define the Access
> level at which the access is required. Access level is an enum.
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend:
> C1, C2 -> Collections
> S1, S2 -> Shards
> R1,R2,R3,R4 -> Replicas
> Cluster
> / \
> / \
> C1 C2
> / \ / \
> / \ / \
> S1 S2 S1 S2
> R1, R2 R3.R4 R1,R2 R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate
> lock from the tree. For example, if an operation needs a lock at a Collection
> level and it needs to operate on Collection C1, the node C1 and all child
> nodes of C1 must be free.
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and
> start moving down depending upon the operation. After it reaches the right
> node, it checks if all the children are free from a lock. If it fails to
> acquire a lock, it remains in the work queue. A scheduler thread waits for
> notification from the current set of tasks . Every task would do a
> {{notify()}} on the monitor of the scheduler thread. The thread would start
> from the head of the queue and check all tasks to see if that task is able to
> acquire the right lock. If yes, it is executed, if not, the task is left in
> the work queue.
> When a new task arrives in the work queue, the schedulerthread wakes and just
> try to schedule that task.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]