[ 
https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13863798#comment-13863798
 ] 

Hoss Man commented on SOLR-5477:
--------------------------------

A few small suggestions from someone who hasn't through much of this but has 
done similar async setups in other systems in another lifetime...

1) on where the (core task) queues should live...

bq. I'm still debating between having even the CoreAdmin to use zk (which means 
it'd only work in SolrCloud mode) or just have a local map of running taks. 

I think it would be wise to keep them in ZK -- if for no other reason then 
because the primary usecase you expect is for the async core calls to be made 
by the async overseer calls; and by keeping the async core queues in zk, the 
overseer can watch those queues directly for "completed" instead of needing ot 
wake up, poll every replica, go back to sleep.

However, a secondary concern (i think) is what should happen if/when a node 
gets rebooted -- if the core admin tasks queues are in RAM then you could 
easily get in a situation where the overseer asks 10 replicas to do something, 
replicaA succeeds or fails quickly and then reboots, the overseer checks back 
once all replicas are done and finds that replicaA can't say one way or another 
whether it succeeded or failed -- it's queues are totally empty.

2) on generating the task/request IDs.

in my experience, when implementing an async callback API like this, it can be 
handy to require the *client* to specify the magical id that you use to keep 
track of things -- you just ensure it's unique among the existing async jobs 
you know about (either in the queue, or in the recently completed/failed 
queues).  Sometimes single threaded (or centrally manged) client apps can 
generate a unique id easier then your distributed system, and/or they may 
already have a one-to-one mapping between some id they've already got and the 
task they are asking you to do, and re-using that id makes the client's life 
easier for debuging/audit-logs.

in the case of async collection commands -> async core commands, it would also 
mean the overseer could reuse whatever id the client passed in for the 
collection commands when talking to each of the replicas.


> Async execution of OverseerCollectionProcessor tasks
> ----------------------------------------------------
>
>                 Key: SOLR-5477
>                 URL: https://issues.apache.org/jira/browse/SOLR-5477
>             Project: Solr
>          Issue Type: Sub-task
>          Components: SolrCloud
>            Reporter: Noble Paul
>            Assignee: Anshum Gupta
>         Attachments: SOLR-5477-CoreAdminStatus.patch
>
>
> Typical collection admin commands are long running and it is very common to 
> have the requests get timed out.  It is more of a problem if the cluster is 
> very large.Add an option to run these commands asynchronously
> add an extra param async=true for all collection commands
> the task is written to ZK and the caller is returned a task id. 
> as separate collection admin command will be added to poll the status of the 
> task
> command=status&id=7657668909
> if id is not passed all running async tasks should be listed
> A separate queue is created to store in-process tasks . After the tasks are 
> completed the queue entry is removed. OverSeerColectionProcessor will perform 
> these tasks in multiple threads



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to