[ https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13863798#comment-13863798 ]
Hoss Man commented on SOLR-5477: -------------------------------- A few small suggestions from someone who hasn't through much of this but has done similar async setups in other systems in another lifetime... 1) on where the (core task) queues should live... bq. I'm still debating between having even the CoreAdmin to use zk (which means it'd only work in SolrCloud mode) or just have a local map of running taks. I think it would be wise to keep them in ZK -- if for no other reason then because the primary usecase you expect is for the async core calls to be made by the async overseer calls; and by keeping the async core queues in zk, the overseer can watch those queues directly for "completed" instead of needing ot wake up, poll every replica, go back to sleep. However, a secondary concern (i think) is what should happen if/when a node gets rebooted -- if the core admin tasks queues are in RAM then you could easily get in a situation where the overseer asks 10 replicas to do something, replicaA succeeds or fails quickly and then reboots, the overseer checks back once all replicas are done and finds that replicaA can't say one way or another whether it succeeded or failed -- it's queues are totally empty. 2) on generating the task/request IDs. in my experience, when implementing an async callback API like this, it can be handy to require the *client* to specify the magical id that you use to keep track of things -- you just ensure it's unique among the existing async jobs you know about (either in the queue, or in the recently completed/failed queues). Sometimes single threaded (or centrally manged) client apps can generate a unique id easier then your distributed system, and/or they may already have a one-to-one mapping between some id they've already got and the task they are asking you to do, and re-using that id makes the client's life easier for debuging/audit-logs. in the case of async collection commands -> async core commands, it would also mean the overseer could reuse whatever id the client passed in for the collection commands when talking to each of the replicas. > Async execution of OverseerCollectionProcessor tasks > ---------------------------------------------------- > > Key: SOLR-5477 > URL: https://issues.apache.org/jira/browse/SOLR-5477 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud > Reporter: Noble Paul > Assignee: Anshum Gupta > Attachments: SOLR-5477-CoreAdminStatus.patch > > > Typical collection admin commands are long running and it is very common to > have the requests get timed out. It is more of a problem if the cluster is > very large.Add an option to run these commands asynchronously > add an extra param async=true for all collection commands > the task is written to ZK and the caller is returned a task id. > as separate collection admin command will be added to poll the status of the > task > command=status&id=7657668909 > if id is not passed all running async tasks should be listed > A separate queue is created to store in-process tasks . After the tasks are > completed the queue entry is removed. OverSeerColectionProcessor will perform > these tasks in multiple threads -- This message was sent by Atlassian JIRA (v6.1.5#6160) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org