----> To take it up even one more level, you could have you spawning query spawn only a limited number of batches (10 maybe), and then spawn itself to do the remainder. All spawns end up on the task queue, which are processed in parallel already. The creation of tasks on the queue would be paced down when spawning the query that creates the batches, so only a very limited number of tasks would be on the queue on average. That would prevent overflow, and also leave room for other tasks, like evals from other processes and scheduled tasks. --<
This is interesting, if I am reading it right, it's like this: T: spawn controller ( processes batch then respawns T) T': span worker ( process batch only ) n=batch#,m=controller# T(m): spawn 10 T's one per batch ( T'(n,n+10+m) ) (runs 1 batch?) then spans itself T(m+1) T': process 1 batch - exit unless the processing time is exactly equal this could get ahead of itself ... if T' finishes before all T (say 3 T's are left when T is done and spawns 10 more T's , repeat). I've done similar things also ... Its a great technique but I find every time I do it, its trickier than I thought and the job has different needs so reusing the old code is hard. This really calls for generic framework/library that separates out the task queue management from the work process and all the tiddly bits that add up to 99% of the work. < read back one message > < face palm > wow! https://github.com/mblakele/taskbot Mike, why didn't you read my mind when I needed this for [insert recent project] ? This is really nice. Of course I can immediately see some additions I would have needed ... (replace the list with a function, check pointing state for restart persistence ,...) and a billion features I don't need but would be potentially useful ( resource capping by querying the meters DB occasionally, cross server spawns - may need to serialize function items for that ... hmmm, and of course a GUI ! ) if only it were open source, on GitHub, had a license that allowed commercial use without the worrying about the lawyers ... had logging and exception support and written by a nice person that wouldn't mind the pull requests ... ... oh wait ... it is ! wow. One question before I put this on my infinite queue of jobs for my clones to work on in parallel universes How does the using $tb:OPTIONS-SYNC avoid the problem of the calling task timing out if the job takes too long ? That and the problem of the original list itself being too expensive to create were my big stumbling blocks on a recent project. ( the rest was just a PITA of repetitive work this would have eliminated) ----------------------------------------------------------------------------- David Lee Lead Engineer MarkLogic Corporation d...@marklogic.com Phone: +1 812-482-5224 Cell: +1 812-630-7622 www.marklogic.com<http://www.marklogic.com/>
_______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general