Greg Brockman <g...@ksplice.com> added the comment: Before I forget, looks like we also need to deal with the result from a worker being un-unpickleable: """ #!/usr/bin/env python import multiprocessing def foo(x): global bar def bar(x): pass return bar p = multiprocessing.Pool(1) p.apply(foo, [1]) """
This shouldn't require much more work, but I'll hold off on submitting a patch until we have a better idea of where we're going in this arena. > Instead of restarting crashed worker processes it will simply bring down > the pool, right? Yep. Again, as things stand, once you've lost an worker, you've lost a task, and you can't really do much about it. I guess that depends on your application though... is your use-case such that you can lose a task without it mattering? If tasks are idempotent, one could have the task handler resubmit them, etc.. But really, thinking about the failure modes I've seen (OOM kills/user-initiated interrupt) I'm not sure under what circumstances I'd like the pool to try to recover. The idea of recording the mapping of tasks -> workers seems interesting. Getting all of the corner cases could be hard (e.g. making removing a task from the queue and recording which worker did the removing atomic, detecting if the worker crashed while still holding the queue lock) and doing this would require extra mechanism. This feature does seem to be useful for pools running many different jobs, because that way a crashed worker need only terminate one job. Anyway, I'd be curious to know more about the kinds of crashes you've encountered from which you'd like to be able to recover. Is it just Unpickleable exceptions, or are there others? ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue9205> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com