[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

Greg Brockman Fri, 13 Aug 2010 13:53:50 -0700

Greg Brockman <g...@mit.deu> added the comment:

I'll take another stab at this.  In the attachment (assign-tasks.patch), I've 
combined a lot of the ideas presented on this issue, so thank you both for your 
input.  Anyway:


- The basic idea of the patch is to record the mapping of tasks to workers.  
I've added a protocol between the parent process and the workers that allows 
this to happen without adding a race condition between recording the task and 
the child dying.
- If a child unexpectedly dies, the worker_handler pretends that all of the 
jobs currently assigned to it raised a RuntimeError.  (Multiple jobs can be 
assigned to a single worker if the result handler is being slow.)
- The guarantee I try to provide is that each job will be started at most once. 
 There is enough information to instead ensure that each job is run exactly 
once, but in general whether that's acceptable or useful is really only known 
at the application level.

Some notes:
- I haven't implemented this for approach for the ThreadPool yet.
- The test suite runs but occasionally hangs on shutting down the pool in Ask's 
tests in multiprocessing-tr...@82502-termination-trackjobs.patch.  My 
experiments seem to indicate this is due to a worker dying while holding a 
queue lock.  So I think a next step is to deal with workers dying while holding 
a queue lock, although this seems unlikely in practice.  I have some ideas as 
to how you could fix this, if we decide it's worth trying.

Anyway, please let me know what you think of this approach/sample 
implementation.  If we decide that this seems promising, I'd be happy to built 
it out further.

----------
Added file: http://bugs.python.org/file18513/assign-tasks.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue9205>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

Reply via email to