[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

Greg Brockman Wed, 14 Jul 2010 08:04:30 -0700

Greg Brockman <g...@ksplice.com> added the comment:

Before I forget, looks like we also need to deal with the result from a worker 
being un-unpickleable:
"""
#!/usr/bin/env python
import multiprocessing
def foo(x):
  global bar
  def bar(x):
    pass
  return bar
p = multiprocessing.Pool(1)
p.apply(foo, [1])
"""


This shouldn't require much more work, but I'll hold off on submitting a patch 
until we have a better idea of where we're going in this arena.

> Instead of restarting crashed worker processes it will simply bring down
> the pool, right?
Yep.  Again, as things stand, once you've lost an worker, you've lost a task, 
and you can't really do much about it.  I guess that depends on your 
application though... is your use-case such that you can lose a task without it 
mattering?  If tasks are idempotent, one could have the task handler resubmit 
them, etc..  But really, thinking about the failure modes I've seen (OOM 
kills/user-initiated interrupt) I'm not sure under what circumstances I'd like 
the pool to try to recover.

The idea of recording the mapping of tasks -> workers seems interesting.  
Getting all of the corner cases could be hard (e.g. making removing a task from 
the queue and recording which worker did the removing atomic, detecting if the 
worker crashed while still holding the queue lock) and doing this would require 
extra mechanism.  This feature does seem to be useful for pools running many 
different jobs, because that way a crashed worker need only terminate one job.

Anyway, I'd be curious to know more about the kinds of crashes you've 
encountered from which you'd like to be able to recover.  Is it just 
Unpickleable exceptions, or are there others?

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue9205>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

Reply via email to