[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

Ask Solem Wed, 14 Jul 2010 01:01:51 -0700

Ask Solem <[email protected]> added the comment:

There's one more thing


     if exitcode is not None:
       cleaned = True
                if exitcode != 0 and not worker._termination_requested:
                    abnormal.append((worker.pid, exitcode))


Instead of restarting crashed worker processes it will simply bring down
the pool, right?

If so, then I think it's important to decide whether we want to keep
the supervisor functionality, and if so decide on a recovery strategy.

Some alternatives are:

A) Any missing worker brings down the pool.

B) Missing workers will be replaced one-by-one. A maximum-restart-frequency 
decides when the supervisor should give up trying to recover
the pool, and crash it.

C) Same as B, except that any process crashing when trying to get() will bring 
down the pool.

I think the supervisor is a good addition, so I would very much like to keep 
it. It's also a step closer to my goal of adding the enhancements added by 
Celery to multiprocessing.pool.

Using C is only a few changes away from this patch, but B would also be 
possible in combination with my accept_callback patch. It does pose some 
overhead, so it depends on the level of recovery we want to support.

accept_callback: this is a callback that is triggered when the job is reserved 
by a worker process. The acks are sent to an additional Queue, with an 
additional thread processing the acks (hence the mentioned overhead). This 
enables us to keep track of what the worker processes are doing, also get the 
PID of the worker processing any given job (besides from recovery, potential 
uses are monitoring and the ability to terminate a job 
(ApplyResult.terminate?). See 
http://github.com/ask/celery/blob/master/celery/concurrency/processes/pool.py

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue9205>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

Reply via email to