Ask Solem <a...@opera.com> added the comment: There's one more thing
if exitcode is not None: cleaned = True if exitcode != 0 and not worker._termination_requested: abnormal.append((worker.pid, exitcode)) Instead of restarting crashed worker processes it will simply bring down the pool, right? If so, then I think it's important to decide whether we want to keep the supervisor functionality, and if so decide on a recovery strategy. Some alternatives are: A) Any missing worker brings down the pool. B) Missing workers will be replaced one-by-one. A maximum-restart-frequency decides when the supervisor should give up trying to recover the pool, and crash it. C) Same as B, except that any process crashing when trying to get() will bring down the pool. I think the supervisor is a good addition, so I would very much like to keep it. It's also a step closer to my goal of adding the enhancements added by Celery to multiprocessing.pool. Using C is only a few changes away from this patch, but B would also be possible in combination with my accept_callback patch. It does pose some overhead, so it depends on the level of recovery we want to support. accept_callback: this is a callback that is triggered when the job is reserved by a worker process. The acks are sent to an additional Queue, with an additional thread processing the acks (hence the mentioned overhead). This enables us to keep track of what the worker processes are doing, also get the PID of the worker processing any given job (besides from recovery, potential uses are monitoring and the ability to terminate a job (ApplyResult.terminate?). See http://github.com/ask/celery/blob/master/celery/concurrency/processes/pool.py ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue9205> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com