New submission from Dan O'Reilly:

This is essentially a dupe of issue9205, but it was suggested I open a new 
issue, since that one ended up being used to fix this same problem in 
concurrent.futures, and was subsequently closed.

Right now, should a worker process in a Pool unexpectedly get terminated while 
a blocking Pool method is running (e.g. apply, map), the method will hang 
forever. This isn't a normal occurrence, but it does occasionally happen 
(either because someone  sends a SIGTERM, or because of a bug in the 
interpreter or a C-extension). It would be preferable for multiprocessing to 
follow the lead of concurrent.futures.ProcessPoolExecutor when this happens, 
and abort all running tasks and close down the Pool.

Attached is a patch that implements this behavior. Should a process in a Pool 
unexpectedly exit (meaning, *not* because of hitting the maxtasksperchild 
limit), the Pool will be closed/terminated and all cached/running tasks will 
raise a BrokenProcessPool exception. These changes also prevent the Pool from 
going into a bad state if the "initializer" function raises an exception 
(previously, the pool would end up infinitely starting new processes, which 
would immediately die because of the exception).

One concern with the patch: The way timings are altered with these changes, the 
Pool seems to be particularly susceptible to issue6721 in certain cases. If 
processes in the Pool are being restarted due to maxtasksperchild just as the 
worker is being closed or joined, there is a chance the worker will be forked 
while some of the debug logging inside of Pool is running (and holding locks on 
either sys.stdout or sys.stderr). When this happens, the worker deadlocks on 
startup, which will hang the whole program. I believe the current 
implementation is susceptible to this as well, but I could reproduce it much 
more consistently with this patch. I think its rare enough in practice that it 
shouldn't prevent the patch from being accepted, but thought I should point it 
out. 

(I do think issue6721 should be addressed, or at the very least internal  I/O 
locks should always reset after forking.)

----------
components: Library (Lib)
files: multiproc_broken_pool.diff
keywords: patch
messages: 226805
nosy: dan.oreilly, jnoller, pitrou, sbt
priority: normal
severity: normal
status: open
title: multiprocessing.Pool shouldn't hang forever if a worker process dies 
unexpectedly
type: enhancement
versions: Python 3.5
Added file: http://bugs.python.org/file36603/multiproc_broken_pool.diff

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue22393>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to