New submission from Charles-François Natali:
hanger.py
"""
from time import sleep
def hang(i):
sleep(i)
raise ValueError("x" * 1024**2)
"""
The following code will deadlock on pool.close():
"""
from multiprocessing import Pool
from time import sleep
from hanger import hang
with Pool() as pool:
try:
pool.map(hang, [0,1])
finally:
sleep(0.5)
pool.close()
pool.join()
"""
The problem is that when one of the tasks comprising a map result fails with an
exception, the corresponding MapResult is removed from the result cache:
def _set(self, i, success_result):
success, result = success_result
if success:
[snip]
else:
self._success = False
self._value = result
if self._error_callback:
self._error_callback(self._value)
<===
del self._cache[self._job]
self._event.set()
===>
Which means that when the pool is closed, the result handler thread terminates
right away, because it doesn't see any task left to wait for.
Which means that it doesn't drain the result queue, and if some worker process
is trying to write a large result to it (hence the large valuerrror to fill the
socket/pipe buffer), it will hang, and the pool won't shut down (unless you
call terminate()).
Although I can see the advantage of fail-fast behavior, I don't think it's
correct because it breaks the invariant where results won't be deleted from the
cache until they're actually done.
Also, the current fail-fast behavior breaks the semantics that the call only
returns when it has completed.
Returning while some jobs part of the map are still running is potentially very
bad, e.g. if the user call retries the same call, assuming that all the jobs
are done. Retrying jobs that are idempotent but not parallel execution-safe
would break with the current code.
The fix is trivial, use the same logic as in case of success to only signal
failure when all jobs are done.
I'll provide a patch if it seems sensible :-)
----------
components: Library (Lib)
messages: 241404
nosy: neologix, pitrou
priority: normal
severity: normal
status: open
title: multiprocessing: MapResult shouldn't fail fast upon exception
type: behavior
versions: Python 2.7, Python 3.4, Python 3.5
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue23992>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com