[issue23992] multiprocessing: MapResult shouldn't fail fast upon exception

2016-02-12 Thread Charles-François Natali

Changes by Charles-François Natali :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23992] multiprocessing: MapResult shouldn't fail fast upon exception

2016-02-10 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 1ba0deb52223 by Charles-François Natali in branch 'default':
Issue #23992: multiprocessing: make MapResult not fail-fast upon exception.
https://hg.python.org/cpython/rev/1ba0deb52223

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23992] multiprocessing: MapResult shouldn't fail fast upon exception

2015-09-12 Thread Davin Potts

Davin Potts added the comment:

As an aside:  issue24948 seems to show there are others who would find the 
immediate-multiple-error_callback idea attractive.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23992] multiprocessing: MapResult shouldn't fail fast upon exception

2015-09-12 Thread Davin Potts

Davin Potts added the comment:

The patches make good sense to me -- I have no comments to add in a review.

I spent more time than I care to admit concerned with the idea that 
error_callback (exposed by map_async which map sits on top of) should perhaps 
be called not just once at the end but each time an exception occurs.  
Motivated by past jobs which failed overall to yield any results because one 
out of a million of the inputs triggered an error, I thought the idea very 
appealing and experimented with implementing it (with happy results).  Googling 
for it though, I found plenty of examples of people asking questions about how 
callback and error_callback are intended to work -- though the documentation is 
not explicit on this particular point, most of those search results correctly 
document in the wild that error_callback is called only once at the end just 
like callback.  I think it best to leave that functionality just as you have it 
now.

Thanks for creating the patch -- looks great to me.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23992] multiprocessing: MapResult shouldn't fail fast upon exception

2015-09-12 Thread Davin Potts

Changes by Davin Potts :


--
stage: needs patch -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23992] multiprocessing: MapResult shouldn't fail fast upon exception

2015-07-01 Thread Charles-François Natali

Charles-François Natali added the comment:

Barring any objections, I'll commit within the next few days.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23992
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23992] multiprocessing: MapResult shouldn't fail fast upon exception

2015-06-13 Thread Charles-François Natali

Changes by Charles-François Natali cf.nat...@gmail.com:


--
keywords: +needs review
nosy: +haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23992
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23992] multiprocessing: MapResult shouldn't fail fast upon exception

2015-04-22 Thread Charles-François Natali

Changes by Charles-François Natali cf.nat...@gmail.com:


Removed file: http://bugs.python.org/file39171/mp_map_fail_fast_default.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23992
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23992] multiprocessing: MapResult shouldn't fail fast upon exception

2015-04-22 Thread Charles-François Natali

Changes by Charles-François Natali cf.nat...@gmail.com:


Removed file: http://bugs.python.org/file39170/mp_map_fail_fast_27.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23992
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23992] multiprocessing: MapResult shouldn't fail fast upon exception

2015-04-22 Thread Charles-François Natali

Changes by Charles-François Natali cf.nat...@gmail.com:


Added file: http://bugs.python.org/file39172/mp_map_fail_fast_27.diff
Added file: http://bugs.python.org/file39173/mp_map_fail_fast_default.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23992
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23992] multiprocessing: MapResult shouldn't fail fast upon exception

2015-04-22 Thread Charles-François Natali

Charles-François Natali added the comment:

Patches for 2.7 and default.

--
keywords: +patch
Added file: http://bugs.python.org/file39170/mp_map_fail_fast_27.diff
Added file: http://bugs.python.org/file39171/mp_map_fail_fast_default.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23992
___diff -r 5576c8240963 Lib/multiprocessing/pool.py
--- a/Lib/multiprocessing/pool.py   Wed Apr 15 19:30:38 2015 +0100
+++ b/Lib/multiprocessing/pool.py   Wed Apr 22 19:34:48 2015 +0100
@@ -599,10 +599,10 @@
 self._number_left = length//chunksize + bool(length % chunksize)
 
 def _set(self, i, success_result):
+self._number_left -= 1
 success, result = success_result
-if success:
+if success and self._success:
 self._value[i*self._chunksize:(i+1)*self._chunksize] = result
-self._number_left -= 1
 if self._number_left == 0:
 if self._callback:
 self._callback(self._value)
@@ -615,15 +615,17 @@
 self._cond.release()
 
 else:
-self._success = False
-self._value = result
-del self._cache[self._job]
-self._cond.acquire()
-try:
-self._ready = True
-self._cond.notify()
-finally:
-self._cond.release()
+if self._success:
+self._success = False
+self._value = result
+if self._number_left == 0:
+del self._cache[self._job]
+self._cond.acquire()
+try:
+self._ready = True
+self._cond.notify()
+finally:
+self._cond.release()
 
 #
 # Class whose instances are returned by `Pool.imap()`
diff -r 5576c8240963 Lib/test/test_multiprocessing.py
--- a/Lib/test/test_multiprocessing.py  Wed Apr 15 19:30:38 2015 +0100
+++ b/Lib/test/test_multiprocessing.py  Wed Apr 22 19:34:48 2015 +0100
@@ -1123,6 +1123,12 @@
 time.sleep(wait)
 return x*x
 
+
+def raise_large_valuerror(wait):
+time.sleep(wait)
+raise ValueError(x * 1024**2)
+
+
 class SayWhenError(ValueError): pass
 
 def exception_throwing_generator(total, when):
@@ -1262,6 +1268,27 @@
 p.close()
 p.join()
 
+def test_map_no_failfast(self):
+# Issue #23992: the fail-fast behaviour when an exception is raised
+# during map() would make Pool.join() deadlock, because a worker
+# process would fill the result queue (after the result handler thread
+# terminated, hence not draining it anymore).
+
+t_start = time.time()
+
+with self.assertRaises(ValueError):
+p = self.Pool(2)
+try:
+p.map(raise_large_valuerror, [0, 1])
+finally:
+time.sleep(0.5)
+p.close()
+p.join()
+
+# check that we indeed waited for all jobs
+self.assertGreater(time.time() - t_start, 0.9)
+
+
 def unpickleable_result():
 return lambda: 42
 
diff -r 21ae8abc1af3 Lib/multiprocessing/pool.py
--- a/Lib/multiprocessing/pool.py   Mon Apr 13 12:30:53 2015 -0500
+++ b/Lib/multiprocessing/pool.py   Wed Apr 22 19:35:31 2015 +0100
@@ -638,22 +638,24 @@
 self._number_left = length//chunksize + bool(length % chunksize)
 
 def _set(self, i, success_result):
+self._number_left -= 1
 success, result = success_result
 if success:
 self._value[i*self._chunksize:(i+1)*self._chunksize] = result
-self._number_left -= 1
 if self._number_left == 0:
 if self._callback:
 self._callback(self._value)
 del self._cache[self._job]
 self._event.set()
 else:
-self._success = False
-self._value = result
-if self._error_callback:
-self._error_callback(self._value)
-del self._cache[self._job]
-self._event.set()
+if self._success:
+self._success = False
+self._value = result
+if self._number_left == 0:
+if self._error_callback:
+self._error_callback(self._value)
+del self._cache[self._job]
+self._event.set()
 
 #
 # Class whose instances are returned by `Pool.imap()`
diff -r 21ae8abc1af3 Lib/test/_test_multiprocessing.py
--- a/Lib/test/_test_multiprocessing.py Mon Apr 13 12:30:53 2015 -0500
+++ b/Lib/test/_test_multiprocessing.py Wed Apr 22 19:35:31 2015 +0100
@@ -1660,6 +1660,10 @@
 def mul(x, y):
 return x*y
 
+def raise_large_valuerror(wait):
+time.sleep(wait)
+raise ValueError(x * 1024**2)
+
 class SayWhenError(ValueError): pass
 
 def 

[issue23992] multiprocessing: MapResult shouldn't fail fast upon exception

2015-04-18 Thread Ned Deily

Changes by Ned Deily n...@acm.org:


--
nosy: +davin, sbt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23992
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23992] multiprocessing: MapResult shouldn't fail fast upon exception

2015-04-18 Thread Davin Potts

Davin Potts added the comment:

This is a nice example demonstrating what I agree is a problem with the current 
implementation of close.

A practical concern with what I believe is being proposed in your trivial fix:  
if the workers are engaged in very long-running tasks (and perhaps slowly 
writing their overly large results to the results queue) then we would have to 
wait for quite a long time for these other workers to reach their natural 
completion.

That said, I believe close should in fact behave just that way and have us 
subsequently wait for the others to be completed.  It is not close's job to 
attempt to address the general concern I bring up.

This change could be felt by people who have written their code to expect the 
result handler's immediate shutdown if there are no other visible results -- it 
is difficult to imagine what the impact would be.

This is my long-winded way of saying it seems very sensible and welcome to me 
if you took the time to prepare a patch.

--
stage:  - needs patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23992
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23992] multiprocessing: MapResult shouldn't fail fast upon exception

2015-04-18 Thread Charles-François Natali

New submission from Charles-François Natali:

hanger.py

from time import sleep


def hang(i):
sleep(i)
raise ValueError(x * 1024**2)



The following code will deadlock on pool.close():

from multiprocessing import Pool
from time import sleep

from hanger import hang


with Pool() as pool:
try:
pool.map(hang, [0,1])
finally:
sleep(0.5)
pool.close()
pool.join()


The problem is that when one of the tasks comprising a map result fails with an 
exception, the corresponding MapResult is removed from the result cache:

def _set(self, i, success_result):
success, result = success_result
if success:
[snip]
else:
self._success = False
self._value = result
if self._error_callback:
self._error_callback(self._value)
===
del self._cache[self._job]
self._event.set()
===

Which means that when the pool is closed, the result handler thread terminates 
right away, because it doesn't see any task left to wait for.
Which means that it doesn't drain the result queue, and if some worker process 
is trying to write a large result to it (hence the large valuerrror to fill the 
socket/pipe buffer), it will hang, and the pool won't shut down (unless you 
call terminate()).

Although I can see the advantage of fail-fast behavior, I don't think it's 
correct because it breaks the invariant where results won't be deleted from the 
cache until they're actually done.

Also, the current fail-fast behavior breaks the semantics that the call only 
returns when it has completed.
Returning while some jobs part of the map are still running is potentially very 
bad, e.g. if the user call retries the same call, assuming that all the jobs 
are done. Retrying jobs that are idempotent but not parallel execution-safe 
would break with the current code.


The fix is trivial, use the same logic as in case of success to only signal 
failure when all jobs are done.

I'll provide a patch if it seems sensible :-)

--
components: Library (Lib)
messages: 241404
nosy: neologix, pitrou
priority: normal
severity: normal
status: open
title: multiprocessing: MapResult shouldn't fail fast upon exception
type: behavior
versions: Python 2.7, Python 3.4, Python 3.5

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23992
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com