New submission from Luis: Hi,
I've seen an odd behavior for multiprocessing Pool in Linux/MacOS: ----------------------------- import multiprocessing as mp from sys import getsizeof import numpy as np def f_test(x): print('process has received argument %s' % x ) r = x[:100] # return will put in a queue for Pool, for objects > 4GB pickle complains return r if __name__ == '__main__': # 2**28 runs ok, 2**29 or bigger breaks pickle big_param = np.random.random(2**29) # Process+big_parameter OK: proc = mp.Process(target=f_test, args=(big_param,)) res = proc.start() proc.join() print('size of process result', getsizeof(res)) # Pool+big_parameter BREAKS: pool = mp.Pool(1) res = pool.map(f_test, (big_param,)) print('size of Pool result', getsizeof(res)) ----------------------------- $ python bug_mp.py process has received argument [ 0.65282086 0.34977429 0.64148342 ..., 0.79902495 0.31427761 0.02678803] size of process result 16 Traceback (most recent call last): File "bug_mp.py", line 26, in <module> res = pool.map(f_test, (big_param,)) File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/pool.py", line 260, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/pool.py", line 599, in get raise self._value File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/pool.py", line 383, in _handle_tasks put(task) File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/connection.py", line 206, in send self._send_bytes(ForkingPickler.dumps(obj)) File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/reduction.py", line 50, in dumps cls(buf, protocol).dump(obj) OverflowError: cannot serialize a bytes object larger than 4 GiB ----------------------------- There's another flavor of error seen in similar scenario: ... struct.error: 'i' format requires -2147483648 <= number <= 2147483647 ----------------------------- Tested in: Python 3.4.2 |Anaconda 2.1.0 (64-bit)| (default, Oct 21 2014, 17:16:37) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux And in: Python 3.4.3 (default, Apr 9 2015, 16:03:56) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin ----------------------------- Pool.map creates a "task Queue" to handle workers, and I think that but by doing this we are forcing any arguments passed to the workers to be pickled. Process works OK, since no queue is created, it just forks. My expectation would be that since we are in POSIX and forking, we shouldn't have to worry about arguments being pickled, and if this is expected behavior, it should be warned/documented (hope I've not missed this in the docs). For small sized arguments, pickling-unpicking may not be an issue, but for big ones then, it is (I am aware of the Array and MemShare options). Anybody has seen something similar, is perhaps this a hard requirement to Pool.map or I'm completely missing the point altogether? ---------- messages: 241289 nosy: kieleth priority: normal severity: normal status: open title: Multiprocessing Pool.map pickles arguments passed to workers type: behavior versions: Python 3.4 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue23979> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com