[issue23979] Multiprocessing Pool.map pickles arguments passed to workers

2021-04-01 Thread Socob


Change by Socob <206a8...@opayq.com>:


--
nosy: +Socob

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23979] Multiprocessing Pool.map pickles arguments passed to workers

2015-09-20 Thread Davin Potts

Changes by Davin Potts :


--
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23979] Multiprocessing Pool.map pickles arguments passed to workers

2015-04-22 Thread Davin Potts

Davin Potts added the comment:

@Luis:  Before closing this bug, could you give a quick description of the fix 
you put in place?

Two reasons:
1) Someone else encountering this frustration with similar needs might well 
benefit from hearing what you did.
2) To provide evidence that there is demand for a more efficient means of 
communication, demonstrated through workarounds.

Nothing fancy required -- not even code necessarily.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23979
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23979] Multiprocessing Pool.map pickles arguments passed to workers

2015-04-21 Thread Luis

Luis added the comment:

Thanks for information and explanations.

The option of writing a tweaked serialization mechanism in Queue for Pool and 
implement a sharedmem sounds like fun, not sure if the pure-copy-on-write of 
forking can be achieved tho, it would be nice to know if it is actually 
possible (the project mentioned in issue17560 still needs to dump the arrays 
in the filesystem)

As quick fix for us, I've created a simple wrapper around Pool and its map, it 
creates a Queue for the results and uses Process to start the workers, this 
works just fine.

Simplicity and consistency are great, but I still believe that Pool, in 
LINUX-based systems, by serializing arguments, creates duplication and works 
inefficiently, and this could be avoided.

Obviously it's not me who takes the decisions and I don't have the time to 
investigate it further, so, after this petty rant, should we close this bug? :

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23979
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23979] Multiprocessing Pool.map pickles arguments passed to workers

2015-04-18 Thread Davin Potts

Davin Potts added the comment:

Though it's been discussed elsewhere, issue17560 is a good one where the matter 
of really big objects are being communicated between processes via 
multiprocessing.  In it, Richard shares some detail about the implementation in 
multiprocessing, its constraints and motivation.

That discussion also highlights that the pickler has been made pluggable in 
multiprocessing since Python 3.3.  That is, if you wish, you can use something 
other than Python's pickle to serialize objects and, in the extreme, 
potentially communicate them in a completely new way (perhaps even via 
out-of-band communication, though that was not the intention and would be 
arguably extreme).

I do not think Python's pickle is necessarily what we should expect 
multiprocessing to use to help communicate objects between processes.  Just 
because pickle is Python's serialization strategy does not also mean it must 
necessarily also be used in such communications.  Thankfully, we have the 
convenience of using something other than pickle (or newer versions of pickle, 
since there have been versioned updates to pickle's format over time with some 
promising improvements).

@Luis:  To your specific question about the need for Queue, the benefits of a 
consistent behavior and methodology whether forking/spawning on one OS with its 
caveats versus another are just that:  simplicity.  The pluggable nature of the 
pickler opens up the opportunity for folks (perhaps such as yourself) to 
construct and plug-in optimized solutions for scenarios or edge cases of 
particular interest.  The fact that we all start with a consistent behavior 
across platforms and process creation strategies is very appealing from a 
reduced complexity point of view.

--
nosy: +davin

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23979
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23979] Multiprocessing Pool.map pickles arguments passed to workers

2015-04-17 Thread Luis

Luis added the comment:

Thanks for answer, although I still think I haven't made myself fully 
understood here, allow me to paraphrase:
...You need some means of transferring objects between processes, and pickling 
is the Python standard serialization method 

Yes, but the question that stands is why Pool has to use a multiprocess.Queue 
to load and spin the workers (therefore pickling-unpickling their arguments), 
whereas we should just inheriting in that moment and then just create a Queue 
for the returns of the workers.

This applies to method fork, not to spawn, and not sure for fork server.

Plus, I'm not trying to avoid inheritance, I'm trying to use it with Pools and 
large arguments as theoretically allowed by forking, and instead at the moment 
I'm forced to use Processes with a Queue for the results, as shown in the code 
above.

OverflowError: cannot serialize a bytes object larger than 4 GiB is just what 
allows us to expose this behavior, cause the Pool pickles the arguments 
without, in my opinion, having to do so.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23979
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23979] Multiprocessing Pool.map pickles arguments passed to workers

2015-04-17 Thread Josh Rosenberg

Josh Rosenberg added the comment:

The Pool workers are created eagerly, not lazily. That is, the fork occurs 
before map is called, and Python can't know that the objects passed as 
arguments were inherited in the first place (since they could be created after 
the Pool was created). If you created worker tasks lazily, then sure, you could 
fork and use the objects that are inherited, but that's not how Pools work, and 
they can't work like that if the worker processes process more than one input 
without eagerly evaluating input sequences (which would introduce other 
problems).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23979
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23979] Multiprocessing Pool.map pickles arguments passed to workers

2015-04-16 Thread Luis

New submission from Luis:

Hi,

I've seen an odd behavior for multiprocessing Pool in Linux/MacOS:

-
import multiprocessing as mp
from sys import getsizeof
import numpy as np


def f_test(x):
print('process has received argument %s' % x )
r = x[:100] # return will put in a queue for Pool, for objects  4GB pickle 
complains
return r

if __name__ == '__main__':
# 2**28 runs ok, 2**29 or bigger breaks pickle
big_param = np.random.random(2**29)

# Process+big_parameter OK:
proc = mp.Process(target=f_test, args=(big_param,))
res = proc.start()
proc.join()
print('size of process result', getsizeof(res))

# Pool+big_parameter BREAKS:
pool = mp.Pool(1)
res = pool.map(f_test, (big_param,))
print('size of Pool result', getsizeof(res))

-
$ python bug_mp.py
process has received argument [ 0.65282086  0.34977429  0.64148342 ...,  
0.79902495  0.31427761
  0.02678803]
size of process result 16
Traceback (most recent call last):
  File bug_mp.py, line 26, in module
res = pool.map(f_test, (big_param,))
  File 
/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/pool.py,
 line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
  File 
/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/pool.py,
 line 599, in get
raise self._value
  File 
/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/pool.py,
 line 383, in _handle_tasks
put(task)
  File 
/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/connection.py,
 line 206, in send
self._send_bytes(ForkingPickler.dumps(obj))
  File 
/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/reduction.py,
 line 50, in dumps
cls(buf, protocol).dump(obj)
OverflowError: cannot serialize a bytes object larger than 4 GiB

-
There's another flavor of error seen in similar scenario:
...
struct.error: 'i' format requires -2147483648 = number = 2147483647

-
Tested in:
Python 3.4.2 |Anaconda 2.1.0 (64-bit)| (default, Oct 21 2014, 17:16:37)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
And in:
Python 3.4.3 (default, Apr  9 2015, 16:03:56)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin

-

Pool.map creates a task Queue to handle workers, and I think that but by 
doing this we are forcing any arguments passed to the workers to be pickled.
Process works OK, since no queue is created, it just forks.

My expectation would be that since we are in POSIX and forking, we shouldn't 
have to worry about arguments being pickled, and if this is expected behavior, 
it should be warned/documented (hope I've not missed this in the docs).

For small sized arguments, pickling-unpicking may not be an issue, but for big 
ones then, it is (I am aware of the Array and MemShare options).

Anybody has seen something similar, is perhaps this a hard requirement to 
Pool.map or I'm completely missing the point altogether?

--
messages: 241289
nosy: kieleth
priority: normal
severity: normal
status: open
title: Multiprocessing Pool.map pickles arguments passed to workers
type: behavior
versions: Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23979
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23979] Multiprocessing Pool.map pickles arguments passed to workers

2015-04-16 Thread Josh Rosenberg

Josh Rosenberg added the comment:

The nature of a Pool precludes assumptions about the availability of specific 
objects in a forked worker process (particularly now that there are alternate 
methods of forking processes). Since the workers are spun up when the pool is 
created, objects created or modified after that point would have to be 
serialized by some mechanism anyway.

The Pool class doesn't describe this explicitly, there are multiple references 
to this behavior (e.g. 
https://docs.python.org/3/library/multiprocessing.html#all-start-methods 
mentions inheriting as being more efficient than pickling/unpickling; you have 
to develop with inheritance in mind though; pickling, particular for task 
dispatch approaches in the Futures model, can't be generalized as an 
inheritance problem when using producer/consumer based worker model).

Point is, this is an expected behavior. You need some means of transferring 
objects between processes, and pickling is the Python standard serialization 
method. The inability to serialize a 4+ GB bytes object is a problem I assume 
(don't know if a bug exists for that), but pickling as the mechanism is the 
only obvious way to do it. If you want to avoid inheritance, it's up to you to 
ensure the root process has created the necessary bytes object prior to 
creating the Pool, and conveying information to the worker about how to find it 
(say, a dict of int keys to your bytes object data) in its own memory.

--
nosy: +josh.r

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23979
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com