On Apr 4, 2011, at 4:20 PM, John Ladasky wrote:

> I have been playing with multiprocessing for a while now, and I have
> some familiarity with Pool.  Apparently, arguments passed to a Pool
> subprocess must be able to be pickled.  

Hi John,
multiprocessing's use of pickle is not limited to Pool. For instance, objects 
put into a multiprocessing.Queue are also pickled, as are the args to a 
multiprocessing.Process. So if you're going to use multiprocessing, you're 
going to use pickle, and you need pickleable objects. 


> Pickling is still a pretty
> vague progress to me, but I can see that you have to write custom
> __reduce__ and __setstate__ methods for your objects.

Well, that's only if one's objects don't support pickle by default. A lot of 
classes do without any need for custom __reduce__ and __setstate__ methods. 
Since you're apparently not too familiar with pickle, I don't want you to get 
the false impression that it's a lot of trouble. I've used pickle a number of 
times and never had to write custom methods for it.



> Now, I don't know that I actually HAVE to pass my neural network and
> input data as copies -- they're both READ-ONLY objects for the
> duration of an evaluate function (which can go on for quite a while).
> So, I have also started to investigate shared-memory approaches.  I
> don't know how a shared-memory object is referenced by a subprocess
> yet, but presumably you pass a reference to the object, rather than
> the whole object.   Also, it appears that subprocesses also acquire a
> temporary lock over a shared memory object, and thus one process may
> well spend time waiting for another (individual CPU caches may
> sidestep this problem?) Anyway, an implementation of a shared-memory
> ndarray is here:

There's no standard shared memory implementation for Python. The mmap module is 
as close as you get. I wrote & support the posix_ipc and sysv_ipc modules which 
give you IPC primitives (shared memory and semaphores) in Python. They work 
well (IMHO) but they're *nix-only and much lower level than multiprocessing. If 
multiprocessing is like a kitchen well stocked with appliances, posix_ipc (and 
sysc_ipc) is like a box of sharp knives.

Note that mmap and my IPC modules don't expose Python objects. They expose raw 
bytes in memory. YOu're still going to have to jump through some hoops (...like 
pickle) to turn your Python objects into a bytestream and vice versa.


What might be easier than fooling around with boxes of sharp knives is to 
convert your ndarray objects to Python lists. Lists are pickle-friendly and 
easy to turn back into ndarray objects once they've crossed the pickle 
boundary. 


> When should one pickle and copy?  When to implement an object in
> shared memory?  Why is pickling apparently such a non-trivial process
> anyway?  And, given that multi-core CPU's are apparently here to stay,
> should it be so difficult to make use of them?

My answers to these questions:

1) Depends
2) In Python, almost never unless you're using a nice wrapper like shmarray.py
3) I don't think it's non-trivial =)
4) No, definitely not. Python will only get better at working with multiple 
cores/CPUs, but there's plenty of room for improvement on the status quo.

Hope this helps
Philip





-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to