Oftentimes, if one needs to share numpy arrays for multiprocessing, I would imagine that it is because the array is huge, right? So, the pickling approach would copy that array for each process, which defeats the purpose, right?
Ben Root On Wed, May 11, 2016 at 2:01 PM, Allan Haldane <allanhald...@gmail.com> wrote: > On 05/11/2016 04:29 AM, Sturla Molden wrote: > > 4. The reason IPC appears expensive with NumPy is because multiprocessing > > pickles the arrays. It is pickle that is slow, not the IPC. Some would > say > > that the pickle overhead is an integral part of the IPC ovearhead, but i > > will argue that it is not. The slowness of pickle is a separate problem > > alltogether. > > That's interesting. I've also used multiprocessing with numpy and didn't > realize that. Is this true in python3 too? > > In python2 it appears that multiprocessing uses pickle protocol 0 which > must cause a big slowdown (a factor of 100) relative to protocol 2, and > uses pickle instead of cPickle. > > a = np.arange(40*40) > > %timeit pickle.dumps(a) > 1000 loops, best of 3: 1.63 ms per loop > > %timeit cPickle.dumps(a) > 1000 loops, best of 3: 1.56 ms per loop > > %timeit cPickle.dumps(a, protocol=2) > 100000 loops, best of 3: 18.9 µs per loop > > Python 3 uses protocol 3 by default: > > %timeit pickle.dumps(a) > 10000 loops, best of 3: 20 µs per loop > > > > 5. Share memory does not improve on the pickle overhead because also > NumPy > > arrays with shared memory must be pickled. Multiprocessing can bypass > > pickling the RawArray object, but the rest of the NumPy array is pickled. > > Using shared memory arrays have no speed advantage over normal NumPy > arrays > > when we use multiprocessing. > > > > 6. It is much easier to write concurrent code that uses queues for > message > > passing than anything else. That is why using a Queue object has been the > > popular Pythonic approach to both multitreading and multiprocessing. I > > would like this to continue. > > > > I am therefore focusing my effort on the multiprocessing.Queue object. If > > you understand the six points I listed you will see where this is going: > > What we really need is a specialized queue that has knowledge about NumPy > > arrays and can bypass pickle. I am therefore focusing my efforts on > > creating a NumPy aware queue object. > > > > We are not doing the users a favor by encouraging the use of shared > memory > > arrays. They help with nothing. > > > > > > Sturla Molden > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion