Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

Benjamin Root Wed, 11 May 2016 11:24:02 -0700

Oftentimes, if one needs to share numpy arrays for multiprocessing, I would
imagine that it is because the array is huge, right? So, the pickling
approach would copy that array for each process, which defeats the purpose,
right?


Ben Root

On Wed, May 11, 2016 at 2:01 PM, Allan Haldane <allanhald...@gmail.com>
wrote:

> On 05/11/2016 04:29 AM, Sturla Molden wrote:
> > 4. The reason IPC appears expensive with NumPy is because multiprocessing
> > pickles the arrays. It is pickle that is slow, not the IPC. Some would
> say
> > that the pickle overhead is an integral part of the IPC ovearhead, but i
> > will argue that it is not. The slowness of pickle is a separate problem
> > alltogether.
>
> That's interesting. I've also used multiprocessing with numpy and didn't
> realize that. Is this true in python3 too?
>
> In python2 it appears that multiprocessing uses pickle protocol 0 which
> must cause a big slowdown (a factor of 100) relative to protocol 2, and
> uses pickle instead of cPickle.
>
> a = np.arange(40*40)
>
> %timeit pickle.dumps(a)
> 1000 loops, best of 3: 1.63 ms per loop
>
> %timeit cPickle.dumps(a)
> 1000 loops, best of 3: 1.56 ms per loop
>
> %timeit cPickle.dumps(a, protocol=2)
> 100000 loops, best of 3: 18.9 µs per loop
>
> Python 3 uses protocol 3 by default:
>
> %timeit pickle.dumps(a)
> 10000 loops, best of 3: 20 µs per loop
>
>
> > 5. Share memory does not improve on the pickle overhead because also
> NumPy
> > arrays with shared memory must be pickled. Multiprocessing can bypass
> > pickling the RawArray object, but the rest of the NumPy array is pickled.
> > Using shared memory arrays have no speed advantage over normal NumPy
> arrays
> > when we use multiprocessing.
> >
> > 6. It is much easier to write concurrent code that uses queues for
> message
> > passing than anything else. That is why using a Queue object has been the
> > popular Pythonic approach to both multitreading and multiprocessing. I
> > would like this to continue.
> >
> > I am therefore focusing my effort on the multiprocessing.Queue object. If
> > you understand the six points I listed you will see where this is going:
> > What we really need is a specialized queue that has knowledge about NumPy
> > arrays and can bypass pickle. I am therefore focusing my efforts on
> > creating a NumPy aware queue object.
> >
> > We are not doing the users a favor by encouraging the use of shared
> memory
> > arrays. They help with nothing.
> >
> >
> > Sturla Molden
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

Reply via email to