I did some work on this some years ago. I have more or less concluded that
it was a waste of effort. But first let me explain what the suggested
approach do not work. As it uses memory mapping to create shared memory
(i.e. shared segments are not named), they must be created ahead of
spawning processes. But if you really want this to work smoothly, you want
named shared memory (Sys V IPC or posix shm_open), so that shared arrays
can be created in the spawned processes and passed back.

Now for the reason I don't care about shared memory arrays anymore, and
what I am currently working on instead:

1. I have come across very few cases where threaded code cannot be used in
numerical computing. In fact, multithreading nearly always happens in the
code where I write pure C or Fortran anyway. Most often it happens in
library code that are already multithreaded (Intel MKL, Apple Accelerate
Framework, OpenBLAS, etc.), which means using it requires no extra effort
from my side. A multithreaded LAPACK library is not less multithreaded if I
call it from Python.

2. Getting shared memory right can be difficult because of hierarchical
memory and false sharing. You might not see it if you only have a multicore
CPU with a shared cache. But your code might not scale up on computers with
more than one physical processor. False sharing acts like the GIL, except
it happens in hardware and affects your C code invisibly without any
explicit locking you can pinpoint. This is also why MPI code tends to scale
much better than OpenMP code. If nothing is shared there will be no false
sharing.

3. Raw C level IPC is cheap – very, very cheap. Even if you use pipes or
sockets instead of shared memory it is cheap. There are very few cases
where the IPC tends to be a bottleneck. 

4. The reason IPC appears expensive with NumPy is because multiprocessing
pickles the arrays. It is pickle that is slow, not the IPC. Some would say
that the pickle overhead is an integral part of the IPC ovearhead, but i
will argue that it is not. The slowness of pickle is a separate problem
alltogether.

5. Share memory does not improve on the pickle overhead because also NumPy
arrays with shared memory must be pickled. Multiprocessing can bypass
pickling the RawArray object, but the rest of the NumPy array is pickled.
Using shared memory arrays have no speed advantage over normal NumPy arrays
when we use multiprocessing.

6. It is much easier to write concurrent code that uses queues for message
passing than anything else. That is why using a Queue object has been the
popular Pythonic approach to both multitreading and multiprocessing. I
would like this to continue.

I am therefore focusing my effort on the multiprocessing.Queue object. If
you understand the six points I listed you will see where this is going:
What we really need is a specialized queue that has knowledge about NumPy
arrays and can bypass pickle. I am therefore focusing my efforts on
creating a NumPy aware queue object.

We are not doing the users a favor by encouraging the use of shared memory
arrays. They help with nothing.


Sturla Molden



Matěj  Týč <matej....@gmail.com> wrote:
> Dear Numpy developers,
> I propose a pull request https://github.com/numpy/numpy/pull/7533 that
> features numpy arrays that can be shared among processes (with some
> effort).
> 
> Why:
> In CPython, multiprocessing is the only way of how to exploit
> multi-core CPUs if your parallel code can't avoid creating Python
> objects. In that case, CPython's GIL makes threads unusable. However,
> unlike with threading, sharing data among processes is something that
> is non-trivial and platform-dependent.
> 
> Although numpy (and certainly some other packages) implement some
> operations in a way that GIL is not a concern, consider another case:
> You have a large amount of data in a form of a numpy array and you
> want to pass it to a function of an arbitrary Python module that also
> expects numpy array (e.g. list of vertices coordinates as an input and
> array of the corresponding polygon as an output). Here, it is clear
> GIL is an issue you and since you want a numpy array on both ends, now
> you would have to copy your numpy array to a multiprocessing.Array (to
> pass the data) and then to convert it back to ndarray in the worker
> process.
> This contribution would streamline it a bit - you would create an
> array as you are used to, pass it to the subprocess as you would do
> with the multiprocessing.Array, and the process can work with a numpy
> array right away.
> 
> How:
> The idea is to create a numpy array in a buffer that can be shared
> among processes. Python has support for this in its standard library,
> so the current solution creates a multiprocessing.Array and then
> passes it as the "buffer" to the ndarray.__new__. That would be it on
> Unixes, but on Windows, there has to be a a custom pickle method,
> otherwise the array "forgets" that its buffer is that special and the
> sharing doesn't work.
> 
> Some of what has been said in the pull request & my answer to that:
> 
> * ... I do see some value in providing a canonical right way to
> construct shared memory arrays in NumPy, but I'm not very happy with
> this solution, ... terrible code organization (with the global
> variables):
> * I understand that, however this is a pattern of Python
> multiprocessing and everybody who wants to use the Pool and shared
> data either is familiar with this approach or has to become familiar
> with[2, 3]. The good compromise is to have a separate module for each
> parallel calculation, so global variables are not a problem.
> 
> * Can you explain why the ndarray subclass is needed? Subclasses can
> be rather annoying to get right, and also for other reasons.
> * The shmarray class needs the custom pickler (but only on Windows).
> 
> * If there's some way to we can paper over the boilerplate such that
> users can use it without understanding the arcana of multiprocessing,
> then yes, that would be great. But otherwise I'm not sure there's
> anything to be gained by putting it in a library rather than referring
> users to the examples on StackOverflow [1] [2].
> * What about telling users: "You can use numpy with multiprocessing.
> Remeber the multiprocessing.Value and multiprocessing.Aray classes?
> numpy.shm works exactly the same way, which means that it shares their
> limitations. Refer to an example: <link to numpy doc>." Notice that
> although those SO links contain all of the information, it is very
> difficult to get it up and running for a newcomer like me few years
> ago.
> 
> * This needs tests and justification for custom pickling methods,
> which are not used in any of the current examples. ...
> * I am sorry, but don't fully understand that point. The custom
> pickling method of shmarray has to be there on Windows, but users
> don't have to know about it at all. As noted earlier, the global
> variable is the only way of using standard Python multiprocessing.Pool
> with shared objects.
> 
> [1]:
> http://stackoverflow.com/questions/10721915/shared-memory-objects-in-python-multiprocessing
> [2]:
> http://stackoverflow.com/questions/7894791/use-numpy-array-in-shared-memory-for-multiprocessing
> [3]:
> http://stackoverflow.com/questions/1675766/how-to-combine-pool-map-with-array-shared-memory-in-python-multiprocessing

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to