On 30.09.2015 19:20, Nathaniel Smith wrote:
The challenges to providing transparent multithreading in numpy
generally are:
- gcc + OpenMP on linux still breaks multiprocessing. There's a patch
to fix this but they still haven't applied it; alternatively there's a
workaround you can use in multiprocessing (not using fork mode), but
this requires every user update their code and the workaround has
other limitations. We're unlikely to use OpenMP while this is the case.
Ah, I didn't know this. Thanks.
- parallel code in general is not very composable. If someone is
calling a numpy operation from one thread, great, transparently using
multiple threads internally is a win. If they're exploiting some
higher-level structure in their problem to break it into pieces and
process each in parallel, and then using numpy on each piece, then
numpy spawning threads internally will probably destroy performance.
And numpy is too low-level to know which case it's in. This problem
exists to some extent already with multi-threaded BLAS, so people use
various BLAS-specific knobs to manage it in ad hoc ways, but this
doesn't scale.
Very good point. I've had both kinds of use cases myself.
It would be nice if there was some way to tell NumPy to either use
additional threads or not, but that adds complexity. It's also not a
good solution, considering that any higher-level code building on NumPy,
if it is designed to be at all reusable, may find *itself* in either
role. Only the code that, at any particular point of time in the
development of a software project, happens to form the top level at that
time, has the required context...
Then again, the matter is further complicated by considering codes that
run on a single machine, versus codes that run on a cluster. Threads
being local to each node in a cluster, it may make sense in a solver
targeted for a cluster to split the problem at the process level,
distribute the processes across the network, and use the threading
capability to accelerate computation on each node.
A complex issue with probably no easy solutions :)
-J
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion