Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

Sturla Molden Wed, 30 Sep 2015 15:05:58 -0700

On 30/09/15 11:27, Daπid wrote:

Is there a nice way to ship both versions? After all, most
implementations of BLAS and friends do spawn OpenMP threads, so I don't
think it would be outrageous to take advantage of it in more places;


Some do, others don't.

ACML uses OpenMP.

GotoBLAS uses OpenMP.

Intel MKL uses Intel TBB and OpenMP (both of them).

OpenBLAS will by default use an internal threadpool. It can beconfigured to use OpenMP instead.


ATLAS uses its own threadpool.

Apple Accelerate Framework uses a kernel thread-pool called the GrandCentral Dispatch (GCD). A library called libdispatch uses kqueue toaccess the GCD.




There are two principal problems with using OpenMP in NumPy:

One is that the GNU OpenMP threadpool is not fork-safe, and can breakmultiprocessing on some platforms (e.g. when using Python 2.7 on Linux).Anything that uses GCD has this nasty effect on Apple and FreeBSD aswell. Note that the problem is actually in multiprocessing. It is notpresent on Windows (there is no fork system call) and it is avoidableeven on Linux with Python 3.4 or later. Also the default builds of NumPyand SciPy on MacOSX uses GCD by default.

The other problem is that NumPy with its iterator objects and gufuncs isnot particularly fit for multithreading. There will be a lot ofcontention for the iterator object. Without a major redesign, one threadwould do useful work while most of the others would be busy-waiting on aspinlock and fighting for iterator access. Not to mention that theiterator object would be false shared between the processors, whichwould trash the performance if you have more than one CPU, even whencompared to using a single thread. This means that for multithreading tobe useful in NumPy, the loops will have to be redesigned so that thework sharing between the threads is taken care of ahead of creating theiterator, i.e. allowing us to use one iterator per thread. Each iteratorwould then iterate over a slice of the original array. This is ok, butwe cannot simply do this by adding OpenMP pragmas in the C code.

Given the two problems mentioned above, it would likely be better to usea fork-safe threadpool instead of OpenMP.



Sturla

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

Reply via email to