Hello!

I recently got a new MacBook Pro with an M2 Pro CPU (ARM64). When I ran some 
numerical computations (ICA to be precise), I was surprised how slow it was - 
way slower than e.g. my almost 10 year old Intel Mac. It turns out that the 
default OpenBLAS, which is what you get when installing the binary wheel with 
pip (i.e. "pip install numpy"), is the reason why computations are so slow.

When installing NumPy from source (by using "pip install --no-binary :all: 
--no-use-pep517 numpy"), it uses the Apple-provided Accelerate framework, which 
includes an optimized BLAS library. The difference is mind-boggling, I'd even 
say that NumPy is pretty much unusable with the default OpenBLAS backend (at 
least for the applications I tested).

In my test with four different ICA algorithms, I got these runtimes with the 
default OpenBLAS:

- FastICA: 6.3s
- Picard: 26.3s
- Infomax: 0.8s
- Extended Infomax: 1.4s

Especially the second algorithm is way slower than on my 10 year old Intel Mac 
using OpenBLAS.

Here are the times with Accelerate:

- FastICA: 0.4s
- Picard: 0.6s
- Infomax: 1.0s
- Extended Infomax: 1.3s

Given this huge performance difference, my question is if you would consider 
distributing a binary wheel for ARM64-based Macs which links to Accelerate. Or 
are there any caveats why you do not want to do that? I know that NumPy moved 
away from Accelerate years ago on Intel Macs, but maybe now is the time to 
reconsider this decision.

Thanks!

Clemens
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to