Re: [numpy @ bigsur: multithreading]

2022-01-03 Thread Chris Jones


> On 3 Jan 2022, at 5:18 pm, Maxim Abalenkov  wrote:
> 
> Dear all,
> 
> Thank you for all of your replies and suggestions! I have written my own 
> matrix multiplication script in order to test NumPy’s performance. Please 
> find it attached. I’m using the MKL variant of NumPy. Strangely enough the 
> `port variants py39-numpy` still returns:
> 
> port variants py39-numpy
> py39-numpy has the variants:
>   atlas: Use MacPorts ATLAS Libraries
> * conflicts with mkl openblas
>   gcc10: Build using the MacPorts gcc 10 compiler
> * conflicts with gcc11 gcc8 gcc9 gccdevel gfortran gfortran
>   gcc11: Build using the MacPorts gcc 11 compiler
> * conflicts with gcc10 gcc8 gcc9 gccdevel gfortran gfortran
>   gcc8: Build using the MacPorts gcc 8 compiler
> * conflicts with gcc10 gcc11 gcc9 gccdevel gfortran gfortran
>   gcc9: Build using the MacPorts gcc 9 compiler
> * conflicts with gcc10 gcc11 gcc8 gccdevel gfortran gfortran
>   gccdevel: Build using the MacPorts gcc devel compiler
> * conflicts with gcc10 gcc11 gcc8 gcc9 gfortran gfortran
> [+]gfortran: Build using the MacPorts gcc 11 Fortran compiler
> * conflicts with gcc10 gcc11 gcc8 gcc9 gccdevel
>   mkl: Use MacPorts MKL Libraries
> * conflicts with atlas openblas
> [+]openblas: Use MacPorts OpenBLAS Libraries
> * conflicts with atlas mkl
>   universal: Build for multiple architectures
> 
> Either I don’t understand the expected behaviour or my `port variants` 
> command returns something else. I would expect it to show [+]gfortran and 
> [+]mkl, not the [+]openblas.

No. The + sign indicates which variants are enabled by default, not what you 
happened to be using yourself. For that the command you use below correctly 
shows this.

> On the other hand, command `port installed py39-numpy` shows:
> 
> port installed py39-numpy
> The following ports are currently installed:
>  py39-numpy @1.21.5_1+gfortran+mkl
>  py39-numpy @1.22.0_0+gfortran+mkl (active)
> 
> Finally, I wasn’t able to specify 8 execution threads with `export 
> MKL_NUM_THREADS=8`. NumPy was still using 4, but the `htop` reported 350–380% 
> CPU load for the `/usr/bin/env python3 ./dgemm_numpy.py` process. I think 
> this is good news!
> 
> The `otool` command executed under 
> `/opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/numpy/core`
>  shows that MKL backend is being used.
> 
> otool -L _multiarray_umath.cpy
> _multiarray_umath.cpython-39-darwin.so:
>
> /opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/libmkl_rt.2.dylib
>  (compatibility version 0.0.0, current version 0.0.0)
>/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 
> 1311.0.0)
> 
> I think I still need to experiment with OpenBLAS and compare the performance 
> numbers. Thank you for your help!
> 
> —
> Best wishes,
> Maxim
> 
#!/usr/bin/env python3

import numpy as np
import time

print(np.__version__)
print(np.show_config())

m = 2
k = 2
n = 2

t0 = time.time()
alpha = np.random.rand()
beta  = np.random.rand()

A  = np.random.rand(m, k)
B  = np.random.rand(k, n)
C  = np.random.rand(m, n)
t1 = time.time()

t = t1-t0
print('Generation time: {0:f}'.format(t))
print('  alpha: {0:f},  beta: {1:f}'.format(alpha, beta))

t0 = time.time()
C  = alpha*np.matmul(A, B) + beta*C
t1 = time.time()
t  = t1-t0
print('Multiplication time: {0:f}'.format(t))

## @eof dgemm_numpy.py
> 
> 
>>> On 29 Dec 2021, at 13:33, Joshua Root  wrote:
>>> 
>>> Maxim Abalenkov wrote:
>>> 
>>> 
>>> Dear all,
>>> 
>>> I’m looking for guidance please. I would like to make sure, that I use all 
>>> eight of my CPU cores, when I run Python’s 3.9.9 NumPy on my macOS BigSur 
>>> 12.1. When I run my NumPy code, I see in ‘htop’, that only one ‘python’ 
>>> process is running and the core utilisation is 20–25%. I remember in the 
>>> past, stock MacPorts NumPy installation would use Apple’s Accelerate 
>>> library including the multithreaded BLAS and LAPACK (
>>> https://developer.apple.com/documentation/accelerate
>>> ). As I understand this is no longer the case.
>>> 
>>> I run Python code using a virtual environment located under
>>> 
>>> /opt/venv/zipfstime/lib/python3.9/site-packages/numpy/core
>>> 
>>> When I change there and issue
>>> 
>>> otool -L _multiarray_umath.cpython-39-darwin.so
>>> 
>>> _multiarray_umath.cpython-39-darwin.so:
>>>@loader_path/../.dylibs/libopenblas.0.dylib (compatibility version 
>>> 0.0.0, current version 0.0.0)
>>>/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 
>>> 1281.100.1)
>>> 
>>> In other words, NumPy relies on openBLAS. Command `port variants openblas` 
>>> returns
>>> 
>>> OpenBLAS has the variants:
>>>  g95: Build using the g95 Fortran compiler
>>>* conflicts with gcc10 gcc11 gcc8 gcc9 gccdevel
>>>  gcc10: Build using the MacPorts gcc 10 compiler
>>>* conflicts with g95 g95 gcc11 gcc8 gcc9 gccdevel
>>> [+]gcc11: Build using the MacPorts gcc 11 compiler

Re: [numpy @ bigsur: multithreading]

2022-01-03 Thread Bill Cole
On 2022-01-03 at 12:18:23 UTC-0500 (Mon, 3 Jan 2022 19:18:23 +0200)
Maxim Abalenkov 
is rumored to have said:

> Either I don’t understand the expected behaviour or my `port variants` 
> command returns something else. I would expect it to show [+]gfortran and 
> [+]mkl, not the [+]openblas.

As documented, the 'port variants' command returns the AVAILABLE variants, with 
the '[+]' added to the DEFAULT variants.


-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: [numpy @ bigsur: multithreading]

2022-01-03 Thread Maxim Abalenkov
Dear all,

Thank you for all of your replies and suggestions! I have written my own matrix 
multiplication script in order to test NumPy’s performance. Please find it 
attached. I’m using the MKL variant of NumPy. Strangely enough the `port 
variants py39-numpy` still returns:

port variants py39-numpy
py39-numpy has the variants:
   atlas: Use MacPorts ATLAS Libraries
 * conflicts with mkl openblas
   gcc10: Build using the MacPorts gcc 10 compiler
 * conflicts with gcc11 gcc8 gcc9 gccdevel gfortran gfortran
   gcc11: Build using the MacPorts gcc 11 compiler
 * conflicts with gcc10 gcc8 gcc9 gccdevel gfortran gfortran
   gcc8: Build using the MacPorts gcc 8 compiler
 * conflicts with gcc10 gcc11 gcc9 gccdevel gfortran gfortran
   gcc9: Build using the MacPorts gcc 9 compiler
 * conflicts with gcc10 gcc11 gcc8 gccdevel gfortran gfortran
   gccdevel: Build using the MacPorts gcc devel compiler
 * conflicts with gcc10 gcc11 gcc8 gcc9 gfortran gfortran
[+]gfortran: Build using the MacPorts gcc 11 Fortran compiler
 * conflicts with gcc10 gcc11 gcc8 gcc9 gccdevel
   mkl: Use MacPorts MKL Libraries
 * conflicts with atlas openblas
[+]openblas: Use MacPorts OpenBLAS Libraries
 * conflicts with atlas mkl
   universal: Build for multiple architectures

Either I don’t understand the expected behaviour or my `port variants` command 
returns something else. I would expect it to show [+]gfortran and [+]mkl, not 
the [+]openblas. On the other hand, command `port installed py39-numpy` shows:

port installed py39-numpy
The following ports are currently installed:
  py39-numpy @1.21.5_1+gfortran+mkl
  py39-numpy @1.22.0_0+gfortran+mkl (active)

Finally, I wasn’t able to specify 8 execution threads with `export 
MKL_NUM_THREADS=8`. NumPy was still using 4, but the `htop` reported 350–380% 
CPU load for the `/usr/bin/env python3 ./dgemm_numpy.py` process. I think this 
is good news!

The `otool` command executed under 
`/opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/numpy/core`
 shows that MKL backend is being used.

otool -L _multiarray_umath.cpy
_multiarray_umath.cpython-39-darwin.so:

/opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/libmkl_rt.2.dylib
 (compatibility version 0.0.0, current version 0.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
version 1311.0.0)

I think I still need to experiment with OpenBLAS and compare the performance 
numbers. Thank you for your help!

—
Best wishes,
Maxim

#!/usr/bin/env python3

import numpy as np
import time

print(np.__version__)
print(np.show_config())

m = 2
k = 2
n = 2

t0 = time.time()
alpha = np.random.rand()
beta  = np.random.rand()

A  = np.random.rand(m, k)
B  = np.random.rand(k, n)
C  = np.random.rand(m, n)
t1 = time.time()

t = t1-t0
print('Generation time: {0:f}'.format(t))
print('  alpha: {0:f},  beta: {1:f}'.format(alpha, beta))

t0 = time.time()
C  = alpha*np.matmul(A, B) + beta*C
t1 = time.time()
t  = t1-t0
print('Multiplication time: {0:f}'.format(t))

## @eof dgemm_numpy.py


> On 29 Dec 2021, at 13:33, Joshua Root  wrote:
> 
> Maxim Abalenkov wrote:
> 
> 
>> Dear all,
>> 
>> I’m looking for guidance please. I would like to make sure, that I use all 
>> eight of my CPU cores, when I run Python’s 3.9.9 NumPy on my macOS BigSur 
>> 12.1. When I run my NumPy code, I see in ‘htop’, that only one ‘python’ 
>> process is running and the core utilisation is 20–25%. I remember in the 
>> past, stock MacPorts NumPy installation would use Apple’s Accelerate library 
>> including the multithreaded BLAS and LAPACK (
>> https://developer.apple.com/documentation/accelerate
>> ). As I understand this is no longer the case.
>> 
>> I run Python code using a virtual environment located under
>> 
>>  /opt/venv/zipfstime/lib/python3.9/site-packages/numpy/core
>> 
>> When I change there and issue
>> 
>>  otool -L _multiarray_umath.cpython-39-darwin.so
>> 
>> _multiarray_umath.cpython-39-darwin.so:
>>  @loader_path/../.dylibs/libopenblas.0.dylib (compatibility version 
>> 0.0.0, current version 0.0.0)
>>  /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
>> version 1281.100.1)
>> 
>> In other words, NumPy relies on openBLAS. Command `port variants openblas` 
>> returns
>> 
>> OpenBLAS has the variants:
>>   g95: Build using the g95 Fortran compiler
>> * conflicts with gcc10 gcc11 gcc8 gcc9 gccdevel
>>   gcc10: Build using the MacPorts gcc 10 compiler
>> * conflicts with g95 g95 gcc11 gcc8 gcc9 gccdevel
>> [+]gcc11: Build using the MacPorts gcc 11 compiler
>> * conflicts with g95 g95 gcc10 gcc8 gcc9 gccdevel
>>   gcc8: Build using the MacPorts gcc 8 compiler
>> * conflicts with g95 g95 gcc10 gcc11 gcc9 gccdevel
>>   gcc9: Build using the MacPorts gcc 9 compiler
>> * conflicts with g95 g95 gcc10 gcc11 gcc8 gccdevel
>>   gccdevel: Build using the MacPorts gcc devel compiler
>> * conflicts with g95 g

Re: [numpy @ bigsur: multithreading]

2021-12-29 Thread Joshua Root

Maxim Abalenkov wrote:


Dear all,

I’m looking for guidance please. I would like to make sure, that I use all 
eight of my CPU cores, when I run Python’s 3.9.9 NumPy on my macOS BigSur 12.1. 
When I run my NumPy code, I see in ‘htop’, that only one ‘python’ process is 
running and the core utilisation is 20–25%. I remember in the past, stock 
MacPorts NumPy installation would use Apple’s Accelerate library including the 
multithreaded BLAS and LAPACK 
(https://developer.apple.com/documentation/accelerate). As I understand this is 
no longer the case.

I run Python code using a virtual environment located under

  /opt/venv/zipfstime/lib/python3.9/site-packages/numpy/core

When I change there and issue

  otool -L _multiarray_umath.cpython-39-darwin.so

_multiarray_umath.cpython-39-darwin.so:
@loader_path/../.dylibs/libopenblas.0.dylib (compatibility version 
0.0.0, current version 0.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
version 1281.100.1)

In other words, NumPy relies on openBLAS. Command `port variants openblas` 
returns

OpenBLAS has the variants:
   g95: Build using the g95 Fortran compiler
 * conflicts with gcc10 gcc11 gcc8 gcc9 gccdevel
   gcc10: Build using the MacPorts gcc 10 compiler
 * conflicts with g95 g95 gcc11 gcc8 gcc9 gccdevel
[+]gcc11: Build using the MacPorts gcc 11 compiler
 * conflicts with g95 g95 gcc10 gcc8 gcc9 gccdevel
   gcc8: Build using the MacPorts gcc 8 compiler
 * conflicts with g95 g95 gcc10 gcc11 gcc9 gccdevel
   gcc9: Build using the MacPorts gcc 9 compiler
 * conflicts with g95 g95 gcc10 gcc11 gcc8 gccdevel
   gccdevel: Build using the MacPorts gcc devel compiler
 * conflicts with g95 g95 gcc10 gcc11 gcc8 gcc9
[+]lapack: Add Lapack/CLapack support to the library
   native: Force compilation on machine to get fully optimized library
   universal: Build for multiple architectures

I tried installing the “native” variant of OpenBLAS port with `sudo port 
install openblas +native` and setting the environment variable 
`OMP_NUM_THREADS=8`, but I didn’t see any improvement when running my Python 
code. I would welcome your help and guidance on this subject.

I'm using py39-numpy with default variants:

% port installed py39-numpy openblas
The following ports are currently installed:
  OpenBLAS @0.3.19_0+gcc11+lapack (active)
  py39-numpy @1.21.5_1+gfortran+openblas (active)

I see Python using around 600% CPU on my 6-core machine when running 
this basic benchmark script: 



If you try that and see how many cores it uses, that will at least tell 
you if there is something different about your code. If it doesn't use 
all the cores for you, there are some other environment variables that 
OpenBLAS looks at that you could check: 



- Josh


[numpy @ bigsur: multithreading]

2021-12-28 Thread Maxim Abalenkov
Dear all,

I’m looking for guidance please. I would like to make sure, that I use all 
eight of my CPU cores, when I run Python’s 3.9.9 NumPy on my macOS BigSur 12.1. 
When I run my NumPy code, I see in ‘htop’, that only one ‘python’ process is 
running and the core utilisation is 20–25%. I remember in the past, stock 
MacPorts NumPy installation would use Apple’s Accelerate library including the 
multithreaded BLAS and LAPACK 
(https://developer.apple.com/documentation/accelerate). As I understand this is 
no longer the case.

I run Python code using a virtual environment located under

 /opt/venv/zipfstime/lib/python3.9/site-packages/numpy/core

When I change there and issue

 otool -L _multiarray_umath.cpython-39-darwin.so

_multiarray_umath.cpython-39-darwin.so:
@loader_path/../.dylibs/libopenblas.0.dylib (compatibility version 
0.0.0, current version 0.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
version 1281.100.1)

In other words, NumPy relies on openBLAS. Command `port variants openblas` 
returns

OpenBLAS has the variants:
  g95: Build using the g95 Fortran compiler
* conflicts with gcc10 gcc11 gcc8 gcc9 gccdevel
  gcc10: Build using the MacPorts gcc 10 compiler
* conflicts with g95 g95 gcc11 gcc8 gcc9 gccdevel
[+]gcc11: Build using the MacPorts gcc 11 compiler
* conflicts with g95 g95 gcc10 gcc8 gcc9 gccdevel
  gcc8: Build using the MacPorts gcc 8 compiler
* conflicts with g95 g95 gcc10 gcc11 gcc9 gccdevel
  gcc9: Build using the MacPorts gcc 9 compiler
* conflicts with g95 g95 gcc10 gcc11 gcc8 gccdevel
  gccdevel: Build using the MacPorts gcc devel compiler
* conflicts with g95 g95 gcc10 gcc11 gcc8 gcc9
[+]lapack: Add Lapack/CLapack support to the library
  native: Force compilation on machine to get fully optimized library
  universal: Build for multiple architectures

I tried installing the “native” variant of OpenBLAS port with `sudo port 
install openblas +native` and setting the environment variable 
`OMP_NUM_THREADS=8`, but I didn’t see any improvement when running my Python 
code. I would welcome your help and guidance on this subject.

—
Best wishes,
Maxim