I meant, maybe on your cpu the matrix coprocessor is bad at double floats but the cpu is fine; and on my cpu they fixed the matrix coprocessor to handle double floats well. That would explain why I saw a big difference with using it and you didn't.

On Thu, 31 Mar 2022, bill lam wrote:

Cpu cores in my MacBook air m1 is 4+4 while that in your m1 pro is 8+2.
Also I run in low power mode.


On Thu, Mar 31, 2022, 4:49 AM Elijah Stone <[email protected]> wrote:

I wonder if the older hardware had bad double-prec performance?  If you
send me binaries I can compare them.

On Thu, 31 Mar 2022, bill lam wrote:

> On my macbook air m1. time taken is about 0.3 sec.
> Applying your patch and then linked with the framework didn't show any
> improvement.
>
> If you want optimized gemm performance, you need to compile with
> USE_OPENMP=1
>
>
>
> On Wed, Mar 30, 2022 at 4:09 PM Elijah Stone <[email protected]>
wrote:
>
>> Recent apple arm CPUs include a hardware coprocessor for matrix
>> multiplication.  This is nominally not directly accessible to user code
>> (though it has been partly reverse engineered), but must be accessed
>> through apple's blas implementation.  Attached trivial patch makes j use
>> this rather than its own routines for large matrix multiplication on
>> darwin/arm.  Performance delta is quite good.  Before:
>>
>>     a=. ?1e3 2e3$0
>>     b=. ?2e3 3e3$0
>>     100 timex 'a +/ . * b'
>> 0.103497
>>
>> after:
>>
>>     100 timex 'a +/ . * b'
>> 0.0274741
>>     0.103497%0.0274741
>> 3.76708
>>
>> Nearly 4x faster!
>>
>> There seems to be a warmup period (big buffers go brrr...), so the gemm
>> threshold should perhaps be tuned.  I did not take detailed
measurements.
>>
>> (Fine print: benchmarks taken on a 14in macbook w/m1pro.)
>>
>> Also of note: on desktop (zen2), numpy is 3x faster than j.  I tried
>> swapping out j's mm microkernel for the newest from blis, and got only a
>> modest boost, so the problem is not there.  I think numpy is using
>> openblas.  (On arm, j and numpy are reasonably close, and the hardware
>> accelerator smokes both.)
>>
>>   -E
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to