Good segue into a couple of questions that have been on my mind. a) Any thoughts comments on a current Beta release for Raspbian?
b) Are there strong reasons to move to a 64 bit OS on the Raspberry Pi? - joey > On 2021May 24, at 13:54, bill lam <bbill....@gmail.com> wrote: > > Matrix multiplication on arm64 android should already be fully optimized, > including > Blas routine with arm64 asimd kernel > Openmp multithreading > > Optimized on desktop too, J runs as fast as other multithreaded optimized > blas lapack such as openblas. > > > > > > On Mon, May 24, 2021, 3:53 PM Ric Sherlock <tikk...@gmail.com> wrote: > >> Just to provide some context to Henry's statement that things have changed >> a bit since J8.05, below are the timings I get on my phone (Pixel 4a) using >> J902. >> >> ,.f"0]2^>:i.13 >> 0.024127 >> 1e_5 >> 2e_6 >> 3e_6 >> 3.4e_5 >> 0.000909 >> 0.000425 >> 0.012697 >> 0.020461 >> 0.139175 >> 1.00075 >> 6.6658 >> 56.7179 >> >> >> >> On Mon, 24 May 2021, 15:00 Henry Rich, <henryhr...@gmail.com> wrote: >> >>> J8.05 is very out-of-date for +/ . * . Since then I have rewritten the >>> JE code a couple of times: the current version is pretty fast and has >>> special code depending on matrix sizes. >>> >>> If you are doing performance measurement you need to get an up-to-date >>> J. Many primitives and combinations run 5-10x faster than they did in >>> 8.05. >>> >>> Henry Rich >>> >>> On 5/23/2021 10:32 PM, Imre Patyi wrote: >>>> Dear Programming in J, >>>> >>>> I made another test of numerical calculation in J, >>>> this time looking at multiplying two matrices using >>>> (+/ .*) and here is what I have found. It seems to >>>> me that J with (+/ .*) has acceptable speed only for >>>> matrices of order about 128 or below, after which order it >>>> quickly falls behind other standard numerical software such >>>> as python with numpy, and Octave. I also wrote a naive C >>>> program for matrix multiplication; for orders 256, 1024, >>>> ..., 8192 J tracks as 2 to 4 faster than the naive C program >>>> (which does not do SIMD or mind caching much). >>>> >>>> Numpy and Octave are able to use multiple threads and/or cores >>>> just by calling ordinary 'matmul', and they are about 7 to >>>> 25 times as fast as J in my experiment. As a primitive in J >>>> the command (+/ .*) could be just as fast as in any competent >>>> numerical program available in C for matrix multiplication. >>>> Even if you do not want multithreading in J, it seems to >>>> me that (+/ .*) has roughly 1/4 or 1/8 the speed of what should >>>> be possible for a single threaded program. It seems especially >>>> troubling that it becomes just as slow as a plain vanilla >>>> naive C program for larger sizes of the matrices. I am not sure >>>> why J does not seem to use BLAS or LAPACK for matrix multiplication. >>>> >>>> Yours sincerely, >>>> Imre Patyi >>>> >>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> Here is the summary of timings. >>>> >>>> n time, C time, J time, python time, Octave (time, J)/(time, C) (time, >>>> J)/(time, python) (time, J)/(time, Octave) >>>> 256 0.0780 0.0073 0.0010 0.0007 0.0936 7.3047 9.8987 >>>> 512 0.2680 0.0671 0.0100 0.0050 0.2505 6.7137 13.4195 >>>> 1024 1.8400 0.7293 0.0479 0.0380 0.3964 15.2255 19.1919 >>>> 2048 14.0430 6.0432 0.2663 0.2851 0.4303 22.6938 21.1960 >>>> 4096 109.8290 54.4634 2.2739 2.1620 0.4959 23.9513 25.1917 >>>> 8192 874.8430 435.2600 17.1282 17.2197 0.4975 25.4120 25.2769 >>>> >>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> File: example-of-matmul.ijs >>>> >>>> f=: 3 : 0 >>>> N=.y >>>> a=.2 o. ((4 : '(1234*x)+(5678*y)')"0 0)/~ (i.N) >>>> NB.smoutput(i.5){(i.5){a >>>> NB.smoutput'' >>>> t=.timex'b=:a(+/ .*)a' >>>> NB.smoutput(i.5){(i.5){b >>>> NB.t;(60 60#:t) >>>> t >>>> ) >>>> >>>> NB. Sample run. >>>> NB. ,.f"0]2^>:i.13 >>>> NB. 0.0135541 >>>> NB. 3.5e_6 >>>> NB. 2.9e_6 >>>> NB. 4e_6 >>>> NB. 1.77e_5 >>>> NB. 0.0001052 >>>> NB. 0.0008633 >>>> NB. 0.0072972 >>>> NB. 0.0671373 >>>> NB. 0.729313 >>>> NB. 6.04315 >>>> NB. 54.4634 >>>> NB. 435.26 >>>> >>>> >>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> File: example-with-numpy.py >>>> >>>> import numpy, time >>>> def f(n): >>>> i=numpy.array(numpy.arange(n).reshape((1,n))) >>>> a=numpy.cos(numpy.array(1234*i+5678*i.T)) >>>> #print(a.shape) >>>> t0=time.time() >>>> b=numpy.matmul(a,a) >>>> return time.time()-t0 >>>> >>>> for i in range(1,1+13): >>>> print(f(2**i)) >>>> >>>> >>>> r""" Sample run. >>>> C:>py "example-with-numpy.py" >>>> 0.0020143985748291016 >>>> 0.0 >>>> 0.0 >>>> 0.0 >>>> 0.0 >>>> 0.0009746551513671875 >>>> 0.0 >>>> 0.0009989738464355469 >>>> 0.009999990463256836 >>>> 0.04790067672729492 >>>> 0.26629042625427246 >>>> 2.273921251296997 >>>> 17.128154277801514 >>>> """ >>>> >>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> File: The command I used in Octave. >>>> >>>>>> for n=2.^(1:13) ; i=(0:n-1) ; a=cos(1234*i'+5678*i) ; tic,b=a*a;toc, >>> end >>>> Elapsed time is 1.3113e-05 seconds. >>>> Elapsed time is 1.90735e-05 seconds. >>>> Elapsed time is 1.38283e-05 seconds. >>>> Elapsed time is 1.3113e-05 seconds. >>>> Elapsed time is 2.09808e-05 seconds. >>>> Elapsed time is 4.88758e-05 seconds. >>>> Elapsed time is 0.000244141 seconds. >>>> Elapsed time is 0.00073719 seconds. >>>> Elapsed time is 0.00500298 seconds. >>>> Elapsed time is 0.0380011 seconds. >>>> Elapsed time is 0.285108 seconds. >>>> Elapsed time is 2.16196 seconds. >>>> Elapsed time is 17.2197 seconds. >>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> File: example-of-naive-matmul.c >>>> >>>> #include <stdlib.h> >>>> #include <stdio.h> >>>> #include <math.h> >>>> >>>> int >>>> main(int argc, char **argv){ >>>> >>>> int N ; >>>> if(argc==0){ >>>> N=8192 ; >>>> } else { >>>> N=atoi(argv[1]) ; >>>> } >>>> >>>> double *a=(double*)calloc(N*N,sizeof(double)); >>>> double *aT=(double*)calloc(N*N,sizeof(double)); >>>> for(int i=0 ; i<N ; i++){ >>>> for(int j =0 ; j<N ; j++){ >>>> a[i+N*j]=aT[j+N*i]=cos(1234*i+5678*j) ; >>>> } >>>> } >>>> >>>> double *b=(double*)calloc(N*N,sizeof(double)); >>>> for(int i=0 ; i<N ; i++){ >>>> for(int j=0 ; j<N ; j++){ >>>> double bij=0.0 ; >>>> for(int k=0 ; k<N ; k++){ >>>> bij += aT[k+N*i]*a[k+N*j] ; >>>> } >>>> b[i+N*j]=bij ; >>>> } >>>> } >>>> printf("\n") ; >>>> /* >>>> for(int i=0 ; i<5 ; i++){ >>>> for(int j=0 ; j<5 ; j++){ >>>> printf("%f\t",a[i+N*j]) ; >>>> } >>>> printf("\n") ; >>>> } >>>> printf("\n") ; >>>> for(int i=0 ; i<5 ; i++){ >>>> for(int j=0 ; j<5 ; j++){ >>>> printf("%f\t",b[i+N*j]) ; >>>> } >>>> printf("\n") ; >>>> } >>>> */ >>>> } >>>> >>>> /* Sample run. >>>> $ cc -o example-of-naive-matmul{,.c} -O3 >>>> $ for i in {1..13}; do n=`echo 2^$i|bc`; echo $n ; time >>>> ./example-of-naive-matmul $n ; done >>>> 2 >>>> >>>> >>>> real 0m0.038s >>>> user 0m0.015s >>>> sys 0m0.000s >>>> 4 >>>> >>>> >>>> real 0m0.045s >>>> user 0m0.000s >>>> sys 0m0.030s >>>> 8 >>>> >>>> >>>> real 0m0.047s >>>> user 0m0.030s >>>> sys 0m0.000s >>>> 16 >>>> >>>> >>>> real 0m0.046s >>>> user 0m0.046s >>>> sys 0m0.015s >>>> 32 >>>> >>>> >>>> real 0m0.051s >>>> user 0m0.015s >>>> sys 0m0.000s >>>> 64 >>>> >>>> >>>> real 0m0.046s >>>> user 0m0.000s >>>> sys 0m0.030s >>>> 128 >>>> >>>> >>>> real 0m0.045s >>>> user 0m0.000s >>>> sys 0m0.046s >>>> 256 >>>> >>>> >>>> real 0m0.078s >>>> user 0m0.015s >>>> sys 0m0.030s >>>> 512 >>>> >>>> >>>> real 0m0.268s >>>> user 0m0.218s >>>> sys 0m0.030s >>>> 1024 >>>> >>>> >>>> real 0m1.840s >>>> user 0m1.811s >>>> sys 0m0.030s >>>> 2048 >>>> >>>> >>>> real 0m14.043s >>>> user 0m13.937s >>>> sys 0m0.062s >>>> 4096 >>>> >>>> >>>> real 1m49.829s >>>> user 1m49.578s >>>> sys 0m0.125s >>>> 8192 >>>> >>>> >>>> real 14m34.843s >>>> user 14m33.046s >>>> sys 0m0.874s >>>> >>>> */ >>>> >>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> I ran all of the above on a lower midrange laptop with Windows 10, >>>> i5, 8GB RAM, 2 cores, 4 threads; I used J805, Anaconda python 3.5, >>>> Octave 5.2.0. >>>> ---------------------------------------------------------------------- >>>> For information about J forums see http://www.jsoftware.com/forums.htm >>> >>> >>> -- >>> This email has been checked for viruses by AVG. >>> https://www.avg.com >>> >>> ---------------------------------------------------------------------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >>> >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm