Certainly simd in avx helps, some more tests, NB. non-avx j806 28.124 2.68438e8 1.12953 5.36873e8 1
NB. non-avx j806 (using sse2) 21.5087 2.68438e8 1.13762 5.36873e8 1 NB. avx j806 19.4048 2.68438e8 1.13984 5.36873e8 1 The non-avx j806 runs 4x faster than j602. The advantage of avx over sse2 is only about 10%. Пт, 21 апр 2017, Henry Rich написал(а): > Actually both AVX and cache are important: cache ordering is necessary to > read the input faster, then AVX instructions to process them. It keeps a > single CPU pretty busy, so BLAS must be doing something special. > > Henry Rich > > On 4/21/2017 12:10 PM, bill lam wrote: > > Improvement coming from avx is not that important, the major improvement in > > inner product is the algorithm had been tuned to keep cache hot. Using sse2 > > can also achieve similar improvement factor when using the new codes. > > > > blas is super optimized, it might have running in multiple threads, so that > > on my 4 cores cpu, it runs 4x faster, still not sure from where the rest > > 5x improvement comes. offloading to gpu in i7? > > > > > > On 21 Apr, 2017 11:37 pm, "Xiao-Yong Jin" <jinxiaoy...@gmail.com> wrote: > > > > Thanks. It's interesting to see how far the avx has come along. > > And the |: takes more than 20% of the time? I guess this is something that > > could be improved. > > > > > On Apr 21, 2017, at 7:42 AM, bill lam <bbill....@gmail.com> wrote: > > > > > > Opp, output from dgemm should be transposed to row major. > > > > > > dgemm=: 'liblapack.so.3 dgemm_ > n *c *c *i *i *i *d *d *i *d *i *d *d > > *i'&cd > > > mm=: 4 : 0 > > > k=. ,{.$x > > > c=. (k,k)$1.5-1.5 > > > dgemm (,'T');(,'T');k;k;k;(,2.5-1.5);x;k;y;k;(,1.5-1.5);c;k > > > |:c > > > ) > > > > > > 'A B'=:0?@$~2,,~4096 > > > echo timespacex'c1=: A+/ .*B' > > > echo timespacex'c2=: A mm B' > > > echo c1-:c2 > > > > > > NB. avx > > > load'dgemm.ijs' > > > 19.4683 2.68438e8 > > > 1.11488 5.36873e8 > > > 1 > > > > > > NB. j602 > > > 167.99789 2.684384e8 > > > 1.224063 5.369056e8 > > > 1 > > > > > > j806 version is already quite good. > > > > > > Пт, 21 апр 2017, bill lam написал(а): > > > > I tested with J calling lapack for matrix multiplication with the > > > > following script, > > > > > > > > NB. extern dgemm_(char * transa, char * transb, int * m, int * n, int * > > k, > > > > NB. double * alpha, double * A, int * lda, > > > > NB. double * B, int * ldb, double * beta, > > > > NB. double * C, int * ldc); > > > > > > > > dgemm=: 'liblapack.so.3 dgemm_ > n *c *c *i *i *i *d *d *i *d *i *d *d > > *i'&cd > > > > mm=: 4 : 0 > > > > k=. ,{.$x > > > > c=. (k,k)$1.5-1.5 > > > > dgemm (,'T');(,'T');k;k;k;(,2.5-1.5);x;k;y;k;(,1.5-1.5);c;k > > > > c > > > > ) > > > > > > > > 'A B'=:0?@$~2,,~4096 > > > > echo timespacex'A+/ .*B' > > > > echo timespacex'A mm B' > > > > > > > > result was, > > > > 19.3608 2.68437e8 > > > > 0.886447 2.68442e8 > > > > > > > > Note it need to use an optimized version of blas, not the > > > > reference blas. > > > > > > > > Apparently the blas used in julia is sub-optimal. > > > > > > > > Вт, 18 апр 2017, bill lam написал(а): > > > > > I think julia just calls blas. > > > > > > > > > > Пн, 17 апр 2017, Xiao-Yong Jin написал(а): > > > > > > > On Apr 17, 2017, at 9:26 PM, Henry Rich <henryhr...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > If you have an implementation of +/ . * on double-precision floats > > that's faster than J 8.06, I would be obliged if you'd send me a copy of > > the source code. > > > > > > I'm sure your code is much faster than naive c loops. But some how > > the matrix-matrix multiplication is much slower (10x) than that in julia > > (tested with a 3-year old version). > > > > > > % julia > > > > > > _ > > > > > > _ _ _(_)_ | A fresh approach to technical computing > > > > > > (_) | (_) (_) | Documentation: http://docs.julialang.org > > > > > > _ _ _| |_ __ _ | Type "help()" to list help topics > > > > > > | | | | | | |/ _` | | > > > > > > | | |_| | | | (_| | | Version 0.2.1 (2014-02-11 06:30 UTC) > > > > > > _/ |\__'_|_|_|\__'_| | > > > > > > |__/ | x86_64-linux-gnu > > > > > > > > > > > > julia> A=rand(4096,4096); B=rand(4096,4096); > > > > > > > > > > > > julia> @time A*B; > > > > > > elapsed time: 2.260157127 seconds (149184640 bytes allocated) > > > > > > > > > > > > julia> > > > > > > % jconsole > > > > > > JVERSION > > > > > > Engine: j806/j64avx/linux > > > > > > Beta-3: commercial/2017-04-10T17:51:14 > > > > > > Library: 8.06.02 > > > > > > Platform: Linux 64 > > > > > > Installer: J806 install > > > > > > InstallPath: /nfs2/xjin/pkgs/j64-806 > > > > > > Contact: www.jsoftware.com > > > > > > 'A B'=:0?@$~2,,~4096 > > > > > > timespacex'A+/ .*B' > > > > > > 23.8976 2.68437e8 > > > > > > timespacex'A+/ .*B' > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > For information about J forums see > > > > > > http://www.jsoftware.com/forums.htm > > > > > -- > > > > > regards, > > > > > ==================================================== > > > > > GPG key 1024D/4434BAB3 2008-08-24 > > > > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 > > > > > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 > > > > -- > > > > regards, > > > > ==================================================== > > > > GPG key 1024D/4434BAB3 2008-08-24 > > > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 > > > > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 > > > -- > > > regards, > > > ==================================================== > > > GPG key 1024D/4434BAB3 2008-08-24 > > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 > > > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 > > > ---------------------------------------------------------------------- > > > For information about J forums see http://www.jsoftware.com/forums.htm > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > > > --- > This email has been checked for viruses by AVG. > http://www.avg.com > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm -- regards, ==================================================== GPG key 1024D/4434BAB3 2008-08-24 gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm