Certainly simd in avx helps, some more tests,

NB. non-avx j806
28.124 2.68438e8
1.12953 5.36873e8
1

NB. non-avx j806 (using sse2)
21.5087 2.68438e8
1.13762 5.36873e8
1

NB. avx j806
19.4048 2.68438e8
1.13984 5.36873e8
1

The non-avx j806 runs 4x faster than j602.
The advantage of avx over sse2 is only about 10%.

Пт, 21 апр 2017, Henry Rich написал(а):
> Actually both AVX and cache are important: cache ordering is necessary to
> read the input faster, then AVX instructions to process them.  It keeps a
> single CPU pretty busy, so BLAS must be doing something special.
> 
> Henry Rich
> 
> On 4/21/2017 12:10 PM, bill lam wrote:
> > Improvement coming from avx is not that important, the major improvement in
> > inner product is the algorithm had been tuned to keep cache hot. Using sse2
> > can also achieve similar improvement factor when using the new codes.
> > 
> > blas is super optimized, it might have running in multiple threads, so that
> > on my  4 cores cpu, it runs 4x faster, still not sure from where the rest
> > 5x improvement comes. offloading to gpu in i7?
> > 
> > 
> > On 21 Apr, 2017 11:37 pm, "Xiao-Yong Jin" <jinxiaoy...@gmail.com> wrote:
> > 
> > Thanks.  It's interesting to see how far the avx has come along.
> > And the |: takes more than 20% of the time?  I guess this is something that
> > could be improved.
> > 
> > > On Apr 21, 2017, at 7:42 AM, bill lam <bbill....@gmail.com> wrote:
> > > 
> > > Opp, output from dgemm should be transposed to row major.
> > > 
> > > dgemm=: 'liblapack.so.3 dgemm_ > n *c *c *i *i *i *d *d *i *d *i *d *d
> > *i'&cd
> > > mm=: 4 : 0
> > > k=. ,{.$x
> > > c=. (k,k)$1.5-1.5
> > > dgemm (,'T');(,'T');k;k;k;(,2.5-1.5);x;k;y;k;(,1.5-1.5);c;k
> > > |:c
> > > )
> > > 
> > > 'A B'=:0?@$~2,,~4096
> > > echo timespacex'c1=: A+/ .*B'
> > > echo timespacex'c2=: A mm B'
> > > echo c1-:c2
> > > 
> > > NB. avx
> > >    load'dgemm.ijs'
> > > 19.4683 2.68438e8
> > > 1.11488 5.36873e8
> > > 1
> > > 
> > > NB. j602
> > > 167.99789 2.684384e8
> > > 1.224063 5.369056e8
> > > 1
> > > 
> > > j806 version is already quite good.
> > > 
> > > Пт, 21 апр 2017, bill lam написал(а):
> > > > I tested with J calling lapack for matrix multiplication with the
> > > > following script,
> > > > 
> > > > NB. extern dgemm_(char * transa, char * transb, int * m, int * n, int *
> > k,
> > > > NB.               double * alpha, double * A, int * lda,
> > > > NB.               double * B, int * ldb, double * beta,
> > > > NB.               double * C, int * ldc);
> > > > 
> > > > dgemm=: 'liblapack.so.3 dgemm_ > n *c *c *i *i *i *d *d *i *d *i *d *d
> > *i'&cd
> > > > mm=: 4 : 0
> > > > k=. ,{.$x
> > > > c=. (k,k)$1.5-1.5
> > > > dgemm (,'T');(,'T');k;k;k;(,2.5-1.5);x;k;y;k;(,1.5-1.5);c;k
> > > > c
> > > > )
> > > > 
> > > > 'A B'=:0?@$~2,,~4096
> > > > echo timespacex'A+/ .*B'
> > > > echo timespacex'A mm B'
> > > > 
> > > > result was,
> > > > 19.3608 2.68437e8
> > > > 0.886447 2.68442e8
> > > > 
> > > > Note it need to use an optimized version of blas, not the
> > > > reference blas.
> > > > 
> > > > Apparently the blas used in julia is sub-optimal.
> > > > 
> > > > Вт, 18 апр 2017, bill lam написал(а):
> > > > > I think julia just calls blas.
> > > > > 
> > > > > Пн, 17 апр 2017, Xiao-Yong Jin написал(а):
> > > > > > > On Apr 17, 2017, at 9:26 PM, Henry Rich <henryhr...@gmail.com> 
> > > > > > > wrote:
> > > > > > > 
> > > > > > > If you have an implementation of +/ . * on double-precision floats
> > that's faster than J 8.06, I would be obliged if you'd send me a copy of
> > the source code.
> > > > > > I'm sure your code is much faster than naive c loops.  But some how
> > the matrix-matrix multiplication is much slower (10x) than that in julia
> > (tested with a 3-year old version).
> > > > > > % julia
> > > > > >                _
> > > > > >    _       _ _(_)_     |  A fresh approach to technical computing
> > > > > >   (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
> > > > > >    _ _   _| |_  __ _   |  Type "help()" to list help topics
> > > > > >   | | | | | | |/ _` |  |
> > > > > >   | | |_| | | | (_| |  |  Version 0.2.1 (2014-02-11 06:30 UTC)
> > > > > > _/ |\__'_|_|_|\__'_|  |
> > > > > > |__/                   |  x86_64-linux-gnu
> > > > > > 
> > > > > > julia> A=rand(4096,4096); B=rand(4096,4096);
> > > > > > 
> > > > > > julia> @time A*B;
> > > > > > elapsed time: 2.260157127 seconds (149184640 bytes allocated)
> > > > > > 
> > > > > > julia>
> > > > > > % jconsole
> > > > > >    JVERSION
> > > > > > Engine: j806/j64avx/linux
> > > > > > Beta-3: commercial/2017-04-10T17:51:14
> > > > > > Library: 8.06.02
> > > > > > Platform: Linux 64
> > > > > > Installer: J806 install
> > > > > > InstallPath: /nfs2/xjin/pkgs/j64-806
> > > > > > Contact: www.jsoftware.com
> > > > > >    'A B'=:0?@$~2,,~4096
> > > > > >    timespacex'A+/ .*B'
> > > > > > 23.8976 2.68437e8
> > > > > >    timespacex'A+/ .*B'
> > > > > > 
> > > > > > ----------------------------------------------------------------------
> > > > > > For information about J forums see 
> > > > > > http://www.jsoftware.com/forums.htm
> > > > > --
> > > > > regards,
> > > > > ====================================================
> > > > > GPG key 1024D/4434BAB3 2008-08-24
> > > > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> > > > > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> > > > --
> > > > regards,
> > > > ====================================================
> > > > GPG key 1024D/4434BAB3 2008-08-24
> > > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> > > > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> > > --
> > > regards,
> > > ====================================================
> > > GPG key 1024D/4434BAB3 2008-08-24
> > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> > > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> > > ----------------------------------------------------------------------
> > > For information about J forums see http://www.jsoftware.com/forums.htm
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> 
> 
> ---
> This email has been checked for viruses by AVG.
> http://www.avg.com
> 
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

-- 
regards,
====================================================
GPG key 1024D/4434BAB3 2008-08-24
gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to