taking out the %., I get 3.2x improvement on amd timespacex '( + / . *)~ aa'[ aa =.? 1200 1200 $ 0 0.85839 1.67796e7
----- Original Message ----- From: Henry Rich <[email protected]> To: [email protected] Sent: Sunday, March 12, 2017 11:06 PM Subject: Re: [Jprogramming] first 806 beta vailable Oh yes - integer matrix multiply is not improved (there are no AVX instructions for them). In fact, integer matrix multiply is slower than 8.04 (because we got rid of the assembler code in 8.05). But if you convert to float, the floating matrix multiply on 8.06 is faster than anything previous, float or integer. Henry Rich On 3/12/2017 8:34 PM, 'Pascal Jasmin' via Programming wrote: > larger matrix is 2x faster with avx... > > timespacex '( %. + / . * ] ) aa'[ aa =.? 1200 1200 $ 40000 > 3.67906 1.21639e8 > > > > ----- Original Message ----- > From: 'Pascal Jasmin' via Programming <[email protected]> > To: "[email protected]" <[email protected]> > Sent: Sunday, March 12, 2017 8:16 PM > Subject: Re: [Jprogramming] first 806 beta vailable > > 10 timespacex '( %. + / . * ] ) aa'[ aa =.? 200 200 $ 40000 > 0.0308774 3.80518e6 > > > slightly slower on avx806 vs 805. > > > 0.0286737 3.80518e6 > > Exact memory match suggests maybe the processor doesn't support a specific > avx feature, and the code aborts to "downhandle" the operation? > > > > ----- Original Message ----- > From: Henry Rich <[email protected]> > To: [email protected] > Sent: Sunday, March 12, 2017 7:26 PM > Subject: Re: [Jprogramming] first 806 beta vailable > > Very surprising that there is no improvement in matrix multiply, when > the processor uses AVX instructions. This processor has a large L2 > cache. How large were the matrices? > > Henry Rich > > On 3/12/2017 7:20 PM, 'Pascal Jasmin' via Programming wrote: >> on older AM A8-5500 most of the benchmark improvements (compared to 805) are >> higher (minimally) than Eric reported. floating point a bit less, but >> float!.0 a bit more. No improvement in matrix multiply. >> >> >> >> >> ----- Original Message ----- >> From: 'Mike Day' via Programming <[email protected]> >> To: [email protected] >> Sent: Sunday, March 12, 2017 1:44 PM >> Subject: Re: [Jprogramming] first 806 beta vailable >> >> I didn't know I had avx available on this machine, an AMD A10-7300, >> running Windows 10. Anyway, J806 JVERSION says I do! >> >> I'd have a go at running the benchmarks, but wonder if there's a script >> available >> to save doing them "by hand".... >> >> JVERSION >> >> Engine: j806/j64avx/windows >> >> Beta-1: commercial/2017-03-09T09:10:13 >> >> Library: 8.06.01 >> >> Qt IDE: 1.5.3/5.6.2 >> >> Platform: Win 64 >> >> Installer: J806 install >> >> InstallPath: c:/d/j806 >> >> Contact: www.jsoftware.com >> >> Mike >> >> On 11/03/2017 22:44, Eric Iverson wrote: >>> The first 806 beta is available. >>> >>> 806 will be primarily a performance release. This is the first J release >>> where hardware features are directly used for performance. Previous >>> releases depended on excellent code and smart algorithms. With Advanced >>> Vector Extensions (AVX) Intel finally (first hardware released in 2011) has >>> hardware that seems to have J, at least partially, in mind. >>> >>> A rough benchmark report is at the end of this message. Some of the results >>> are already impressive and there may be more to come. >>> >>> Improvments in i. and related areas are important in J, but faster >>> crunching is usually overwhelmed by all the housekeeping in an application. >>> Some things run 10 times faster, but your application won't. >>> >>> It would be a shame to have non-trivial vector capabilities in the hardware >>> and for J to not take advantage. AVX2 machines have just hit the shelves >>> there are more goodies there. >>> >>> It has been a long time since we've been able to brag of a factor of 10 >>> speedup in a primitive. >>> >>> Please get involved in the beta program, it helps make a better product for >>> everyone. >>> >>> And give big thanks to Henry Rich for this core JE development! >>> >>> *** >>> Follow web site download links to Installation/Beta. Do appropriate >>> download from j806/install folder and then follow the Archive install >>> instructions. These are 805 release instructions, so be careful to use 806 >>> as appropriate. >>> >>> The install contains a default non-avx JE binary as well as an avx JE >>> binary. The launch icons will run the non-avx binary. Make sure the install >>> is stable and when you are ready, switch to the avx binary with the >>> following steps: >>> >>> load'~addons/ide/jhs/installer.ijs' >>> avx'' NB. follow the instructions >>> >>> If your hardware/OS supports avx, then the next time you start 806 it will >>> use the avx binary. Verify this by checking 9!:14'' (you will see avx in >>> the string). >>> >>> *** preliminary benchmark report >>> >>> 2017 3 11 16 10 >>> j806/j64avx/linux/beta-1/commercial/www.jsoftware.com/2017-03-09T10:14:43 >>> i7-7700Q >>> >>> N in tables below indicate new avx JE runs N times faster than 805 >>> b=: (<.-:#a)+ c ?. c [ a=: C ?. C NB. intsr >>> b=: (c?.#a){a [ a=: C ?@$ <:2^63 NB. intbr >>> b=: (c?.#a){a [ a=: >,.~":each <"0 [C ?@$ <:2^63 NB. char >>> b=: 0.1+(c?.#a){a [ a=: 0.1+C ?@$ <:2^63 NB. float >>> intsr (small range) special code avoids hash - intbr (big range) >>> float0 tests use !.0 where appropriate >>> >>> 'C c'=: 10000000 1000 >>> intsr intbr char float float0 test >>> 1.3 2.0 4.1 1.0 3.5 a i. a >>> 12.8 10.4 25.5 2.1 20.2 a i. b >>> 3.4 7.3 8.6 5.1 10.7 b i. a >>> 5.2 8.0 9.0 5.3 12.9 a e. b >>> 6.4 10.4 25.3 2.1 20.2 b e. a >>> 5.3 8.8 9.5 5.1 13.1 a (+/@:e.) b >>> 4.4 6.4 9.4 38.2 12.9 a (e. i. 1:) b >>> 1.7 1.9 3.8 1.0 1.0 ~.a >>> 1.6 2.1 4.1 1.0 1.0 ~:a >>> 1.1 0.9 1.4 1.1 0.0 /:a >>> 1.2 0.6 1.3 1.0 0.0 /:~a >>> >>> 'C c'=: 100000 1000 >>> intsr intbr char float float0 test >>> 1.5 3.5 5.3 1.1 4.6 a i. a >>> 3.4 4.7 9.3 3.1 7.1 a i. b >>> 3.9 8.1 8.7 5.0 12.1 b i. a >>> 4.4 7.5 9.1 5.3 12.6 a e. b >>> 2.6 4.8 9.2 3.1 6.8 b e. a >>> 4.4 8.4 9.5 5.2 12.9 a (+/@:e.) b >>> 1.5 4.3 7.8 20.7 12.7 a (e. i. 1:) b >>> 1.0 3.3 4.7 1.1 1.1 ~.a >>> 1.3 3.5 5.2 1.2 1.1 ~:a >>> 1.7 1.3 1.3 1.2 0.0 /:a >>> 1.6 1.4 1.3 1.2 0.0 /:~a >>> >>> matrix multiply >>> 4.5 a +/ . * b >>> ---------------------------------------------------------------------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >> >> --- >> This email has been checked for viruses by Avast antivirus software. >> https://www.avast.com/antivirus > >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
