I think pascal is saying he saw improvement a bit higher than I reported in my benchmark. That is, quite a bit of overall improvement from your avx work.
On Sun, Mar 12, 2017 at 7:26 PM, Henry Rich <[email protected]> wrote: > Very surprising that there is no improvement in matrix multiply, when the > processor uses AVX instructions. This processor has a large L2 cache. How > large were the matrices? > > Henry Rich > > > On 3/12/2017 7:20 PM, 'Pascal Jasmin' via Programming wrote: > >> on older AM A8-5500 most of the benchmark improvements (compared to 805) >> are higher (minimally) than Eric reported. floating point a bit less, but >> float!.0 a bit more. No improvement in matrix multiply. >> >> >> >> >> ----- Original Message ----- >> From: 'Mike Day' via Programming <[email protected]> >> To: [email protected] >> Sent: Sunday, March 12, 2017 1:44 PM >> Subject: Re: [Jprogramming] first 806 beta vailable >> >> I didn't know I had avx available on this machine, an AMD A10-7300, >> running Windows 10. Anyway, J806 JVERSION says I do! >> >> I'd have a go at running the benchmarks, but wonder if there's a script >> available >> to save doing them "by hand".... >> >> JVERSION >> >> Engine: j806/j64avx/windows >> >> Beta-1: commercial/2017-03-09T09:10:13 >> >> Library: 8.06.01 >> >> Qt IDE: 1.5.3/5.6.2 >> >> Platform: Win 64 >> >> Installer: J806 install >> >> InstallPath: c:/d/j806 >> >> Contact: www.jsoftware.com >> >> Mike >> >> On 11/03/2017 22:44, Eric Iverson wrote: >> >>> The first 806 beta is available. >>> >>> 806 will be primarily a performance release. This is the first J release >>> where hardware features are directly used for performance. Previous >>> releases depended on excellent code and smart algorithms. With Advanced >>> Vector Extensions (AVX) Intel finally (first hardware released in 2011) >>> has >>> hardware that seems to have J, at least partially, in mind. >>> >>> A rough benchmark report is at the end of this message. Some of the >>> results >>> are already impressive and there may be more to come. >>> >>> Improvments in i. and related areas are important in J, but faster >>> crunching is usually overwhelmed by all the housekeeping in an >>> application. >>> Some things run 10 times faster, but your application won't. >>> >>> It would be a shame to have non-trivial vector capabilities in the >>> hardware >>> and for J to not take advantage. AVX2 machines have just hit the shelves >>> there are more goodies there. >>> >>> It has been a long time since we've been able to brag of a factor of 10 >>> speedup in a primitive. >>> >>> Please get involved in the beta program, it helps make a better product >>> for >>> everyone. >>> >>> And give big thanks to Henry Rich for this core JE development! >>> >>> *** >>> Follow web site download links to Installation/Beta. Do appropriate >>> download from j806/install folder and then follow the Archive install >>> instructions. These are 805 release instructions, so be careful to use >>> 806 >>> as appropriate. >>> >>> The install contains a default non-avx JE binary as well as an avx JE >>> binary. The launch icons will run the non-avx binary. Make sure the >>> install >>> is stable and when you are ready, switch to the avx binary with the >>> following steps: >>> >>> load'~addons/ide/jhs/installer.ijs' >>> avx'' NB. follow the instructions >>> >>> If your hardware/OS supports avx, then the next time you start 806 it >>> will >>> use the avx binary. Verify this by checking 9!:14'' (you will see avx in >>> the string). >>> >>> *** preliminary benchmark report >>> >>> 2017 3 11 16 10 >>> j806/j64avx/linux/beta-1/commercial/www.jsoftware.com/2017- >>> 03-09T10:14:43 >>> i7-7700Q >>> >>> N in tables below indicate new avx JE runs N times faster than 805 >>> b=: (<.-:#a)+ c ?. c [ a=: C ?. C NB. intsr >>> b=: (c?.#a){a [ a=: C ?@$ <:2^63 NB. intbr >>> b=: (c?.#a){a [ a=: >,.~":each <"0 [C ?@$ <:2^63 NB. char >>> b=: 0.1+(c?.#a){a [ a=: 0.1+C ?@$ <:2^63 NB. float >>> intsr (small range) special code avoids hash - intbr (big range) >>> float0 tests use !.0 where appropriate >>> >>> 'C c'=: 10000000 1000 >>> intsr intbr char float float0 test >>> 1.3 2.0 4.1 1.0 3.5 a i. a >>> 12.8 10.4 25.5 2.1 20.2 a i. b >>> 3.4 7.3 8.6 5.1 10.7 b i. a >>> 5.2 8.0 9.0 5.3 12.9 a e. b >>> 6.4 10.4 25.3 2.1 20.2 b e. a >>> 5.3 8.8 9.5 5.1 13.1 a (+/@:e.) b >>> 4.4 6.4 9.4 38.2 12.9 a (e. i. 1:) b >>> 1.7 1.9 3.8 1.0 1.0 ~.a >>> 1.6 2.1 4.1 1.0 1.0 ~:a >>> 1.1 0.9 1.4 1.1 0.0 /:a >>> 1.2 0.6 1.3 1.0 0.0 /:~a >>> >>> 'C c'=: 100000 1000 >>> intsr intbr char float float0 test >>> 1.5 3.5 5.3 1.1 4.6 a i. a >>> 3.4 4.7 9.3 3.1 7.1 a i. b >>> 3.9 8.1 8.7 5.0 12.1 b i. a >>> 4.4 7.5 9.1 5.3 12.6 a e. b >>> 2.6 4.8 9.2 3.1 6.8 b e. a >>> 4.4 8.4 9.5 5.2 12.9 a (+/@:e.) b >>> 1.5 4.3 7.8 20.7 12.7 a (e. i. 1:) b >>> 1.0 3.3 4.7 1.1 1.1 ~.a >>> 1.3 3.5 5.2 1.2 1.1 ~:a >>> 1.7 1.3 1.3 1.2 0.0 /:a >>> 1.6 1.4 1.3 1.2 0.0 /:~a >>> >>> matrix multiply >>> 4.5 a +/ . * b >>> ---------------------------------------------------------------------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >>> >> >> >> --- >> This email has been checked for viruses by Avast antivirus software. >> https://www.avast.com/antivirus >> >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
