If you use j807 with an avx capable cpu, it should call optimized blas for that pattern, you can compare with previous versions of J. If you want even faster, you can build j.dll/libj.so from source to enable multiple core support and performance will scale up with the number of core used.
On Fri, May 17, 2019, 7:13 AM 'Mike Day' via Programming < [email protected]> wrote: > (Also answers Bill’s post, just in) > > think I misled you. Brian’s “dot” is more correctly the matrix product, > such as > 2 3 (+/ . *)&i. 3 4 > 20 23 26 29 > 56 68 80 92 > so we’re talking about dot =: +/ . * > > In some cases, Brian needs to multiply an mxn matrix A, by a kxn matrix B > for a mxk result, > A dot |: B > In others, he needs C, shape mxn, by D, shape mxk, for an nxk result, > (|: C) dot D > and of course, some are straight matrix multiplications. > > I defined Tdot =: |:@:[ +/ .* ] and dotT =: dot |: > > Are matrix multiplications going to be enhanced? And what about such > variants as these? > > Thanks, > > Mik > > Sent from my iPad > > > On 16 May 2019, at 18:43, Henry Rich <[email protected]> wrote: > > > > In the next beta +/@:*"1 uses 256-bit instructions, which should help > with dot-products. > > > > Henry rich > > > >> On 5/16/2019 1:27 PM, 'Mike Day' via Programming wrote: > >> I've tried various timings and tweaks - the dot products seem to > consume the most time; > >> > >> it's marginally worth dividing by "num_examples" after summing > "correct_logprobs" rather > >> > >> than summing the quotient, " correct_logprobs%num_examples " > >> > >> I added a couple of dot fns, Tdot =: |:@[ dot ] and dotT =: > dot |: > >> to neaten up the code a bit. Those transposes seem unavoidable. > >> > >> In a practical application, you'd probably run cycles until either a > suitable level of convergence > >> > >> is achieved - or until it's obvious that the process is divergent. > >> > >> Cheers, > >> > >> Mike > >> > >> > >>> On 16/05/2019 15:20, Brian Schott wrote: > >>> Mike, > >>> > >>> Yes, I new the reason that the calculation was done, but was surprised > by > >>> the manner in which these authors applied the calculation (without the > >>> multiplication) and I applied the Amend incorrectly, by not remembering > >>> that it was being applied to an array. > >>> > >>> And you are correct that the Amend approach is slower and more space > >>> consuming than the Product approach. I re-applied -- correctly, this > time, > >>> finally🤞 -- the Amend approach on a 'dbstopped' version of `train` > and > >>> got the following timings. In retrospect both methods require the > condition > >>> check and then multiplying by 0 and 1 may be very fast relative to > Amend's > >>> needs. > >>> > >>> mnd =: 0:`(I.@(0&>:)@[)`]}"1 > >>> ((hidden_layer>0)*dscores dot|:W2)-:hidden_layer mnd dscores > dot|:W2 > >>> 1 > >>> 10 timespacex'(hidden_layer>0)*dscores dot|:W2' > >>> 0.0004102 301568 > >>> 10 timespacex'hidden_layer mnd dscores dot|:W2' > >>> 0.0006501 535360 > >>> > >>> And btw, mnd1 =: 0:`(I.@(0>:[))`]}"1 using a fork is very slightly > faster > >>> than mnd. > >>> > >>> > >>> Thanks, again, > >>> > >>> On Thu, May 16, 2019 at 5:32 AM 'Mike Day' via Programming < > >>> [email protected]> wrote: > >>> > >>>> The Python authors' comments here explain (well, they assert) why > we're > >>>> doing that filtering for hidden_layer > 0: > >>>> > >>>> " Now we have the gradient on the outputs of the hidden layer. Next, > we > >>>> have to backpropagate the ReLU non-linearity. This turns out to be > easy > >>>> because ReLU during the backward pass is effectively a switch. Since > >>>> r=max(0,x) , we have that dr/dx = 1 (x>0) . Combined with the chain > >>>> rule, we see that the ReLU unit lets the gradient pass through > unchanged > >>>> if its input was greater than 0, but kills it if its input was less > than > >>>> zero [or equal to zero - Mike's edit] during the forward pass." > >>>> > >>>> Isn't it curious that the J-way of doing it, > >>>> > >>>> if. # ilow=. (<"1@:($ #: I.@:(0 >: ,))) hidden_layer do. NB. > find > >>>> indices of elements <: 0 > >>>> dhidden =. 0 ilow } dhidden > >>>> end. > >>>> > >>>> is much slower than the naive > >>>> > >>>> dhidden =. (hidden_layer >0) * dscores dotT W2 > >>>> ? > >>>> > >>>> Mike > >>>> > >>>> > >>>> -- > >>> (B=) > >>> ---------------------------------------------------------------------- > >>> For information about J forums see http://www.jsoftware.com/forums.htm > >> > >> --- > >> This email has been checked for viruses by Avast antivirus software. > >> https://www.avast.com/antivirus > >> > >> ---------------------------------------------------------------------- > >> For information about J forums see http://www.jsoftware.com/forums.htm > > > > > > --- > > This email has been checked for viruses by AVG. > > https://www.avg.com > > > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
