Mike,

Yes, I new the reason that the calculation was done, but was surprised by
the manner in which these authors applied the calculation (without the
multiplication) and I applied the Amend incorrectly, by not remembering
that it was being applied to an array.

And you are correct that the Amend approach is slower and more space
consuming than the Product approach. I re-applied -- correctly, this time,
finally🤞  -- the Amend approach on a 'dbstopped' version of `train` and
got the following timings. In retrospect both methods require the condition
check and then multiplying by 0 and 1 may be very fast relative to Amend's
needs.

      mnd =: 0:`(I.@(0&>:)@[)`]}"1
      ((hidden_layer>0)*dscores dot|:W2)-:hidden_layer mnd dscores dot|:W2
1
      10 timespacex'(hidden_layer>0)*dscores dot|:W2'
0.0004102 301568
      10 timespacex'hidden_layer mnd dscores dot|:W2'
0.0006501 535360

And btw, mnd1 =: 0:`(I.@(0>:[))`]}"1  using a fork is very slightly faster
than mnd.


Thanks, again,

On Thu, May 16, 2019 at 5:32 AM 'Mike Day' via Programming <
[email protected]> wrote:

> The Python authors' comments here explain (well, they assert) why we're
> doing that filtering for hidden_layer > 0:
>
> " Now we have the gradient on the outputs of the hidden layer. Next, we
> have to backpropagate the ReLU non-linearity. This turns out to be easy
> because ReLU during the backward pass is effectively a switch. Since
> r=max(0,x) , we have that dr/dx = 1 (x>0) . Combined with the chain
> rule, we see that the ReLU unit lets the gradient pass through unchanged
> if its input was greater than 0, but kills it if its input was less than
> zero [or equal to zero - Mike's edit] during the forward pass."
>
> Isn't it curious that the J-way of doing it,
>
>      if. # ilow=. (<"1@:($ #: I.@:(0 >: ,))) hidden_layer do.  NB. find
> indices of elements <: 0
>         dhidden =. 0 ilow } dhidden
>      end.
>
> is much slower than the naive
>
>      dhidden =. (hidden_layer >0) * dscores dotT  W2
> ?
>
> Mike
>
>
> --
(B=)
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to