Jon,

I have been studying your simple_conv_test.ijs example and trying to
compare it to the *dynamic* example at
http://cs231n.github.io/convolutional-networks/#conv where only 2 biases
are used with their stride of 2 and 2 output kernels of shape 3x3. (I
believe they have 2 biases because of the 2 output kernels.) In contrast,
according to my manual reconstruction of your verb preRun in conv2d.ijs I
get a whopping 90 biases (a 10x3x3 array), one for each of the 10 output
kernels in each of its 3x3 positions on the 8x8 image.

My confusion is that based on the cs231n example, I  would have guessed
that you would have had only 10 biases, not 90. Can you comment on that,
please?

[Of course in my example below, my `filter` values are ridiculous.
And I have not adjusted for epochs and batches.
But I hope the shape of `filter` and the stride of 2 and the ks are
consistent with your simple example.]


   filter =: i.  10 3 4 4
   ks =: 2 3$2 2 2 3 4 4
   $A,B,C
15 3 8 8
   cf=: [: |:"2 [: |: [: +/ ks filter&(convFunc"3 3);._3 ]
   $n=: cf"3 A,B,C
15 10 3 3
   $2 %~ +: 0.5-~ ? ( }. $ n) $ 0     NB. 90 biases
10 3 3

Actually, in my own development of a convnet I have been tempted to do as I
believe you have done, but have been unsuccessful in the backprop step.
Conceptually how do you combine each group of 3x3 dW's to update their
common single W/kernel (for example, with summation or mean or max)?

Thanks,

(B=)
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to