Jon, I have been studying your simple_conv_test.ijs example and trying to compare it to the *dynamic* example at http://cs231n.github.io/convolutional-networks/#conv where only 2 biases are used with their stride of 2 and 2 output kernels of shape 3x3. (I believe they have 2 biases because of the 2 output kernels.) In contrast, according to my manual reconstruction of your verb preRun in conv2d.ijs I get a whopping 90 biases (a 10x3x3 array), one for each of the 10 output kernels in each of its 3x3 positions on the 8x8 image.
My confusion is that based on the cs231n example, I would have guessed that you would have had only 10 biases, not 90. Can you comment on that, please? [Of course in my example below, my `filter` values are ridiculous. And I have not adjusted for epochs and batches. But I hope the shape of `filter` and the stride of 2 and the ks are consistent with your simple example.] filter =: i. 10 3 4 4 ks =: 2 3$2 2 2 3 4 4 $A,B,C 15 3 8 8 cf=: [: |:"2 [: |: [: +/ ks filter&(convFunc"3 3);._3 ] $n=: cf"3 A,B,C 15 10 3 3 $2 %~ +: 0.5-~ ? ( }. $ n) $ 0 NB. 90 biases 10 3 3 Actually, in my own development of a convnet I have been tempted to do as I believe you have done, but have been unsuccessful in the backprop step. Conceptually how do you combine each group of 3x3 dW's to update their common single W/kernel (for example, with summation or mean or max)? Thanks, (B=) ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
