I think you may be right. Thanks for pointing this out. However, since my 
networks mostly work, I am going to assume that having too many biases doesn't 
negatively impact the results, except for adding "useless" calculations. If you 
are correct, I should fix this.

I have edited the source on a new branch to only have a 2d shaped bias. (see: 
https://github.com/jonghough/jlearn/blob/feature/conv2d_layer_fix/adv/conv2d.ijs)
This is not on the master branch, but on a new branch. I am not 100% convinced 
this is correct, and so am going to think about it.

I did, however test it on the MNIST dataset and got about 90% accuracy on test 
data, after 2 epochs (takes a couple of hours to run on a PC). MNIST data is 
not particularly challenging though. Would test it on CIFAR-10 if I had somem 
ore time, but don't at the moment. 

The MNIST conv net is:

NB. =================================================


PATHTOTRAIN=: '/path/on/my/pc/to/mnist/train/input'
PATHTOTEST=: '/path/on/my/pc/to/mnist/test/input'
PATHTOTRAINLABELS=:'/path/on/my/pc/to/mnist/train/labels'
PATHTOTESTLABELS=: '/path/on/my/pc/to/mnist/test/labels'
rf=: 1!:1 
data=: a.i. toJ dltb , rf < PATHTOTRAIN
TRAININPUT =: 255 %~ [ 60000 1 28 28 $, 16}. data

data=: a.i. toJ dltb , rf < PATHTOTEST
TESTINPUT =: 255 %~ [ 10000 1 28 28 $, 16}. data


data=: a.i. toJ dltb , rf < PATHTOTRAINLABELS
TRAINLABELS =: 60000 10 $ , #: 2^ 8}. data

data=: a.i. toJ dltb , rf < PATHTOTESTLABELS
TESTLABELS =: 10000 10 $ , #: 2^ 8}. data

pipe=: (100;20;'softmax';1; 'l2';0.0001) conew 'NNPipeline'
c1=: ((50 1 4 4);3;'relu';'adam';0.01;0) conew 'Conv2D'
b1 =: (0; 1 ;1e_4;50;0.001) conew 'BatchNorm2D'
a1 =: 'relu' conew 'Activation'
c2=: ((64 50 5 5);4;'relu';'adam';0.01;0) conew 'Conv2D'
b2 =: (0; 1 ;1e_4;64;0.001) conew 'BatchNorm2D'
a2 =: 'relu' conew 'Activation'
p1=: 2 conew 'PoolLayer'
fl=: 1 conew 'FlattenLayer'
fc1=: (64;34;'tanh';'adam';0.01) conew 'SimpleLayer'
b3 =: (0; 1 ;1e_4;34;0.001) conew 'BatchNorm'
a3 =: 'tanh' conew 'Activation'
fc2=: (34;10;'softmax';'adam';0.01) conew 'SimpleLayer'
b4 =: (0; 1 ;1e_4;10;0.001) conew 'BatchNorm'
a4 =: 'softmax' conew 'Activation'

addLayer__pipe c1
addLayer__pipe b1
addLayer__pipe a1
addLayer__pipe c2
addLayer__pipe b2
addLayer__pipe a2
addLayer__pipe p1
addLayer__pipe fl
addLayer__pipe fc1
addLayer__pipe b3
addLayer__pipe a3
addLayer__pipe fc2
addLayer__pipe b4
addLayer__pipe a4



TRAINLABELS fit__pipe TRAININPUT

NB. f=: 3 : '+/ ((y + i. 100){TESTLABELS) -:"1 1 (=>./)"1 >{:predict__pipe 
(y+i.100){TESTINPUT'
NB. run f"0[100*i.100 to run prediction on ALL test set (in batches of size 
100. Avg the result to get accuracy. 
NB. =================================================

As I said, I am going to go back and look at my notes (don't have them at 
hand). I am sure you are correct, but then, am not 100% convinced that my new 
bias shape is correct. After thinking it through I will probably merge the fix.

About backprop for bias, I simply took the ntd (next layer training deltas) and 
averaged them across the first dimension, and then
multiplied by learn rate, and subtracted from the current bias. This was, a 
fudge from me. Why average? to make the shapes fit. Biases are shared between 
neurons so it makes sense to average the deltas that the bias contributes to. 
As I am sure you have noticed, the actual implementation of convnet backprop is 
the trickiest part, and also the least written about. I have a copy of 
Goodfellow and Bengio's Deep Learning book, which is mostly excellent, but even 
that just skims over backprop for convnets, or gives it a very abstract 
mathematical treatment, but the actual nitty gritty details are left to the 
reader. So my own interpretation of the actual correct implementation may be 
wrong in places (but then again, how wrong can it be, if it gets correct 
answers?).     On Sunday, April 28, 2019, 3:34:57 PM GMT+9, Brian Schott 
<[email protected]> wrote:  
 
 Jon,

I have been studying your simple_conv_test.ijs example and trying to
compare it to the *dynamic* example at
http://cs231n.github.io/convolutional-networks/#conv where only 2 biases
are used with their stride of 2 and 2 output kernels of shape 3x3. (I
believe they have 2 biases because of the 2 output kernels.) In contrast,
according to my manual reconstruction of your verb preRun in conv2d.ijs I
get a whopping 90 biases (a 10x3x3 array), one for each of the 10 output
kernels in each of its 3x3 positions on the 8x8 image.

My confusion is that based on the cs231n example, I  would have guessed
that you would have had only 10 biases, not 90. Can you comment on that,
please?

[Of course in my example below, my `filter` values are ridiculous.
And I have not adjusted for epochs and batches.
But I hope the shape of `filter` and the stride of 2 and the ks are
consistent with your simple example.]


  filter =: i.  10 3 4 4
  ks =: 2 3$2 2 2 3 4 4
  $A,B,C
15 3 8 8
  cf=: [: |:"2 [: |: [: +/ ks filter&(convFunc"3 3);._3 ]
  $n=: cf"3 A,B,C
15 10 3 3
  $2 %~ +: 0.5-~ ? ( }. $ n) $ 0    NB. 90 biases
10 3 3

Actually, in my own development of a convnet I have been tempted to do as I
believe you have done, but have been unsuccessful in the backprop step.
Conceptually how do you combine each group of 3x3 dW's to update their
common single W/kernel (for example, with summation or mean or max)?

Thanks,

(B=)
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
  
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to