Trying to de-loop "fit", here is something I came up with - it's probably
missing something important as I do not understand the code well enough to
know otherwise but I think it's a step in the right direction.
fit__pipe
4 : 0
NB. test output
l=: {: layers
if. type__l -: 'SimpleLayer' do.
'Pipeline output must match last layer''s activation.' assert (tolower
output)-: activation__l
end.
pe=: >. (# y) % bs
ctr=: 0
".&.>(<'_ '''''),~&.>(<'onBeginFit_'),&.>layers
NB. for_j. i.#layers do.
NB. l=: j{layers
NB. onBeginFit__l ''
NB. end.
NB.preRun y{~ bs ?#y
while. ctr < epochs do.
NB. ectr=: 0
NB. while. ectr < pe do.
NB. ectr=: ectr+1
".&.>(<'_ '''''),~&.>(<'onBeginFit_'),&.>layers
NB. for_j. i.#layers do.
NB. l=: j{layers
NB. onBeginFit__l ''
NB. end.
ixs=. (bs*pe)?#y NB. choose random row(s)
NB. index=. bs ?# y NB. choose random row(s)
NB. X=. index { y NB. get the random sample(s)
NB. Y=. index { x NB. corresponding target value(s)
(ixs{Y) fit1 &.> ixs{X
total=: total + pe
smoutput 'Iterations complete: ',(":ctr),', total: ',":total
NB. wd^:1 'msgs'
NB. end.
ctr=: ctr+1
end.
)
On Sun, Apr 28, 2019 at 2:09 PM Devon McCormick <[email protected]> wrote:
> Hi - trying to run the "fit__pipe" function, I encountered a value error
> on this line:
> wd^:1 'msgs'
> so I commented it out under the assumption this is "wd" defined in JQt and
> this is some sort of progress message. Is this correct?
> Thanks,
> Devon
>
> On Sun, Apr 28, 2019 at 9:20 AM jonghough via Programming <
> [email protected]> wrote:
>
>> I think you may be right. Thanks for pointing this out. However, since
>> my networks mostly work, I am going to assume that having too many biases
>> doesn't negatively impact the results, except for adding "useless"
>> calculations. If you are correct, I should fix this.
>>
>> I have edited the source on a new branch to only have a 2d shaped bias.
>> (see:
>> https://github.com/jonghough/jlearn/blob/feature/conv2d_layer_fix/adv/conv2d.ijs
>> )
>> This is not on the master branch, but on a new branch. I am not 100%
>> convinced this is correct, and so am going to think about it.
>>
>> I did, however test it on the MNIST dataset and got about 90% accuracy on
>> test data, after 2 epochs (takes a couple of hours to run on a PC). MNIST
>> data is not particularly challenging though. Would test it on CIFAR-10 if I
>> had somem ore time, but don't at the moment.
>>
>> The MNIST conv net is:
>>
>> NB. =================================================
>>
>>
>> PATHTOTRAIN=: '/path/on/my/pc/to/mnist/train/input'
>> PATHTOTEST=: '/path/on/my/pc/to/mnist/test/input'
>> PATHTOTRAINLABELS=:'/path/on/my/pc/to/mnist/train/labels'
>> PATHTOTESTLABELS=: '/path/on/my/pc/to/mnist/test/labels'
>> rf=: 1!:1
>> data=: a.i. toJ dltb , rf < PATHTOTRAIN
>> TRAININPUT =: 255 %~ [ 60000 1 28 28 $, 16}. data
>>
>> data=: a.i. toJ dltb , rf < PATHTOTEST
>> TESTINPUT =: 255 %~ [ 10000 1 28 28 $, 16}. data
>>
>>
>> data=: a.i. toJ dltb , rf < PATHTOTRAINLABELS
>> TRAINLABELS =: 60000 10 $ , #: 2^ 8}. data
>>
>> data=: a.i. toJ dltb , rf < PATHTOTESTLABELS
>> TESTLABELS =: 10000 10 $ , #: 2^ 8}. data
>>
>> pipe=: (100;20;'softmax';1; 'l2';0.0001) conew 'NNPipeline'
>> c1=: ((50 1 4 4);3;'relu';'adam';0.01;0) conew 'Conv2D'
>> b1 =: (0; 1 ;1e_4;50;0.001) conew 'BatchNorm2D'
>> a1 =: 'relu' conew 'Activation'
>> c2=: ((64 50 5 5);4;'relu';'adam';0.01;0) conew 'Conv2D'
>> b2 =: (0; 1 ;1e_4;64;0.001) conew 'BatchNorm2D'
>> a2 =: 'relu' conew 'Activation'
>> p1=: 2 conew 'PoolLayer'
>> fl=: 1 conew 'FlattenLayer'
>> fc1=: (64;34;'tanh';'adam';0.01) conew 'SimpleLayer'
>> b3 =: (0; 1 ;1e_4;34;0.001) conew 'BatchNorm'
>> a3 =: 'tanh' conew 'Activation'
>> fc2=: (34;10;'softmax';'adam';0.01) conew 'SimpleLayer'
>> b4 =: (0; 1 ;1e_4;10;0.001) conew 'BatchNorm'
>> a4 =: 'softmax' conew 'Activation'
>>
>> addLayer__pipe c1
>> addLayer__pipe b1
>> addLayer__pipe a1
>> addLayer__pipe c2
>> addLayer__pipe b2
>> addLayer__pipe a2
>> addLayer__pipe p1
>> addLayer__pipe fl
>> addLayer__pipe fc1
>> addLayer__pipe b3
>> addLayer__pipe a3
>> addLayer__pipe fc2
>> addLayer__pipe b4
>> addLayer__pipe a4
>>
>>
>>
>> TRAINLABELS fit__pipe TRAININPUT
>>
>> NB. f=: 3 : '+/ ((y + i. 100){TESTLABELS) -:"1 1 (=>./)"1
>> >{:predict__pipe (y+i.100){TESTINPUT'
>> NB. run f"0[100*i.100 to run prediction on ALL test set (in batches of
>> size 100. Avg the result to get accuracy.
>> NB. =================================================
>>
>> As I said, I am going to go back and look at my notes (don't have them at
>> hand). I am sure you are correct, but then, am not 100% convinced that my
>> new bias shape is correct. After thinking it through I will probably merge
>> the fix.
>>
>> About backprop for bias, I simply took the ntd (next layer training
>> deltas) and averaged them across the first dimension, and then
>> multiplied by learn rate, and subtracted from the current bias. This was,
>> a fudge from me. Why average? to make the shapes fit. Biases are shared
>> between neurons so it makes sense to average the deltas that the bias
>> contributes to. As I am sure you have noticed, the actual implementation of
>> convnet backprop is the trickiest part, and also the least written about. I
>> have a copy of Goodfellow and Bengio's Deep Learning book, which is mostly
>> excellent, but even that just skims over backprop for convnets, or gives it
>> a very abstract mathematical treatment, but the actual nitty gritty details
>> are left to the reader. So my own interpretation of the actual correct
>> implementation may be wrong in places (but then again, how wrong can it be,
>> if it gets correct answers?). On Sunday, April 28, 2019, 3:34:57 PM
>> GMT+9, Brian Schott <[email protected]> wrote:
>>
>> Jon,
>>
>> I have been studying your simple_conv_test.ijs example and trying to
>> compare it to the *dynamic* example at
>> http://cs231n.github.io/convolutional-networks/#conv where only 2 biases
>> are used with their stride of 2 and 2 output kernels of shape 3x3. (I
>> believe they have 2 biases because of the 2 output kernels.) In contrast,
>> according to my manual reconstruction of your verb preRun in conv2d.ijs I
>> get a whopping 90 biases (a 10x3x3 array), one for each of the 10 output
>> kernels in each of its 3x3 positions on the 8x8 image.
>>
>> My confusion is that based on the cs231n example, I would have guessed
>> that you would have had only 10 biases, not 90. Can you comment on that,
>> please?
>>
>> [Of course in my example below, my `filter` values are ridiculous.
>> And I have not adjusted for epochs and batches.
>> But I hope the shape of `filter` and the stride of 2 and the ks are
>> consistent with your simple example.]
>>
>>
>> filter =: i. 10 3 4 4
>> ks =: 2 3$2 2 2 3 4 4
>> $A,B,C
>> 15 3 8 8
>> cf=: [: |:"2 [: |: [: +/ ks filter&(convFunc"3 3);._3 ]
>> $n=: cf"3 A,B,C
>> 15 10 3 3
>> $2 %~ +: 0.5-~ ? ( }. $ n) $ 0 NB. 90 biases
>> 10 3 3
>>
>> Actually, in my own development of a convnet I have been tempted to do as
>> I
>> believe you have done, but have been unsuccessful in the backprop step.
>> Conceptually how do you combine each group of 3x3 dW's to update their
>> common single W/kernel (for example, with summation or mean or max)?
>>
>> Thanks,
>>
>> (B=)
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
>
>
> --
>
> Devon McCormick, CFA
>
> Quantitative Consultant
>
>
--
Devon McCormick, CFA
Quantitative Consultant
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm