I have been developing a toy convolutional neural network patterned after the toy non-convolutional nn discussed in this thread. The status of the toy is produced below here for any comments.
NB. simpler_conv_test.ijs NB. 5/19/19 NB. based on Jon Hough's simple_conv_test.ijs NB. especially Jon's conv and convFunc in backprob NB. and patterned after toy nn case study at NB. http://cs231n.github.io/neural-networks-case-study/#net NB. This example demonstrates a 3-layer neural net: NB. an input layer receives 8x8 Black and White (binary)images, NB. a hidden convolutional layer with 2 4x4 filters NB. which stride 2 over the 8x8 input images with no zero-padding, NB. and 3 classes in the softmax activated output layer. NB. These exact features of the filter sizes and the image NB. size and the stride and the lack of zero-padding, NB. produce 9 readings by each filter. NB. The names W and W2 for filters/weights are kept from the NB. original toy case study which had only fully connected NB. layers, and no convolutional layer. NB. ============================== NB. revised data to use only a single image channel instead of 3 channels A1=:8 8$"."0'1111111100000000000000001111111100000000111111111111111100000000' A2=:8 8$"."0'1111111100000000111111111111111100000000' A3=:8 8$"."0'111111110000000000000000' A4=:8 8$"."0'111111110000000000000000111111111111111100000000' A5=: 2 |. A4 B1=: |:"2 A1 B2=: |:"2 A2 B3=: |:"2 A3 B4=: |:"2 A4 B5=: |:"2 A5 C1=:8 8$"."0'1000000101000010001001000001100000011000001001000100001010000001' C2=:8 8$"."0'1000000001000000001000000001000000001000000001000000001000000001' C3=:8 8$"."0'1010100001010100001010100001010110001010010001011010001001010001' C4=: |."1 C3 C5=:8 8$"."0'1111000000111100000011111100001111110000001111000000111111000011' A=: 5 8 8 $, A1, A2, A3, A4, A5 B=: 5 8 8 $, B1, B2, B3, B4, B5 C=: 5 8 8 $, C1, C2, C3, C4, C5 X =: INPUT=: A,B,C Y =: 5#0 1 2 NB. specific target values for this case NB. some utility verbs dot =: +/ . * probs =: (%+/)@:^"1 NB. used in softmax activiation amnd =: (1-~{)`[`]} mean =: +/ % # rand01 =: ?.@$ 0: NB. ? replaced with ?. for demo purposes only normalrand =: (2 o. [: +: [: o. rand01) * [: %: [: - [: +: [: ^. rand01 NB. Hough's backprop magic verb deconv =: 4 : '(1 1,:5 5) x&(conv"2 _) tesl y' conv =: +/@:,@:* NB. Jon Hough's tessellation verb for conv tesl =: ;._3 NB. backprop requires some zero-padding, thus msk msk =: (1j1 1j1 1)&# NB. creates zero-padding for 3x3 array NB. tessellation requires and produces rectangular data NB. so collapse and sqr reworks and creates such data collapse =: (,"2)&(1 2&|:) NB. list from internal array sqr =: $&,~}:@$,2&$@:%:@{:@$ NB. square array from list NB. some training parameters step_size =: 1e_0 reg =: 1e_3 N =: 5 NB. number of points per class D =: 2 NB. dimensionality K =: 3 NB. number of classes bias =: 0 train =: dyad define 'W b W2 b2' =. x NB. weights and biases num_examples =. #X for_i. 1+i. y do. NB. y is number of batches hidden_layer =. 0>.b +"1(2 2,:4 4) W&(conv"2 _) tesl"3 2 X c_hidden_layer =. collapse hidden_layer scores =. mean 0>.b2 +"1 (1 0 2|:c_hidden_layer) dot"2 W2 prob =. probs scores correct_logprobs =. -^.Y}|:prob data_loss =. (+/correct_logprobs)%num_examples reg_loss =. (0.5*reg*+/,W*W) + 0.5*reg*+/,W2*W2 loss =. data_loss + reg_loss if. 0=(y%10)|i do. smoutput i,loss end. dscores =. prob dscores =. Y amnd"0 1 dscores dscores =. dscores%num_examples dW2 =. 1 0 2|:(|:collapse hidden_layer)dot dscores db2 =. mean dscores NB. Mike Day fixed next line 5/16/19 dhidden =. (0<|:"2 c_hidden_layer)*dscores dot|:W2 padded =. msk"1 msk"2 sqr |:"2 dhidden dW =. mean 0 3 1 2|: padded deconv"3 2 X db =. mean mean dhidden dW2 =. dW2 + reg*W2 dW =. dW + reg*W W =. W-step_size * dW b =. b-step_size * db W2 =. W2-step_size * dW2 b2 =. b2-step_size * db2 end. W;b;W2;b2 ) Note 'example start of training' NB. 2 4 4 is the shape of W. NB. That shape reflects 2 filters of wxh = 4x4 NB. over the 8x8 image layer NB. The hidden layer is convolutional; W defines the NB. weights (filters) between input layer and hidden layer. $xW =. 0.026*normalrand 2 4 4 NB. 0.026 is wild-ass guess 2 4 4 NB. 2 9 3 is the shape of W2. NB. W2 are weights that receive the 2 conv filters, and NB. produce the 3 output classes, and each filter NB. takes 9 readings over 8x8 image of 4x4 filters NB. using a stride of 2. NB. The output or scores layer is fully connected; W2 NB. defines the weights between the hidden and output layers. $xW2 =. 0.026*normalrand 2 9 3 2 9 3 cc =. (xW;0 0;xW2;0 0 0) train 200 NB. start run like this NB. then continue with 200 more batches with the following cx =. cc train 200 ) Note 'check prediction proportion' $h_l =. 0>.(>1{cc) +"1(2 2,:4 4) (>0{cc)&(conv"2 _) tesl"3 2 X $c_h_l =. collapse h_l $sc =. mean 0>.(>3{cc) +"1 (1 0 2|:c_h_l) dot"2 (>2{cc) $predicted_class =. (i.>./)"1 sc mean predicted_class = Y ) -- (B=) <-----my sig Brian Schott ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
