I have been working on a convolutional neural net application and have
learned from Jon's conv2d system, especially about his verb `deconv` in the
backprop step. That was a real eye-opener for me. But, when I use the
softmax activation, I am still having trouble getting well trained nets --
even for NON-convolutional systems. So I have attempted to code a j version
of the TOY, standard python example at
http://cs231n.github.io/neural-networks-case-study/#net .
Below I have produced my script and a **sample run**. I would appreciate
any advice.
The disappointment comes when I can get only about 51% accurate prediction
(as I have documented at the bottom of this post) and yet the authors get
98% accuracy. I cannot see what is different between my code and the python
code, but there must be some difference.
Notice that I have used Roll with a fixed seed in both the verbs `rand1`
and `rand2` below, so likely you can get the same result that I get.
My plot of the data looks very similar to their plot.
load'trig'
load'stats'
load'plot'
thetas =: 0 4 8+/steps 0 4 99
radii =: 100%~steps 1 100 99
rand2 =: 1.73205*0.2 * _2 ([ * <:@+:@])/\ 0 2 ?.@$~ +:
rand1 =: 1.73205*0.01 * _2 ([ * <:@+:@])/\ 0 2 ?.@$~ +:
dot =: +/ . *
probs =: (%+/)@:^"1
amnd =: (1-~{)`[`]}
mean =: +/ % #
X =: |:,"2|:radii*|:(sin,:cos)(_100]\rand2 300)+thetas
classes =: 100#i. 3
NB. 'dot' plot ;/|:X
step_size =: 1e_0
reg =: 1e_3
N =: 100 NB. number of points per class
D =: 2 NB. dimensionality
K =: 3 NB. number of classes
train =: verb define
h =. 100 NB. size of hidden layer
W =. ($rand1@*/) D,h
b =. h#0
W2 =. ($rand1@*/) h,K
b2 =. K#0
(W;b;W2;b2) train y
:
'W b W2 b2' =. x
num_examples =. #X
for_i. 1+i. y do.
hidden_layer =. 0>.b +"1 X dot W
scores =. b2 +"1 hidden_layer dot W2
NB. exp_scores =. ^scores
prob =. probs scores
correct_logprobs =. -^.classes}|:prob
data_loss =. +/correct_logprobs%num_examples
reg_loss =. 0.5*reg*+/,W*W
loss =. data_loss + reg_loss
if. 0=(y%10)|i do.
smoutput i,loss
end.
dscores =. prob
dscores =. classes amnd"0 1 dscores
dscores =. dscores%num_examples
dW2 =. (|:hidden_layer) dot dscores
db2 =. +/dscores
dhidden =. dscores dot |:W2
indx =. I. hidden_layer <: 0
dhidden =. 0 indx}dhidden
dW =. (|:X) dot dhidden
db =. +/dhidden
dW2 =. dW2 + reg*W2
dW =. dW + reg*W
W =. W-step_size * dW
b =. b-step_size * db
W2 =. W2-step_size * dW2
b2 =. b2-step_size * db2
end.
NB. (W;b);<W2;b2
W;b;W2;b2
)
Note 'sample results'
cc =: train 10000
1000 8.40634
2000 4.71528
3000 5.8607
4000 11.6474
5000 10.9213
6000 4.65864
7000 2.1275
8000 1.44098
9000 1.28587
10000 1.25434
$h_l =. 0>.(>1{cc) +"1 X dot >0{cc
300 100
$sc =. (>3{cc) +"1 h_l dot >2{cc
300 3
$predicted_class =. (i.>./)"1 sc
300
mean predicted_class = classes
0.51
JVERSION
Engine: j807/j64/darwin
Release-c: commercial/2019-02-24T10:50:40
Library: 8.07.25
Qt IDE: 1.7.9/5.9.7
Platform: Darwin 64
Installer: J807 install
InstallPath: /users/brian/j64-807
Contact: www.jsoftware.com
)
--
(B=)
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm