I've had a look at your example and the source you cite. You differ
from the source in seeming to
need explicit handling of hidden layer with both W & b AND W2 & b2 which
I can't understand right now.
Ah - I've just found a second listing, lower down the page, which does
have W2 and b2 and a hidden layer!
I found, at least in Windows 10, that 'dot'plot.... shows more or less
white space; 'symbol' plot is better.
Anyway, when I first ran train, I got:
train 100
|nonce error: train
| dhidden=.0 indx}dhidden
The trouble arose from this triplet of lines:
dhidden =. dscores dot |:W2
indx =. I. hidden_layer <: 0
dhidden =. 0 indx}dhidden
Since you seem to be restricting dhidden to be non-negative, I replaced
these three with:
dhidden =. 0 >. dscores dot |:W2 NB. is this what you meant?
I've also changed the loop so that we get a report for the first cycle,
as in Python:
for_i. i. >: y do.
and added this line after smoutput i,loss - might not be necessary in
Darwin...
wd'msgs'
With these changes, train ran as follows:
cc =: train 10000 NB. loss starts ok, increases slightly, still
unlike the Python ex!
0 1.09856
1000 1.10522
2000 1.10218
3000 1.0997
4000 1.09887
5000 1.09867
6000 1.09862
7000 1.09861
8000 1.09861
9000 1.09861
$h_l =. 0>.(>1{cc) +"1 X dot >0{cc
300 100
$sc =. (>3{cc) +"1 h_l dot >2{cc
300 3
$predicted_class =. (i.>./)"1 sc
300
mean predicted_class = classes
0.333333
Why are the cycle 0 losses different, if only slightly? They report
1.098744 cf your 1.09856 .
Sorry - only minor problems found - they don't explain why you don't
reproduce their results
more closely,
Mike
On 15/05/2019 19:42, Brian Schott wrote:
I have been working on a convolutional neural net application and have
learned from Jon's conv2d system, especially about his verb `deconv` in the
backprop step. That was a real eye-opener for me. But, when I use the
softmax activation, I am still having trouble getting well trained nets --
even for NON-convolutional systems. So I have attempted to code a j version
of the TOY, standard python example at
http://cs231n.github.io/neural-networks-case-study/#net .
Below I have produced my script and a **sample run**. I would appreciate
any advice.
The disappointment comes when I can get only about 51% accurate prediction
(as I have documented at the bottom of this post) and yet the authors get
98% accuracy. I cannot see what is different between my code and the python
code, but there must be some difference.
Notice that I have used Roll with a fixed seed in both the verbs `rand1`
and `rand2` below, so likely you can get the same result that I get.
My plot of the data looks very similar to their plot.
load'trig'
load'stats'
load'plot'
thetas =: 0 4 8+/steps 0 4 99
radii =: 100%~steps 1 100 99
rand2 =: 1.73205*0.2 * _2 ([ * <:@+:@])/\ 0 2 ?.@$~ +:
rand1 =: 1.73205*0.01 * _2 ([ * <:@+:@])/\ 0 2 ?.@$~ +:
dot =: +/ . *
probs =: (%+/)@:^"1
amnd =: (1-~{)`[`]}
mean =: +/ % #
X =: |:,"2|:radii*|:(sin,:cos)(_100]\rand2 300)+thetas
classes =: 100#i. 3
NB. 'dot' plot ;/|:X
step_size =: 1e_0
reg =: 1e_3
N =: 100 NB. number of points per class
D =: 2 NB. dimensionality
K =: 3 NB. number of classes
train =: verb define
h =. 100 NB. size of hidden layer
W =. ($rand1@*/) D,h
b =. h#0
W2 =. ($rand1@*/) h,K
b2 =. K#0
(W;b;W2;b2) train y
:
'W b W2 b2' =. x
num_examples =. #X
for_i. 1+i. y do.
hidden_layer =. 0>.b +"1 X dot W
scores =. b2 +"1 hidden_layer dot W2
NB. exp_scores =. ^scores
prob =. probs scores
correct_logprobs =. -^.classes}|:prob
data_loss =. +/correct_logprobs%num_examples
reg_loss =. 0.5*reg*+/,W*W
loss =. data_loss + reg_loss
if. 0=(y%10)|i do.
smoutput i,loss
end.
dscores =. prob
dscores =. classes amnd"0 1 dscores
dscores =. dscores%num_examples
dW2 =. (|:hidden_layer) dot dscores
db2 =. +/dscores
dhidden =. dscores dot |:W2
indx =. I. hidden_layer <: 0
dhidden =. 0 indx}dhidden
dW =. (|:X) dot dhidden
db =. +/dhidden
dW2 =. dW2 + reg*W2
dW =. dW + reg*W
W =. W-step_size * dW
b =. b-step_size * db
W2 =. W2-step_size * dW2
b2 =. b2-step_size * db2
end.
NB. (W;b);<W2;b2
W;b;W2;b2
)
Note 'sample results'
cc =: train 10000
1000 8.40634
2000 4.71528
3000 5.8607
4000 11.6474
5000 10.9213
6000 4.65864
7000 2.1275
8000 1.44098
9000 1.28587
10000 1.25434
$h_l =. 0>.(>1{cc) +"1 X dot >0{cc
300 100
$sc =. (>3{cc) +"1 h_l dot >2{cc
300 3
$predicted_class =. (i.>./)"1 sc
300
mean predicted_class = classes
0.51
JVERSION
Engine: j807/j64/darwin
Release-c: commercial/2019-02-24T10:50:40
Library: 8.07.25
Qt IDE: 1.7.9/5.9.7
Platform: Darwin 64
Installer: J807 install
InstallPath: /users/brian/j64-807
Contact: www.jsoftware.com
)
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm