I've had a look at your example and the source you cite.  You differ from the source in seeming to

need explicit handling of hidden layer with both W & b AND W2 & b2 which I can't understand right now.

Ah - I've just found a second listing, lower down the page, which does have W2 and b2 and a hidden layer!

I found, at least in Windows 10, that 'dot'plot.... shows more or less white space; 'symbol' plot is better.

Anyway, when I first ran train, I got:

   train 100
|nonce error: train
|   dhidden=.0     indx}dhidden

The trouble arose from this triplet of lines:

    dhidden =. dscores dot |:W2
    indx =. I. hidden_layer <: 0
    dhidden =. 0 indx}dhidden

Since you seem to be restricting dhidden to be non-negative, I replaced these three with:

    dhidden =. 0 >. dscores dot |:W2   NB. is this what you meant?

I've also changed the loop so that we get a report for the first cycle, as in Python:

for_i. i. >: y do.

and added this line after smoutput i,loss - might not be necessary in Darwin...

wd'msgs'

With these changes, train ran as follows:

cc =: train 10000    NB. loss starts ok,  increases slightly, still unlike the Python ex!

0 1.09856

1000 1.10522

2000 1.10218

3000 1.0997

4000 1.09887

5000 1.09867

6000 1.09862

7000 1.09861

8000 1.09861

9000 1.09861

   $h_l =. 0>.(>1{cc) +"1 X dot >0{cc
300 100
   $sc =. (>3{cc) +"1 h_l dot >2{cc
300 3
   $predicted_class =. (i.>./)"1 sc
300
   mean predicted_class = classes
0.333333

Why are the cycle 0 losses different, if only slightly? They report 1.098744 cf your 1.09856 .

Sorry - only minor problems found - they don't explain why you don't reproduce their results

more closely,

Mike

On 15/05/2019 19:42, Brian Schott wrote:
I have been working on a convolutional neural net application and have
learned from Jon's conv2d system, especially about his verb `deconv` in the
backprop step. That was a real eye-opener for me. But, when I use the
softmax activation, I am still having trouble getting well trained nets --
even for NON-convolutional systems. So I have attempted to code a j version
of the TOY, standard python example at
http://cs231n.github.io/neural-networks-case-study/#net .

Below I have produced my script and a **sample run**. I would appreciate
any advice.

The disappointment comes when I can get only about 51% accurate prediction
(as I have documented at the bottom of this post) and yet the authors get
98% accuracy. I cannot see what is different between my code and the python
code, but there must be some difference.
Notice that I have used Roll with a fixed seed in both the verbs `rand1`
and `rand2` below, so likely you can get the same result that I get.
My plot of the data looks very similar to their plot.


load'trig'
load'stats'
load'plot'
thetas =: 0 4 8+/steps 0 4 99
radii =: 100%~steps 1 100 99
rand2 =: 1.73205*0.2 * _2 ([ * <:@+:@])/\ 0 2 ?.@$~ +:
rand1 =: 1.73205*0.01 * _2 ([ * <:@+:@])/\ 0 2 ?.@$~ +:
dot =: +/ . *
probs =: (%+/)@:^"1
amnd =: (1-~{)`[`]}
mean =: +/ % #

X =: |:,"2|:radii*|:(sin,:cos)(_100]\rand2 300)+thetas
classes =: 100#i. 3
NB. 'dot' plot ;/|:X

step_size =: 1e_0
reg =: 1e_3
N =: 100 NB. number of points per class
D =: 2 NB. dimensionality
K =: 3 NB. number of classes

train =: verb define
h =. 100 NB. size of hidden layer
W =. ($rand1@*/) D,h
b =. h#0
W2 =. ($rand1@*/) h,K
b2 =. K#0
(W;b;W2;b2) train y
:
'W b W2 b2' =. x
num_examples =. #X
for_i. 1+i. y do.
     hidden_layer =. 0>.b +"1 X dot W
     scores =. b2 +"1 hidden_layer dot W2
     NB. exp_scores =. ^scores

     prob =. probs scores
     correct_logprobs =. -^.classes}|:prob

     data_loss =. +/correct_logprobs%num_examples
     reg_loss =. 0.5*reg*+/,W*W
     loss =. data_loss + reg_loss
     if. 0=(y%10)|i do.
         smoutput i,loss
     end.
     dscores =. prob
     dscores =. classes amnd"0 1 dscores
     dscores =. dscores%num_examples

     dW2 =. (|:hidden_layer) dot dscores
     db2 =. +/dscores

     dhidden =. dscores dot |:W2
     indx =. I. hidden_layer <: 0
     dhidden =. 0 indx}dhidden

     dW =. (|:X) dot dhidden
     db =. +/dhidden

     dW2 =. dW2 + reg*W2
     dW =. dW + reg*W

     W =. W-step_size * dW
     b =. b-step_size * db

     W2 =. W2-step_size * dW2
     b2 =. b2-step_size * db2
end.
NB. (W;b);<W2;b2
W;b;W2;b2

)

Note 'sample results'
    cc =: train 10000
1000 8.40634
2000 4.71528
3000 5.8607
4000 11.6474
5000 10.9213
6000 4.65864
7000 2.1275
8000 1.44098
9000 1.28587
10000 1.25434
    $h_l =. 0>.(>1{cc) +"1 X dot >0{cc
300 100
    $sc =. (>3{cc) +"1 h_l dot >2{cc
300 3
    $predicted_class =. (i.>./)"1 sc
300
    mean predicted_class = classes
0.51
    JVERSION
Engine: j807/j64/darwin
Release-c: commercial/2019-02-24T10:50:40
Library: 8.07.25
Qt IDE: 1.7.9/5.9.7
Platform: Darwin 64
Installer: J807 install
InstallPath: /users/brian/j64-807
Contact: www.jsoftware.com
)




---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to