Re: [Jprogramming] convolutional neural network [was simplifying im2col]

jonghough via Programming Wed, 17 Apr 2019 21:44:29 -0700

 You can test my convnet on the cifar 10 dataset (need to download it to your 
PC first, and get all values into memory by:

NB. 
===================================================================================
pathToCifar10=:'/path/to/cifar10'
rf=: 1 !: 1
data=: a.i. , rf < pathToCifar10,'/data_batch_1.bin'
data=: 10000 3073 $, data
TRAINLABELS1=: {."1 data
TRAINDATA1=: 10000 3 32 32 $, }."1 data

data=: a.i. , rf < pathToCifar10,'/data_batch_2.bin'
data=: 10000 3073 $, data
TRAINLABELS2=: {."1 data
TRAINDATA2=: 10000 3 32 32 $, }."1 data

data=: a.i. , rf < pathToCifar10,'/data_batch_3.bin'
data=: 10000 3073 $, data
TRAINLABELS3=: {."1 data
TRAINDATA3=: 10000 3 32 32 $, }."1 data

data=: a.i. , rf < pathToCifar10,'/data_batch_4.bin'
data=: 10000 3073 $, data
TRAINLABELS4=: {."1 data
TRAINDATA4=: 10000 3 32 32 $, }."1 data

data=: a.i. , rf < pathToCifar10,'/data_batch_5.bin'
data=: 10000 3073 $, data
TRAINLABELS5=: {."1 data
TRAINDATA5=: 10000 3 32 32 $, }."1 data

data=: a.i. , rf < pathToCifar10,'/test_batch.bin'
data=: 10000 3073 $, data
TESTLABELS=: #: 2^{."1 data
TESTDATA=: 255 %~ 10000 3 32 32 $, }."1 data

TRAINLABELS=: TRAINLABELS1, TRAINLABELS2,TRAINLABELS3, TRAINLABELS4, 
TRAINLABELS5
TRAINLABELS=: |: ,: TRAINLABELS
TRAINLABELS=: ,/ #: 2^ TRAINLABELS

TRAINDATA=: (,TRAINDATA1), 
(,TRAINDATA2),(,TRAINDATA3),(,TRAINDATA4),(,TRAINDATA5)
TRAINDATA1=: TRAINDATA2=: TRAINDATA3=: TRAINDATA4=: TRAINDATA5=: ''
TRAINDATA =: 255%~ 50000 3 32 32 $ TRAINDATA

V=: 500 ? 10000
VD=: V{TESTDATA
VL=: V{TESTLABELS
f=: 3 : '+/ ((y + i. 100){TESTLABELS) -:"1 1 (=>./)"1 >{:predict__pipe 
(y+i.100){TESTDATA'
g=: 3 : '+/ ((y + i. 100){VL) -:"1 1 (=>./)"1 >{:predict__pipe (y+i.100){VD'
h=: 3 : '+/ ((y + i. 100){TRAINLABELS) -:"1 1 (=>./)"1 >{:predict__pipe 
(y+i.100){TRAINDATA'

NB. f g h are used for testing the net, during training. f tests against 
batches of size 100 on the training
NB. set, g on a random subset of the test set (quasi-validation set)
NB. h tests on the test subset
NB. example: f"0[100*i.25
NB. will test the first 25 * 100 images in the training set and give prediction 
accuracy
NB. h"0[100*i.100 
NB. this gives prediction accuracy for the entire test set.

load 'plot'
mean=: +/ % #

NB. 
===================================================================================

You will need quite a lot of RAM to get all this in, and then build a model.

Here is an example model I made:

NB. 
===================================================================================

lr=: 0.001
pipe=: (100;25;'softmax';1;'l2';1e_3) conew 'NNPipeline'
c1=: (( 40 3 3 3);1;'relu';'adam';lr;1) conew 'Conv2D' 
a1=: 'relu' conew 'Activation' 
p1=: 2 conew 'PoolLayer' 

c2=: ((50 40 3 3);2;'relu';'adam';lr;1) conew 'Conv2D' 
a2=: 'relu' conew 'Activation' 

c3=: ((70 50 4 4);1;'relu';'adam';lr;1) conew 'Conv2D' 
a3=: 'relu' conew 'Activation'

c4=: ((80 70 3 3);1;'relu';'adam';lr;1) conew 'Conv2D' 
a4=: 'relu' conew 'Activation'
p2=: 2 conew 'PoolLayer' 

fl=: 1 conew 'FlattenLayer'

fc1=: (80;90;'relu';'adam';lr) conew 'SimpleLayer' 
a5=: 'relu' conew 'Activation'
d1=: 0.8 conew 'Dropout'
fc2=: (90;100 ;'relu';'adam';lr) conew 'SimpleLayer'
a6=: 'relu' conew 'Activation' 
d2=: 0.8 conew 'Dropout'
fc3=: (100;60;'relu';'adam';lr) conew 'SimpleLayer' 
a7=: 'relu' conew 'Activation' 
d3=: 0.8 conew 'Dropout'
 fc4=: (60;10;'softmax';'adam';lr) conew 'SimpleLayer' 
a8=: 'softmax' conew 'Activation' 

addLayer__pipe c1 
addLayer__pipe a1
addLayer__pipe p1 

addLayer__pipe c2
addLayer__pipe a2 

addLayer__pipe c3
addLayer__pipe a3

addLayer__pipe c4
addLayer__pipe a4 
addLayer__pipe p2

addLayer__pipe fl

addLayer__pipe fc1 
addLayer__pipe a5
NB. addLayer__pipe d1 
addLayer__pipe fc2 
addLayer__pipe a6
NB. addLayer__pipe d2
addLayer__pipe fc3 
addLayer__pipe a7 
addLayer__pipe d3 
addLayer__pipe fc4 
addLayer__pipe a8 

NB.TRAINLABELS fit__pipe TRAINDATA <----- begin training. stop training using 
the J's "break" executable.

NB. 
===================================================================================

Here's the thing though. To train to anything like 40 epochs will take several 
days of compute time. With a slightly modified version of this model I got 89% 
accuracy on the testing set.
But that took several days of computing on my home PC. I can send you the model 
if you want (serialized as a mix of text/binary in a file of about 800 MB size 
- yes, that big!)
I wrote a serializer/deserializer so i could turn my pc off and restart at a 
later date. So my training was not just days of non stop running the algorithm. 
I took several breaks in between.

I think in the readme I also wrote a description of how to train the MNIST 
dataset. THat is much easier as it is only greysacel and is an easy to train 
dataset. I think I could get well over 90% accuracy on that.

One thing I found is it is very difficult to know whether the code is correct. 
And I couldn't spend my life testing it on these huge datasets, so I wrote a 
very simple test script. Made of horizontal line and vertial line images (well, 
matrices of 1s and 0s). I think the below is the script:

NB. 
===================================================================================

NB.As and Bs are different types of horizontal and vertical lines, so we want 
to be able to recognize horizontal images and veritcal images, i.e. 
differentiate A class and B class.
NB. there is no training set or testing set, just working with these 10 images. 
Just to tes tthe algorithms are conceivably working correctly.

runConv=: 3 : 0
A1=: 3 8 8 $ 1 1 1 1 1 1 1 1, 0 0 0 0 0 0 0 0
A2=: 3 8 8 $ 1 1 1 1 1 1 1 1, 0 0 0 0 0 0 0 0, 1 1 1 1 1 1 1 1, 1 1 1 1 1 1 1 
1, 0 0 0 0 0 0 0 0
A3=: 3 8 8 $ 1 1 1 1 1 1 1 1, 0 0 0 0 0 0 0 0, 0 0 0 0 0 0 0 0
A4=: 3 8 8 $ 1 1 1 1 1 1 1 1, 0 0 0 0 0 0 0 0, 0 0 0 0 0 0 0 0, 1 1 1 1 1 1 1 
1, 1 1 1 1 1 1 1 1, 0 0 0 0 0 0 0 0
A5=: 2 |. A4

B1=: |:"2 A1
B2=: |:"2 A2
B3=: |:"2 A3
B4=: |:"2 A4
B5=: |:"2 A5

A=: 5 3 8 8 $, A1, A2, A3, A4, A5
B=: 5 3 8 8 $, B1, B2, B3, B4, B5
INPUT=: A,B
OUTPUT=: 10 2 $ 1 0 1 0 1 0 1 0 1 0, 0 1 0 1 0 1 0 1 0 1

pipe=: (5;1000;'softmax';1) conew 'NNPipeline'
c1=: ((10 3 4 4);2;'relu';'adam';0.001;0;1) conew 'Conv2D'
c2=: ((5 10 2 2); 1;'relu';'adam';0.001;0;1) conew 'Conv2D'
p1=: 2 conew 'PoolLayer'
fc=: (5;2;'softmax';'sgd';0.001;1) conew 'SimpleLayer'
fl=: 3 conew 'FlattenLayer'
addLayer__pipe c1
addLayer__pipe p1
addLayer__pipe c2
addLayer__pipe fl
addLayer__pipe fc

NB. returns model, input and output
NB. can train using OUTPUT train__pipe INPUT
pipe; INPUT;OUTPUT
)

runConv2=: 3 : 0
pipe=: (10;20;'softmax';1) conew 'NNPipeline'
c1=: ((40 1 4 4);3;'relu';'adam';0.001;0;1) conew 'Conv2D'
c2=: ((60 40 5 5);4;'relu';'adam';0.001;0;1) conew 'Conv2D'
p1=: 2 conew 'PoolLayer'
fl=: 1 conew 'FlattenLayer'
fc1=: (60;30;'tanh';'adam';0.001;1) conew 'SimpleLayer'
fc2=: (30;10;'softmax';'adam';0.001;1) conew 'SimpleLayer'

addLayer__pipe c1
addLayer__pipe c2
addLayer__pipe p1
addLayer__pipe fl
addLayer__pipe fc1
addLayer__pipe fc2

pipe
)

NB. 
===================================================================================

Regarding your questions and comments. I am going to have to go back and look 
at the source code. I haven't done much to this in several months, and need to 
refresh my brain.
The code is not optimized at all, I make a lot of errors (programming errors, 
and writing poorly performing J), and this is really just a reference project 
for my own personal use. For any serious use of a convnet ther eis no reason 
not to use Tensorflow or similar, since they also haver GPU support.

About undefined functions. This is another problem of mine. I forget to write 
"require" and "load" sometimes. Just run the init.ijs script before anything 
else and all verbs will be defined.
bmt_jLearnUtil_ is box muller transform for samplign from a normal 
distribution. Locate din utils/utils.ijs
I think setSolver is located in adv/advnn.ijs, defined as

NB. 
===================================================================================

setSolver=: 4 : 0

if. y -: 'adam' do.
 x conew 'AdamSolver'
elseif. y -: 'sgd' do.
 0.1 conew 'SGDSolver'
elseif. y -: 'sgdm' do.
 (0.1;0.6) conew 'SGDMSolver'
end.
)

NB. 
===================================================================================
 This just sets the solver either standard sgd or adam.
Solver verbs are defined in mlp/mlpopt.ijs and are basically implementations of 
the various solvers defined here: 
http://www.deeplearningbook.org/contents/optimization.html

The reason there is a folder called mlp (multi layer perceptron), is because I 
actually implemented 2 kinds of MLP models. The monolithic kind, mainly based 
on scikit-learn, which is in the mlp folder. Then I realized it makes more 
sense in the long run to make the models more modular, so created he classes in 
adv/ folder, very loosely based on tensorflow and pytorch, which allows adding 
layers at will.

I will need to relearn a lot about conv nets to figure out why I used 1 0 2 3 
|:, because, to be honest, I can't remember. I do remember that doing back prop 
over these 4-D matrices is very tricky, and I remember drawing lots of pics to 
figure it out, and that is what I came up with. It may be worth trying your 
way, to see what happens.

As you rightly mentioned, there are more efficient ways of calculating the 
convolutions, but as I said, I have not yet gone back to think about that. I 
should probably try to implement it as tensorflow or pytorch do, but I was more 
interested in understand the theory of how it works than making it run fast. 
Perhaps the next stage will be to optimize this, if I have time.

One more thing about my Conv2D implementation. The stride of the filter kernel 
cannot be any old value. The convolution algorithm needs to be EXACT (see: 
https://arxiv.org/pdf/1603.07285.pdf (section 2.3)) 
If I rememeber correctly this means that the input size subtract kernel size 
must be a multiple of the stride. 
ie input size 7
kernel size 3
stride 2
7 -3 divides 2, so ok.
This limits what possible values can go into each conv layer. Tensor flow 
allows non-EXACT convolutions, and it probablh wouldn't be too much bother to 
implement that (the back prop would need to be refactored mainly), but not 
really on my todo list.

Regards,
Jon     On Thursday, April 18, 2019, 11:36:35 AM GMT+9, Brian Schott 
<[email protected]> wrote:  

 I have renamed this message because the topic has changed, but considered
moving it to jchat as well. However I settled on jprogramming because there
are definitely some j programming issues to discuss.

Jon,

Your script code is beautifully commented and very valuable, imho. The lack
of an example has slowed down my study of the script, but now I have some
questions and comments.

I gather from your comments that the word tensor is used to designate a 4
dimensional array. That's new to me, but it is very logical.

Your definition convFunc=: +/@:,@:* works very well. However, for some
reason I wish I could think of a way to defined convFunc in terms of X=:
dot=: +/ . * .

The main insight I have gained from your code is that (x u;.+_3 y)  can be
used with x of shape 2 n where n>2 (and not just 2 2). This is great
information. And that you built the convFunc directly into cf is also very
enlightening.

I have created a couple of examples of the use of your function `cf` to
better understand how it works. [The data is borrowed from the fine example
at http://cs231n.github.io/convolutional-networks/#conv . Beware that the
dynamic example seen at the link changes everytime the page is refreshed,
so you will not see the exact data I present, but the shapes of the data
are constant.]

Notice that in my first experiments both `filter` and the RHA of cf"3 are
arrays and not tensors. Consequently(?) the result is an array, not a
tensor, either.

  i=: _7]\".;._2 (0 : 0)
0 0 0 0 0 0 0
0 0 0 1 2 2 0
0 0 0 2 1 0 0
0 0 0 1 2 2 0
0 0 0 0 2 0 0
0 0 0 2 2 2 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 2 1 2 2 2 0
0 0 1 0 2 0 0
0 1 1 1 1 1 0
0 2 0 0 0 2 0
0 0 0 2 2 2 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 1 2 1 0
0 1 1 0 0 0 0
0 2 1 2 0 2 0
0 1 0 0 2 2 0
0 1 0 1 2 2 0
0 0 0 0 0 0 0
)

  k =: _3]\".;._2(0 :0)
1  0 0
1 _1 0
_1 _1 1
0 _1 1
0  0 1
0 _1 1
1  0 1
0 _1 0
0 _1 0
)

  $i NB. 3 7 7
  $k NB.  3 3 3

  filter =: k
  convFunc=: +/@:,@:*

  cf=: 4 :  '|:"2 |: +/ x filter&(convFunc"3 3);._3 y'
  (1 2 2,:3 3 3) cf"3 i NB. 3 3$1 1 _2 _2 3 _7 _3 1  0

My next example makes both the `filter` and the RHA into tensors. And
notice the shape of the result shows it is a tensor, also.

  filter2 =: filter,:_1+filter
  cf2=: 4 :  '|:"2 |: +/ x filter2&(convFunc"3 3);._3 y'
  $ (1 2 2,:3 3 3) cf2"3 i,:5+i NB. 2 2 3 3

Much of my effort regarding CNN has been studying the literature that
discusses efficient ways of computing these convolutions by translating the
filters and the image data into flattened (and somewhat sparse} forms that
can be restated in matrix  formats. These matrices accomplish the
convolution and deconvolution as *efficient* matrix products. Your
demonstration of the way that j's ;._3 can be so effective challenges the
need for such efficiencies.

On the other hand, I could use some help understanding how the 1 0 2 3 |:
transpose you apply to `filter` is effective in the backpropogation stage.
Part of my confusion is that I would have thought the transpose would have
been 0 1 3 2 |:, instead. Can you say more about that?

I have yet to try to understand your verbs `forward` and `backward`, but I
look forward to doing so.

I could not find definitions for the following functions and wonder if you
can say more about them, please?

bmt_jLearnUtil_
setSolver

I noticed that your definitions of relu and derivRelu were more complicated
than mine, so I attempted to test yours out against mine as follows.

  relu    =: 0&>.
  derivRelu =: 0&<
  (relu -: 0:`[@.>&0) i: 4
1
  (derivRelu -: 0:`1:@.>&0) i: 4
1

On Sun, Apr 14, 2019 at 8:31 AM jonghough via Programming <
[email protected]> wrote:

>  I had a go writing conv nets in J.
> See
> https://github.com/jonghough/jlearn/blob/master/adv/conv2d.ijs
>
> This uses ;.3 to do the convolutions. Using a version of this , with a
> couple of fixes/, I managed to get 88% accuracy on the cifar-10 imageset.
> Took several days to run, as my algorithms are not optimized in any way,
> and no gpu was used.
> If you look at the references in the above link, you may get some ideas.
>
> the convolution verb is defined as:
> cf=: 4 : 0
> |:"2 |: +/ x filter&(convFunc"3 3);._3 y
> )
>
> Note that since the input is an batch of images, each 3-d (width, height,
> channels), we are actually doing the whole forward pass over a 4d array,
> and outputting another 4d array of different shape, depending on output
> channels, filter width, and filter height.
>
> Thanks,
> Jon
>

Thank you,

-- 
(B=)
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm  
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] convolutional neural network [was simplifying im2col]

Reply via email to