The erorr is in the L-BFGS test, which is unrelated to conv nets. In fact just remove all tests from the init.ijs file and that error can be ignored. These lines at teh bottom of init.ijs NB. test require jpath '~Projects/jlearn/test/testbase.ijs' require jpath '~Projects/jlearn/test/testgmm.ijs' require jpath '~Projects/jlearn/test/testknn.ijs' require jpath '~Projects/jlearn/test/testnnopt.ijs' require jpath '~Projects/jlearn/test/testoptimize.ijs' require jpath '~Projects/jlearn/test/testrunner.ijs'
can be deleted. On Friday, April 26, 2019, 10:39:32 PM GMT+9, Devon McCormick <[email protected]> wrote: Hi Jon - I came up with a work-around by changing the "beta" parameter to 1e_7 instead of 0.0001 (in the call to "minBFGS" in "test4") but have no idea what this means in the larger scheme of things. Your helpful error message gave me the clue I needed: Error attempting line search. The input values to the function may be too extreme. Function input value (xk): _9.65977e27 _6.73606e6 Search direction value (pk): 2.06573e27 3.10187e6 A possible solution is to reduce the size of the initial inverse hessian scale, beta. Beta is currently set to 0.0001, which may be too large/small. |uncaught throw.: minBFGS_BFGS_ |minBFGS_BFGS_[0] Thanks again. I look forward to exploring this code. Regards, Devon On Fri, Apr 26, 2019 at 9:16 AM Devon McCormick <[email protected]> wrote: > Hi Jon - > I got your example CIFAR-10 code running on one of my machines but got the > following error when running "init.ijs" on another one (perhaps with a > different version of J 8.07): > Test success AdaGrad Optimizer test 1 > 1 > Test success 1 > 1 > Test success 2 > 1 > Test success 3 > |value error: minBFGS_BFGS_ > | k=. u y > |assertThrow[2] > > It looks like "minBFGS_BFGS_" was not defined, so I pasted in the > definition before loading "init.ijs" and got a little further only to hit > this error: > Test success 3 > |NaN error: dot > |dot[:0] > 13!:1'' > |NaN error > *dot[:0] > | Hk=.(rhok*(|:sk)dot sk)+(I-rhok*(|:sk)dot yk)dot Hk > dot(I-rhok*(|:yk)dot sk) > |minBFGS_BFGS_[0] > | k=. u y > |assertThrow[2] > | ( minBFGS_BFGS_ assertThrow(f f.`'');(fp f.`'');(4 > 3);100000;0.0001;0.0001) > |test4[2] > | res=. u'' > |testWrapper[0] > | test4 testWrapper 4 > |run__to[5] > | run__to'' > |[-180] > c:\users\devon_mccormick\j64-807-user\projects\jlearn\test\testoptimize.ijs > | 0!:0 y[4!:55<'y' > |script[0] > |fn[0] > | fn fl > |load[:7] > | 0 load y > |load[0] > | load fls > |require[1] > | require jpath'~Projects/jlearn/test/testoptimize.ijs' > |[-39] c:\Users\devon_mccormick\j64-807-user\projects\jlearn\init.ijs > | 0!:0 y[4!:55<'y' > |script[0] > |fn[0] > | fn fl > |load[:7] > | 0 load y > |load[0] > | > load'c:\Users\devon_mccormick\j64-807-user\projects\jlearn\init.ijs' > > The arguments to "dot" appear to be extreme values: > (I-rhok*(|:yk)dot sk) > 1 0 > _ 0 > Hk > _ _4.62371e38 > _4.62371e38 2.76904e_78 > > Any idea what might cause this? > > Thanks, > > Devon > > > > > On Thu, Apr 25, 2019 at 9:55 PM Devon McCormick <[email protected]> > wrote: > >> That looks like it did the trick - thanks! >> >> On Thu, Apr 25, 2019 at 9:23 PM jonghough via Programming < >> [email protected]> wrote: >> >>> Hi Devon. >>> Did you run the init.ijs script. If you run that initially, everything >>> should be setup, and you should have no problems. >>> On Friday, April 26, 2019, 9:54:55 AM GMT+9, Devon McCormick < >>> [email protected]> wrote: >>> >>> Hi - so I tried running the code at " >>> https://github.com/jonghough/jlearn/blob/master/adv/conv2d.ijs" but get >>> numerous value errors. >>> >>> Is there another package somewhere that I need to load? >>> >>> Thanks, >>> >>> Devon >>> >>> On Fri, Apr 19, 2019 at 10:57 AM Raul Miller <[email protected]> >>> wrote: >>> >>> > That's the same thing as a dot product on ravels, unless the ranks of >>> > your arguments are ever different. >>> > >>> > Thanks, >>> > >>> > -- >>> > Raul >>> > >>> > On Thu, Apr 18, 2019 at 8:13 PM jonghough via Programming >>> > <[email protected]> wrote: >>> > > >>> > > The convolution kernel function is just a straight up elementwise >>> > multiply and then sum all, it is not a dot product or matrix product. >>> > > Nice illustration is found here: >>> https://mlnotebook.github.io/post/CNN1/ >>> > > >>> > > so +/@:,@:* works. I don't know if there is a faster way to do it. >>> > > >>> > > Thanks, >>> > > Jon On Friday, April 19, 2019, 5:54:24 AM GMT+9, Raul Miller < >>> > [email protected]> wrote: >>> > > >>> > > They're also not equivalent. >>> > > >>> > > For example: >>> > > >>> > > (i.2 3 4) +/@:,@:* i.2 3 >>> > > 970 >>> > > (i.2 3 4) +/ .* i.2 3 >>> > > |length error >>> > > >>> > > I haven't studied the possibilities of this code base enough to know >>> > > how relevant this might be, but if you're working with rank 4 arrays, >>> > > this kind of thing might matter. >>> > > >>> > > On the other hand, if the arrays handled by +/@:,@:* are the same >>> > > shape, then +/ .*&, might be what you want. (Then again... any change >>> > > introduced on "performance" grounds should get at least enough >>> testing >>> > > to show that there's a current machine where that change provides >>> > > significant benefit for plausible data.) >>> > > >>> > > Thanks, >>> > > >>> > > -- >>> > > Raul >>> > > >>> > > On Thu, Apr 18, 2019 at 4:16 PM Henry Rich <[email protected]> >>> wrote: >>> > > > >>> > > > FYI: +/@:*"1 and +/ . * are two ways of doing dot-products fast. >>> > > > +/@:,@:* is not as fast. >>> > > > >>> > > > Henry Rich >>> > > > >>> > > > On 4/18/2019 10:38 AM, jonghough via Programming wrote: >>> > > > > >>> > > > > Regarding the test network I sent in the previous email, it will >>> not >>> > work. This one should: >>> > > > > >>> > > > > NB. >>> > >>> ========================================================================= >>> > > > > >>> > > > > >>> > > > > NB. 3 classes >>> > > > > NB. horizontal lines (A), vertical lines (B), diagonal lines (C). >>> > > > > NB. each class is a 3 channel matrix 3x8x8 >>> > > > > >>> > > > > >>> > > > > A1=: 3 8 8 $ 1 1 1 1 1 1 1 1, 0 0 0 0 0 0 0 0, 0 0 0 0 0 0 0 0, >>> 1 1 >>> > 1 1 1 1 1 1, 0 0 0 0 0 0 0 0, 1 1 1 1 1 1 1 1, 1 1 1 1 1 1 1 1, 0 0 0 >>> 0 0 0 >>> > 0 0, 0 0 0 0 0 0 0 0 >>> > > > > >>> > > > > A2=: 3 8 8 $ 1 1 1 1 1 1 1 1, 0 0 0 0 0 0 0 0, 1 1 1 1 1 1 1 1, >>> 1 1 >>> > 1 1 1 1 1 1, 0 0 0 0 0 0 0 0 >>> > > > > >>> > > > > A3=: 3 8 8 $ 1 1 1 1 1 1 1 1, 0 0 0 0 0 0 0 0, 0 0 0 0 0 0 0 0 >>> > > > > >>> > > > > A4=: 3 8 8 $ 1 1 1 1 1 1 1 1, 0 0 0 0 0 0 0 0, 0 0 0 0 0 0 0 0, >>> 1 1 >>> > 1 1 1 1 1 1, 1 1 1 1 1 1 1 1, 0 0 0 0 0 0 0 0 >>> > > > > >>> > > > > A5=: 2 |. A4 >>> > > > > >>> > > > > >>> > > > > >>> > > > > B1=: |:"2 A1 >>> > > > > >>> > > > > B2=: |:"2 A2 >>> > > > > >>> > > > > B3=: |:"2 A3 >>> > > > > >>> > > > > B4=: |:"2 A4 >>> > > > > >>> > > > > B5=: |:"2 A5 >>> > > > > >>> > > > > >>> > > > > >>> > > > > C1=: 3 8 8 $ 1 0 0 0 0 0 0 1, 0 1 0 0 0 0 1 0, 0 0 1 0 0 1 0 0, >>> 0 0 >>> > 0 1 1 0 0 0, 0 0 0 1 1 0 0 0, 0 0 1 0 0 1 0 0, 0 1 0 0 0 0 1 0, 1 0 0 >>> 0 0 0 >>> > 0 1 >>> > > > > >>> > > > > C2=: 3 8 8 $ 1 0 0 0 0 0 0 0, 0 1 0 0 0 0 0 0, 0 0 1 0 0 0 0 0, >>> 0 0 >>> > 0 1 0 0 0 0, 0 0 0 0 1 0 0 0, 0 0 0 0 0 1 0 0, 0 0 0 0 0 0 1 0, 0 0 0 >>> 0 0 0 >>> > 0 1 >>> > > > > >>> > > > > C3=: 3 8 8 $ 1 0 1 0 1 0 0 0, 0 1 0 1 0 1 0 0, 0 0 1 0 1 0 1 0, >>> 0 0 >>> > 0 1 0 1 0 1, 1 0 0 0 1 0 1 0, 0 1 0 0 0 1 0 1, 1 0 1 0 0 0 1 0, 0 1 0 >>> 1 0 0 >>> > 0 1 >>> > > > > >>> > > > > C4=: |."1 C3 >>> > > > > >>> > > > > C5=: 3 8 8 $ 1 1 1 1 0 0 0 0, 0 0 1 1 1 1 0 0, 0 0 0 0 1 1 1 1, >>> 1 1 >>> > 0 0 0 0 1 1, 1 1 1 1 0 0 0 0, 0 0 1 1 1 1 0 0, 0 0 0 0 1 1 1 1, 1 1 0 >>> 0 0 0 >>> > 1 1 >>> > > > > >>> > > > > >>> > > > > >>> > > > > A=: 5 3 8 8 $, A1, A2, A3, A4, A5 >>> > > > > >>> > > > > B=: 5 3 8 8 $, B1, B2, B3, B4, B5 >>> > > > > >>> > > > > C=: 5 3 8 8 $, C1, C2, C3, C4, C5 >>> > > > > >>> > > > > INPUT=: A,B,C >>> > > > > >>> > > > > OUTPUT=: 15 3 $ 1 0 0, 1 0 0, 1 0 0, 1 0 0, 1 0 0, 0 1 0, 0 1 0, >>> 0 1 >>> > 0, 0 1 0, 0 1 0, 0 0 1, 0 0 1, 0 0 1, 0 0 1, 0 0 1 >>> > > > > >>> > > > > >>> > > > > >>> > > > > pipe=: (10;10;'softmax';1;'l2';0.0001) conew 'NNPipeline' >>> > > > > >>> > > > > c1=: ((10 3 4 4);2;'relu';'adam';0.01;0) conew 'Conv2D' >>> > > > > >>> > > > > b1=: (0; 1 ;0.0001;10;0.01) conew 'BatchNorm2D' >>> > > > > >>> > > > > a1=: 'relu' conew 'Activation' >>> > > > > >>> > > > > >>> > > > > >>> > > > > c2=: ((12 10 2 2); 1;'relu';'adam';0.01;0) conew 'Conv2D' >>> > > > > >>> > > > > b2=: (0; 1 ;0.0001;5;0.01) conew 'BatchNorm2D' >>> > > > > >>> > > > > a2=: 'relu' conew 'Activation' >>> > > > > >>> > > > > p1=: 2 conew 'PoolLayer' >>> > > > > >>> > > > > >>> > > > > >>> > > > > fl=: 3 conew 'FlattenLayer' >>> > > > > >>> > > > > fc=: (12;3;'softmax';'adam';0.01) conew 'SimpleLayer' >>> > > > > >>> > > > > b3=: (0; 1 ;0.0001;2;0.01) conew 'BatchNorm' >>> > > > > >>> > > > > a3=: 'softmax' conew 'Activation' >>> > > > > >>> > > > > >>> > > > > >>> > > > > addLayer__pipe c1 >>> > > > > >>> > > > > addLayer__pipe p1 >>> > > > > >>> > > > > NB.addLayer__pipe b1 >>> > > > > >>> > > > > addLayer__pipe a1 >>> > > > > >>> > > > > addLayer__pipe c2 >>> > > > > >>> > > > > NB.addLayer__pipe b2 >>> > > > > >>> > > > > addLayer__pipe a2 >>> > > > > >>> > > > > addLayer__pipe fl >>> > > > > >>> > > > > addLayer__pipe fc >>> > > > > >>> > > > > NB.addLayer__pipe b3 >>> > > > > >>> > > > > addLayer__pipe a3 >>> > > > > >>> > > > > require 'plot viewmat' >>> > > > > NB. check the input images (per channel) >>> > > > > NB. viewmat"2 A1 >>> > > > > NB. viewmat"2 B1 >>> > > > > NB. viewmat"2 C1 >>> > > > > >>> > > > > >>> > > > > OUTPUT fit__pipe INPUT NB. <--- should get 100%ish accuracy after >>> > only a few iterations. >>> > > > > NB. >>> > >>> ========================================================================= >>> > > > > >>> > > > > >>> > > > > Running the above doesn't prove much, as there is no training / >>> > testing set split. It is just to see *if* the training will push the >>> > networks parameters in the correct direction. Getting accurate >>> predictions >>> > on all the A,B,C images will at least show that the network is not >>> doing >>> > anything completely wrong. It is also just useful as a playground to >>> see if >>> > different ideas work. >>> > > > > >>> > > > > You can test the accuracy with >>> > > > > OUTPUT -:"1 1 (=>./)"1 >{: predict__pipe INPUT >>> > > > > On Thursday, April 18, 2019, 11:36:35 AM GMT+9, Brian >>> Schott < >>> > [email protected]> wrote: >>> > > > > >>> > > > > I have renamed this message because the topic has changed, but >>> > considered >>> > > > > moving it to jchat as well. However I settled on jprogramming >>> > because there >>> > > > > are definitely some j programming issues to discuss. >>> > > > > >>> > > > > Jon, >>> > > > > >>> > > > > Your script code is beautifully commented and very valuable, >>> imho. >>> > The lack >>> > > > > of an example has slowed down my study of the script, but now I >>> have >>> > some >>> > > > > questions and comments. >>> > > > > >>> > > > > I gather from your comments that the word tensor is used to >>> > designate a 4 >>> > > > > dimensional array. That's new to me, but it is very logical. >>> > > > > >>> > > > > Your definition convFunc=: +/@:,@:* works very well. However, for >>> > some >>> > > > > reason I wish I could think of a way to defined convFunc in >>> terms of >>> > X=: >>> > > > > dot=: +/ . * . >>> > > > > >>> > > > > The main insight I have gained from your code is that (x u;.+_3 >>> y) >>> > can be >>> > > > > used with x of shape 2 n where n>2 (and not just 2 2). This is >>> great >>> > > > > information. And that you built the convFunc directly into cf is >>> > also very >>> > > > > enlightening. >>> > > > > >>> > > > > I have created a couple of examples of the use of your function >>> `cf` >>> > to >>> > > > > better understand how it works. [The data is borrowed from the >>> fine >>> > example >>> > > > > at http://cs231n.github.io/convolutional-networks/#conv . Beware >>> > that the >>> > > > > dynamic example seen at the link changes everytime the page is >>> > refreshed, >>> > > > > so you will not see the exact data I present, but the shapes of >>> the >>> > data >>> > > > > are constant.] >>> > > > > >>> > > > > Notice that in my first experiments both `filter` and the RHA of >>> > cf"3 are >>> > > > > arrays and not tensors. Consequently(?) the result is an array, >>> not a >>> > > > > tensor, either. >>> > > > > >>> > > > > i=: _7]\".;._2 (0 : 0) >>> > > > > 0 0 0 0 0 0 0 >>> > > > > 0 0 0 1 2 2 0 >>> > > > > 0 0 0 2 1 0 0 >>> > > > > 0 0 0 1 2 2 0 >>> > > > > 0 0 0 0 2 0 0 >>> > > > > 0 0 0 2 2 2 0 >>> > > > > 0 0 0 0 0 0 0 >>> > > > > 0 0 0 0 0 0 0 >>> > > > > 0 2 1 2 2 2 0 >>> > > > > 0 0 1 0 2 0 0 >>> > > > > 0 1 1 1 1 1 0 >>> > > > > 0 2 0 0 0 2 0 >>> > > > > 0 0 0 2 2 2 0 >>> > > > > 0 0 0 0 0 0 0 >>> > > > > 0 0 0 0 0 0 0 >>> > > > > 0 0 0 1 2 1 0 >>> > > > > 0 1 1 0 0 0 0 >>> > > > > 0 2 1 2 0 2 0 >>> > > > > 0 1 0 0 2 2 0 >>> > > > > 0 1 0 1 2 2 0 >>> > > > > 0 0 0 0 0 0 0 >>> > > > > ) >>> > > > > >>> > > > > k =: _3]\".;._2(0 :0) >>> > > > > 1 0 0 >>> > > > > 1 _1 0 >>> > > > > _1 _1 1 >>> > > > > 0 _1 1 >>> > > > > 0 0 1 >>> > > > > 0 _1 1 >>> > > > > 1 0 1 >>> > > > > 0 _1 0 >>> > > > > 0 _1 0 >>> > > > > ) >>> > > > > >>> > > > > $i NB. 3 7 7 >>> > > > > $k NB. 3 3 3 >>> > > > > >>> > > > > filter =: k >>> > > > > convFunc=: +/@:,@:* >>> > > > > >>> > > > > cf=: 4 : '|:"2 |: +/ x filter&(convFunc"3 3);._3 y' >>> > > > > (1 2 2,:3 3 3) cf"3 i NB. 3 3$1 1 _2 _2 3 _7 _3 1 0 >>> > > > > >>> > > > > My next example makes both the `filter` and the RHA into >>> tensors. And >>> > > > > notice the shape of the result shows it is a tensor, also. >>> > > > > >>> > > > > filter2 =: filter,:_1+filter >>> > > > > cf2=: 4 : '|:"2 |: +/ x filter2&(convFunc"3 3);._3 y' >>> > > > > $ (1 2 2,:3 3 3) cf2"3 i,:5+i NB. 2 2 3 3 >>> > > > > >>> > > > > Much of my effort regarding CNN has been studying the literature >>> that >>> > > > > discusses efficient ways of computing these convolutions by >>> > translating the >>> > > > > filters and the image data into flattened (and somewhat sparse} >>> > forms that >>> > > > > can be restated in matrix formats. These matrices accomplish the >>> > > > > convolution and deconvolution as *efficient* matrix products. >>> Your >>> > > > > demonstration of the way that j's ;._3 can be so effective >>> > challenges the >>> > > > > need for such efficiencies. >>> > > > > >>> > > > > On the other hand, I could use some help understanding how the 1 >>> 0 2 >>> > 3 |: >>> > > > > transpose you apply to `filter` is effective in the >>> backpropogation >>> > stage. >>> > > > > Part of my confusion is that I would have thought the transpose >>> > would have >>> > > > > been 0 1 3 2 |:, instead. Can you say more about that? >>> > > > > >>> > > > > I have yet to try to understand your verbs `forward` and >>> `backward`, >>> > but I >>> > > > > look forward to doing so. >>> > > > > >>> > > > > I could not find definitions for the following functions and >>> wonder >>> > if you >>> > > > > can say more about them, please? >>> > > > > >>> > > > > bmt_jLearnUtil_ >>> > > > > setSolver >>> > > > > >>> > > > > I noticed that your definitions of relu and derivRelu were more >>> > complicated >>> > > > > than mine, so I attempted to test yours out against mine as >>> follows. >>> > > > > >>> > > > > >>> > > > > >>> > > > > relu =: 0&>. >>> > > > > derivRelu =: 0&< >>> > > > > (relu -: 0:`[@.>&0) i: 4 >>> > > > > 1 >>> > > > > (derivRelu -: 0:`1:@.>&0) i: 4 >>> > > > > 1 >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > On Sun, Apr 14, 2019 at 8:31 AM jonghough via Programming < >>> > > > > [email protected]> wrote: >>> > > > > >>> > > > >> I had a go writing conv nets in J. >>> > > > >> See >>> > > > >> https://github.com/jonghough/jlearn/blob/master/adv/conv2d.ijs >>> > > > >> >>> > > > >> This uses ;.3 to do the convolutions. Using a version of this , >>> > with a >>> > > > >> couple of fixes/, I managed to get 88% accuracy on the cifar-10 >>> > imageset. >>> > > > >> Took several days to run, as my algorithms are not optimized in >>> any >>> > way, >>> > > > >> and no gpu was used. >>> > > > >> If you look at the references in the above link, you may get >>> some >>> > ideas. >>> > > > >> >>> > > > >> the convolution verb is defined as: >>> > > > >> cf=: 4 : 0 >>> > > > >> |:"2 |: +/ x filter&(convFunc"3 3);._3 y >>> > > > >> ) >>> > > > >> >>> > > > >> Note that since the input is an batch of images, each 3-d >>> (width, >>> > height, >>> > > > >> channels), we are actually doing the whole forward pass over a >>> 4d >>> > array, >>> > > > >> and outputting another 4d array of different shape, depending on >>> > output >>> > > > >> channels, filter width, and filter height. >>> > > > >> >>> > > > >> Thanks, >>> > > > >> Jon >>> > > > >> >>> > > > > Thank you, >>> > > > > >>> > > > >>> > > > >>> > > > --- >>> > > > This email has been checked for viruses by AVG. >>> > > > https://www.avg.com >>> > > > >>> > > > >>> ---------------------------------------------------------------------- >>> > > > For information about J forums see >>> http://www.jsoftware.com/forums.htm >>> > > >>> ---------------------------------------------------------------------- >>> > > For information about J forums see >>> http://www.jsoftware.com/forums.htm >>> > > >>> ---------------------------------------------------------------------- >>> > > For information about J forums see >>> http://www.jsoftware.com/forums.htm >>> > ---------------------------------------------------------------------- >>> > For information about J forums see http://www.jsoftware.com/forums.htm >>> >>> >>> >>> -- >>> >>> Devon McCormick, CFA >>> >>> Quantitative Consultant >>> ---------------------------------------------------------------------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >>> >>> ---------------------------------------------------------------------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >>> >> >> >> -- >> >> Devon McCormick, CFA >> >> Quantitative Consultant >> >> > > -- > > Devon McCormick, CFA > > Quantitative Consultant > > -- Devon McCormick, CFA Quantitative Consultant ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
