Actually I don't mind sharing the current J version. It is immature and little more than a transliteration of the basic C code. I hope eventually to have it all loopless, etc with proper ranking and tacitness. I have both Prolog and Haskell versions but J is somewhat different than Haskell - in all the right ways.
Sent from my iPhone > On Apr 18, 2017, at 11:31 AM, Xiao-Yong Jin <[email protected]> wrote: > > >> On Apr 18, 2017, at 9:23 AM, Michael Goodrich <[email protected]> >> wrote: >> >> Hi Henry, >> >> Thanks for your interest. I owe you some better information. >> >> First off its not really an apples-apples comparison as the C version is >> very mature with some performance tricks designed to reduce calculations to >> the bare minimum (e.g., do not recalculate matrices but instead do selected >> in place updates as necessary). This gave me an 5-6X speed improvement in >> the C version. > > If you are only optimizing away unnecessary computations in C, your code will > not > really beat a well written J code. There are still a lot more you can do in > C. > >> When I attempted to put same in the J code it ran SLOWER >> than simply recalculating entire matrix although in many cases only a >> column was actually updated, so i backed them out. > > J does reasonable inplace (avoiding allocating and copying) updates. > Look them up and see if you can better employ those. In an interpreted > mostly functional language like J, you want to minimize reallocating arrays > and moving whole array. Too much copying hurts more than extra floating > points operations. > >> This is a Markov Chain Monte Carlo Bayesian Artificial Neural Network >> (Three Layer Perceptron) application that in the test problem produces >> about 1e5 chain states (not saving them but streaming them to another C >> prog) using a half dozen matrices the largest of which (for this test >> problem) is about 200x5 > > It's really small and fits in L1 cache. Naive C loops would have no problem. > If you go larger than L2 cache, you will need to call dgemm or any cache > friendly block multiplication algorithm for better performance. > >> >> Another curiosity is that in the C version using a (user defined) sigmoid >> vice 'tanh' as the non linear activation (on all matrix elements) expands >> the run time by 1.75X. In the J version the same choice *reduces* run time >> by about 20% over the 'tanh' J primitive (?). >> >> sgmd =. monad : '1%(1+^-y)' > > In any interpreted language, primitives are always going to outperform > composite > functions. > >> As far as releasing code, this is the outgrowth of my dissertation work and >> I may hope someday to commercialize and so I reluctant to release it. Pity >> - I know you cant be sure what I doing to in order to diagnose the >> situation, but perhaps we can find a way to accomplish what you want any >> way by pursuing this together. > > I don't see a good MCMC code in J, so I'm also developing my own. Perhaps we > cal share that part at some point so you don't have to give away your baby > neural network. > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
