Actually I don't mind sharing the current J version.  It is immature and little 
more than a transliteration of the basic C code.  I hope eventually to have it 
all loopless, etc with proper ranking and tacitness.  I have both Prolog and 
Haskell versions but J is somewhat different than Haskell - in all the right 
ways.

Sent from my iPhone

> On Apr 18, 2017, at 11:31 AM, Xiao-Yong Jin <[email protected]> wrote:
> 
> 
>> On Apr 18, 2017, at 9:23 AM, Michael Goodrich <[email protected]> 
>> wrote:
>> 
>> Hi Henry,
>> 
>> Thanks for your interest.  I owe you some better information.
>> 
>> First off its not really an apples-apples comparison as the C version is
>> very mature with some performance tricks designed to reduce calculations to
>> the bare minimum (e.g., do not recalculate matrices but instead do selected
>> in place updates as necessary).  This gave me an 5-6X speed improvement in
>> the C version.
> 
> If you are only optimizing away unnecessary computations in C, your code will 
> not
> really beat a well written J code.  There are still a lot more you can do in 
> C.
> 
>> When I attempted to put same in the J code it ran SLOWER
>> than simply recalculating entire matrix although in many cases only a
>> column was actually updated, so i backed them out.
> 
> J does reasonable inplace (avoiding allocating and copying) updates.
> Look them up and see if you can better employ those.  In an interpreted
> mostly functional language like J, you want to minimize reallocating arrays
> and moving whole array.  Too much copying hurts more than extra floating
> points operations.
> 
>> This is a Markov Chain Monte Carlo Bayesian Artificial Neural Network
>> (Three Layer Perceptron) application that in the test problem produces
>> about 1e5 chain states (not saving them but streaming them to another C
>> prog) using a half dozen matrices the largest of which (for this test
>> problem) is about 200x5
> 
> It's really small and fits in L1 cache.  Naive C loops would have no problem.
> If you go larger than L2 cache, you will need to call dgemm or any cache
> friendly block multiplication algorithm for better performance.
> 
>> 
>> Another curiosity is that in the C version using a (user defined) sigmoid
>> vice 'tanh'  as the non linear activation (on all matrix elements) expands
>> the run time by 1.75X.  In the J version the same choice *reduces* run time
>> by about 20% over the 'tanh' J primitive (?).
>> 
>> sgmd =. monad : '1%(1+^-y)'
> 
> In any interpreted language, primitives are always going to outperform 
> composite
> functions.
> 
>> As far as releasing code, this is the outgrowth of my dissertation work and
>> I may hope someday to commercialize and so I reluctant to release it.  Pity
>> - I know you cant be sure what I doing to in order to diagnose the
>> situation, but perhaps we can find a way to accomplish what you want any
>> way by pursuing this together.
> 
> I don't see a good MCMC code in J, so I'm also developing my own.  Perhaps we
> cal share that part at some point so you don't have to give away your baby 
> neural network.
> 
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to