[Jprogramming] Information theory in J

Scott Locklin Thu, 21 Mar 2013 03:30:07 -0700


Dan Bron wrote in the "Learning J language" thread:


"If you post your explicit (that's what we call the 3 : 'x stuff y other 
stuff' style of code), I can take a whack at showing you how to translate it 
to tacit code (that's what we call the cartoon-characters-cursing style of 
code).  " 

Thank you for your kind offer of education for the n00b. I have the basic 
pieces working to my satisfaction this evening. It ties out nicely with the R 
version from the package "infotheo." What is more, it works *much* faster: I 
haven't done a proper benchmark yet, but it looks like a factor of 100 or more 
on big problems. This is extremely impressive to me as, 1) the R version is 
written in C++ (albeit, imperfect C++), and 2) I barely know what I'm doing in 
J. I think the histogram verb isn't quite right, but it's close enough for this 
to be useful. 


I'm going to attempt making this more tacit myself right after I post this 
(mmdow is pretty obvious), but looking at how a J ninja does this should be 
super helpful. 


The functions:

NB. obtained from the phrase book:

histogram=: <: @ (#/.~) @ (i.@#@[ , I.)
Round =: (%&) (<.@:(1r2+])&.:) 

NB. need a discretization/histogram
discretize=: 3 : 0
nbins=. 1 Round (<.#y)^0.5
nbins discretize (2.2-2.2+y)
:
max=.>./y
min=.<./y
width=. (max-min) % x
binner =. min+(i.x)*width
binner histogram y
) 

NB.  entropy of histograms
entropy=: 3 : 0
nout=. +/ y
en=. +/  _1*y * ^. y
(en%nout) + ^.nout
)

NB. now, need miller-madow entropy
mmdow=: 3 : 0
en =. entropy y
m=. # y
nout =. +/ y
en + (m -1) % nout * 2
)


usage:
mmdow@discretize (random stuff)

In case anyone is curious about this:

There are histogram approximations to information theoretic quantities. 
"entropy" produces an approximation to Shannon information measured in nats (* 
log_2 to make it bits). "mmdow" applies the Miller-Madow bias correction. These 
work best when the histogram is well populated (aka, something like sqrt(N) 
bins for N points).


What is it good for? Information theory allows us to calculate all kinds of 
interesting quantities. I use "mutual information" as an estimator for 
nonlinear correlations. It is also a useful quantity to weight  input features 
to machine learning algorithms, or do feature selection/reduction.


This is the ultimate immediate use I will put the code to; I have a nice little 
"instance learning" gizmo I will be putting up on github, once all the parts 
work. The inherent speed of J (and a few design choices I made) will make it 
the fastest and one of the most powerful instance learning packages available 
in any interpretor I know of. Instance learning is not presently fashionable, 
but I have found it does very well on most problems, and beats the pants off of 
more "modern" learners like SVM and RandomForests when the data is noisy and 
the number of relevant features is reasonable.


-Scott
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

[Jprogramming] Information theory in J

Reply via email to