entropy =: +/@:(* ^.) (^.@] - %) +/

or

entropy =: =: -@(+/)@:(* ^.)@(% +/)

Henry Rich

On 3/21/2013 6:29 AM, Scott Locklin wrote:


Dan Bron wrote in the "Learning J language" thread:

"If you post your explicit (that's what we call the 3 : 'x stuff y other
stuff' style of code), I can take a whack at showing you how to translate it
to tacit code (that's what we call the cartoon-characters-cursing style of
code).  "

Thank you for your kind offer of education for the n00b. I have the basic pieces working 
to my satisfaction this evening. It ties out nicely with the R version from the package 
"infotheo." What is more, it works *much* faster: I haven't done a proper 
benchmark yet, but it looks like a factor of 100 or more on big problems. This is 
extremely impressive to me as, 1) the R version is written in C++ (albeit, imperfect 
C++), and 2) I barely know what I'm doing in J. I think the histogram verb isn't quite 
right, but it's close enough for this to be useful.


I'm going to attempt making this more tacit myself right after I post this 
(mmdow is pretty obvious), but looking at how a J ninja does this should be 
super helpful.


The functions:

NB. obtained from the phrase book:

histogram=: <: @ (#/.~) @ (i.@#@[ , I.)
Round =: (%&) (<.@:(1r2+])&.:)

NB. need a discretization/histogram
discretize=: 3 : 0
nbins=. 1 Round (<.#y)^0.5
nbins discretize (2.2-2.2+y)
:
max=.>./y
min=.<./y
width=. (max-min) % x
binner =. min+(i.x)*width
binner histogram y
)

NB.  entropy of histograms
entropy=: 3 : 0
nout=. +/ y
en=. +/  _1*y * ^. y
(en%nout) + ^.nout
)

NB. now, need miller-madow entropy
mmdow=: 3 : 0
en =. entropy y
m=. # y
nout =. +/ y
en + (m -1) % nout * 2
)


usage:
mmdow@discretize (random stuff)

In case anyone is curious about this:

There are histogram approximations to information theoretic quantities. "entropy" 
produces an approximation to Shannon information measured in nats (* log_2 to make it bits). 
"mmdow" applies the Miller-Madow bias correction. These work best when the histogram is 
well populated (aka, something like sqrt(N) bins for N points).


What is it good for? Information theory allows us to calculate all kinds of interesting 
quantities. I use "mutual information" as an estimator for nonlinear 
correlations. It is also a useful quantity to weight  input features to machine learning 
algorithms, or do feature selection/reduction.


This is the ultimate immediate use I will put the code to; I have a nice little "instance 
learning" gizmo I will be putting up on github, once all the parts work. The inherent speed of 
J (and a few design choices I made) will make it the fastest and one of the most powerful instance 
learning packages available in any interpretor I know of. Instance learning is not presently 
fashionable, but I have found it does very well on most problems, and beats the pants off of more 
"modern" learners like SVM and RandomForests when the data is noisy and the number of 
relevant features is reasonable.


-Scott
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to