entropy =: +/@:(* ^.) (^.@] - %) +/ or
entropy =: =: -@(+/)@:(* ^.)@(% +/) Henry Rich On 3/21/2013 6:29 AM, Scott Locklin wrote:
Dan Bron wrote in the "Learning J language" thread: "If you post your explicit (that's what we call the 3 : 'x stuff y other stuff' style of code), I can take a whack at showing you how to translate it to tacit code (that's what we call the cartoon-characters-cursing style of code). " Thank you for your kind offer of education for the n00b. I have the basic pieces working to my satisfaction this evening. It ties out nicely with the R version from the package "infotheo." What is more, it works *much* faster: I haven't done a proper benchmark yet, but it looks like a factor of 100 or more on big problems. This is extremely impressive to me as, 1) the R version is written in C++ (albeit, imperfect C++), and 2) I barely know what I'm doing in J. I think the histogram verb isn't quite right, but it's close enough for this to be useful. I'm going to attempt making this more tacit myself right after I post this (mmdow is pretty obvious), but looking at how a J ninja does this should be super helpful. The functions: NB. obtained from the phrase book: histogram=: <: @ (#/.~) @ (i.@#@[ , I.) Round =: (%&) (<.@:(1r2+])&.:) NB. need a discretization/histogram discretize=: 3 : 0 nbins=. 1 Round (<.#y)^0.5 nbins discretize (2.2-2.2+y) : max=.>./y min=.<./y width=. (max-min) % x binner =. min+(i.x)*width binner histogram y ) NB. entropy of histograms entropy=: 3 : 0 nout=. +/ y en=. +/ _1*y * ^. y (en%nout) + ^.nout ) NB. now, need miller-madow entropy mmdow=: 3 : 0 en =. entropy y m=. # y nout =. +/ y en + (m -1) % nout * 2 ) usage: mmdow@discretize (random stuff) In case anyone is curious about this: There are histogram approximations to information theoretic quantities. "entropy" produces an approximation to Shannon information measured in nats (* log_2 to make it bits). "mmdow" applies the Miller-Madow bias correction. These work best when the histogram is well populated (aka, something like sqrt(N) bins for N points). What is it good for? Information theory allows us to calculate all kinds of interesting quantities. I use "mutual information" as an estimator for nonlinear correlations. It is also a useful quantity to weight input features to machine learning algorithms, or do feature selection/reduction. This is the ultimate immediate use I will put the code to; I have a nice little "instance learning" gizmo I will be putting up on github, once all the parts work. The inherent speed of J (and a few design choices I made) will make it the fastest and one of the most powerful instance learning packages available in any interpretor I know of. Instance learning is not presently fashionable, but I have found it does very well on most problems, and beats the pants off of more "modern" learners like SVM and RandomForests when the data is noisy and the number of relevant features is reasonable. -Scott ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
