I need to do some pretty simple statistics in a Clojure program and Incanter produces results that I think must be wrong (details below). So I don't think I can trust it.
Is there other code for statistical testing out there? Or maybe somebody could explain to me how to interpret the seemingly anomalous Incanter results? (I received no reply on the Incanter list). I only need a t-test at the moment, but this is a bit of a pain to code from scratch (because of the table that it uses). I'm trying to use an un-paired, two-tailed t-test to tell whether the means of two sets of numbers differ significantly. (Whether or not this is the right test for my application -- e.g. whether the assumptions of normal distributions are valid -- is another matter. I just want to know it the tests are being calculated correctly.) If I understand correctly the t-test should produce a p-value which ranges from 0 to 1. If it's less than 0.05 we can say that the means differ. (Again, there would be more to say here about what's statistically meaningful, but that discussion isn't relevant to my question). Again, if I understand correctly, under no circumstances should the p-value ever be outside of the range from 0 to 1. It's a probability, and no value outside of that range makes any sense. But Incanter sometimes returns p-values greater than 1. Sometimes it seems to give reasonable results: => (use 'incanter.stats) nil => (t-test [2 3 4 3 2 3] :y [3 4 5 6 5 4 3]) {:conf-int [-2.6129722457891322 -0.2917896589727722], :x-mean 2.8333333333333335, :t-stat -2.7883256115163184, :p-value 0.018335366451909547, :n1 6, :df 10.519255193727584, :n2 7, :y-var 1.2380952380952408, :x-var 0.5666666666666658, :y-mean 4.285714285714286} But in other cases the :p-value is over 1. Here's an example from Incanter's own documentation: => (t-test (range 1 11) :mu 0) {:conf-int [3.33414941027723 7.66585058972277], :x-mean 5.5, :t-stat 5.744562646538029, :p-value 1.9997218039889517, :n1 10, :df 9, :n2 nil, :y-var nil, :x-var 9.166666666666666, :y-mean nil} Here's an example that's closer to what can arise in my application, and again I just don't see how the calculation can be right if it's producing this kind of p-value: => (t-test '(40 5 2) :y '(1 5 1)) {:conf-int [-39.46068349230474 66.12735015897141], :x-mean 15.666666666666666, :t-stat 1.0866516498483223, :p-value 1.6115506955016772, :n1 3, :df 2.0477900396893336, :n2 3, :y-var 5.333333333333332, :x-var 446.33333333333337, :y-mean 2.3333333333333335} Am I missing something that would rationalize these results? If not, then does anyone have a pointer to more reliable statistics code in Clojure? Or pointers to using a Java library? I see that there are libraries out there -- e.g. http://commons.apache.org/math/api-1.2/org/apache/commons/math/stat/inference/TTest.html -- but Java interop is not my strong suit and I'm not sure how to call this from my Clojure code. Any pointers would be appreciated. Thanks, -Lee -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en