Re: statistics library?

2011-10-10 Thread Lee Spector
On Oct 10, 2011, at 4:36 PM, Ben Evans wrote:
> There should be 1.2.4 (and a snapshot of 1.3.0) up on clojars now.
> 
> Could I ask you to give one of them a go, and mail your findings to
> the list? We have our regular Incanter Hack Day coming up next
> weekend, so if things are still b0rken for you, I can try to find a
> developer to look at the problem for you at the Hack day.
> 


Searching for incanter at clojars I find only one 1.2.4 item: 
incanter/incanter-latex 1.2.4 -- I don't think this is what I want... Is it? I 
just want the statistics functions, not anything having to do with latex.

I see the 1.3 snapshot, but when I try it by including [incanter 
"1.3.0-SNAPSHOT"] in my project.clj dependencies "lein deps" fails with:

---
Unable to resolve artifact: Missing:
--
1) incanter:incanter-latex:jar:1.3.0-SNAPSHOT
---

So I haven't yet been able to actually try the statistical tests.

 -Lee

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: statistics library?

2011-10-10 Thread Ben Evans
Hi Lee,

On Wed, Sep 28, 2011 at 12:43 AM, Lee Spector  wrote:
> On Sep 27, 2011, at 5:44 PM, David Powell wrote:
>
>> I see that there was a recent fix made to Incanter:
>>
>> Fixed typo in :lower-tail? keyword.
>> This was causing the complement of the p-value to be returned.
>>
>> https://github.com/liebke/incanter/pull/39
>>
>> Have you tried the latest version in git?  Does this fix the problem?
>
> Hmm. I had asked about the version on the Incanter list too. I now see that I 
> was using a *newer* version than the newest one at 
> https://github.com/liebke/incanter.
>
> I grabbed what appeared to be the newest on clojars, which is [incanter 
> "1.2.3"], while the newest download on that github project page appears to be 
> 1.2.2 from April 20, 2010.
>
> It does sound like the comment that you quoted might indeed be about the bug 
> that I ran into, so maybe it's fixed in some version of Incanter somewhere...

There should be 1.2.4 (and a snapshot of 1.3.0) up on clojars now.

Could I ask you to give one of them a go, and mail your findings to
the list? We have our regular Incanter Hack Day coming up next
weekend, so if things are still b0rken for you, I can try to find a
developer to look at the problem for you at the Hack day.

Thanks,

Ben

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: statistics library?

2011-09-28 Thread Daniel
Depending on the project (and I don't know if it's still supported in
1.3), you ought to be able to leverage Mathematica Player with
Clojuratica for more powerful operations.

On Sep 27, 6:43 pm, Lee Spector  wrote:
> On Sep 27, 2011, at 5:44 PM, David Powell wrote:
>
> > I see that there was a recent fix made to Incanter:
>
> > Fixed typo in :lower-tail? keyword.
> > This was causing the complement of the p-value to be returned.
>
> >https://github.com/liebke/incanter/pull/39
>
> > Have you tried the latest version in git?  Does this fix the problem?
>
> Hmm. I had asked about the version on the Incanter list too. I now see that I 
> was using a *newer* version than the newest one 
> athttps://github.com/liebke/incanter.
>
> I grabbed what appeared to be the newest on clojars, which is [incanter 
> "1.2.3"], while the newest download on that github project page appears to be 
> 1.2.2 from April 20, 2010.
>
> It does sound like the comment that you quoted might indeed be about the bug 
> that I ran into, so maybe it's fixed in some version of Incanter somewhere... 
> But for my current purposes I have more faith in 
> [org.apache.commons/commons-math "2.0"].
>
> Thanks,
>
>  -Lee

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: statistics library?

2011-09-27 Thread Lee Spector

On Sep 27, 2011, at 5:44 PM, David Powell wrote:

> I see that there was a recent fix made to Incanter:
> 
> Fixed typo in :lower-tail? keyword.
> This was causing the complement of the p-value to be returned.
> 
> https://github.com/liebke/incanter/pull/39
> 
> 
> Have you tried the latest version in git?  Does this fix the problem?

Hmm. I had asked about the version on the Incanter list too. I now see that I 
was using a *newer* version than the newest one at 
https://github.com/liebke/incanter. 

I grabbed what appeared to be the newest on clojars, which is [incanter 
"1.2.3"], while the newest download on that github project page appears to be 
1.2.2 from April 20, 2010.

It does sound like the comment that you quoted might indeed be about the bug 
that I ran into, so maybe it's fixed in some version of Incanter somewhere... 
But for my current purposes I have more faith in 
[org.apache.commons/commons-math "2.0"].

Thanks,

 -Lee

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: statistics library?

2011-09-27 Thread David Powell
> Again, if I understand correctly, under no circumstances should the p-value 
> ever be outside of the range from 0 to 1. It's a probability, and no value 
> outside of that range makes any sense. But Incanter sometimes returns 
> p-values greater than 1.

I see that there was a recent fix made to Incanter:

Fixed typo in :lower-tail? keyword.
This was causing the complement of the p-value to be returned.

https://github.com/liebke/incanter/pull/39


Have you tried the latest version in git?  Does this fix the problem?

-- 
Dave

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: statistics library?

2011-09-27 Thread labwork07
Yes, those errors in Incanter are unfortunate. I had another weird one  
occurred which David Liebke attributed to the underlying Colt library.

user=> (sd (repeat 9 0.65))
NaN

The sd function calls the variance function, which calls a function in
Colt; the trouble is Colt is returning a number very, very close to
zero, but just a bit under (ie it's negative)

user=> (variance (repeat 9 0.65))
-1.1102230246251565E-16

and the sqrt of a negative number is NaN.

On , Lee Spector  wrote:


I need to do some pretty simple statistics in a Clojure program and  
Incanter produces results that I think must be wrong (details below). So  
I don't think I can trust it.




Is there other code for statistical testing out there? Or maybe somebody  
could explain to me how to interpret the seemingly anomalous Incanter  
results? (I received no reply on the Incanter list). I only need a t-test  
at the moment, but this is a bit of a pain to code from scratch (because  
of the table that it uses).




I'm trying to use an un-paired, two-tailed t-test to tell whether the  
means of two sets of numbers differ significantly. (Whether or not this  
is the right test for my application -- eg whether the assumptions of  
normal distributions are valid -- is another matter. I just want to know  
it the tests are being calculated correctly.)




If I understand correctly the t-test should produce a p-value which  
ranges from 0 to 1. If it's less than 0.05 we can say that the means  
differ. (Again, there would be more to say here about what's  
statistically meaningful, but that discussion isn't relevant to my  
question).




Again, if I understand correctly, under no circumstances should the  
p-value ever be outside of the range from 0 to 1. It's a probability, and  
no value outside of that range makes any sense. But Incanter sometimes  
returns p-values greater than 1.





Sometimes it seems to give reasonable results:





=> (use 'incanter.stats)



nil





=> (t-test [2 3 4 3 2 3] :y [3 4 5 6 5 4 3])



{:conf-int [-2.6129722457891322 -0.2917896589727722],



:x-mean 2.8335,



:t-stat -2.7883256115163184,



:p-value 0.018335366451909547,



:n1 6,



:df 10.519255193727584,



:n2 7,



:y-var 1.2380952380952408,



:x-var 0.5658,



:y-mean 4.285714285714286}




But in other cases the :p-value is over 1. Here's an example from  
Incanter's own documentation:





=> (t-test (range 1 11) :mu 0)



{:conf-int [3.33414941027723 7.66585058972277],



:x-mean 5.5,



:t-stat 5.744562646538029,



:p-value 1.9997218039889517,



:n1 10,



:df 9,



:n2 nil,



:y-var nil,



:x-var 9.166,



:y-mean nil}




Here's an example that's closer to what can arise in my application, and  
again I just don't see how the calculation can be right if it's producing  
this kind of p-value:





=> (t-test '(40 5 2) :y '(1 5 1))



{:conf-int [-39.46068349230474 66.12735015897141],



:x-mean 15.666,



:t-stat 1.0866516498483223,



:p-value 1.6115506955016772,



:n1 3,



:df 2.0477900396893336,



:n2 3,



:y-var 5.332,



:x-var 446.37,



:y-mean 2.3335}





Am I missing something that would rationalize these results?




If not, then does anyone have a pointer to more reliable statistics code  
in Clojure? Or pointers to using a Java library? I see that there are  
libraries out there -- eg  
http://commons.apache.org/math/api-1.2/org/apache/commons/math/stat/inference/TTest.html  
-- but Java interop is not my strong suit and I'm not sure how to call  
this from my Clojure code.





Any pointers would be appreciated.





Thanks,





-Lee





--



You received this message because you are subscribed to the Google



Groups "Clojure" group.



To post to this group, send email to clojure@googlegroups.com


Note that posts from new members are moderated - please be patient with  
your first post.



To unsubscribe from this group, send email to



clojure+unsubscr...@googlegroups.com



For more options, visit this group at



http://groups.google.com/group/clojure?hl=en


--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: statistics library?

2011-09-27 Thread Lee Spector

On Sep 27, 2011, at 1:37 PM, Johann Hibschman wrote:

> Johann Hibschman  writes:
> 
>> There may be an easier way to do this, but this worked for me:
>> 
>>  user=> (org.apache.commons.math.stat.inference.TestUtils/tTest
>>(into-array Double/TYPE [40 5 2]) (into-array Double/TYPE [1 5 1]))
>>  0.3884493044983227
> 
> I should have used (double-array [40 5 2]) here, but for some reason I
> couldn't remember it until I hit send.

Hooray! This is beautiful.

I had to tinker a bit to find/download the library jars and get my runtime 
environment to find them, but then this did exactly what I wanted.

Thanks so much for the confirmation and solution!

 -Lee

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: statistics library?

2011-09-27 Thread Johann Hibschman
Johann Hibschman  writes:

> There may be an easier way to do this, but this worked for me:
>
>   user=> (org.apache.commons.math.stat.inference.TestUtils/tTest
> (into-array Double/TYPE [40 5 2]) (into-array Double/TYPE [1 5 1]))
>   0.3884493044983227

I should have used (double-array [40 5 2]) here, but for some reason I
couldn't remember it until I hit send.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: statistics library?

2011-09-27 Thread Johann Hibschman
Lee Spector  writes:

> I need to do some pretty simple statistics in a Clojure program and
> Incanter produces results that I think must be wrong (details
> below). So I don't think I can trust it.

I agree, those all look weird to me.

> Is there other code for statistical testing out there?

I'd reach for commons-math, but I don't have much experience.

> If I understand correctly the t-test should produce a p-value which
> ranges from 0 to 1. If it's less than 0.05 we can say that the means
> differ. (Again, there would be more to say here about what's
> statistically meaningful, but that discussion isn't relevant to my
> question).

This is true.

> => (t-test (range 1 11) :mu 0)
> {:conf-int [3.33414941027723 7.66585058972277],
> :x-mean 5.5,
> :t-stat 5.744562646538029,
> :p-value 1.9997218039889517,
> :n1 10,
> :df 9,
> :n2 nil,
> :y-var nil,
> :x-var 9.166,
> :y-mean nil}

This looks wrong to me.  At least according to R, the p-value is
0.00278.  Interestingly, this is 2 - [incanter's p].

> => (t-test '(40 5 2) :y '(1 5 1))
> {:conf-int [-39.46068349230474 66.12735015897141],
>  :x-mean 15.666,
>  :t-stat 1.0866516498483223,
>  :p-value 1.6115506955016772,
>  :n1 3,
>  :df 2.0477900396893336,
>  :n2 3,
>  :y-var 5.332,
>  :x-var 446.37,
>  :y-mean 2.3335}

R gives 0.3884, which is again 2 - [incanter's p].  Fishy.

I would say that there's a bug in Incanter's distribution function, at
least when calculating values in the tails.

> If not, then does anyone have a pointer to more reliable statistics
> code in Clojure? Or pointers to using a Java library? I see that there
> are libraries out there --
> e.g. 
> http://commons.apache.org/math/api-1.2/org/apache/commons/math/stat/inference/TTest.html
> -- but Java interop is not my strong suit and I'm not sure how to call
> this from my Clojure code.

There may be an easier way to do this, but this worked for me:

  user=> (org.apache.commons.math.stat.inference.TestUtils/tTest
(into-array Double/TYPE [40 5 2]) (into-array Double/TYPE [1 5 1]))
  0.3884493044983227

Hope that helps,
Johann

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


statistics library?

2011-09-27 Thread Lee Spector

I need to do some pretty simple statistics in a Clojure program and Incanter 
produces results that I think must be wrong (details below). So I don't think I 
can trust it.

Is there other code for statistical testing out there? Or maybe somebody could 
explain to me how to interpret the seemingly anomalous Incanter results? (I 
received no reply on the Incanter list). I only need a t-test at the moment, 
but this is a bit of a pain to code from scratch (because of the table that it 
uses).

I'm trying to use an un-paired, two-tailed t-test to tell whether the means of 
two sets of numbers differ significantly. (Whether or not this is the right 
test for my application -- e.g. whether the assumptions of normal distributions 
are valid -- is another matter. I just want to know it the tests are being 
calculated correctly.)

If I understand correctly the t-test should produce a p-value which ranges from 
0 to 1. If it's less than 0.05 we can say that the means differ. (Again, there 
would be more to say here about what's statistically meaningful, but that 
discussion isn't relevant to my question).

Again, if I understand correctly, under no circumstances should the p-value 
ever be outside of the range from 0 to 1. It's a probability, and no value 
outside of that range makes any sense. But Incanter sometimes returns p-values 
greater than 1.

Sometimes it seems to give reasonable results:

=> (use 'incanter.stats)
nil

=> (t-test [2 3 4 3 2 3] :y [3 4 5 6 5 4 3])
{:conf-int [-2.6129722457891322 -0.2917896589727722],
 :x-mean 2.8335,
 :t-stat -2.7883256115163184,
 :p-value 0.018335366451909547,
 :n1 6,
 :df 10.519255193727584,
 :n2 7,
 :y-var 1.2380952380952408,
 :x-var 0.5658,
 :y-mean 4.285714285714286}

But in other cases the :p-value is over 1. Here's an example from Incanter's 
own documentation:

=> (t-test (range 1 11) :mu 0)
{:conf-int [3.33414941027723 7.66585058972277],
:x-mean 5.5,
:t-stat 5.744562646538029,
:p-value 1.9997218039889517,
:n1 10,
:df 9,
:n2 nil,
:y-var nil,
:x-var 9.166,
:y-mean nil}

Here's an example that's closer to what can arise in my application, and again 
I just don't see how the calculation can be right if it's producing this kind 
of p-value:

=> (t-test '(40 5 2) :y '(1 5 1))
{:conf-int [-39.46068349230474 66.12735015897141],
 :x-mean 15.666,
 :t-stat 1.0866516498483223,
 :p-value 1.6115506955016772,
 :n1 3,
 :df 2.0477900396893336,
 :n2 3,
 :y-var 5.332,
 :x-var 446.37,
 :y-mean 2.3335}

Am I missing something that would rationalize these results? 

If not, then does anyone have a pointer to more reliable statistics code in 
Clojure? Or pointers to using a Java library? I see that there are libraries 
out there -- e.g. 
http://commons.apache.org/math/api-1.2/org/apache/commons/math/stat/inference/TTest.html
 -- but Java interop is not my strong suit and I'm not sure how to call this 
from my Clojure code.

Any pointers would be appreciated.

Thanks,

 -Lee

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en