[Numpy-discussion] Alternative to R phyper

2012-04-30 Thread Bruno Santos
Hello everyone,

I have a bit of code where I am using rpy2 to import R phyper so I can
perform an hypergeometric test. Unfortunately our cluster does not have a
functional installation of rpy2 working. So I am wondering if I could
translate to scipy which would make the code completly independent of R.
The python code I have is as following:


def lphyper(self,q,m,n,k):

self.phyper(self,q,m,n,k)
Calculate p-value using R function phyper from rpy2 low-level
interface.
R Documentation
phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE)
q: vector of quantiles representing the number of white balls
drawn without replacement from an urn which contains both
black and white balls.
m: the number of white balls in the urn.
n: the number of black balls in the urn.
k: the number of balls drawn from the urn.
log.p: logical; if TRUE, probabilities p are given as log(p).
lower.tail: logical; if TRUE (default), probabilities are P[X = x],
otherwise, P[X  x].

phyper_q = SexpVector([q,], rinterface.INTSXP)
phyper_m = SexpVector([m,], rinterface.INTSXP)
phyper_n = SexpVector([n,], rinterface.INTSXP)
phyper_k = SexpVector([k,], rinterface.INTSXP)
return phyper(phyper_q,phyper_m,phyper_n,phyper_k,**myparams)[0]

I have looked to scipy.stats.hypergeom but it is giving me a different
result which is also negative.
 1-phyper(45, 92, 7518, 1329)
[1] 6.92113e-13

In [24]: stats.hypergeom.sf(45,(92+7518),92,1329)
Out[24]: -8.4343643180773142e-12


This was supposed to be an error with an older version of scipy but I am
using more recent versions of it which should not contain the error anymore:
In [26]: numpy.__version__
Out[26]: '1.5.1'
In [27]: scipy.__version__
Out[27]: '0.9.0'

thank you very much in advance for any help.

Best,
Bruno
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Alternative to R phyper

2012-04-30 Thread Robert Kern
On Mon, Apr 30, 2012 at 12:09, Bruno Santos bacmsan...@gmail.com wrote:
 Hello everyone,

 I have a bit of code where I am using rpy2 to import R phyper so I can
 perform an hypergeometric test. Unfortunately our cluster does not have a
 functional installation of rpy2 working. So I am wondering if I could
 translate to scipy which would make the code completly independent of R. The
 python code I have is as following:


 def lphyper(self,q,m,n,k):
         
         self.phyper(self,q,m,n,k)
         Calculate p-value using R function phyper from rpy2 low-level
         interface.
         R Documentation
         phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE)
         q: vector of quantiles representing the number of white balls
             drawn without replacement from an urn which contains both
             black and white balls.
         m: the number of white balls in the urn.
         n: the number of black balls in the urn.
         k: the number of balls drawn from the urn.
         log.p: logical; if TRUE, probabilities p are given as log(p).
         lower.tail: logical; if TRUE (default), probabilities are P[X = x],
             otherwise, P[X  x].
         
         phyper_q = SexpVector([q,], rinterface.INTSXP)
         phyper_m = SexpVector([m,], rinterface.INTSXP)
         phyper_n = SexpVector([n,], rinterface.INTSXP)
         phyper_k = SexpVector([k,], rinterface.INTSXP)
         return phyper(phyper_q,phyper_m,phyper_n,phyper_k,**myparams)[0]

 I have looked to scipy.stats.hypergeom but it is giving me a different
 result which is also negative.
 1-phyper(45, 92, 7518, 1329)
 [1] 6.92113e-13

 In [24]: stats.hypergeom.sf(45,(92+7518),92,1329)
 Out[24]: -8.4343643180773142e-12

The CDF (CMF? whatever) for stats.hypergeom is not implemented
explicitly, so it falls back to the default implementation of just
summing up the PMF. You are falling victim to floating point error
accumulation such that the sum exceeds 1.0, so the survival function
1-CDF ends up negative. I don't think we have a better implementation
of the CDF anywhere in scipy, sorry.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Alternative to R phyper

2012-04-30 Thread josef . pktd
On Mon, Apr 30, 2012 at 7:27 AM, Robert Kern robert.k...@gmail.com wrote:
 On Mon, Apr 30, 2012 at 12:09, Bruno Santos bacmsan...@gmail.com wrote:
 Hello everyone,

 I have a bit of code where I am using rpy2 to import R phyper so I can
 perform an hypergeometric test. Unfortunately our cluster does not have a
 functional installation of rpy2 working. So I am wondering if I could
 translate to scipy which would make the code completly independent of R. The
 python code I have is as following:


 def lphyper(self,q,m,n,k):
         
         self.phyper(self,q,m,n,k)
         Calculate p-value using R function phyper from rpy2 low-level
         interface.
         R Documentation
         phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE)
         q: vector of quantiles representing the number of white balls
             drawn without replacement from an urn which contains both
             black and white balls.
         m: the number of white balls in the urn.
         n: the number of black balls in the urn.
         k: the number of balls drawn from the urn.
         log.p: logical; if TRUE, probabilities p are given as log(p).
         lower.tail: logical; if TRUE (default), probabilities are P[X = x],
             otherwise, P[X  x].
         
         phyper_q = SexpVector([q,], rinterface.INTSXP)
         phyper_m = SexpVector([m,], rinterface.INTSXP)
         phyper_n = SexpVector([n,], rinterface.INTSXP)
         phyper_k = SexpVector([k,], rinterface.INTSXP)
         return phyper(phyper_q,phyper_m,phyper_n,phyper_k,**myparams)[0]

 I have looked to scipy.stats.hypergeom but it is giving me a different
 result which is also negative.
 1-phyper(45, 92, 7518, 1329)
 [1] 6.92113e-13

 In [24]: stats.hypergeom.sf(45,(92+7518),92,1329)
 Out[24]: -8.4343643180773142e-12

the corrected version with explicit sf was added in 0.10

 from scipy import stats
 stats.hypergeom.sf(45,(92+7518),92,1329)
6.9212632647221852e-13

 import scipy
 scipy.__version__
'0.10.0b2'

Josef


 The CDF (CMF? whatever) for stats.hypergeom is not implemented
 explicitly, so it falls back to the default implementation of just
 summing up the PMF. You are falling victim to floating point error
 accumulation such that the sum exceeds 1.0, so the survival function
 1-CDF ends up negative. I don't think we have a better implementation
 of the CDF anywhere in scipy, sorry.

 --
 Robert Kern
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion