[Numpy-discussion] Alternative to R phyper
Hello everyone, I have a bit of code where I am using rpy2 to import R phyper so I can perform an hypergeometric test. Unfortunately our cluster does not have a functional installation of rpy2 working. So I am wondering if I could translate to scipy which would make the code completly independent of R. The python code I have is as following: def lphyper(self,q,m,n,k): self.phyper(self,q,m,n,k) Calculate p-value using R function phyper from rpy2 low-level interface. R Documentation phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE) q: vector of quantiles representing the number of white balls drawn without replacement from an urn which contains both black and white balls. m: the number of white balls in the urn. n: the number of black balls in the urn. k: the number of balls drawn from the urn. log.p: logical; if TRUE, probabilities p are given as log(p). lower.tail: logical; if TRUE (default), probabilities are P[X = x], otherwise, P[X x]. phyper_q = SexpVector([q,], rinterface.INTSXP) phyper_m = SexpVector([m,], rinterface.INTSXP) phyper_n = SexpVector([n,], rinterface.INTSXP) phyper_k = SexpVector([k,], rinterface.INTSXP) return phyper(phyper_q,phyper_m,phyper_n,phyper_k,**myparams)[0] I have looked to scipy.stats.hypergeom but it is giving me a different result which is also negative. 1-phyper(45, 92, 7518, 1329) [1] 6.92113e-13 In [24]: stats.hypergeom.sf(45,(92+7518),92,1329) Out[24]: -8.4343643180773142e-12 This was supposed to be an error with an older version of scipy but I am using more recent versions of it which should not contain the error anymore: In [26]: numpy.__version__ Out[26]: '1.5.1' In [27]: scipy.__version__ Out[27]: '0.9.0' thank you very much in advance for any help. Best, Bruno ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Alternative to R phyper
On Mon, Apr 30, 2012 at 12:09, Bruno Santos bacmsan...@gmail.com wrote: Hello everyone, I have a bit of code where I am using rpy2 to import R phyper so I can perform an hypergeometric test. Unfortunately our cluster does not have a functional installation of rpy2 working. So I am wondering if I could translate to scipy which would make the code completly independent of R. The python code I have is as following: def lphyper(self,q,m,n,k): self.phyper(self,q,m,n,k) Calculate p-value using R function phyper from rpy2 low-level interface. R Documentation phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE) q: vector of quantiles representing the number of white balls drawn without replacement from an urn which contains both black and white balls. m: the number of white balls in the urn. n: the number of black balls in the urn. k: the number of balls drawn from the urn. log.p: logical; if TRUE, probabilities p are given as log(p). lower.tail: logical; if TRUE (default), probabilities are P[X = x], otherwise, P[X x]. phyper_q = SexpVector([q,], rinterface.INTSXP) phyper_m = SexpVector([m,], rinterface.INTSXP) phyper_n = SexpVector([n,], rinterface.INTSXP) phyper_k = SexpVector([k,], rinterface.INTSXP) return phyper(phyper_q,phyper_m,phyper_n,phyper_k,**myparams)[0] I have looked to scipy.stats.hypergeom but it is giving me a different result which is also negative. 1-phyper(45, 92, 7518, 1329) [1] 6.92113e-13 In [24]: stats.hypergeom.sf(45,(92+7518),92,1329) Out[24]: -8.4343643180773142e-12 The CDF (CMF? whatever) for stats.hypergeom is not implemented explicitly, so it falls back to the default implementation of just summing up the PMF. You are falling victim to floating point error accumulation such that the sum exceeds 1.0, so the survival function 1-CDF ends up negative. I don't think we have a better implementation of the CDF anywhere in scipy, sorry. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Alternative to R phyper
On Mon, Apr 30, 2012 at 7:27 AM, Robert Kern robert.k...@gmail.com wrote: On Mon, Apr 30, 2012 at 12:09, Bruno Santos bacmsan...@gmail.com wrote: Hello everyone, I have a bit of code where I am using rpy2 to import R phyper so I can perform an hypergeometric test. Unfortunately our cluster does not have a functional installation of rpy2 working. So I am wondering if I could translate to scipy which would make the code completly independent of R. The python code I have is as following: def lphyper(self,q,m,n,k): self.phyper(self,q,m,n,k) Calculate p-value using R function phyper from rpy2 low-level interface. R Documentation phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE) q: vector of quantiles representing the number of white balls drawn without replacement from an urn which contains both black and white balls. m: the number of white balls in the urn. n: the number of black balls in the urn. k: the number of balls drawn from the urn. log.p: logical; if TRUE, probabilities p are given as log(p). lower.tail: logical; if TRUE (default), probabilities are P[X = x], otherwise, P[X x]. phyper_q = SexpVector([q,], rinterface.INTSXP) phyper_m = SexpVector([m,], rinterface.INTSXP) phyper_n = SexpVector([n,], rinterface.INTSXP) phyper_k = SexpVector([k,], rinterface.INTSXP) return phyper(phyper_q,phyper_m,phyper_n,phyper_k,**myparams)[0] I have looked to scipy.stats.hypergeom but it is giving me a different result which is also negative. 1-phyper(45, 92, 7518, 1329) [1] 6.92113e-13 In [24]: stats.hypergeom.sf(45,(92+7518),92,1329) Out[24]: -8.4343643180773142e-12 the corrected version with explicit sf was added in 0.10 from scipy import stats stats.hypergeom.sf(45,(92+7518),92,1329) 6.9212632647221852e-13 import scipy scipy.__version__ '0.10.0b2' Josef The CDF (CMF? whatever) for stats.hypergeom is not implemented explicitly, so it falls back to the default implementation of just summing up the PMF. You are falling victim to floating point error accumulation such that the sum exceeds 1.0, so the survival function 1-CDF ends up negative. I don't think we have a better implementation of the CDF anywhere in scipy, sorry. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion