On Mon, Apr 30, 2012 at 7:27 AM, Robert Kern <robert.k...@gmail.com> wrote: > On Mon, Apr 30, 2012 at 12:09, Bruno Santos <bacmsan...@gmail.com> wrote: >> Hello everyone, >> >> I have a bit of code where I am using rpy2 to import R phyper so I can >> perform an hypergeometric test. Unfortunately our cluster does not have a >> functional installation of rpy2 working. So I am wondering if I could >> translate to scipy which would make the code completly independent of R. The >> python code I have is as following: >> >> >> def lphyper(self,q,m,n,k): >> """ >> self.phyper(self,q,m,n,k) >> Calculate p-value using R function phyper from rpy2 low-level >> interface. >> "R Documentation >> phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE) >> q: vector of quantiles representing the number of white balls >> drawn without replacement from an urn which contains both >> black and white balls. >> m: the number of white balls in the urn. >> n: the number of black balls in the urn. >> k: the number of balls drawn from the urn. >> log.p: logical; if TRUE, probabilities p are given as log(p). >> lower.tail: logical; if TRUE (default), probabilities are P[X <= x], >> otherwise, P[X > x]. >> """ >> phyper_q = SexpVector([q,], rinterface.INTSXP) >> phyper_m = SexpVector([m,], rinterface.INTSXP) >> phyper_n = SexpVector([n,], rinterface.INTSXP) >> phyper_k = SexpVector([k,], rinterface.INTSXP) >> return phyper(phyper_q,phyper_m,phyper_n,phyper_k,**myparams)[0] >> >> I have looked to scipy.stats.hypergeom but it is giving me a different >> result which is also negative. >>> 1-phyper(45, 92, 7518, 1329) >> [1] 6.92113e-13 >> >> In [24]: stats.hypergeom.sf(45,(92+7518),92,1329) >> Out[24]: -8.4343643180773142e-12
the corrected version with explicit sf was added in 0.10 >>> from scipy import stats >>> stats.hypergeom.sf(45,(92+7518),92,1329) 6.9212632647221852e-13 >>> import scipy >>> scipy.__version__ '0.10.0b2' Josef > > The CDF (CMF? whatever) for stats.hypergeom is not implemented > explicitly, so it falls back to the default implementation of just > summing up the PMF. You are falling victim to floating point error > accumulation such that the sum exceeds 1.0, so the survival function > 1-CDF ends up negative. I don't think we have a better implementation > of the CDF anywhere in scipy, sorry. > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion