Re: Hypergeometric distribution

2006-01-05 Thread Robert Kern
Bengt Richter wrote:
 On 4 Jan 2006 12:46:47 -0800, Raven [EMAIL PROTECTED] wrote:

The problem with Stirling's approximation is that I need to calculate
the hypergeometric hence the factorial for numbers within a large range
e.g. choose(14000,170) or choose(5,2)
 
 It seems you are hinting at some accuracy requirements that you haven't
 yet explained. I'm curious how you use the values, and how that affects your
 judgement of Stirling's approximation. In fact, perhaps the semantics of your
 value usage could even suggest an alternate algorithmic approach to your 
 actual end result.

Does it matter? Implementing Stirling's approximation is pointless when
scipy.special.gammaln() or scipy.special.gamma() does it for him.

-- 
Robert Kern
[EMAIL PROTECTED]

In the fields of hell where the grass grows high
 Are the graves of dreams allowed to die.
  -- Richard Harter

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hypergeometric distribution

2006-01-05 Thread Bengt Richter
On Thu, 05 Jan 2006 09:47:02 -0600, Robert Kern [EMAIL PROTECTED] wrote:

Bengt Richter wrote:
 On 4 Jan 2006 12:46:47 -0800, Raven [EMAIL PROTECTED] wrote:

The problem with Stirling's approximation is that I need to calculate
the hypergeometric hence the factorial for numbers within a large range
e.g. choose(14000,170) or choose(5,2)
 
 It seems you are hinting at some accuracy requirements that you haven't
 yet explained. I'm curious how you use the values, and how that affects your
 judgement of Stirling's approximation. In fact, perhaps the semantics of your
 value usage could even suggest an alternate algorithmic approach to your 
 actual end result.

Does it matter? Implementing Stirling's approximation is pointless when
scipy.special.gammaln() or scipy.special.gamma() does it for him.

Who's talking about implementing Stirling's approximation? ;-) I'm trying to 
determine first
why the OP is thinking there's a problem with using it at all. With alternate 
algorithmic
approach I didn't mean an alternate way of calculating Stirling's 
approximation. I meant
to allude to the possibility that pulling a little further on the requirements 
thread might
even unravel some of the rationale for calculating the hypergeometric per se, 
depending on
how he's actually using it and why. Same old, same old: requirements, 
requirements ;-)

Regards,
Bengt Richter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hypergeometric distribution

2006-01-04 Thread Raven
Cameron Laird wrote:

 This thread confuses me.

 I've lost track of the real goal.  If it's an exact calculation of
 binomial coefficients--or even one of several other potential
 targets mentioned--I echo Steven D'Aprano, and ask, are you *sure*
 the suggestions already offered aren't adequate?

Hi Cameron, my real goal was to calculate the hypergeometric
distribution. The problem was that the  function for hypergeometric
calculation from scipy uses the scipy.comb function which by default
uses floats so for large numbers comb(n,r) returns inf. and hence the
hypergeometric returns nan.
The first suggestion, the one by Robert Kern,  resolved my problem:

Raven wrote:
Thanks to all of you guys, I could resolve my problem using the
logarithms as proposed by Robert.

Then the other guys gave alternative solutions so I tried them out. So
form me the suggestions offered are more than adequate :-)

Cameron Laird wrote:
Also, I think you
 might not realize how accurate Stirling's approximation (perhaps to
 second order) is in the range of interest.

The problem with Stirling's approximation is that I need to calculate
the hypergeometric hence the factorial for numbers within a large range
e.g. choose(14000,170) or choose(5,2)

Ale

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hypergeometric distribution

2006-01-04 Thread Paul Rubin
Raven [EMAIL PROTECTED] writes:
 The problem with Stirling's approximation is that I need to calculate
 the hypergeometric hence the factorial for numbers within a large range
 e.g. choose(14000,170) or choose(5,2)

Stirling's approximation to second order is fairly accurate even at
low values:

from math import log,exp,pi

def stirling(n):
# approx log(n!)
return n*(log(n)-1) + .5*(log(2.*pi*n)) + 1/(12.*n)


 for n in range(1,6): print n, exp(stirling(n))
... 
1 1.00227444918
2 2.00065204769
3 6.00059914247
4 24.0010238913
5 120.002637086
 

To third order it's even better:

from math import log,exp,pi

def stirling(n):
# approx log(n!)
return n*(log(n)-1) + .5*(log(2.*pi*n)) + 1/(12.*n) - 1/(360.*n*n*n)


 for n in range(1,6): print n, exp(stirling(n))
... 
1 0.999494216712
2 1.5749743
3 5.8182863
4 23.822028
5 119.70391
 

Reference: http://en.wikipedia.org/wiki/Stirling's_approximation
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hypergeometric distribution

2006-01-04 Thread Bengt Richter
On 4 Jan 2006 12:46:47 -0800, Raven [EMAIL PROTECTED] wrote:

Cameron Laird wrote:

 This thread confuses me.

 I've lost track of the real goal.  If it's an exact calculation of
 binomial coefficients--or even one of several other potential
 targets mentioned--I echo Steven D'Aprano, and ask, are you *sure*
 the suggestions already offered aren't adequate?

Hi Cameron, my real goal was to calculate the hypergeometric
distribution. The problem was that the  function for hypergeometric
ISTM that can't have been your real goal -- unless you are e.g. preparing 
numeric
tables for publication. IOW, IWT you probably intend to USE the hypergeometric
distribution values in some useful way to go towards your real goal. ;-)

The requirements of this USE are still not apparent to me in your posts, though
that may be because I've missed something.

calculation from scipy uses the scipy.comb function which by default
uses floats so for large numbers comb(n,r) returns inf. and hence the
hypergeometric returns nan.
The first suggestion, the one by Robert Kern,  resolved my problem:

Raven wrote:
Thanks to all of you guys, I could resolve my problem using the
logarithms as proposed by Robert.

Then the other guys gave alternative solutions so I tried them out. So
form me the suggestions offered are more than adequate :-)

Cameron Laird wrote:
Also, I think you
 might not realize how accurate Stirling's approximation (perhaps to
 second order) is in the range of interest.

The problem with Stirling's approximation is that I need to calculate
the hypergeometric hence the factorial for numbers within a large range
e.g. choose(14000,170) or choose(5,2)

It seems you are hinting at some accuracy requirements that you haven't
yet explained. I'm curious how you use the values, and how that affects your
judgement of Stirling's approximation. In fact, perhaps the semantics of your
value usage could even suggest an alternate algorithmic approach to your actual 
end result.

Regards,
Bengt Richter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hypergeometric distribution

2006-01-03 Thread Raven

Travis E. Oliphant wrote:

 Notice the keyword for the comb function (in scipy) lets you use it to
 compute exact values.   SciPy does not just automatically use the long
 integer because this will always slow you down.

 comb(N, k, exact=0)

 Combinations of N things taken k at a time.

 If exact==0, then floating point precision is used, otherwise
 exact long integer is computed.

 Notes:
- Array arguments accepted only for exact=0 case.
- If k  N, N  0, or k  0, then a 0 is returned.
 
 -Travis Oliphant

Great, thanks Travis.
Ale

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hypergeometric distribution

2006-01-03 Thread Cameron Laird
In article [EMAIL PROTECTED],
Raven [EMAIL PROTECTED] wrote:
Well, what to say? I am very happy for all the solutions you guys have
posted :-)
For Paul:
I would prefer not to use Stirling's approximation


The problem with long integers is that to calculate the hypergeometric
I need to do float division and multiplication because integer division
returns 0. A solution could be to calculate log(Long_Factorial_Integer)
.
.
.
This thread confuses me.

I've lost track of the real goal.  If it's an exact calculation of
binomial coefficients--or even one of several other potential
targets mentioned--I echo Steven D'Aprano, and ask, are you *sure*
the suggestions already offered aren't adequate?  Also, I think you
might not realize how accurate Stirling's approximation (perhaps to
second order) is in the range of interest.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hypergeometric distribution

2006-01-02 Thread Raven
Well, what to say? I am very happy for all the solutions you guys have
posted :-)
For Paul:
I would prefer not to use Stirling's approximation


The problem with long integers is that to calculate the hypergeometric
I need to do float division and multiplication because integer division
returns 0. A solution could be to calculate log(Long_Factorial_Integer)
and finally calculate the hypergeometric with the logarithmic values.
I've done a test: iterated 1000 times two different functions for the
hypergeometric, the first one based on scipy.special.gammaln:

from scipy.special import gammaln

def lnchoose(n, m):
  nf = gammaln(n + 1)
  mf = gammaln(m + 1)
  nmmnf = gammaln(n - m + 1)
  return nf - (mf + nmmnf)

def hypergeometric_gamma(k, n1, n2, t):
  if t  n1 + n2:
t = n1 + n2
  if k  n1 or k  t:
return 0
  elif t  n2 and ((k + n2)  t):
return 0
  else:
c1 = lnchoose(n1,k)
c2 = lnchoose(n2, t - k)
c3 = lnchoose(n1 + n2 ,t)

return exp(c1 + c2 - c3)

and the second one based on the code by Steven and Scott:


import time
from math import log, exp

def bincoeff1(n, r):
  if r  n - r:
r = n - r
  x = 1
  for i in range(n, r, -1):
x *= i
  for i in range(n - r, 1, -1):
x /= i
  return x

def hypergeometric(k, n1, n2, t):
  if t  n1 + n2:
t = n1 + n2
  if k  n1 or k  t:
return 0
  elif t  n2 and ((k + n2)  t):
return 0
  else:
c1 = log(raw_bincoeff1(n1,k))
c2 = log(raw_bincoeff1(n2, t - k))
c3 = log(raw_bincoeff1(n1 + n2 ,t))

return exp(c1 + c2 - c3)

def main():
  t = time.time()
  for i in range(1000):
r = hypergeometric(6,6,30,6)
  print time.time() - t

  t = time.time()
  for i in range(1000):
r = hypergeometric_gamma(6,6,30,6)
  print time.time() - t


if __name__ == __main__:
  main()


and the result is:

0.0386447906494
0.192448139191

The first approach is faster so I think I will adopt it.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hypergeometric distribution

2006-01-02 Thread Scott David Daniels
Raven wrote:
 ...
 def main():
   t = time.time()
   for i in range(1000):
 r = hypergeometric(6,6,30,6)
   print time.time() - t
 
   t = time.time()
   for i in range(1000):
 r = hypergeometric_gamma(6,6,30,6)
   print time.time() - t
 
 and the result is:
 
 0.0386447906494
 0.192448139191
 
 The first approach is faster so I think I will adopt it.
 

You should really look into the timeit module -- you'll get nice
solid timings slightly easier to tweak.
Imagine something like:

 import timeit
 ...
 t0 = timeit.Timer(stmt='f(6, 6, 30, 6)',
setup='from __main__ import hypergeometric as f')
 t1 = timeit.Timer(stmt='f(6, 6, 30, 6)',
setup='from __main__ import hypergeometric_gamma as f')

 repetitions = 1   # Gross under-estimate of needed repetitions
 while t0.timeit(repetitions)  .25:   # .25 = minimum Secs per round
 repetitions *= 10
 print 'Going for %s repetitions' % repetitions
 print 'hypergeometric:', t0.repeat(3, repetitions)
 print 'hypergeometric_gamma:', t1.repeat(3, repetitions)

--Scott David Daniels
[EMAIL PROTECTED]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hypergeometric distribution

2006-01-02 Thread Bengt Richter
On 2 Jan 2006 03:35:33 -0800, Raven [EMAIL PROTECTED] wrote:
[...]

The problem with long integers is that to calculate the hypergeometric
I need to do float division and multiplication because integer division
returns 0. A solution could be to calculate log(Long_Factorial_Integer)

ISTM you wouldn't get zero if you scaled by 10**significant_digits (however many
you require) before dividing. E.g., expected hits per trillion (or septillion 
or whatever)
expresses probability too. Perhaps that could work in your calculation?

Regards,
Bengt Richter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hypergeometric distribution

2006-01-02 Thread Raven
Scott David Daniels ha scritto:

 You should really look into the timeit module -- you'll get nice
 solid timings slightly easier to tweak.

This seems a very interesting module, I will give it a try as soon as
possible. Thanks Scott.
Ale

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hypergeometric distribution

2006-01-02 Thread Raven

Bengt Richter wrote:

 ISTM you wouldn't get zero if you scaled by 10**significant_digits (however 
 many
 you require) before dividing. E.g., expected hits per trillion (or septillion 
 or whatever)
 expresses probability too. Perhaps that could work in your calculation?

 Regards,
 Bengt Richter

Sorry Bengt but I can't figure out how to do it, can you give me an
example please? Thanks in advance.
Ale

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hypergeometric distribution

2006-01-02 Thread Raven

Bengt Richter wrote:


 ISTM you wouldn't get zero if you scaled by 10**significant_digits (however 
 many
 you require) before dividing. E.g., expected hits per trillion (or septillion 
 or whatever)
 expresses probability too. Perhaps that could work in your calculation?

 Regards,
 Bengt Richter

Sorry but I can't figure out how to do it, can you give me an example
please? 
Thnx

Ale

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hypergeometric distribution

2006-01-02 Thread Travis E. Oliphant
Raven wrote:
 Thanks Steven for your very interesting post.
 
 This was a critical instance from my problem:
 
 
from scipy import comb

comb(14354,174)
 
 inf
 
 The scipy.stats.distributions.hypergeom function uses the scipy.comb
 function, so it returned nan since it tries to divide an infinite. I
 did not tried to write a self-made function using standard python as I
 supposed that the scipy functions reached python's limits but I was
 wrong, what a fool :-)

Notice the keyword for the comb function (in scipy) lets you use it to 
compute exact values.   SciPy does not just automatically use the long 
integer because this will always slow you down.

comb(N, k, exact=0)

Combinations of N things taken k at a time.

If exact==0, then floating point precision is used, otherwise
exact long integer is computed.

Notes:
   - Array arguments accepted only for exact=0 case.
   - If k  N, N  0, or k  0, then a 0 is returned.

-Travis Oliphant



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hypergeometric distribution

2006-01-01 Thread Steven D'Aprano
On Sat, 31 Dec 2005 16:24:02 -0800, Raven wrote:

 Thanks to all of you guys, I could resolve my problem using the
 logarithms as proposed by Robert.  I needed to calculate the factorial
 for genomic data, more specifically for the number of genes in the
 human genome i.e. about 30.000 and that is a big number :-)
 I didn't know gmpy
 Thanks a lot, really

Are you *sure* the existing functions didn't work? Did you try them?

 def log2(x):
... return math.log(x)/math.log(2)
...
 n = 0.0
 for i in range(1, 30):  # ten times bigger than you need
... n += log2(i)
...
 n
5025564.6087276665
 t = time.time(); x = 2L**(int(n) + 1); time.time() - t
0.26649093627929688

That's about one quarter of a second to calculate 300,000 factorial
(approximately), and it shows that the calculations are well within
Python's capabilities.

Of course, converting this 1.5 million-plus digit number to a string takes
a bit longer:

 t = time.time(); len(str(x)); time.time() - t
1512846
6939.3762848377228

A quarter of a second to calculate, and almost two hours to convert to a
string. Lesson one: calculations on longints are fast. Converting them to
strings is not.

As far as your original question goes, try something like this:

(formula from memory, may be wrong)

 def bincoeff(n,r):
... x = 1
... for i in range(r+1, n+1):
... x *= i
... for i in range(1, n-r+1):
... x /= i
... return x
...
 bincoeff(10, 0)
1
 bincoeff(10, 1)
10
 bincoeff(10, 2)
45
 bincoeff(10, 3)
120
 import time
 t = time.time(); L = bincoeff(3, 7000); time.time() - t
28.317800045013428

Less than thirty seconds to calculate a rather large binomial coefficient
exactly. How many digits?

 len(str(L))
7076

If you are calculating hundreds of hypergeometric probabilities, 30
seconds each could be quite painful, but it certainly shows that Python is
capable of doing it without resorting to logarithms which may lose some
significant digits. Although, in fairness, the log function doesn't seem
to lose much accuracy for arguments in the range you are dealing with.


How long does it take to calculate factorials?

 def timefact(n):
... # calculate n! and return the time taken in seconds
... t = time.time()
... L = 1
... for i in range(1, n+1):
... L *= i
... return time.time() - t
...
 timefact(3000)
0.054913997650146484
 timefact(3)  # equivalent to human genome
5.069951057434082
 timefact(30)  # ten times bigger
4255.2370519638062

Keep in mind, if you are calculating the hypergeometric probabilities
using raw factorials, you are doing way too much work.


-- 
Steven.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hypergeometric distribution

2006-01-01 Thread Raven
Thanks Steven for your very interesting post.

This was a critical instance from my problem:

from scipy import comb
 comb(14354,174)
inf

The scipy.stats.distributions.hypergeom function uses the scipy.comb
function, so it returned nan since it tries to divide an infinite. I
did not tried to write a self-made function using standard python as I
supposed that the scipy functions reached python's limits but I was
wrong, what a fool :-)

If you are calculating hundreds of hypergeometric probabilities, 30
seconds each could be quite painful, but it certainly shows that Python is
capable of doing it without resorting to logarithms which may lose some
significant digits. Although, in fairness, the log function doesn't seem
to lose much accuracy for arguments in the range you are dealing with.

Yes I am calculating hundreds of hypergeometric probabilities so I need
fast calculations

Ale

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hypergeometric distribution

2006-01-01 Thread Paul Rubin
Raven [EMAIL PROTECTED] writes:
 Yes I am calculating hundreds of hypergeometric probabilities so I need
 fast calculations

Can you use Stirling's approximation to get the logs of the factorials?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hypergeometric distribution

2006-01-01 Thread Steven D'Aprano
On Sun, 01 Jan 2006 14:24:39 -0800, Raven wrote:

 Thanks Steven for your very interesting post.
 
 This was a critical instance from my problem:
 
from scipy import comb
 comb(14354,174)
 inf

Curious. It wouldn't surprise me if scipy was using floats, because 'inf'
is usually a floating point value, not an integer.

Using my test code from yesterday, I got:

 bincoeff(14354,174)
11172777193562324917353367958024437473336018053487854593870
07090637489405604489192488346144684402362344409632515556732
33563523161308145825208276395238764441857829454464446478336
90173777095041891067637551783324071233625370619908633625448
31076677382448616246125346667737896891548166898009878730510
57476139515840542769956414204130692733629723305869285300247
645972456505830620188961902165086857407612722931651840L

Took about three seconds on my system.



 Yes I am calculating hundreds of hypergeometric probabilities so I
 need fast calculations

Another possibility, if you want exact integer maths rather than floating
point with logarithms, is to memoise the binomial coefficients. Something
like this:

# untested
def bincoeff(n,r, \
 cache={}):
try:
return cache((n,r))
except KeyError:
x = 1
for i in range(r+1, n+1):
x *= i
for i in range(1, n-r+1):
x /= i
cache((n,r)) = x
return x


-- 
Steven.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hypergeometric distribution

2005-12-31 Thread Raven
Thanks to all of you guys, I could resolve my problem using the
logarithms as proposed by Robert.  I needed to calculate the factorial
for genomic data, more specifically for the number of genes in the
human genome i.e. about 30.000 and that is a big number :-)
I didn't know gmpy
Thanks a lot, really

Ale

-- 
http://mail.python.org/mailman/listinfo/python-list


Hypergeometric distribution

2005-12-26 Thread Raven
Hi to all, I need to calculate the hpergeometric distribution:


   choose(r, x) * choose(b, n-x)
p(x; r,b,n) =  -
   choose(r+b, n)

choose(r,x) is the binomial coefficient
I use the factorial to calculate the above formula but since I am using
large numbers, the result of choose(a,b) (ie: the binomial coefficient)
is too big even for large int. I've tried the scipy library, but this
library calculates
the hypergeometric using the factorials too, so the problem subsist. Is
there any other libray or an algorithm to calculate
the hypergeometric distribution? The statistical package R can handle
such calculations but I don't want to use python R binding since I want
a standalone app.
Thanks a lot
Ale

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hypergeometric distribution

2005-12-26 Thread Robert Kern
Raven wrote:
 Hi to all, I need to calculate the hpergeometric distribution:
 
 
choose(r, x) * choose(b, n-x)
 p(x; r,b,n) =  -
choose(r+b, n)
 
 choose(r,x) is the binomial coefficient
 I use the factorial to calculate the above formula but since I am using
 large numbers, the result of choose(a,b) (ie: the binomial coefficient)
 is too big even for large int. I've tried the scipy library, but this
 library calculates
 the hypergeometric using the factorials too, so the problem subsist. Is
 there any other libray or an algorithm to calculate
 the hypergeometric distribution?

Use logarithms.

Specifically,

from scipy import special

def logchoose(n, k):
lgn1 = special.gammaln(n+1)
lgk1 = special.gammaln(k+1)
lgnk1 = special.gammaln(n-k+1)
return lgn1 - (lgnk1 + lgk1)

def gauss_hypergeom(x, r, b, n):
return exp(logchoose(r, x) +
   logchoose(b, n-x) -
   logchoose(r+b, n))

Or you could use gmpy if you need exact rational arithmetic rather than floating
point.

-- 
Robert Kern
[EMAIL PROTECTED]

In the fields of hell where the grass grows high
 Are the graves of dreams allowed to die.
  -- Richard Harter

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hypergeometric distribution

2005-12-26 Thread Gerard Flanagan
Raven wrote:

 Hi to all, I need to calculate the hpergeometric distribution:


choose(r, x) * choose(b, n-x)
 p(x; r,b,n) =  -
choose(r+b, n)

 choose(r,x) is the binomial coefficient
 I use the factorial to calculate the above formula but since I am using
 large numbers, the result of choose(a,b) (ie: the binomial coefficient)
 is too big even for large int. I've tried the scipy library, but this
 library calculates
 the hypergeometric using the factorials too, so the problem subsist. Is
 there any other libray or an algorithm to calculate
 the hypergeometric distribution? The statistical package R can handle
 such calculations but I don't want to use python R binding since I want
 a standalone app.
 Thanks a lot
 Ale

Ale

I had this code lying about if it helps.  I don't know if it's even
correct but it's non-recursive!

def Binomial( n, k ):
ret = 0
if k == 0:
ret = 1
elif k  0:
a = range( n+1 )
a[0] = 1
for i in a[1:]:
a[i] = 1
for j in range(i-1,0,-1):
a[j] = a[j] + a[j-1]
ret = a[k]
return ret

Gerard

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hypergeometric distribution

2005-12-26 Thread Steven D'Aprano
On Mon, 26 Dec 2005 12:18:55 -0800, Raven wrote:

 Hi to all, I need to calculate the hpergeometric distribution:
 
 
choose(r, x) * choose(b, n-x)
 p(x; r,b,n) =  -
choose(r+b, n)
 
 choose(r,x) is the binomial coefficient
 I use the factorial to calculate the above formula but since I am using
 large numbers, the result of choose(a,b) (ie: the binomial coefficient)
 is too big even for large int.

Are you sure about that? Python long ints can be as big as you have enough
memory for. My Python can print 10L**1 to the console with a barely
detectable pause, and 10L**10 with about a ten second delay. (Most of
that delay is printing it, not calculating it.)

25206 is the first integer whose factorial exceeds 10L**10, so even if
you are calculating the binomial coefficient using the most naive
algorithm, calculating the factorials and dividing, you should easily be
able to generate it for a,b up to 20,000 unless you have a severe
shortage of memory.

 I've tried the scipy library, but this
 library calculates
 the hypergeometric using the factorials too, so the problem subsist.

What exactly is your problem? What values of hypergeometric(x; r,b,n) fail
for you?



-- 
Steven.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hypergeometric distribution

2005-12-26 Thread Alex Martelli
Raven [EMAIL PROTECTED] wrote:

 Hi to all, I need to calculate the hpergeometric distribution:
 
 
choose(r, x) * choose(b, n-x)
 p(x; r,b,n) =  -
choose(r+b, n)
 
 choose(r,x) is the binomial coefficient
 I use the factorial to calculate the above formula but since I am using
 large numbers, the result of choose(a,b) (ie: the binomial coefficient)
 is too big even for large int. I've tried the scipy library, but this
 library calculates
 the hypergeometric using the factorials too, so the problem subsist. Is
 there any other libray or an algorithm to calculate
 the hypergeometric distribution? The statistical package R can handle
 such calculations but I don't want to use python R binding since I want
 a standalone app.

Have you tried with gmpy?


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list