Dear list,

I have N=1850923 drawings of characters which can be either A,T,C or G.
The probability of C is p=0.256903, so I would expect lambda=N*p=1850923*0.256903=475508 C's.
Indeed, I got k=560073.
According to my understanding, the probability to get 560073 or more C's in 1850923 drawings should be gsl_cdf_poisson_Q(k-1, lambda)=gsl_cdf_poisson_Q(560072, 475508) I get slightly but not dramatically more than expected, so I would expect a p-value of, whatwever, 0.2 or so.
However, I got 0.
I would expect the p-value in the same range than when getting 5 instead of 4 drawings, so I did a series:

gsl_cdf_poisson_Q(     5,      4) = 0.21487
gsl_cdf_poisson_Q(    56,     47) = 0.0859409
gsl_cdf_poisson_Q(   560,    475) = 6.65119e-05
gsl_cdf_poisson_Q(  5600,   4755) = 4.39096e-33
gsl_cdf_poisson_Q( 56007,  47550) = 1.46886e-311
gsl_cdf_poisson_Q(560072, 475507) = 0

The results starting from the second line are certainly not what I expect.
Probably I have a fundamental misunderstanding of gsl_cdf_poisson_Q?
What would be the proper way to use it, or which function should I use instead?

Thanks for any answers.

P.S. the actual problem concerns longer sequences with way lower p-values where the Poisson distribution is more appropriate. I just picked a simple example with a single letter and thus high p-value

Reply via email to