Dear list,
I have N=1850923 drawings of characters which can be either A,T,C or G.
The probability of C is p=0.256903, so I would expect
lambda=N*p=1850923*0.256903=475508 C's.
Indeed, I got k=560073.
According to my understanding, the probability to get 560073 or more C's
in 1850923 drawings should be gsl_cdf_poisson_Q(k-1,
lambda)=gsl_cdf_poisson_Q(560072, 475508)
I get slightly but not dramatically more than expected, so I would
expect a p-value of, whatwever, 0.2 or so.
However, I got 0.
I would expect the p-value in the same range than when getting 5 instead
of 4 drawings, so I did a series:
gsl_cdf_poisson_Q( 5, 4) = 0.21487
gsl_cdf_poisson_Q( 56, 47) = 0.0859409
gsl_cdf_poisson_Q( 560, 475) = 6.65119e-05
gsl_cdf_poisson_Q( 5600, 4755) = 4.39096e-33
gsl_cdf_poisson_Q( 56007, 47550) = 1.46886e-311
gsl_cdf_poisson_Q(560072, 475507) = 0
The results starting from the second line are certainly not what I expect.
Probably I have a fundamental misunderstanding of gsl_cdf_poisson_Q?
What would be the proper way to use it, or which function should I use
instead?
Thanks for any answers.
P.S. the actual problem concerns longer sequences with way lower
p-values where the Poisson distribution is more appropriate. I just
picked a simple example with a single letter and thus high p-value