The general idea of the KS test (and others) can be applied to discrete data, 
but the implementation in R assumes continuous data (does not have the needed 
adjustments to deal with ties).  The chi-square and other tests suffer from the 
same problems in your case.  In all cases the null hypothesis is that the data 
comes from the stated distribution (poisson in your case), failing to reject 
the null hypothesis does not prove that the data comes from that distribution, 
only shows that we cannot disprove that it comes from that distribution.  With 
large sample sizes, your data could come from a true distribution that for all 
practical purposes is equivalent to the poisson, but due to slight rounding or 
other errors has probabilities slightly different for some values (a difference 
that no one would reasonably care about), but these tests can show a 
significant difference.

Usually it is better to just show that your data and the theoretical 
distribution are close enough to each other rather than depending on a formal 
test.  The plots and diagnostics in the vcd package are a good choice here, you 
could also use the KS test statistic (ignoring the p-value and warnings) as 
another measure, but plot the empirical and theoretical distributions to see 
what the value means and how close they are.

Another option is the vis.test function in TeachingDemos, it lets you plot data 
from the theoretical distribution and the actual data, then see if you can 
visually tell the difference.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -----Original Message-----
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of m.marcinmichal
> Sent: Thursday, April 28, 2011 3:54 PM
> To: r-help@r-project.org
> Subject: Re: [R] Kolmogorov-Smirnov test
> 
> Hi,
> thanks for response.
> 
> >> The Kolmogorov-Smirnov test is designed for distributions on
> continuous
> >> variable, not discrete like the >> poisson.  That is why you are
> getting
> >> some of your warnings.
> 
> I read in "Fitting distributions whith R" Vito Ricci page 19  that:
> "...
> Kolmogorov-Smirnov test is used to decide if a sample comes from a
> population with a specific distribution. I can be applied both for
> discrete
> (count) data and continuous binned (even if some Authors do not agree
> on
> this point) and both for continuous variables" but in page 16 i read
> that
> "... while the Kolmogorov-Smirnov and Anderson-Darling tests are
> restricted
> to continuous distribution" and i was little confused, but try this
> test to
> my discrete data.
> 
> Generally in first step, I try fit my data to discret or continuous
> distribution (task: find distribution for emirical data). Question, Can
> I
> approximate my discret data by the continuous  distribution? I know
> that
> sometmies we can poisson distribution approxime by the normal
> distribution.
> But what happen if I use another distribution like log normall or gama?
> 
> I done another three tests - chi square test. But this tests return
> three
> another results. Suppose that we have the same data i.e vectorSentence.
> Test:
> 1. One
> param <- fitdistr(vectorSentence, "poisson")
> chisq.test(table(vectorSentence), p = dpois(1:9, lambda=param[[1]][1]),
> rescale.p = TRUE)
> 
> X-squared = 272.8958, df = 8, p-value < 2.2e-16
> 
> 2. Two
> library(vcd)
> gf <- goodfit(vectorSentence, type="poisson", method="MinChisq")
> summary(gf)
> 
>              X^2 df     P(> X^2)
> Pearson 404.3607  8 2.186332e-82
> 
> 3. Three
> fdistc <- fitdist(vectorSentence, "pois")
> g<-gofstat(fdistc, print.test = TRUE)
> 
> Chi-squared statistic:  535.344
> Degree of freedom of the Chi-squared distribution:  8
> Chi-squared p-value:  1.824112e-110
> 
> Question which results is correct?
> 
> I know that I can reject null hipotesis: data don't come from poisson
> distribution. But which result is correct?
> 
> For another side I trying to accomplish another problem:
> 1. Suppose that we have a reference data (dr) from some process (pr)
> which
> save in vectorSentence.
> 2. Suppose that we have a two another sample data d1, d2 from another
> two
> process p1, p2
> 3. We know that all data is discrete.
> 
> Task:
> One: check if data d1, d2 is equal to reference data (dr) - this is not
> a
> problem. I use a cdf, histogram, another mensure etc. chi square test.
> But
> can I use Kolmogorov-Smirnov  to test cumulative distribution function
> hipotesis i.e F(d1) = F(d) for my data?
> Two: find dr distributions discret or if possible continuous
> 
> Best
> 
> Marcin M.
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-
> Smirnov-test-tp3479506p3482349.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to