Re: [R-sig-eco] testing for distribution

2009-05-13 Thread Peter Solymos
Dear Jacob,

Erika was right, you just have to perform a goodness of fit test. Bit
it is easier
to inspect your residual deviance.
It follows a Chi-sqared distribution, where the expected value should
be close to
the degrees of freedom if the fit is good. To get a P value for an
object of class
"negbin" (inheriting from glm and lm), use (note, H0: the fit is good):

library(MASS)
mod <- glm.nb(...your model...)
1-pchisq(mod$deviance, mod$df.residual)

If you are using other functions (i.e. in package pscl), the structure
of the returned object might change,
in this case simply type the numbers instead.

Cheers,

Péter

Péter Sólymos, PhD
Postdoctoral Fellow
Department of Mathematical and Statistical Sciences
University of Alberta
Edmonton, Alberta, T6G 2G1
Canada
email <- paste("solymos", "ualberta.ca", sep = "@")



On Wed, May 13, 2009 at 12:17 PM, Erika Mudrak  wrote:
> Jacob-  You can use a Chi-squared goodness of fit - chisq.test() for discrete 
> distributions like the negative binomial and a Kolmogorov-Smirnoff test- 
> ks.test() for continuous distributions.      They will both produce a p-value 
> which tests the null hypothesis that your data come from the given 
> distribution with stated parameters.    Use the parameter estimates from your 
> fitdistr() results. So if p>0.05 (or 0.1 or whatever), your data come from 
> that distribution.
>
> For Discrete distributions, try something like:
> fit=fitdistr(.)
> chisq.test(x=ActualData, y=rnbinom(n=length(ActualData), k=fit.k, mu=fit.mu))
> #I think this is right, I haven't actually tried it...
> # This is akin to quantitatively comparing your histograms...
>
>
> For continous distributions (such as beta), the code would be this:
> fit=fitdistr(...)
> ks.test(ActualData, "pbeta", shape1=fit$estimate[1],shape2=fit$estimate[2])
> # I've done this successfully
>
> You can use AIC to test if another distribution fits your data better than 
> negative binomial does.  I think it's possible for your data to "pass" the 
> Chi-Squared/Kolmogorov-Smirnoff test for two different distributions, but it 
> will fit one better than another.
>
> Erika Mudrak
>
>
> ---
> Erika Mudrak
> Graduate Student
> Department of Botany
> University of Wisconsin-Madison
> 430 Lincoln Dr
> Madison WI, 53706
> 608-265-2191
> mud...@wisc.edu
>
> - Original Message -
> From: "Capelle, Jacob" 
> Date: Tuesday, May 12, 2009 11:00 am
> Subject: [R-sig-eco]  testing for distribution
> To: r-sig-ecology@r-project.org
>
>
>> Dear all,
>>
>> I have a kind of a theoretical question from which I hope it might
>> interest you and hopefully can help me a bit.
>>
>> In order to obtain ecological (surrvey) data, I try to make a
>> prediction about the accuracy of a sampling tool to estimate mussel
>> density. For this reason I took a lot of samples at a certain fixed
>> location and counted the amount of mussels in each sample. Because
>> mussels are aggregated on the sediment, I had a lot of zero values. To
>> estimate the sample size I used a binomial distribution and obtained
>> the k value and the mu from the fitdistr(x,"negative binomial") (MASS).
>>
>> The question I have is: how can I test if this distribution accurately
>> described my (zero inflated count) data?
>>
>> I am a bit familiar with the AIC but since I only have counts on one
>> variable I cannot perform a GLS.
>> Creating a vector with rnbinom() using the k and mu from the
>> fitdistr() I plotted a histogram and compared it with my data, this
>> showed that is was roughly comparable, but I want to quantify this.
>>
>> I have a biological background not a statistical one, so I realize I
>> can ask silly questions.
>> But I hope someone can give me some hints.
>>
>> Kind regards,
>>
>> Jacob Capelle
>>
>> PhD student
>> Wageningen Imares
>> The Netherlands
>> jacob.cape...@wur.nl <
>>
>> ___
>> R-sig-ecology mailing list
>> R-sig-ecology@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
>

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] testing for distribution

2009-05-13 Thread Erika Mudrak
Jacob-  You can use a Chi-squared goodness of fit - chisq.test() for discrete 
distributions like the negative binomial and a Kolmogorov-Smirnoff test- 
ks.test() for continuous distributions.  They will both produce a p-value 
which tests the null hypothesis that your data come from the given distribution 
with stated parameters.Use the parameter estimates from your fitdistr() 
results. So if p>0.05 (or 0.1 or whatever), your data come from that 
distribution. 

For Discrete distributions, try something like: 
fit=fitdistr(.)
chisq.test(x=ActualData, y=rnbinom(n=length(ActualData), k=fit.k, mu=fit.mu))
#I think this is right, I haven't actually tried it...
# This is akin to quantitatively comparing your histograms...


For continous distributions (such as beta), the code would be this: 
fit=fitdistr(...)
ks.test(ActualData, "pbeta", shape1=fit$estimate[1],shape2=fit$estimate[2])
# I've done this successfully

You can use AIC to test if another distribution fits your data better than 
negative binomial does.  I think it's possible for your data to "pass" the 
Chi-Squared/Kolmogorov-Smirnoff test for two different distributions, but it 
will fit one better than another. 

Erika Mudrak


---
Erika Mudrak
Graduate Student
Department of Botany
University of Wisconsin-Madison
430 Lincoln Dr
Madison WI, 53706
608-265-2191
mud...@wisc.edu

- Original Message -
From: "Capelle, Jacob" 
Date: Tuesday, May 12, 2009 11:00 am
Subject: [R-sig-eco]  testing for distribution
To: r-sig-ecology@r-project.org


> Dear all,
>  
> I have a kind of a theoretical question from which I hope it might 
> interest you and hopefully can help me a bit.
>  
> In order to obtain ecological (surrvey) data, I try to make a 
> prediction about the accuracy of a sampling tool to estimate mussel 
> density. For this reason I took a lot of samples at a certain fixed 
> location and counted the amount of mussels in each sample. Because 
> mussels are aggregated on the sediment, I had a lot of zero values. To 
> estimate the sample size I used a binomial distribution and obtained 
> the k value and the mu from the fitdistr(x,"negative binomial") (MASS).
>  
> The question I have is: how can I test if this distribution accurately 
> described my (zero inflated count) data?
>  
> I am a bit familiar with the AIC but since I only have counts on one 
> variable I cannot perform a GLS. 
> Creating a vector with rnbinom() using the k and mu from the 
> fitdistr() I plotted a histogram and compared it with my data, this 
> showed that is was roughly comparable, but I want to quantify this.
>  
> I have a biological background not a statistical one, so I realize I 
> can ask silly questions.
> But I hope someone can give me some hints. 
>  
> Kind regards,
>  
> Jacob Capelle
>  
> PhD student
> Wageningen Imares
> The Netherlands
> jacob.cape...@wur.nl < 
> 
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] testing for distribution

2009-05-13 Thread Manuel Spínola

Dear Jacob,

May be you can use cluster sampling or adaptive cluster sampling  
(Design-based estimation) to get a density estimate.

Best,

Manuel Spínola

Capelle, Jacob wrote:

Dear all,
 
I have a kind of a theoretical question from which I hope it might interest you and hopefully can help me a bit.
 
In order to obtain ecological (surrvey) data, I try to make a prediction about the accuracy of a sampling tool to estimate mussel density. For this reason I took a lot of samples at a certain fixed location and counted the amount of mussels in each sample. Because mussels are aggregated on the sediment, I had a lot of zero values. To estimate the sample size I used a binomial distribution and obtained the k value and the mu from the fitdistr(x,"negative binomial") (MASS).
 
The question I have is: how can I test if this distribution accurately described my (zero inflated count) data?
 
I am a bit familiar with the AIC but since I only have counts on one variable I cannot perform a GLS. 
Creating a vector with rnbinom() using the k and mu from the fitdistr() I plotted a histogram and compared it with my data, this showed that is was roughly comparable, but I want to quantify this.
 
I have a biological background not a statistical one, so I realize I can ask silly questions.
But I hope someone can give me some hints. 
 
Kind regards,
 
Jacob Capelle
 
PhD student

Wageningen Imares
The Netherlands
jacob.cape...@wur.nl  


___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


  



--
Manuel Spínola, Ph.D.
Instituto Internacional en Conservación y Manejo de Vida Silvestre
Universidad Nacional
Apartado 1350-3000
Heredia
COSTA RICA
mspin...@una.ac.cr
mspinol...@gamil.com
Teléfono: (506) 2277-3598
Fax: (506) 2237-7036

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology