Re: [R] Nonnormal Residuals and GAMs

2013-11-08 Thread Robert Rigby
 Hi Colin,

The GAMLSS package allows modelling of the response variable distribution
using either Exponential family or non-Exponential family distributions.
It also allows modelling of the scale parameter
(and hence the dispersion parameter for Exponential family distributions)
using explanatory variables.
This can be important for selecting mean model terms
and is particularly important when interest lies in the variance and/or
quantiles
of the response variable.

Robert Rigby


On 06/11/13 21:46, Collin Lynch wrote:
 Greetings, My question is more algorithmic than prectical.  What I am
 trying to determine is, are the GAM algorithms used in the mgcv package
 affected by nonnormally-distributed residuals?

 As I understand the theory of linear models the Gauss-Markov theorem
 guarantees that least-squares regression is optimal over all unbiased
 estimators iff the data meet the conditions linearity, homoscedasticity,
 independence, and normally-distributed residuals.  Absent the last
 requirement it is optimal but only over unbiased linear estimators.

 What I am trying to determine is whether or not it is necessary to check
 for normally-distributed errors in a GAM from mgcv.  I know that the
 unsmoothed terms, if any, will be fitted by ordinary least-squares but I
 am unsure whether the default Penalized Iteratively Reweighted Least
 Squares method used in the package is also based upon this assumption or
 falls under any analogue to the Gauss-Markov Theorem.

 Thank you in advance for any help.

   Sincrely,
   Collin Lynch

Companies Act 2006 : http://www.londonmet.ac.uk/companyinfo

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nonnormal Residuals and GAMs

2013-11-07 Thread Simon Wood

If you use GCV smoothness selection then, in the Gaussian case, the key
assumptions are constant variance and independence. As with linear
modelling, the normality assumption only comes in when you want to find
confidence intervals or p-values. (The GM Thm does not require normality 
btw. but I don't know if it has a penalized analogue).


With REML smoothness selection it's less clear (at least to me).

Beyond Gaussian the situation is much as it is with GLMs. The key
assumptions are independence and that the mean variance relationship is
correct. The theory of quasi-likelihood tells you that you can make
valid inference based only on specifying the mean-variance relationship
for the response, rather than the whole distribution, with the price
being a small loss of efficiency. It follows that getting the
distribution exactly right is of secondary importance.

It's also quite easy to be misled by normal qq plots of the deviance
residuals when you have low count data. For example, section 4 of
  http://opus.bath.ac.uk/27091/1/qq_gam_resub.pdf
shows a real example where the usual qq plots look awful, suggesting
massive zero inflation, but if you compute the correct reference
quantiles for the qq plot you find that there is nothing wrong and no
evidence of zero inflation.

best,
Simon

ps. in response to the follow up discussion: The default link depends on 
the family, rather than being a  gam (or glm) default. Eg the default is 
log for the Poisson, but identity for the Gaussian.



On 06/11/13 21:46, Collin Lynch wrote:

Greetings, My question is more algorithmic than prectical.  What I am
trying to determine is, are the GAM algorithms used in the mgcv package
affected by nonnormally-distributed residuals?

As I understand the theory of linear models the Gauss-Markov theorem
guarantees that least-squares regression is optimal over all unbiased
estimators iff the data meet the conditions linearity, homoscedasticity,
independence, and normally-distributed residuals.  Absent the last
requirement it is optimal but only over unbiased linear estimators.

What I am trying to determine is whether or not it is necessary to check
for normally-distributed errors in a GAM from mgcv.  I know that the
unsmoothed terms, if any, will be fitted by ordinary least-squares but I
am unsure whether the default Penalized Iteratively Reweighted Least
Squares method used in the package is also based upon this assumption or
falls under any analogue to the Gauss-Markov Theorem.

Thank you in advance for any help.

Sincrely,
Collin Lynch.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603   http://people.bath.ac.uk/sw283

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nonnormal Residuals and GAMs

2013-11-06 Thread David Winsemius

On Nov 6, 2013, at 12:46 PM, Collin Lynch wrote:

 Greetings, My question is more algorithmic than prectical.  What I am
 trying to determine is, are the GAM algorithms used in the mgcv package
 affected by nonnormally-distributed residuals?
 
 As I understand the theory of linear models the Gauss-Markov theorem
 guarantees that least-squares regression is optimal over all unbiased
 estimators iff the data meet the conditions linearity, homoscedasticity,
 independence, and normally-distributed residuals.  Absent the last
 requirement it is optimal but only over unbiased linear estimators.
 
 What I am trying to determine is whether or not it is necessary to check
 for normally-distributed errors in a GAM from mgcv.  I know that the
 unsmoothed terms, if any, will be fitted by ordinary least-squares but I
 am unsure whether the default Penalized Iteratively Reweighted Least
 Squares method used in the package is also based upon this assumption or
 falls under any analogue to the Gauss-Markov Theorem.

The default functional link for mgcv::gam is log, so I doubt that your 
theoretical understanding applies to GAM's in general. When Simon Wood wrote 
his book on GAMs his first chapter was on linear models, his second chapter was 
on generalized lienar models at which point he had written over 100 pages, and 
only then did he introduce GAMs. I think you need to follow the same 
progression, and this forum is not the correct one for statistics education. 
Perhaps pose your follow-up questions to CrossValidated.com

-- 
David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nonnormal Residuals and GAMs

2013-11-06 Thread Collin Lynch
 The default functional link for mgcv::gam is log, so I doubt that
 your theoretical understanding applies to GAM's in general. When Simon
 Wood wrote his book on GAMs his first chapter was on linear models, his
 second chapter was on generalized lienar models at which point he had
 written over 100 pages, and only then did he introduce GAMs. I think
 you need to follow the same progression, and this forum is not the
 correct one for statistics education. Perhaps pose your follow-up
 questions to CrossValidated.com

David, thank you for your advice, has the default changed for mgcv::gam?
Based upon the help pages for the version I have (1.7-27) I had thought
that the default family was gaussian() with link identity.

In any event I will look again at Simon Woods' book and consider
CrossValidated in the future.

Best,
Collin.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nonnormal Residuals and GAMs

2013-11-06 Thread David Winsemius

On Nov 6, 2013, at 5:44 PM, Collin Lynch wrote:

 The default functional link for mgcv::gam is log, so I doubt that
 your theoretical understanding applies to GAM's in general. When Simon
 Wood wrote his book on GAMs his first chapter was on linear models, his
 second chapter was on generalized lienar models at which point he had
 written over 100 pages, and only then did he introduce GAMs. I think
 you need to follow the same progression, and this forum is not the
 correct one for statistics education. Perhaps pose your follow-up
 questions to CrossValidated.com
 
 David, thank you for your advice, has the default changed for mgcv::gam?
 Based upon the help pages for the version I have (1.7-27) I had thought
 that the default family was gaussian() with link identity.
 
 In any event I will look again at Simon Woods' book and consider
 CrossValidated in the future.

I may have gotten this wrong by only referring to my memory. I'm not able to 
tell by looking at either ?mgcv::gam or ?gam::gam pages where I picked 
up this notion.

-- 
David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nonnormal Residuals and GAMs

2013-11-06 Thread COLLINL
 The default functional link for mgcv::gam is log, so I doubt that
 your theoretical understanding applies to GAM's in general. When Simon
 Wood wrote his book on GAMs his first chapter was on linear models, his
 second chapter was on generalized lienar models at which point he had
 written over 100 pages, and only then did he introduce GAMs. I think
 you need to follow the same progression, and this forum is not the
 correct one for statistics education. Perhaps pose your follow-up
 questions to CrossValidated.com

 David, thank you for your advice, has the default changed for mgcv::gam?
 Based upon the help pages for the version I have (1.7-27) I had thought
 that the default family was gaussian() with link identity.

 In any event I will look again at Simon Woods' book and consider
 CrossValidated in the future.

 I may have gotten this wrong by only referring to my memory. I'm not able
 to tell by looking at either ?mgcv::gam or ?gam::gam pages where I
 picked up this notion.

Ok, thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.