Hi Bex

Did you mean the dredge function from the MuMIn package? I'm not really sure 
what you mean by "it does not take into account correlates or significance". 
The point about dredge is that it is part of a family of functions for 
multi-model inference which specifically avoid concepts of "significance". It 
must not be used on its own. ANY stepwise procedure, forwards, backwards and 
including all subsets (dredge) will give BIASED parameter estimates if you just 
pick the best model. The application of the multi-model inference approach 
guards against this, and MuMIn provides all the functions to do this properly. 
I'd recommend anyone thinking of doing stepwise variable selection to learn 
more about this by reading the book by Burnham and Anderson, who give an 
excellent description of the pitfalls of dredging and stepwise selection, and 
explains why a multimodel approach, which completely avoids concepts of 
significance, is the most appropriate manner to compare alternative models o!
 f environmental data (i.e. not designed experiments) and make predictions.

Regards
Mike




-----Original Message-----
From: r-sig-ecology-boun...@r-project.org 
[mailto:r-sig-ecology-boun...@r-project.org] On Behalf Of Rebecca Ross
Sent: 28 September 2011 10:45
To: r-sig-ecology@r-project.org
Subject: Re: [R-sig-eco] gam variable selection

Hi Marco,
Having recently been working with gams myself I would suggest a procedure 
whereby you build your model in a forward stepwise approach first, having run 
individual gams for each of your variables and selecting the significant 
variable with the best AIC as your first variable, and iteratively trying out 
the other variables as 2nd in the gam, selecting the combination with the best 
AIC, and repeating until you get no further AIC improvement.

I found it advisable to always first run each gam with all smooth functions 
applied (and with number of knots restricted to avoid overfitting the model 
using the term k=4 for 4 knots e.g. gam(x~s(y,k=4)+s(z,k=4), family=Gaussian)) 
then check the plots for each of your variables and rerun each model with 
linear functions applied as advised by the plots.

Also remember to throw out significantly correlated variables once one of your 
correlates has been selected.

The backwards stepwise model build could then be run to check the forwards 
build and using a global model that has excluded the thrown out correlates.

Also worth knowing, but not worth relying on, is that there is a function 
called "dredge" which will run through your global model and list the potential 
model builds in order of best AIC. This is a variable selection algorithm but 
it does not take into account correlates or significance so it is best used 
only as advice and another check for a longhand build.

All the best,
Bex

Research Assistant 
University of Plymouth




-----Original Message-----
From: r-sig-ecology-boun...@r-project.org 
[mailto:r-sig-ecology-boun...@r-project.org] On Behalf Of 
r-sig-ecology-requ...@r-project.org
Sent: 27 September 2011 11:00
To: r-sig-ecology@r-project.org
Subject: R-sig-ecology Digest, Vol 42, Issue 16

Send R-sig-ecology mailing list submissions to
        r-sig-ecology@r-project.org

To subscribe or unsubscribe via the World Wide Web, visit
        https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
or, via email, send a message with subject or body 'help' to
        r-sig-ecology-requ...@r-project.org

You can reach the person managing the list at
        r-sig-ecology-ow...@r-project.org

When replying, please edit your Subject line so it is more specific than "Re: 
Contents of R-sig-ecology digest..."


Today's Topics:

   1. gam variable selection (Marco Helbich)
   2. Re: gam variable selection (Gavin Simpson)


----------------------------------------------------------------------

Message: 1
Date: Tue, 27 Sep 2011 08:54:52 +0200
From: Marco Helbich <marco.helb...@gmx.at>
To: r-sig-ecology@r-project.org
Subject: [R-sig-eco] gam variable selection
Message-ID: <4e81733c.8090...@gmx.at>
Content-Type: text/plain; charset=ISO-8859-15; format=flowed

Dear list,

I am studying the influence of several environmental factors (numeric &
dummies) on species densities (= numeric) using the gam() function with a 
gaussian link function in the mgcv package. As stated in Wood (2006) there is 
no variable selection algorithm.

Is it an appropriate (iterative) approach to drop the predictor being least 
significant (eg. p > 0.05), refit the model, compare the GCV/AIC score and so 
forth. Should I first focus on the smoothing functions or fixed effects? Or is 
such a distinction not important at all?

Perhaps someone has more experience with GAMs and can give me a helping hand? 
Thanks in advance!

Best
Marco
--
Marco Helbich
Department of Geography
University of Heidelberg



------------------------------

Message: 2
Date: Tue, 27 Sep 2011 10:40:27 +0100
From: Gavin Simpson <gavin.simp...@ucl.ac.uk>
To: Marco Helbich <marco.helb...@gmx.at>
Cc: r-sig-ecology@r-project.org
Subject: Re: [R-sig-eco] gam variable selection
Message-ID: <1317116427.2714.3.ca...@chrysothemis.geog.ucl.ac.uk>
Content-Type: text/plain; charset="UTF-8"

On Tue, 2011-09-27 at 08:54 +0200, Marco Helbich wrote:
> Dear list,
> 
> I am studying the influence of several environmental factors (numeric &
> dummies) on species densities (= numeric) using the gam()
> function with a gaussian link function in the mgcv package. As stated in
> Wood (2006) there is no variable selection algorithm.
> 
> Is it an appropriate (iterative) approach to drop the predictor being
> least significant (eg. p > 0.05), refit the model, compare the GCV/AIC
> score and so forth. Should I first focus on the smoothing functions or 
> fixed effects? Or is such a distinction not important at all?
> 
> Perhaps someone has more experience with GAMs and can give me a helping
> hand? Thanks in advance!

You could do that, but I would be sceptical of the results.

Marra and Wood (2011, Computational Statistics and Data Analysis 55;
2372-2387) compare various approaches for feature selection in GAMs.
IIRC, they concluded that an additional penalty term in the smoothness
selection procedure gave the best results. This can be activated in
mgcv::gam() by using the `select = TRUE` argument/setting.

HTH

G

> Best
> Marco

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



------------------------------

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


End of R-sig-ecology Digest, Vol 42, Issue 16

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
-- 
This message (and any attachments) is for the recipient only. NERC
is subject to the Freedom of Information Act 2000 and the contents
of this email and any reply you make may be disclosed by NERC unless
it is exempt from release under the Act. Any material supplied to
NERC may be stored in an electronic records management system.
_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Reply via email to