On Wed, May 19, 2010 at 4:51 PM, Lucia Rueda <lucia.ru...@ba.ieo.es> wrote:

>
> Hi Joris,
>
> We're using mgcv.
>
> We have data on abundance of groupers on line transects that have the same
> length.

I only now realized groupers are actually fish :-). Should work on my
english skills...


> My coworker has selected a bunch of variables and he has calculated
> them in terms of total area in different sizes of buffers around the
> centroid of the transect. He has run gam models (quasipoisson, mgcv) for
> each explanatory variable at each size of buffer.


Here you lost me a bit. How should I imagine those buffers? Is it, as Simon
said, some area? Then that would mean you measure eg salinity along the
transect, and average the numbers using a window of a specific size? Or am I
seeing it wrong?

Then he has selected the
> signifficant variables. Some variables explain a higher percentage of
> deviance at different sizes of buffers. And now he wants to build a gam
> model trying the different explanatory variables but using the values that
> correspond to the size of the buffer where they explain a higher deviance,
> so one variable might have the values of a smaller scale whereas other
> might
> correspond to a higher buffer size (I don't know if I made myself clear). I
> am wondering if this is correct.
>

It seems not correct to me. Model building in these frameworks, especially
when using inference, should be driven by hypothesis, not by any correlation
in the data. Especially with smooths one has to be very careful.

Another issue is the correlation between environmental variables, They often
covary along transects, meaning that you can have confounding and even
aliasing in your dataset. This has to be checked and taken into account
_before_ building the models. I have the impression that his approach does
not take care of this.

Next, I believe that data should be used as raw as possible, to not
jeopardize the interpretation. If you use different buffer sizes, you can't
just say that variable X and Y contribute significantly to the explanation
of the variation, but that variable X and Y contributes significantly,
depending on the scale it is measured.

It also depends on whether your goal is purely predictive, or if you want to
do inference. In case you want to conclude something about the significance
of the parameters, his approach seems unvalid to me. How to explain that the
significance of a variable depends on the scale of measurement? One assumes
a continuous relation -unless working with factors- so the scale shouldn't
make much of a difference anyway. If you can predict the number of groupers
by the amount of bald men in Hong-Kong, by all means, do so. But I wouldn't
formulate a scientific conclusion based on the significance of that model,
if you get my drift.

Also I don't know if he should include an offset in spite all the transects
> have the same length.
>
Do you mean an intercept? In that case I'd always include one, except in
very specific cases.

>
> I'm in charge of looking at the spatial correlation once he builds the
> model. I don't know much about it but I was thinking of doing a Moran test,
> correlogram and variogram and then if there's spatial autocorrelation doing
> gamm, sar or gee.
>
Gamm is a very powerful tool, but -if I understood Simon's book correctly-
you cannot trust the anova's on the gam-component of the gamm-object when
using link functions. LR tests can give some information, but there is not a
solid statistical framework yet for formal hypothesis testing of those
models.

I also wonder why building a model without, and then doing the same with the
correct variance-covariance structure. Personally, I'd do it the other way
around. Not that it will change much about the predictions, but it
definitely will change the inference.

In any case, all of these are my personal opinions on a problem I do not
understand fully. It's some general considerations, feel free to think
different.

>
> Thanks,
>
> Lucia
> --
> View this message in context:
> http://r.789695.n4.nabble.com/offset-in-gam-and-spatial-scale-of-variables-tp2222483p2222976.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to