Re: [R-sig-eco] Using residuals as dependent variables

2012-06-22 Thread Chris Mcowen
Steve and all others who have made suggestions,

Thanks for this, I am busy reading suggested papers and investigating
various packages.

The reason I wasn't doing one big model was I am interested in why some
points don't "conform" to the oceanographic model - which in theory should
explain ~ 99% of the variance. My approach has been - this is the best model
I can make that should perform very well, it is not - so why not?

In essence I am trying to tease out what factors stop this model from
performing as it should in theory - the problem is the factors are likely to
be different for most observations.

I have made a big model (all oceanographic/socio etc variables) and I get a
good fit - AIC based selection - r-squared 82%, but this doesn't indicate
why the oceanographic model performed not so well- beyond saying it is due
to socio/economic factors. Furthermore, I would like to say socio factors
stop ... Australia.. from reaching the catch it theoretically could given
the productivity of the surrounding oceans. Whilst political reasons stop
Somalia from reaching the theoretical maximum catch - and this is where I am
struggling.

   
Chris

-Original Message-
From: Steve Brewer [mailto:jbre...@olemiss.edu] 
Sent: 22 June 2012 13:48
To: Chris Mcowen; r-sig-ecology@r-project.org
Subject: Re: [R-sig-eco] Using residuals as dependent variables

Chris,

Another thing to keep in mind is that when you run the regression analysis
using residuals, as opposed to putting all predictors in the multiple
regression from the beginning (oceanographic data and productivity data),
you are in effect inflating the error df for the analysis of catch residuals
against productivity. In the multiple regression approach, one df is removed
from the error df for every predictor variable in the model.
When you run it as two separate analyses, as you propose, the df removed
from the error df in the first analysis (the one with oceanographic data)
are are put back in into the error df for the second analysis of catch
residuals vs productivity. This is usually not a big deal when the first
analysis contains only one or two predictors and lots of observations. But
when the reverse is true, you're more likely to get a significant
relationship between catch residuals and productivity even when none really
exists.

As others have suggested, why not put productivity and oceanographic data
together in a single mult reg model?

Hope this helps.

Steve



J. Stephen Brewer
Professor
Department of Biology
PO Box 1848
 University of Mississippi
University, Mississippi 38677-1848
 Brewer web page - http://home.olemiss.edu/~jbrewer/ FAX - 662-915-5144
Phone - 662-915-1077




On 6/21/12 12:06 PM, "Chris Mcowen"  wrote:

>Dear List,
>
> 
>
>I am wondering if the methodological approach I am taking is correct 
>and would be very grateful if people could comment and make 
>suggestions, much appreciated.
>
> 
>
>I have developed the best model ( AIC model selection) using 
>oceanographic data ( i.e. SST, chlorophyll, NPP...x6) to predict 
>reported fisheries catch for 52 countries.
>
> 
>
>I then extract the residuals from the model and anything positive has a 
>higher catch than would be predicted given the level of productivity in 
>the country, with the opposite being true.
>
> 
>
>What I want to do is:
>
> 
>
>1.   Regress a suite of ecological and socioeconomic variables against
>the residuals from the oceanographic model to determine which factors 
>cause some countries to be above and some below. I.E as trophic level 
>increase the residuals become increasingly negative.
>
>2.   Ideally ( and I am unsure how or if it is possible) work out for
>each country which variables/s cause the poor fit of that country to 
>the oceanographic model.
>
> 
>
>Thanks in advance for any suggestions / possible methods.
>
> 
>
>Chris
>
> 
>
>P.S - Below is the type of conclusions I am drawing
>
> 
>
>There are a number of reasons why some countries have higher / lower 
>catch than you would expect.
>
> 
>
>For example if the target fishery is a high trophic level species then 
>the link between primary productivity and catch will be lesser than if 
>the species was a lower trophic level ( transfer efficiency etc etc)- 
>resulting in a negative residual.  Alternatively it maybe that the area 
>is being overfished i.e. the north sea meaning more fish are being 
>caught in that region than it can sustain - resulting in a high 
>positive residual (as predicted by the model)
>
> 
>
>In reality it is likely a combination of this plus other, however some 
>factors will be relevant to others i.e. Somalia has a really low catch 
>compared to its productivity likely due to piracy and poor reporting 

Re: [R-sig-eco] Using residuals as dependent variables

2012-06-22 Thread Steve Brewer
Chris,

Another thing to keep in mind is that when you run the regression analysis
using residuals, as opposed to putting all predictors in the multiple
regression from the beginning (oceanographic data and productivity data),
you are in effect inflating the error df for the analysis of catch
residuals against productivity. In the multiple regression approach, one
df is removed from the error df for every predictor variable in the model.
When you run it as two separate analyses, as you propose, the df removed
from the error df in the first analysis (the one with oceanographic data)
are are put back in into the error df for the second analysis of catch
residuals vs productivity. This is usually not a big deal when the first
analysis contains only one or two predictors and lots of observations. But
when the reverse is true, you're more likely to get a significant
relationship between catch residuals and productivity even when none
really exists.

As others have suggested, why not put productivity and oceanographic data
together in a single mult reg model?

Hope this helps.

Steve



J. Stephen Brewer 
Professor 
Department of Biology
PO Box 1848
 University of Mississippi
University, Mississippi 38677-1848
 Brewer web page - http://home.olemiss.edu/~jbrewer/
FAX - 662-915-5144
Phone - 662-915-1077




On 6/21/12 12:06 PM, "Chris Mcowen"  wrote:

>Dear List,
>
> 
>
>I am wondering if the methodological approach I am taking is correct and
>would be very grateful if people could comment and make suggestions, much
>appreciated.
>
> 
>
>I have developed the best model ( AIC model selection) using oceanographic
>data ( i.e. SST, chlorophyll, NPP...x6) to predict reported fisheries
>catch
>for 52 countries.
>
> 
>
>I then extract the residuals from the model and anything positive has a
>higher catch than would be predicted given the level of productivity in
>the
>country, with the opposite being true.
>
> 
>
>What I want to do is:
>
> 
>
>1.   Regress a suite of ecological and socioeconomic variables against
>the residuals from the oceanographic model to determine which factors
>cause
>some countries to be above and some below. I.E as trophic level increase
>the
>residuals become increasingly negative.
>
>2.   Ideally ( and I am unsure how or if it is possible) work out for
>each country which variables/s cause the poor fit of that country to the
>oceanographic model.
>
> 
>
>Thanks in advance for any suggestions / possible methods.
>
> 
>
>Chris 
>
> 
>
>P.S - Below is the type of conclusions I am drawing
>
> 
>
>There are a number of reasons why some countries have higher / lower catch
>than you would expect.
>
> 
>
>For example if the target fishery is a high trophic level species then the
>link between primary productivity and catch will be lesser than if the
>species was a lower trophic level ( transfer efficiency etc etc)-
>resulting
>in a negative residual.  Alternatively it maybe that the area is being
>overfished i.e. the north sea meaning more fish are being caught in that
>region than it can sustain - resulting in a high positive residual (as
>predicted by the model)
>
> 
>
>In reality it is likely a combination of this plus other, however some
>factors will be relevant to others i.e. Somalia has a really low catch
>compared to its productivity likely due to piracy and poor reporting of
>statistics. 
>
> 
>
> 
>
> 
>
>
>   [[alternative HTML version deleted]]
>
>___
>R-sig-ecology mailing list
>R-sig-ecology@r-project.org
>https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[R-sig-eco] Using residuals as dependent variables

2012-06-22 Thread Dixon, Philip M [STAT]
1.   Regress a suite of ecological and socioeconomic variables against
the residuals from the oceanographic model to determine which factors cause 
some countries to be above and some below. I.E as trophic level increase the 
residuals become increasingly negative.

All my answers assume that all variables are measured at the same scale, so a 
multi-level model is not appropriate.

Absolutely no problem.  This is essentially equivalent to fitting a larger 
model with both oceanographic and ecological/socioeconomic variables.  If you 
go one step further, and first calculate residuals of the ecol/socio variables 
when regressed against oceanographic variables, the regression of the 
oceanographic residuals on the ecol/socio residuals is exactly the same as 
fitting the larger model.  The theory supporting this leads to added variable 
(or partial regression residual) plots.  

2.   Ideally ( and I am unsure how or if it is possible) work out for
each country which variables/s cause the poor fit of that country to the 
oceanographic model.

This may be a bit tricky.  The problem is potential confounding.  The 
regression of two sets of residuals suggested above tells you that a particular 
ecol/socio variable contributes to the larger only to the extent that that 
ecol/socio variable contains information not present in the set of 
oceanographic variables.  If an ecol/socio variable is causally important but 
confounded with some oceanographic variables, the approach will not detect the 
role of that ecol/socio variable.  If two ecol/socio variables are confounded, 
a regression model can not distinguish their contributions.  You have to think 
about your variables and their relationships to assess this.   

Assuming confounding is not an issue, fit the larger model with both 
oceanographic and ecol/socio variables.  For each country, calculate 
X\hat{\beta} for each ecol/socio variable.  That's the contribution of that 
variable for that country to the model prediction.  Compare those components to 
their sum for the country.
 
Philip Dixon

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Using residuals as dependent variables

2012-06-22 Thread Ivailo
On Fri, Jun 22, 2012 at 1:17 PM, Bob O'Hara  wrote:
...

> The simple answer is no it isn't:
> 

That's a great paper -- thanks for pointing to it, Bob!

...

>> 2.       Ideally ( and I am unsure how or if it is possible) work out for
>> each country which variables/s cause the poor fit of that country to the
>> oceanographic model.
>
> Try using partial residuals
> (http://en.wikipedia.org/wiki/Partial_residual_plot).

My thoughts followed another path and I hadn't thought about that. I
can just add that partial residual plots (called sometimes component
and residual plots) are implemented in the crPlot() function in the
car package that accompanies the quite readable book on regression "An
R Companion to Applied Regression, Second Edition" by Fox & Weisberg
(Sage, 2011).

Cheers,
Ivailo
-- 
UBUNTU: a person is a person through other persons.

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Using residuals as dependent variables

2012-06-22 Thread Bob O'Hara
On 06/21/2012 07:06 PM, Chris Mcowen wrote:
> Dear List,
>
>
>
> I am wondering if the methodological approach I am taking is correct and
> would be very grateful if people could comment and make suggestions, much
> appreciated.
The simple answer is no it isn't:

>
>
> I have developed the best model ( AIC model selection) using oceanographic
> data ( i.e. SST, chlorophyll, NPP...x6) to predict reported fisheries catch
> for 52 countries.
>
>
>
> I then extract the residuals from the model and anything positive has a
> higher catch than would be predicted given the level of productivity in the
> country, with the opposite being true.
>
>
>
> What I want to do is:
>
>
>
> 1.   Regress a suite of ecological and socioeconomic variables against
> the residuals from the oceanographic model to determine which factors cause
> some countries to be above and some below. I.E as trophic level increase the
> residuals become increasingly negative.
You could simply put everything into one big model.
> 2.   Ideally ( and I am unsure how or if it is possible) work out for
> each country which variables/s cause the poor fit of that country to the
> oceanographic model.

Try using partial residuals 
(http://en.wikipedia.org/wiki/Partial_residual_plot).

Bob

-- 

Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 798 40226
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org


[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Using residuals as dependent variables

2012-06-22 Thread Ivailo
On Thu, Jun 21, 2012 at 8:06 PM, Chris Mcowen  wrote:
> Dear List,
>
> I am wondering if the methodological approach I am taking is correct and
> would be very grateful if people could comment and make suggestions, much
> appreciated.
>
> I have developed the best model ( AIC model selection) using oceanographic
> data ( i.e. SST, chlorophyll, NPP...x6) to predict reported fisheries catch
> for 52 countries.
>
> I then extract the residuals from the model and anything positive has a
> higher catch than would be predicted given the level of productivity in the
> country, with the opposite being true.

Dear Chris,

it is difficult to comment without having more details about the model
(i.e. how many samples, variables, etc. you have) you're trying to
fit, but keep in mind that the residuals indicate that some variation
in the dependent variable has not been "explained" by the predictors
included in the model. If you need to explain the unexplained
variation (i.e. the residuals from the model), why don't you include
all the variables of interest just from the beginning?

> What I want to do is:
>
> 1.       Regress a suite of ecological and socioeconomic variables against
> the residuals from the oceanographic model to determine which factors cause
> some countries to be above and some below. I.E as trophic level increase the
> residuals become increasingly negative.
>
> 2.       Ideally ( and I am unsure how or if it is possible) work out for
> each country which variables/s cause the poor fit of that country to the
> oceanographic model.

My feeling is that you might want to perform a multilevel model on
your data, but as I just learn that matter myself I would recommend
you to check some of the wonderful resources on this topic --- either
"Data Analysis Using Regression and Multilevel/Hierarchical Models" by
Gelman & Hill (CUP, 2007) or "Mixed Effects Models and Extensions in
Ecology with R" by Zuur, Ieno, Walker, Saveliev & Smith (Springer,
2009).

> P.S - Below is the type of conclusions I am drawing
>
> There are a number of reasons why some countries have higher / lower catch
> than you would expect.
>
> For example if the target fishery is a high trophic level species then the
> link between primary productivity and catch will be lesser than if the
> species was a lower trophic level ( transfer efficiency etc etc)- resulting
> in a negative residual.  Alternatively it maybe that the area is being
> overfished i.e. the north sea meaning more fish are being caught in that
> region than it can sustain - resulting in a high positive residual (as
> predicted by the model)
>
> In reality it is likely a combination of this plus other, however some
> factors will be relevant to others i.e. Somalia has a really low catch
> compared to its productivity likely due to piracy and poor reporting of
> statistics.

This type of inference could be made by using a hierarchical model
setting where you relate individual catches to both catch-level
variables (as it appears you have already done that) and to
country-level variables (this you still want to do).

HTH,
Ivailo
-- 
UBUNTU: a person is a person through other persons.

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[R-sig-eco] Using residuals as dependent variables

2012-06-21 Thread Chris Mcowen
Dear List,

 

I am wondering if the methodological approach I am taking is correct and
would be very grateful if people could comment and make suggestions, much
appreciated.

 

I have developed the best model ( AIC model selection) using oceanographic
data ( i.e. SST, chlorophyll, NPP...x6) to predict reported fisheries catch
for 52 countries.

 

I then extract the residuals from the model and anything positive has a
higher catch than would be predicted given the level of productivity in the
country, with the opposite being true. 

 

What I want to do is:

 

1.   Regress a suite of ecological and socioeconomic variables against
the residuals from the oceanographic model to determine which factors cause
some countries to be above and some below. I.E as trophic level increase the
residuals become increasingly negative.

2.   Ideally ( and I am unsure how or if it is possible) work out for
each country which variables/s cause the poor fit of that country to the
oceanographic model.

 

Thanks in advance for any suggestions / possible methods.

 

Chris 

 

P.S - Below is the type of conclusions I am drawing

 

There are a number of reasons why some countries have higher / lower catch
than you would expect.

 

For example if the target fishery is a high trophic level species then the
link between primary productivity and catch will be lesser than if the
species was a lower trophic level ( transfer efficiency etc etc)- resulting
in a negative residual.  Alternatively it maybe that the area is being
overfished i.e. the north sea meaning more fish are being caught in that
region than it can sustain - resulting in a high positive residual (as
predicted by the model) 

 

In reality it is likely a combination of this plus other, however some
factors will be relevant to others i.e. Somalia has a really low catch
compared to its productivity likely due to piracy and poor reporting of
statistics. 

 

 

 


[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology