Re: [R-sig-eco] AIC in R: back-transforming standardized model parameters (slopes)

2016-01-12 Thread Cade, Brian
Just to echo Bob O'Hara's comment and elaborate a bit more - Don't model
average the regression coefficients, especially if you are considering
models with and without interactions among the predictors. Follow the link
provided by Bob to Cade (2015.  Model averaging and muddled multimodel
inferences. Ecology 96:2370-2382) to see why model-averaged regression
coefficients as conventionally done following Burnham and Anderson provides
meaningless estimates of partial effects in the presence of
multicollinearity and addresses a concept that doesn't exist (model
uncertainty in regression coefficients) when there is no multicollinearity.
  What you perhaps really need to think about is why you want standardized
predictors. Some times they are useful and some times not.  Here you seem
to be going to a lot of trouble to standardize and then to get back to
unstandardized estimates (Drew and Phil have provided good advice about
recovering estimates in unstandardized scale) but without any indication of
how standardization is aiding your interpretations.  Note that in general,
when standardizing regression coefficients it would be more consistent with
the algebra of the regression coefficients to actually standardize them by
their partial standard deviations that adjusts for the linear relationship
(multicollinearity) among predictors (see the Cade 2015 paper to see why
this suggestion originally by Bring 1994 works).  This may be less relevant
to your simple one continuous predictor (time) model with an interaction
with two levels of a factor (air or ice), but it wasn't clear what other
models you might be considering.

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  ca...@usgs.gov 
tel:  970 226-9326


On Tue, Jan 12, 2016 at 8:16 AM, Bob O'Hara  wrote:

> On 12/01/16 10:54, Matt Perkins wrote:
>
>> Hi Drew and R-help,
>>
>> Many thanks for your email, and your explanations and suggestions have
>> been very helpful in aiding me to understand this problem. Should you, or
>> anyone else on the helplist have time, perhaps I may elaborate on my
>> problem with a little more detail.
>>
>> I appreciate the pragmatic suggestion to use a non-standardised model to
>> extract ‘real’ slope values that I can report, which is great to hear as I
>> had thought to do this, but wasn’t certain of its legitimacy, so extra
>> support in this direction gives me confidence. However, as I am actually
>> running a number of analyses this approach only works when AIC identifies a
>> clear single best model; in some of these analyses AIC finds good evidence
>> for multiple models in the top model set, such that I use model averaging
>> to produce averaged parameter estimates. Obviously with model averaged
>> parameters, it is not possible to run an alternate non-standardised model.
>> So I need to be able to take a model averaged slope parameter estimate from
>> the averaged model summary and back-transform it to a real slope value.
>>
> Oh well, that makes things much easier - you shouldn't use model averaged
> parameters. They're close to meaningless. :-) (see here, for example:
> http://onlinelibrary.wiley.com/doi/10.1890/14-1639.1/full).
>
> In your case, I don't think you need to worry at all about model
> selection: you are actually interested in the interaction, so you need it
> in the model. Beyond that, just follow Phil Dixon's advice.
>
> Bob
>
>
> The provided equation is useful:
>> z.time = (time – mean(time))/2*sd(time)
>> However, I’m still a little uncertain how best I should employ it, or how
>> I might relate it to a model averaged slope estimate.
>>
>> Here is a brief worked example of my confusion, (data at the bottom of
>> the page):
>>
>> My analysis
>> I would like to test if treatment (kept in air or ice) affects nitrogen
>> (N) within shrimps over time. I have repeated measures per shrimp (
>> unique.id) that I use as a random factor to account for non-independence
>> within an individual.
>>
>> My global model is a linear mixed model:
>> model1<-lmer( N ~ time* air.ice + (1|unique.id), data=shrimp, REML=FALSE)
>>
>> Standardisation:
>> So presumably, using the above equation I can standardise my “time”
>> variable to create a new standardised time variable for use in the analysis.
>> “time” is a continuous variable but with 5 different values (2, 30.5,
>> 58.5, 93, 120 hr) where the mean = 60.8 and SD  = 43.34 (time data below).
>> So for each value of time:
>> new standardized time value = (old time value – 60.8)/(2*43.34).
>>
>> This produces a new standardised time variable (also below).
>>
>> I’m not sure how air.ice (2 level factor air or ice) is being
>> standardised when I use the deafult R code with package "arm":
>> stdz.model1<-standardize(model1, standardize.y=FALSE).
>> I believe the default, which I use, leaves this binomial factor unscaled
>> (as a binomial is already scaled between 0 and 1?).
>>
>> Even if

Re: [R-sig-eco] AIC in R: back-transforming standardized model parameters (slopes)

2016-01-12 Thread Bob O'Hara

On 12/01/16 10:54, Matt Perkins wrote:

Hi Drew and R-help,

Many thanks for your email, and your explanations and suggestions have been 
very helpful in aiding me to understand this problem. Should you, or anyone 
else on the helplist have time, perhaps I may elaborate on my problem with a 
little more detail.

I appreciate the pragmatic suggestion to use a non-standardised model to 
extract ‘real’ slope values that I can report, which is great to hear as I had 
thought to do this, but wasn’t certain of its legitimacy, so extra support in 
this direction gives me confidence. However, as I am actually running a number 
of analyses this approach only works when AIC identifies a clear single best 
model; in some of these analyses AIC finds good evidence for multiple models in 
the top model set, such that I use model averaging to produce averaged 
parameter estimates. Obviously with model averaged parameters, it is not 
possible to run an alternate non-standardised model. So I need to be able to 
take a model averaged slope parameter estimate from the averaged model summary 
and back-transform it to a real slope value.
Oh well, that makes things much easier - you shouldn't use model 
averaged parameters. They're close to meaningless. :-) (see here, for 
example: http://onlinelibrary.wiley.com/doi/10.1890/14-1639.1/full).


In your case, I don't think you need to worry at all about model 
selection: you are actually interested in the interaction, so you need 
it in the model. Beyond that, just follow Phil Dixon's advice.


Bob



The provided equation is useful:
z.time = (time – mean(time))/2*sd(time)
However, I’m still a little uncertain how best I should employ it, or how I 
might relate it to a model averaged slope estimate.

Here is a brief worked example of my confusion, (data at the bottom of the 
page):

My analysis
I would like to test if treatment (kept in air or ice) affects nitrogen (N) 
within shrimps over time. I have repeated measures per shrimp (unique.id) that 
I use as a random factor to account for non-independence within an individual.

My global model is a linear mixed model:
model1<-lmer( N ~ time* air.ice + (1|unique.id), data=shrimp, REML=FALSE)

Standardisation:
So presumably, using the above equation I can standardise my “time” variable to 
create a new standardised time variable for use in the analysis.
“time” is a continuous variable but with 5 different values (2, 30.5, 58.5, 93, 
120 hr) where the mean = 60.8 and SD  = 43.34 (time data below). So for each 
value of time:
new standardized time value = (old time value – 60.8)/(2*43.34).

This produces a new standardised time variable (also below).

I’m not sure how air.ice (2 level factor air or ice) is being standardised when I use the 
deafult R code with package "arm":
stdz.model1<-standardize(model1, standardize.y=FALSE).
I believe the default, which I use, leaves this binomial factor unscaled (as a 
binomial is already scaled between 0 and 1?).

Even if this manual standardisation for time is correct, then I'm still unsure 
how I would back-transform my slope value. Slopes are supposed to be the value 
of increase in y per unit of x, so I assume standardised slope estimates 
represent change in y per standardised 1 unit on x axis.


So with all this uncertainty in mind:

AIC model selection using standardised models identifies two top models; the 
interaction model and the main effects model. I model average across these two 
standardised models.
I present the model summaries for these at the bottom of the page, and that of 
the model averaged model.
Strangely, and which I don’t understand, unstandardised model1 and 
unstandardised model2 slopes differ, while both stdz.model 1 and stdz.model2 
have the same slope values. The model averaged summary (based on average of 
stdz model1 and stdz mode2) has a slope value the same as the stdz models.


For simplicity of understanding my problem, I focus on model 1, rather than the 
model-averaged model.

Here are the slopes for different versions of Model 1:
Model 1 unstandardised  (0.008156)
model 1 standardised   (0.50782)
model 1 with my manually standardised  time variable (0.3979)

If I take the slope value from model 1 unstandardised (0.008156) I can 
obviously accurately predict the value of dN for any given time value. This is 
not the case for the bigger values in the two standardised models.

Clearly, as the slope of the model using my manually standardised time variable 
is different from the standardised model, my use of the above equation is not 
correct. Or my use is correct, but I am missing something else in the 
standardisation process.

Secondly, I am still unsure how I would apply the above equation to 
back-transform the standardised slope value (0.50782) so that I may get a value 
equivalent to my unstandardised ‘real’ slope estimate (0.008156).

Any suggestions for how I can back-transform a slope estimate from my model 
summary, or properly apply the equation ab

[R-sig-eco] Back transforming standardized model parameters when there is an interaction

2016-01-12 Thread Dixon, Philip M [STAT]
Matt,

The interaction term is why the usual transformation doesn't work, at least 
initially.

The R parameterization of factor effects makes it easy to fall into the trap of 
thinking that the coefficient for a main effect "means" something when there is 
an interaction in the model.  Sometimes yes, sometimes no.

The best way to untangle model effects when there are factors in the model, 
especially when a factor changes the slope (because the model includes 
air.ice:time) is to work out the regression model for each level of the factor 
(i.e. for air then for ice).

Using the unstandardized version to get illustrate:
air.ice, assuming you used the default contrasts, see options("contrasts"), has 
the value of 0 for air and 1 for ice
so the regression line for air is E Y = 12.52 + 0.0081 time - 0.8019 * 0 - 
0.00442*0*time =  12.52 + 0.0081 time
and that for ice is is E Y = 12.52 + 0.0081 time - 0.8019 * 1 - 0.00442*1*time 
= 11.71 + 0.0037 time

Two of the coefficients in the full model correspond to coefficients for a 
factor-specific regression line ONLY because the factor variable has the value 
0 for the baseline level.

If you only standardize time (your model 3), the same calculation works because 
you haven't changed the air.ice variable.
You should be able to take the coefficients, apply the backtransformation 
suggested by Drew to each regression line separately and get the same 
coefficients when time is the X variable.

It looks like standardize also centers (or centers and standardizes) the 
air.ice variable.  The standardized value for air is no longer 0 and that for 
ice is no longer 1.  Probably something like -0.5 and 0.5 (if centered only 
with equal sample sizes).  I will assume -0.5 for air and 0.5 for ice to 
illustrate.  That means the factor-specific regression lines are now:
for air: E Y = 12.48 + 0.508 z.time - 1.07 * (-0.5) - 0.38 * (-0.5)*z.time
for ice: E Y = 12.48 + 0.508 z.time - 1.07 * (0.5) - 0.38 * (0.5)*z.time

After standardizing, none of the model coefficients can be interpreted as a 
coefficient for some baseline level.  You have to unpack the factor effects.  
Note: you get similar behavior if you use sum-to-zero contrasts.  And, if you 
use contr.SAS(), the last factor level is used as the baseline group.

Aside: the different intercept is no surprise because it is EY at a different 
timevalue (the average time, not time=0) and the different slope is no surprise 
because one unit change in standardized time is different than one unit change 
in raw time.

So, I suggest you look at how the variables are coded by standardize().  
model.matrix() will give you this information.  Use that to calculate the 
coefficients for each regression line.

Best wishes,
Philip


[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] AIC in R: back-transforming standardized model parameters (slopes)

2016-01-12 Thread Matt Perkins

Hi Drew and R-help,

Many thanks for your email, and your explanations and suggestions have been 
very helpful in aiding me to understand this problem. Should you, or anyone 
else on the helplist have time, perhaps I may elaborate on my problem with a 
little more detail. 

I appreciate the pragmatic suggestion to use a non-standardised model to 
extract ‘real’ slope values that I can report, which is great to hear as I had 
thought to do this, but wasn’t certain of its legitimacy, so extra support in 
this direction gives me confidence. However, as I am actually running a number 
of analyses this approach only works when AIC identifies a clear single best 
model; in some of these analyses AIC finds good evidence for multiple models in 
the top model set, such that I use model averaging to produce averaged 
parameter estimates. Obviously with model averaged parameters, it is not 
possible to run an alternate non-standardised model. So I need to be able to 
take a model averaged slope parameter estimate from the averaged model summary 
and back-transform it to a real slope value.

The provided equation is useful:
z.time = (time – mean(time))/2*sd(time)
However, I’m still a little uncertain how best I should employ it, or how I 
might relate it to a model averaged slope estimate.

Here is a brief worked example of my confusion, (data at the bottom of the 
page):

My analysis
I would like to test if treatment (kept in air or ice) affects nitrogen (N) 
within shrimps over time. I have repeated measures per shrimp (unique.id) that 
I use as a random factor to account for non-independence within an individual.

My global model is a linear mixed model:
model1<-lmer( N ~ time* air.ice + (1|unique.id), data=shrimp, REML=FALSE)

Standardisation:
So presumably, using the above equation I can standardise my “time” variable to 
create a new standardised time variable for use in the analysis.
“time” is a continuous variable but with 5 different values (2, 30.5, 58.5, 93, 
120 hr) where the mean = 60.8 and SD  = 43.34 (time data below). So for each 
value of time: 
new standardized time value = (old time value – 60.8)/(2*43.34). 

This produces a new standardised time variable (also below). 

I’m not sure how air.ice (2 level factor air or ice) is being standardised when 
I use the deafult R code with package "arm": 
stdz.model1<-standardize(model1, standardize.y=FALSE). 
I believe the default, which I use, leaves this binomial factor unscaled (as a 
binomial is already scaled between 0 and 1?). 

Even if this manual standardisation for time is correct, then I'm still unsure 
how I would back-transform my slope value. Slopes are supposed to be the value 
of increase in y per unit of x, so I assume standardised slope estimates 
represent change in y per standardised 1 unit on x axis. 


So with all this uncertainty in mind:

AIC model selection using standardised models identifies two top models; the 
interaction model and the main effects model. I model average across these two 
standardised models.
I present the model summaries for these at the bottom of the page, and that of 
the model averaged model. 
Strangely, and which I don’t understand, unstandardised model1 and 
unstandardised model2 slopes differ, while both stdz.model 1 and stdz.model2 
have the same slope values. The model averaged summary (based on average of 
stdz model1 and stdz mode2) has a slope value the same as the stdz models.


For simplicity of understanding my problem, I focus on model 1, rather than the 
model-averaged model.

Here are the slopes for different versions of Model 1:
Model 1 unstandardised  (0.008156)  
model 1 standardised   (0.50782) 
model 1 with my manually standardised  time variable (0.3979)

If I take the slope value from model 1 unstandardised (0.008156) I can 
obviously accurately predict the value of dN for any given time value. This is 
not the case for the bigger values in the two standardised models.

Clearly, as the slope of the model using my manually standardised time variable 
is different from the standardised model, my use of the above equation is not 
correct. Or my use is correct, but I am missing something else in the 
standardisation process.

Secondly, I am still unsure how I would apply the above equation to 
back-transform the standardised slope value (0.50782) so that I may get a value 
equivalent to my unstandardised ‘real’ slope estimate (0.008156).

Any suggestions for how I can back-transform a slope estimate from my model 
summary, or properly apply the equation above to standardise my model, would be 
most appreciated.

thanks,

Matt




### Models, data and summary tables below

model1<-lmer( N ~ time* air.ice + (1|unique.id), data=shrimp, REML=FALSE)
model2<-lmer( N ~ time + air.ice + (1|unique.id), data=shrimp, REML=FALSE)
model3<-lmer( N ~ std.Time*air.ice + (1|unique.id), data=shrimp, REML=FALSE)
stdz.model1<-standardize(model1, standardize.y=FALSE)
stdz.model2<-standardize(model2, standardize.y