Re: [R] random effects in mixed model not that 'random'

2009-12-13 Thread Daniel Malter
Hi, you are unlikely to (or lucky if you) get a response to your question
from the list. This is a question that you should ask your local
statistician with knowledge in stats and, optimally, your area of inquiry.
The list is (mostly) concerned with solving R rather than statistical
problems.

Best of luck,
Daniel

-
cuncta stricte discussurus
-
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Thomas Mang
Sent: Friday, December 11, 2009 6:19 PM
To: r-h...@stat.math.ethz.ch
Subject: [R] random effects in mixed model not that 'random'

Hi,

I have the following conceptual / interpretative question regarding 
random effects:

A mixed effects model was fit on biological data, with observations 
coming from different species. There is a clear overall effect of 
certain predictors (entering the model as fixed effect), but as 
different species react slightly differently, the predictor also enters 
the model as random effect and with species as grouping variable. The 
resulting model is very fine.

Now comes the tricky part however: I can inspect not only the variance 
parameter estimate for the random effect, but also the 'coefficients' 
for each species. If I do this, suppose I find out that they make 
biologically sense, and maybe actually more sense then they should:
For each species vast biological knowledge is available, regarding 
traits etc. So I can link the random effect coefficients to that 
knowledge, see the deviation from the generic predictor impact (the 
fixed effect) and relate it to the traits of my species.
However I see the following problem with that approach: If I have no 
knowledge of the species traits, or the species names are anonymous to 
me, it makes sense to treat the species-specific deviations as 
realizations of a random variable (principle of exchangeability). Once I 
know however the species used in the study and have the biological 
knowledge at hand, it does not make so much sense any more; I can 
predict whether for that particular species the generic predictor impact 
will be amplified, or not. That is, I can predict if more likely the 
draw from the assumed normal distribution of the random effects will be 
  0, or  0 - which is of course complete contradictory and nonsense if 
I assume I have a random draw from a N(0, sigma) distribution. 
Integrating the biological knowledge as fixed effect however might be 
tremendously difficult, as species traits can sometimes not readily be 
quantified in a numeric way.
I could defer issue to the species traits and say, once the species 
evolved their traits were drawn randomly from a population. This however 
causes problems with ideas of evolution and phylogenetic relationships 
among the species.

Maybe my question can be rephrased the following way:
Does it ever make sense to _interpret_ the coefficients of the random 
effects for each group and link it to properties of the grouping 
variable? The assumption of a realization of a random variable seems to 
render that quite problematic. However, this means that the more 
ignorant I am , and the less knowledge I have, the more the random 
realization seems to become realistic - which is at odds with scientific 
investigations.
Suppose the mixed model is one of the famous social sciences studies 
analysing pupil results on tests at different schools, with schools 
acting as grouping variable for a random effect intercept. If I have no 
knowledge about the schools, the random effect assumption makes sense. 
If I however investigate the schools in detail (either a priori or a 
posterior), say teaching quality of the teachers, socio-economic status 
of the school area etc, it will probably make sense to predict which 
ones will have pupils performing above average, and which below average. 
However then probably these factors leading me to the predictions should 
enter the model as fixed effects, and maybe I don't need and school 
random effect any more at all. But this means actually the school 
deviation from the global mean is not the realization of a random 
variable, but instead the result of something quite deterministic, but 
which is usually just unknown, or can only be measured with extreme, 
impractical efforts.  So the process might not be random, just because 
so little is known about the process, the results appear as if they 
would be randomly drawn (from a larger population distribution). Again, 
is ignorance / lack of deeper knowledge the key to using random effects 
- and the more knowledge I have, the less ?

many thanks,
Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__

Re: [R] random effects in mixed model not that 'random'

2009-12-13 Thread Thomas Mang

HI,

Thanks for your response; yes you are right it's not fully on topic, but 
I chose this list not only because I am using R for all my stats and so 
read it anyway, but also because here many statisticians read too.

Do you know another list where my question is more appropriate ?
For what it's worth, haven't found a local statistician yet to really 
answer the question, but I'll continue searching ...


thanks,
Thomas

On 12/13/2009 11:07 AM, Daniel Malter wrote:

Hi, you are unlikely to (or lucky if you) get a response to your question
from the list. This is a question that you should ask your local
statistician with knowledge in stats and, optimally, your area of inquiry.
The list is (mostly) concerned with solving R rather than statistical
problems.

Best of luck,
Daniel

-
cuncta stricte discussurus
-
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Thomas Mang
Sent: Friday, December 11, 2009 6:19 PM
To: r-h...@stat.math.ethz.ch
Subject: [R] random effects in mixed model not that 'random'

Hi,

I have the following conceptual / interpretative question regarding
random effects:

A mixed effects model was fit on biological data, with observations
coming from different species. There is a clear overall effect of
certain predictors (entering the model as fixed effect), but as
different species react slightly differently, the predictor also enters
the model as random effect and with species as grouping variable. The
resulting model is very fine.

Now comes the tricky part however: I can inspect not only the variance
parameter estimate for the random effect, but also the 'coefficients'
for each species. If I do this, suppose I find out that they make
biologically sense, and maybe actually more sense then they should:
For each species vast biological knowledge is available, regarding
traits etc. So I can link the random effect coefficients to that
knowledge, see the deviation from the generic predictor impact (the
fixed effect) and relate it to the traits of my species.
However I see the following problem with that approach: If I have no
knowledge of the species traits, or the species names are anonymous to
me, it makes sense to treat the species-specific deviations as
realizations of a random variable (principle of exchangeability). Once I
know however the species used in the study and have the biological
knowledge at hand, it does not make so much sense any more; I can
predict whether for that particular species the generic predictor impact
will be amplified, or not. That is, I can predict if more likely the
draw from the assumed normal distribution of the random effects will be
0, or  0 - which is of course complete contradictory and nonsense if
I assume I have a random draw from a N(0, sigma) distribution.
Integrating the biological knowledge as fixed effect however might be
tremendously difficult, as species traits can sometimes not readily be
quantified in a numeric way.
I could defer issue to the species traits and say, once the species
evolved their traits were drawn randomly from a population. This however
causes problems with ideas of evolution and phylogenetic relationships
among the species.

Maybe my question can be rephrased the following way:
Does it ever make sense to _interpret_ the coefficients of the random
effects for each group and link it to properties of the grouping
variable? The assumption of a realization of a random variable seems to
render that quite problematic. However, this means that the more
ignorant I am , and the less knowledge I have, the more the random
realization seems to become realistic - which is at odds with scientific
investigations.
Suppose the mixed model is one of the famous social sciences studies
analysing pupil results on tests at different schools, with schools
acting as grouping variable for a random effect intercept. If I have no
knowledge about the schools, the random effect assumption makes sense.
If I however investigate the schools in detail (either a priori or a
posterior), say teaching quality of the teachers, socio-economic status
of the school area etc, it will probably make sense to predict which
ones will have pupils performing above average, and which below average.
However then probably these factors leading me to the predictions should
enter the model as fixed effects, and maybe I don't need and school
random effect any more at all. But this means actually the school
deviation from the global mean is not the realization of a random
variable, but instead the result of something quite deterministic, but
which is usually just unknown, or can only be measured with extreme,
impractical efforts.  So the process might not be random, just because
so little is known about the process, the results appear as if they
would be randomly drawn (from a larger population distribution). Again,
is ignorance / lack of deeper knowledge the key 

Re: [R] random effects in mixed model not that 'random'

2009-12-13 Thread Robert A LaBudde
I think what you are finding is that calling a grouping variable a 
random effect is not the same thing as it actually being a random effect.


An effect is really only random when it is chosen randomly. Just 
because you don't want to deal with it as a fixed effect (e.g., too 
many levels) doesn't mean it qualifies as a random effect. This 
sloppiness in common in mixed modeling.


In your example of student scores, you mentioned the schools were a 
random effect, because they were a grouping variable. This is not 
true. Schools have a strong fixed effect. They are also not chosen 
randomly in your student.


How to resolve your problem? Two methods: 1) Stop modeling the 
grouping variable as a random effect, when it's not: Model it as a 
fixed effect; 2) Do the experiment right: a) List the schools in 
their population. b) Chose the schools to be used by random sampling 
from that population. Then you will find schools really is a random effect.


What you have discovered is called selection bias. It is common in 
unrandomized studies.



At 09:12 AM 12/13/2009, Thomas Mang wrote:

HI,

Thanks for your response; yes you are right it's not fully on topic, 
but I chose this list not only because I am using R for all my stats 
and so read it anyway, but also because here many statisticians read too.

Do you know another list where my question is more appropriate ?
For what it's worth, haven't found a local statistician yet to 
really answer the question, but I'll continue searching ...


thanks,
Thomas

On 12/13/2009 11:07 AM, Daniel Malter wrote:

Hi, you are unlikely to (or lucky if you) get a response to your question
from the list. This is a question that you should ask your local
statistician with knowledge in stats and, optimally, your area of inquiry.
The list is (mostly) concerned with solving R rather than statistical
problems.

Best of luck,
Daniel

-
cuncta stricte discussurus
-
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Thomas Mang
Sent: Friday, December 11, 2009 6:19 PM
To: r-h...@stat.math.ethz.ch
Subject: [R] random effects in mixed model not that 'random'

Hi,

I have the following conceptual / interpretative question regarding
random effects:

A mixed effects model was fit on biological data, with observations
coming from different species. There is a clear overall effect of
certain predictors (entering the model as fixed effect), but as
different species react slightly differently, the predictor also enters
the model as random effect and with species as grouping variable. The
resulting model is very fine.

Now comes the tricky part however: I can inspect not only the variance
parameter estimate for the random effect, but also the 'coefficients'
for each species. If I do this, suppose I find out that they make
biologically sense, and maybe actually more sense then they should:
For each species vast biological knowledge is available, regarding
traits etc. So I can link the random effect coefficients to that
knowledge, see the deviation from the generic predictor impact (the
fixed effect) and relate it to the traits of my species.
However I see the following problem with that approach: If I have no
knowledge of the species traits, or the species names are anonymous to
me, it makes sense to treat the species-specific deviations as
realizations of a random variable (principle of exchangeability). Once I
know however the species used in the study and have the biological
knowledge at hand, it does not make so much sense any more; I can
predict whether for that particular species the generic predictor impact
will be amplified, or not. That is, I can predict if more likely the
draw from the assumed normal distribution of the random effects will be
0, or  0 - which is of course complete contradictory and nonsense if
I assume I have a random draw from a N(0, sigma) distribution.
Integrating the biological knowledge as fixed effect however might be
tremendously difficult, as species traits can sometimes not readily be
quantified in a numeric way.
I could defer issue to the species traits and say, once the species
evolved their traits were drawn randomly from a population. This however
causes problems with ideas of evolution and phylogenetic relationships
among the species.

Maybe my question can be rephrased the following way:
Does it ever make sense to _interpret_ the coefficients of the random
effects for each group and link it to properties of the grouping
variable? The assumption of a realization of a random variable seems to
render that quite problematic. However, this means that the more
ignorant I am , and the less knowledge I have, the more the random
realization seems to become realistic - which is at odds with scientific
investigations.
Suppose the mixed model is one of the famous social sciences studies
analysing pupil results on tests at different schools, with schools

Re: [R] random effects in mixed model not that 'random'

2009-12-13 Thread Thomas Mang

HI,

Thanks for your input; see below

On 12/13/2009 4:41 PM, Robert A LaBudde wrote:

I think what you are finding is that calling a grouping variable a
random effect is not the same thing as it actually being a random effect.

An effect is really only random when it is chosen randomly. Just because
you don't want to deal with it as a fixed effect (e.g., too many levels)
doesn't mean it qualifies as a random effect. This sloppiness in common
in mixed modeling.


Well to some degree the species were chosen randomly, so there isn't a 
big selection bias in there. I also argue they wouldn't qualify as fixed 
effect (they might as stand-alone fixed effect factor, but definitely 
not as interaction with other predictors - there is no reason to believe 
the impact of predictors is totally independent across species).
Sample size isn't the problem; I truly wouldn't want to include them as 
fixed effect based on expert knowledge.




In your example of student scores, you mentioned the schools were a
random effect, because they were a grouping variable. This is not true.
Schools have a strong fixed effect. They are also not chosen randomly in
your student.

How to resolve your problem? Two methods: 1) Stop modeling the grouping
variable as a random effect, when it's not: Model it as a fixed effect;
2) Do the experiment right: a) List the schools in their population. b)
Chose the schools to be used by random sampling from that population.
Then you will find schools really is a random effect.


1) does not seem to be the right solution.
2) is more interesting in terms of understanding:
Are you saying that it's just the random choice of why something was 
included in the sample is what makes it qualify as random effect ? I 
thought the fact that it is the realization of a random variable (drawn 
from a N(0, sigma) distribution). These are two different things.


Suppose I list all the schools in the population and randomly pick 15. 
IIUC, you would argue now it qualifies as random effect. However, once I 
have chosen my schools I could still investigate the estimated random 
effects coefficients, a posteriori investigate the schools and try to 
find out what discriminates those with students above average from those 
below average. Odds are, if I had the resources to make a thorough 
investigation, I would find something - or in other words, because there 
is something deterministic behind it, I would have said they are not the 
random realization from a normal distribution - which was my 
understanding of properties of random effects so far, but which might be 
wrong and hence the problem (although due to the complexity of this 
deterministic process, they might practically appear as random 
realizations). If I would pick a 16. school and then apply my knowledge 
from the investigations, I could probably say if it will be above or 
below average - this is what, in my understanding of random effects, 
actually would not qualify it as random effect, whereas according to you 
it would, if the school was chosen randomly. Is that correct ?


Suppose I have chosen randomly: Does it make sense to investigate a 
posteriori why the estimates for the random effects are the way the are 
and find insights on the system, or would it not make sense as they are 
assumed complete random realization of a random variable and can be 
anything because they are random variable ?


To some degree I think the issue can also be seen the following way:
Conditional on my extensive knowledge of the school properties, the 
schools are probably not distributed iid. I could have this knowledge 
enter as fixed effect. But since this knowledge is usually not available 
the unconditional distribution might well make them iid N(0, sigma), and 
hence makes the schools qualify as grouping variable for random effects 
(where of course it is assumed that now sampling was done randomly from 
the population).
But what shall I do if I have a bit of the extensive knowledge available 
- maybe too much to sticking to the complete unconditional iid 
assumption, but also not enough for a sensible conditional distribution 
to allow the specification of a fixed effect ?


thanks
Thomas







What you have discovered is called selection bias. It is common in
unrandomized studies.


At 09:12 AM 12/13/2009, Thomas Mang wrote:

HI,

Thanks for your response; yes you are right it's not fully on topic,
but I chose this list not only because I am using R for all my stats
and so read it anyway, but also because here many statisticians read too.
Do you know another list where my question is more appropriate ?
For what it's worth, haven't found a local statistician yet to really
answer the question, but I'll continue searching ...

thanks,
Thomas

On 12/13/2009 11:07 AM, Daniel Malter wrote:

Hi, you are unlikely to (or lucky if you) get a response to your
question
from the list. This is a question that you should ask your local
statistician with knowledge in stats