Re: [R] using a noisy variable in regression (not an R question)

2009-03-08 Thread John Maindonald
One can "just include it in the regression", but the potential problems
for interpretation are surely greater than those indicated.  Inclusion  
of
X1 = T1+E1 may cause X2 to appear significant when in fact it is having
no effect at all.  Or the true effect can be reversed in sign.  This  
happens
because X1 and X2 are correlated.  Maybe this is implicit in what Jon
is saying.

See Carroll, Ruppert and Stefanski:
Measurement Error in Nonlinear Models (2004, pp.52-55).  The error in E1
may need to be fairly large relative to SD(T1) for this to be an  
issue.  My notes
at http://www.maths.anu.edu.au/%7Ejohnm/r-book/2edn/xtras/xtras.pdf
have brief comments, and code that can be used to illustrate the point.

I support Stephen Kolassa's suggestions re using simulation for
sensitivity analysis, though I think this can also be done analytically.

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 08/03/2009, at 10:00 PM, r-help-requ...@r-project.org wrote:

> From: Jonathan Baron 
> Date: 8 March 2009 5:21:55 AM
> To: Juliet Hannah 
> Cc: r-help@r-project.org
> Subject: Re: [R] using a noisy variable in regression (not an R  
> question)
>
>
> If you form categories, you add even more error, specifically, the
> variation in the distance between each number and the category
> boundary.
>
> What's wrong with just including it in the regression?
>
> Yes, the measure X1 will account for less variance than the underlying
> variable of real interest (T1, each individual's mean, perhaps), but
> X1 could still be useful in two ways.  One, it might be a significant
> predictor of the dependent variable Y despite the error.  Two, it
> might increase the sensitivity of the model to other predictors (X2,
> X3...) by accounting for what would otherwise be error.
>
> What you cannot conclude in this case (when you measure a predictor
> with error) is that the effect of (say) X2 is not accounted for by its
> correlation with T1.  Some people try to conclude this when X2 remains
> a significant predictor of Y when X1 is included in the model.  The
> trouble is that X1 is an error-prone measure of T1, so the full effect
> of T1 is not removed by inclusion of X1.
>
> Jon
>
> On 03/07/09 12:49, Juliet Hannah wrote:
>> Hi, This is not an R question, but I've seen opinions given on non R
>> topics, so I wanted
>> to give it a try. :)
>>
>> How would one treat a variable that was measured once, but is known  
>> to
>> fluctuate a lot?
>> For example, I want to include a hormone in my regression as an
>> explanatory variable. However, this
>> hormone varies in its levels throughout a day. Nevertheless, its  
>> levels differ
>> substantially between individuals so that there is information  
>> there to use.
>>
>> One simple thing to try would be to form categories, but I assume
>> there are better ways to handle this. Has anyone worked with such  
>> data, or could
>> anyone suggest some keywords that may be helpful in searching for  
>> this
>> topic. Thanks
>> for your input.
>>
>> Regards,
>>
>> Juliet
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> -- 
> Jonathan Baron, Professor of Psychology, University of Pennsylvania
> Home page: http://www.sas.upenn.edu/~baron
> Editor: Judgment and Decision Making (http://journal.sjdm.org)


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using a noisy variable in regression (not an R question)

2009-03-07 Thread Charles C. Berry

On Sat, 7 Mar 2009, Juliet Hannah wrote:


Hi, This is not an R question, but I've seen opinions given on non R
topics, so I wanted
to give it a try. :)

How would one treat a variable that was measured once, but is known to
fluctuate a lot?
For example, I want to include a hormone in my regression as an
explanatory variable. However, this
hormone varies in its levels throughout a day. Nevertheless, its levels differ
substantially between individuals so that there is information there to use.

One simple thing to try would be to form categories, but I assume
there are better ways to handle this. Has anyone worked with such data, or could
anyone suggest some keywords that may be helpful in searching for this



Try:

correction for attenuation

measurement error models

errors-in-variables

Wayne Fuller

LA Stefanski and RJ Carroll

William Cochran

HTH,

Chuck


topic. Thanks
for your input.

Regards,

Juliet

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using a noisy variable in regression (not an R question)

2009-03-07 Thread Paul Johnson
On Sat, Mar 7, 2009 at 11:49 AM, Juliet Hannah  wrote:
> Hi, This is not an R question, but I've seen opinions given on non R
> topics, so I wanted
> to give it a try. :)
>
> How would one treat a variable that was measured once, but is known to
> fluctuate a lot?
> For example, I want to include a hormone in my regression as an
> explanatory variable. However, this
> hormone varies in its levels throughout a day. Nevertheless, its levels differ
> substantially between individuals so that there is information there to use.
>
> One simple thing to try would be to form categories, but I assume
> there are better ways to handle this. Has anyone worked with such data, or 
> could
> anyone suggest some keywords that may be helpful in searching for this
> topic. Thanks
> for your input.
>

>From teaching econometrics, I remember that if the "truth" is
y=b0+b1x1+noise and then you do not have a correct measure of x1, but
rather something else like ex1=x1+noise, then the regression estimate
of b1 is biased, generally attenuated.  As far as I understand it, the
technical solutions are not too encouraging You can try to get better
data or possibly to  build an instrumental variables model, where you
could have other predictors of the "true" value of x1 in a first stage
model.  I don't recall that I was able to persuade myself that
approach really solves anything, but many people recommend it. I
suppose a key question is whether you can persuade your audience that
ex1= x1+noise and whether that noise is well behaved.

As I was considering your problem, I was wondering if there might not
be a "mixed model" approach to this problem.  You hypothesize the
truth is y=b0+b1x1+noise, but you don't have x1.  So suppose you
reconsider the "truth" as a random parameter, as in y=b0+c1*ex1+noise.
ex1 is a fixed estimate of the hormone level for each observation.  c1
is a random, varying coefficient because the effect of the hormone
fluctuates in an unmeasurable way. Then you could try to estimate the
distribution of c1.

You have an interesting problem, I think.

pj
-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using a noisy variable in regression (not an R question)

2009-03-07 Thread Stephan Kolassa

Hi Juliet,

Juliet Hannah schrieb:


I should have emphasized, I do not intend to categorize -- mainly
because of all the discussions I have seen on R-help arguing against
this.


Sorry that we all jumped on this ;-)


I just thought it would be problematic to include the variable by
itself. Take other variables, such as a genotype or BMI. If we measure
this variable the next day, it would be the same. However, a hormone's
level would not be the same. I thought this error must be accounted
for somehow.


You are quite correct that fluctuating hormone levels are a problem 
(although, strictly speaking, measuring BMI and even genotyping will not 
yield exactly the same results the next day, measurement error is always 
present). And there may be methods dealing with this, but I don't know 
of any.


If you have any idea about the variability of your hormone, you could 
always take your data, perturb the hormone levels and run the analysis 
again to get a feeling for the stability of your results. This is quite 
ad hoc, but if I were the reviewer, a perturbation analysis like this 
would greatly reassure me. However, I recently worked with hormones and 
had exactly your problem, and we couldn't find any published data on 
day-to-day variability, so this was not an option - we finally went 
ahead and simply plugged the measurements into R.


Good luck!
Stephan



Thanks again!

Regards,

Juliet

On Sat, Mar 7, 2009 at 1:21 PM, Jonathan Baron  wrote:

If you form categories, you add even more error, specifically, the
variation in the distance between each number and the category
boundary.

What's wrong with just including it in the regression?

Yes, the measure X1 will account for less variance than the underlying
variable of real interest (T1, each individual's mean, perhaps), but
X1 could still be useful in two ways.  One, it might be a significant
predictor of the dependent variable Y despite the error.  Two, it
might increase the sensitivity of the model to other predictors (X2,
X3...) by accounting for what would otherwise be error.

What you cannot conclude in this case (when you measure a predictor
with error) is that the effect of (say) X2 is not accounted for by its
correlation with T1.  Some people try to conclude this when X2 remains
a significant predictor of Y when X1 is included in the model.  The
trouble is that X1 is an error-prone measure of T1, so the full effect
of T1 is not removed by inclusion of X1.

Jon

On 03/07/09 12:49, Juliet Hannah wrote:

Hi, This is not an R question, but I've seen opinions given on non R
topics, so I wanted
to give it a try. :)

How would one treat a variable that was measured once, but is known to
fluctuate a lot?
For example, I want to include a hormone in my regression as an
explanatory variable. However, this
hormone varies in its levels throughout a day. Nevertheless, its levels differ
substantially between individuals so that there is information there to use.

One simple thing to try would be to form categories, but I assume
there are better ways to handle this. Has anyone worked with such data, or could
anyone suggest some keywords that may be helpful in searching for this
topic. Thanks
for your input.

Regards,

Juliet

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron
Editor: Judgment and Decision Making (http://journal.sjdm.org)





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using a noisy variable in regression (not an R question)

2009-03-07 Thread Juliet Hannah
Thank you for your responses.

I should have emphasized, I do not intend to categorize -- mainly
because of all the discussions I have seen on R-help arguing against
this.

I just thought it would be problematic to include the variable by
itself. Take other variables, such as a genotype or BMI. If we measure
this variable the next day, it would be the same. However, a hormone's
level would not be the same. I thought this error must be accounted
for somehow.

Thanks again!

Regards,

Juliet

On Sat, Mar 7, 2009 at 1:21 PM, Jonathan Baron  wrote:
> If you form categories, you add even more error, specifically, the
> variation in the distance between each number and the category
> boundary.
>
> What's wrong with just including it in the regression?
>
> Yes, the measure X1 will account for less variance than the underlying
> variable of real interest (T1, each individual's mean, perhaps), but
> X1 could still be useful in two ways.  One, it might be a significant
> predictor of the dependent variable Y despite the error.  Two, it
> might increase the sensitivity of the model to other predictors (X2,
> X3...) by accounting for what would otherwise be error.
>
> What you cannot conclude in this case (when you measure a predictor
> with error) is that the effect of (say) X2 is not accounted for by its
> correlation with T1.  Some people try to conclude this when X2 remains
> a significant predictor of Y when X1 is included in the model.  The
> trouble is that X1 is an error-prone measure of T1, so the full effect
> of T1 is not removed by inclusion of X1.
>
> Jon
>
> On 03/07/09 12:49, Juliet Hannah wrote:
>> Hi, This is not an R question, but I've seen opinions given on non R
>> topics, so I wanted
>> to give it a try. :)
>>
>> How would one treat a variable that was measured once, but is known to
>> fluctuate a lot?
>> For example, I want to include a hormone in my regression as an
>> explanatory variable. However, this
>> hormone varies in its levels throughout a day. Nevertheless, its levels 
>> differ
>> substantially between individuals so that there is information there to use.
>>
>> One simple thing to try would be to form categories, but I assume
>> there are better ways to handle this. Has anyone worked with such data, or 
>> could
>> anyone suggest some keywords that may be helpful in searching for this
>> topic. Thanks
>> for your input.
>>
>> Regards,
>>
>> Juliet
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Jonathan Baron, Professor of Psychology, University of Pennsylvania
> Home page: http://www.sas.upenn.edu/~baron
> Editor: Judgment and Decision Making (http://journal.sjdm.org)
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using a noisy variable in regression (not an R question)

2009-03-07 Thread Stephan Kolassa

Hi Juliet,

Juliet Hannah schrieb:


One simple thing to try would be to form categories



Simple but problematic. Frank Harrell put together a wonderful page 
detailing all the issues with categorizing continuous data:

http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/CatContinuous

So: keep your data continuous.

Apart from that, I would second John's recommendation to try to get 
samples at the same point in time (and, if it is cortisol, stay away 
from smokers etc.).


Best wishes
Stephan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using a noisy variable in regression (not an R question)

2009-03-07 Thread Jonathan Baron
If you form categories, you add even more error, specifically, the
variation in the distance between each number and the category
boundary.

What's wrong with just including it in the regression?

Yes, the measure X1 will account for less variance than the underlying
variable of real interest (T1, each individual's mean, perhaps), but
X1 could still be useful in two ways.  One, it might be a significant
predictor of the dependent variable Y despite the error.  Two, it
might increase the sensitivity of the model to other predictors (X2,
X3...) by accounting for what would otherwise be error.

What you cannot conclude in this case (when you measure a predictor
with error) is that the effect of (say) X2 is not accounted for by its
correlation with T1.  Some people try to conclude this when X2 remains
a significant predictor of Y when X1 is included in the model.  The
trouble is that X1 is an error-prone measure of T1, so the full effect
of T1 is not removed by inclusion of X1.

Jon

On 03/07/09 12:49, Juliet Hannah wrote:
> Hi, This is not an R question, but I've seen opinions given on non R
> topics, so I wanted
> to give it a try. :)
> 
> How would one treat a variable that was measured once, but is known to
> fluctuate a lot?
> For example, I want to include a hormone in my regression as an
> explanatory variable. However, this
> hormone varies in its levels throughout a day. Nevertheless, its levels differ
> substantially between individuals so that there is information there to use.
> 
> One simple thing to try would be to form categories, but I assume
> there are better ways to handle this. Has anyone worked with such data, or 
> could
> anyone suggest some keywords that may be helpful in searching for this
> topic. Thanks
> for your input.
> 
> Regards,
> 
> Juliet
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron
Editor: Judgment and Decision Making (http://journal.sjdm.org)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using a noisy variable in regression (not an R question)

2009-03-07 Thread John Sorkin
Juliet,
The answer is simple - add the measured value as an independent variable to the 
regression. There is no need to convert continuous values to categorical 
values. If there is a circadian rhythm to the hormone secretion (e.g. cortisol) 
I would try to get values at the same time of day for all study participants. 
Baring this, perhaps you could adjust both for the hormone concentration and 
the time of day the sample was obtained.  
John

John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

>>> Juliet Hannah  3/7/2009 12:49 PM >>>
Hi, This is not an R question, but I've seen opinions given on non R
topics, so I wanted
to give it a try. :)

How would one treat a variable that was measured once, but is known to
fluctuate a lot?
For example, I want to include a hormone in my regression as an
explanatory variable. However, this
hormone varies in its levels throughout a day. Nevertheless, its levels differ
substantially between individuals so that there is information there to use.

One simple thing to try would be to form categories, but I assume
there are better ways to handle this. Has anyone worked with such data, or could
anyone suggest some keywords that may be helpful in searching for this
topic. Thanks
for your input.

Regards,

Juliet

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using a noisy variable in regression (not an R question)

2009-03-07 Thread Juliet Hannah
Hi, This is not an R question, but I've seen opinions given on non R
topics, so I wanted
to give it a try. :)

How would one treat a variable that was measured once, but is known to
fluctuate a lot?
For example, I want to include a hormone in my regression as an
explanatory variable. However, this
hormone varies in its levels throughout a day. Nevertheless, its levels differ
substantially between individuals so that there is information there to use.

One simple thing to try would be to form categories, but I assume
there are better ways to handle this. Has anyone worked with such data, or could
anyone suggest some keywords that may be helpful in searching for this
topic. Thanks
for your input.

Regards,

Juliet

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.