Re: [R] using a noisy variable in regression (not an R question)
One can "just include it in the regression", but the potential problems for interpretation are surely greater than those indicated. Inclusion of X1 = T1+E1 may cause X2 to appear significant when in fact it is having no effect at all. Or the true effect can be reversed in sign. This happens because X1 and X2 are correlated. Maybe this is implicit in what Jon is saying. See Carroll, Ruppert and Stefanski: Measurement Error in Nonlinear Models (2004, pp.52-55). The error in E1 may need to be fairly large relative to SD(T1) for this to be an issue. My notes at http://www.maths.anu.edu.au/%7Ejohnm/r-book/2edn/xtras/xtras.pdf have brief comments, and code that can be used to illustrate the point. I support Stephen Kolassa's suggestions re using simulation for sensitivity analysis, though I think this can also be done analytically. John Maindonald email: john.maindon...@anu.edu.au phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. On 08/03/2009, at 10:00 PM, r-help-requ...@r-project.org wrote: > From: Jonathan Baron > Date: 8 March 2009 5:21:55 AM > To: Juliet Hannah > Cc: r-help@r-project.org > Subject: Re: [R] using a noisy variable in regression (not an R > question) > > > If you form categories, you add even more error, specifically, the > variation in the distance between each number and the category > boundary. > > What's wrong with just including it in the regression? > > Yes, the measure X1 will account for less variance than the underlying > variable of real interest (T1, each individual's mean, perhaps), but > X1 could still be useful in two ways. One, it might be a significant > predictor of the dependent variable Y despite the error. Two, it > might increase the sensitivity of the model to other predictors (X2, > X3...) by accounting for what would otherwise be error. > > What you cannot conclude in this case (when you measure a predictor > with error) is that the effect of (say) X2 is not accounted for by its > correlation with T1. Some people try to conclude this when X2 remains > a significant predictor of Y when X1 is included in the model. The > trouble is that X1 is an error-prone measure of T1, so the full effect > of T1 is not removed by inclusion of X1. > > Jon > > On 03/07/09 12:49, Juliet Hannah wrote: >> Hi, This is not an R question, but I've seen opinions given on non R >> topics, so I wanted >> to give it a try. :) >> >> How would one treat a variable that was measured once, but is known >> to >> fluctuate a lot? >> For example, I want to include a hormone in my regression as an >> explanatory variable. However, this >> hormone varies in its levels throughout a day. Nevertheless, its >> levels differ >> substantially between individuals so that there is information >> there to use. >> >> One simple thing to try would be to form categories, but I assume >> there are better ways to handle this. Has anyone worked with such >> data, or could >> anyone suggest some keywords that may be helpful in searching for >> this >> topic. Thanks >> for your input. >> >> Regards, >> >> Juliet >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- > Jonathan Baron, Professor of Psychology, University of Pennsylvania > Home page: http://www.sas.upenn.edu/~baron > Editor: Judgment and Decision Making (http://journal.sjdm.org) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using a noisy variable in regression (not an R question)
On Sat, 7 Mar 2009, Juliet Hannah wrote: Hi, This is not an R question, but I've seen opinions given on non R topics, so I wanted to give it a try. :) How would one treat a variable that was measured once, but is known to fluctuate a lot? For example, I want to include a hormone in my regression as an explanatory variable. However, this hormone varies in its levels throughout a day. Nevertheless, its levels differ substantially between individuals so that there is information there to use. One simple thing to try would be to form categories, but I assume there are better ways to handle this. Has anyone worked with such data, or could anyone suggest some keywords that may be helpful in searching for this Try: correction for attenuation measurement error models errors-in-variables Wayne Fuller LA Stefanski and RJ Carroll William Cochran HTH, Chuck topic. Thanks for your input. Regards, Juliet __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using a noisy variable in regression (not an R question)
On Sat, Mar 7, 2009 at 11:49 AM, Juliet Hannah wrote: > Hi, This is not an R question, but I've seen opinions given on non R > topics, so I wanted > to give it a try. :) > > How would one treat a variable that was measured once, but is known to > fluctuate a lot? > For example, I want to include a hormone in my regression as an > explanatory variable. However, this > hormone varies in its levels throughout a day. Nevertheless, its levels differ > substantially between individuals so that there is information there to use. > > One simple thing to try would be to form categories, but I assume > there are better ways to handle this. Has anyone worked with such data, or > could > anyone suggest some keywords that may be helpful in searching for this > topic. Thanks > for your input. > >From teaching econometrics, I remember that if the "truth" is y=b0+b1x1+noise and then you do not have a correct measure of x1, but rather something else like ex1=x1+noise, then the regression estimate of b1 is biased, generally attenuated. As far as I understand it, the technical solutions are not too encouraging You can try to get better data or possibly to build an instrumental variables model, where you could have other predictors of the "true" value of x1 in a first stage model. I don't recall that I was able to persuade myself that approach really solves anything, but many people recommend it. I suppose a key question is whether you can persuade your audience that ex1= x1+noise and whether that noise is well behaved. As I was considering your problem, I was wondering if there might not be a "mixed model" approach to this problem. You hypothesize the truth is y=b0+b1x1+noise, but you don't have x1. So suppose you reconsider the "truth" as a random parameter, as in y=b0+c1*ex1+noise. ex1 is a fixed estimate of the hormone level for each observation. c1 is a random, varying coefficient because the effect of the hormone fluctuates in an unmeasurable way. Then you could try to estimate the distribution of c1. You have an interesting problem, I think. pj -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using a noisy variable in regression (not an R question)
Hi Juliet, Juliet Hannah schrieb: I should have emphasized, I do not intend to categorize -- mainly because of all the discussions I have seen on R-help arguing against this. Sorry that we all jumped on this ;-) I just thought it would be problematic to include the variable by itself. Take other variables, such as a genotype or BMI. If we measure this variable the next day, it would be the same. However, a hormone's level would not be the same. I thought this error must be accounted for somehow. You are quite correct that fluctuating hormone levels are a problem (although, strictly speaking, measuring BMI and even genotyping will not yield exactly the same results the next day, measurement error is always present). And there may be methods dealing with this, but I don't know of any. If you have any idea about the variability of your hormone, you could always take your data, perturb the hormone levels and run the analysis again to get a feeling for the stability of your results. This is quite ad hoc, but if I were the reviewer, a perturbation analysis like this would greatly reassure me. However, I recently worked with hormones and had exactly your problem, and we couldn't find any published data on day-to-day variability, so this was not an option - we finally went ahead and simply plugged the measurements into R. Good luck! Stephan Thanks again! Regards, Juliet On Sat, Mar 7, 2009 at 1:21 PM, Jonathan Baron wrote: If you form categories, you add even more error, specifically, the variation in the distance between each number and the category boundary. What's wrong with just including it in the regression? Yes, the measure X1 will account for less variance than the underlying variable of real interest (T1, each individual's mean, perhaps), but X1 could still be useful in two ways. One, it might be a significant predictor of the dependent variable Y despite the error. Two, it might increase the sensitivity of the model to other predictors (X2, X3...) by accounting for what would otherwise be error. What you cannot conclude in this case (when you measure a predictor with error) is that the effect of (say) X2 is not accounted for by its correlation with T1. Some people try to conclude this when X2 remains a significant predictor of Y when X1 is included in the model. The trouble is that X1 is an error-prone measure of T1, so the full effect of T1 is not removed by inclusion of X1. Jon On 03/07/09 12:49, Juliet Hannah wrote: Hi, This is not an R question, but I've seen opinions given on non R topics, so I wanted to give it a try. :) How would one treat a variable that was measured once, but is known to fluctuate a lot? For example, I want to include a hormone in my regression as an explanatory variable. However, this hormone varies in its levels throughout a day. Nevertheless, its levels differ substantially between individuals so that there is information there to use. One simple thing to try would be to form categories, but I assume there are better ways to handle this. Has anyone worked with such data, or could anyone suggest some keywords that may be helpful in searching for this topic. Thanks for your input. Regards, Juliet __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron Editor: Judgment and Decision Making (http://journal.sjdm.org) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using a noisy variable in regression (not an R question)
Thank you for your responses. I should have emphasized, I do not intend to categorize -- mainly because of all the discussions I have seen on R-help arguing against this. I just thought it would be problematic to include the variable by itself. Take other variables, such as a genotype or BMI. If we measure this variable the next day, it would be the same. However, a hormone's level would not be the same. I thought this error must be accounted for somehow. Thanks again! Regards, Juliet On Sat, Mar 7, 2009 at 1:21 PM, Jonathan Baron wrote: > If you form categories, you add even more error, specifically, the > variation in the distance between each number and the category > boundary. > > What's wrong with just including it in the regression? > > Yes, the measure X1 will account for less variance than the underlying > variable of real interest (T1, each individual's mean, perhaps), but > X1 could still be useful in two ways. One, it might be a significant > predictor of the dependent variable Y despite the error. Two, it > might increase the sensitivity of the model to other predictors (X2, > X3...) by accounting for what would otherwise be error. > > What you cannot conclude in this case (when you measure a predictor > with error) is that the effect of (say) X2 is not accounted for by its > correlation with T1. Some people try to conclude this when X2 remains > a significant predictor of Y when X1 is included in the model. The > trouble is that X1 is an error-prone measure of T1, so the full effect > of T1 is not removed by inclusion of X1. > > Jon > > On 03/07/09 12:49, Juliet Hannah wrote: >> Hi, This is not an R question, but I've seen opinions given on non R >> topics, so I wanted >> to give it a try. :) >> >> How would one treat a variable that was measured once, but is known to >> fluctuate a lot? >> For example, I want to include a hormone in my regression as an >> explanatory variable. However, this >> hormone varies in its levels throughout a day. Nevertheless, its levels >> differ >> substantially between individuals so that there is information there to use. >> >> One simple thing to try would be to form categories, but I assume >> there are better ways to handle this. Has anyone worked with such data, or >> could >> anyone suggest some keywords that may be helpful in searching for this >> topic. Thanks >> for your input. >> >> Regards, >> >> Juliet >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- > Jonathan Baron, Professor of Psychology, University of Pennsylvania > Home page: http://www.sas.upenn.edu/~baron > Editor: Judgment and Decision Making (http://journal.sjdm.org) > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using a noisy variable in regression (not an R question)
Hi Juliet, Juliet Hannah schrieb: One simple thing to try would be to form categories Simple but problematic. Frank Harrell put together a wonderful page detailing all the issues with categorizing continuous data: http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/CatContinuous So: keep your data continuous. Apart from that, I would second John's recommendation to try to get samples at the same point in time (and, if it is cortisol, stay away from smokers etc.). Best wishes Stephan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using a noisy variable in regression (not an R question)
If you form categories, you add even more error, specifically, the variation in the distance between each number and the category boundary. What's wrong with just including it in the regression? Yes, the measure X1 will account for less variance than the underlying variable of real interest (T1, each individual's mean, perhaps), but X1 could still be useful in two ways. One, it might be a significant predictor of the dependent variable Y despite the error. Two, it might increase the sensitivity of the model to other predictors (X2, X3...) by accounting for what would otherwise be error. What you cannot conclude in this case (when you measure a predictor with error) is that the effect of (say) X2 is not accounted for by its correlation with T1. Some people try to conclude this when X2 remains a significant predictor of Y when X1 is included in the model. The trouble is that X1 is an error-prone measure of T1, so the full effect of T1 is not removed by inclusion of X1. Jon On 03/07/09 12:49, Juliet Hannah wrote: > Hi, This is not an R question, but I've seen opinions given on non R > topics, so I wanted > to give it a try. :) > > How would one treat a variable that was measured once, but is known to > fluctuate a lot? > For example, I want to include a hormone in my regression as an > explanatory variable. However, this > hormone varies in its levels throughout a day. Nevertheless, its levels differ > substantially between individuals so that there is information there to use. > > One simple thing to try would be to form categories, but I assume > there are better ways to handle this. Has anyone worked with such data, or > could > anyone suggest some keywords that may be helpful in searching for this > topic. Thanks > for your input. > > Regards, > > Juliet > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron Editor: Judgment and Decision Making (http://journal.sjdm.org) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using a noisy variable in regression (not an R question)
Juliet, The answer is simple - add the measured value as an independent variable to the regression. There is no need to convert continuous values to categorical values. If there is a circadian rhythm to the hormone secretion (e.g. cortisol) I would try to get values at the same time of day for all study participants. Baring this, perhaps you could adjust both for the hormone concentration and the time of day the sample was obtained. John John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) >>> Juliet Hannah 3/7/2009 12:49 PM >>> Hi, This is not an R question, but I've seen opinions given on non R topics, so I wanted to give it a try. :) How would one treat a variable that was measured once, but is known to fluctuate a lot? For example, I want to include a hormone in my regression as an explanatory variable. However, this hormone varies in its levels throughout a day. Nevertheless, its levels differ substantially between individuals so that there is information there to use. One simple thing to try would be to form categories, but I assume there are better ways to handle this. Has anyone worked with such data, or could anyone suggest some keywords that may be helpful in searching for this topic. Thanks for your input. Regards, Juliet __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Confidentiality Statement: This email message, including any attachments, is for th...{{dropped:6}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using a noisy variable in regression (not an R question)
Hi, This is not an R question, but I've seen opinions given on non R topics, so I wanted to give it a try. :) How would one treat a variable that was measured once, but is known to fluctuate a lot? For example, I want to include a hormone in my regression as an explanatory variable. However, this hormone varies in its levels throughout a day. Nevertheless, its levels differ substantially between individuals so that there is information there to use. One simple thing to try would be to form categories, but I assume there are better ways to handle this. Has anyone worked with such data, or could anyone suggest some keywords that may be helpful in searching for this topic. Thanks for your input. Regards, Juliet __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.