On 21 Feb 2003 10:30:08 -0800, [EMAIL PROTECTED] (DaveM) wrote: > I am a novice at stats. I am working with a biological system. An > insect infests a certain plant part, the scales. To determine the > level of infestation I examine the WHOLE scale, inside and outside, > dissecting it under a microscope. Some portion of the infestation > occurs on the OUTSIDE of the scale. The outside can be quickly > examined in the field w/ the naked eye. I have a series of 245 > observations; each includes the examination of both the WHOLE and > OUTSIDE of 100 scales (5 scales from each of 20 plants, all equivalent > aged on plant). > > I want to correlate the OUTSIDE (easily observed) to the WHOLE (time > consuming). I want to develop a regression line (equation) where I > could in the field quickly observe the OUTSIDE and then express the > level of infestation as percent infested of the WHOLE. > > The WHOLE variable is what I would call the true infestation level. > The OUTSIDE is some lesser part of that. Here are some questions? > > Q1. Is it correct to call the WHOLE the independent variable, and > assign it as x on the regression graph? OUTSIDE would be the dependent > variable?
I am not going to try to answer everything, but I do have a couple of comments and suggestions. 1) Don't confuse things by predicting <whole> from <part> or in this case, WHOLE from OUTSIDE. Re-frame the data, for statistical purposes, as <whole-minus-part> in order to get rid of the automatic part of the correlation. - Is there still a correlation between OUTSIDE and the rest? - you have nothing to work from if there is not. 2) Don't do a prediction that is forced through the origin [ unless you are a professional statistician, or you are following a totally explicit recommendation and model] . For one thing, it undermines everything that you read about regression in any general source, including p-values. Look in my stats-FAQ for some comments [also under 'negative R-squared', I think], and use groups.google on the sci.stat.* newsgroups. 3) Is it going to be desirable to have predictions of "zero"? You might want to do a separate consideration of what it means to see zero on the outside -- where you MIGHT predict zero as total. As a separate regression, then, you might predict for cases with non-zero on the outside. > > In my stat package, with linear regression I can choose the option > "fit constant" which I take is equivalent to "with constant" and "not > forced thru origin". Toggling this option changes various statistical > values. > > Thru origin Fit constant [ ...] > Pearson correlation 0.9398 0.8702 - those are not commensurable. Different bases for SS. > Resid. Mean Square 147.636 82.9497 > S.D. 12.4506 9.10767 - these report on the deviations around the predicted values. Notice that this FIT is much better with "constant" than "origin" [snip, rest] -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
