I am a novice at stats. I am working with a biological system. An
insect infests a certain plant part, the scales. To determine the
level of infestation I examine the WHOLE scale, inside and outside,
dissecting it under a microscope. Some portion of the infestation
occurs on the OUTSIDE of the scale. The outside can be quickly
examined in the field w/ the naked eye. I have a series of 245
observations; each includes the examination of both the WHOLE and
OUTSIDE of 100 scales (5 scales from each of 20 plants, all equivalent
aged on plant).

I want to correlate the OUTSIDE (easily observed) to the WHOLE (time
consuming). I want to develop a regression line (equation) where I
could in the field quickly observe the OUTSIDE and then express the
level of infestation as percent infested of the WHOLE.

The WHOLE variable is what I would call the true infestation level.
The OUTSIDE is some lesser part of that. Here are some questions?

Q1. Is it correct to call the WHOLE the independent variable, and
assign it as x on the regression graph? OUTSIDE would be the dependent
variable?

In my stat package, with linear regression I can choose the option
"fit constant" which I take is equivalent to "with constant" and "not
forced thru origin". Toggling this option changes various statistical
values.

                        Thru origin             Fit constant
Obs                     245                     245
Residuals               244                     243
Pearson correlation     0.9398                  0.8702
R sq.                   0.8832                  0.7573
Adj. R sq.              0.8828                  0.7563
Resid. Mean Square      147.636                 82.9497
S.D.                    12.4506                 9.10767
Stand. Error(OUTSIDE)   0.06011                 0.06812


Q2. Do I force thru origin. Biologically is that appropriate? With
this plant/insect system if WHOLE is zero, then OUTSIDE is zero. If
Whole >zero OUTSIDE can be zero. In fact, at the lower infestation
levels WHOLE needs to get to values of >= 5% before OUTSIDE
infestation starts to consistently show. The fit of the line looks
better to me w/o going thru zero.

Q3. In deciding to go thru origin or not is there a value I look to
minimize or maximize, such as correlation or Resid. Mean Square? Which
is more important?

Q4. The end use of this is to go out to the field, observe the OUTSIDE
and use that to predict WHOLE via a regression equation or graph. I
make my field observations (say count 100 scales), find my OUTSIDE
value on the y-axis, run across to the regression line and drop down
to the x-axis WHOLE value. Any problem with that setup?

Thanks,

DaveM
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to