On 21 Feb 2003 10:30:08 -0800, [EMAIL PROTECTED] (DaveM) wrote:

> I am a novice at stats. I am working with a biological system. An
> insect infests a certain plant part, the scales. To determine the
> level of infestation I examine the WHOLE scale, inside and outside,
> dissecting it under a microscope. Some portion of the infestation
> occurs on the OUTSIDE of the scale. The outside can be quickly
> examined in the field w/ the naked eye. I have a series of 245
> observations; each includes the examination of both the WHOLE and
> OUTSIDE of 100 scales (5 scales from each of 20 plants, all equivalent
> aged on plant).
> 
> I want to correlate the OUTSIDE (easily observed) to the WHOLE (time
> consuming). I want to develop a regression line (equation) where I
> could in the field quickly observe the OUTSIDE and then express the
> level of infestation as percent infested of the WHOLE.
> 
> The WHOLE variable is what I would call the true infestation level.
> The OUTSIDE is some lesser part of that. Here are some questions?
> 
> Q1. Is it correct to call the WHOLE the independent variable, and
> assign it as x on the regression graph? OUTSIDE would be the dependent
> variable?

I am not going to try to answer everything, but I do have
a couple of comments and suggestions.

 1) Don't confuse things by predicting <whole> from <part>
or in this case, WHOLE  from OUTSIDE.  Re-frame the 
data, for statistical purposes,  as <whole-minus-part>  in
order to get rid of the automatic part of the correlation.
- Is there still a correlation between OUTSIDE  and the
rest? - you have nothing to work from if there is not.

 2) Don't do a prediction that is forced through the origin
[ unless you are a professional statistician, or you are 
following a totally explicit recommendation and model] .
For one thing, it undermines everything that you read about
regression in any general source, including p-values.
Look in my stats-FAQ for some comments [also under 
'negative R-squared', I think], and  use groups.google
on the sci.stat.*   newsgroups.

 3) Is it going to be desirable to have predictions of  "zero"?
You might want to do a separate consideration of what
it means to see zero on the outside -- where you MIGHT 
predict zero as total.  As a separate regression, then, you 
might predict for cases with non-zero on the outside.

> 
> In my stat package, with linear regression I can choose the option
> "fit constant" which I take is equivalent to "with constant" and "not
> forced thru origin". Toggling this option changes various statistical
> values.
> 
>                       Thru origin             Fit constant
[ ...]
> Pearson correlation   0.9398                  0.8702
  - those are not commensurable.  Different bases for SS.

> Resid. Mean Square    147.636                 82.9497
> S.D.                  12.4506                 9.10767
 - these report on the deviations around the predicted values.
Notice that this FIT  is much better with "constant" than "origin"


[snip, rest]

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to