I sure many of you have been asked a question like that posed today
by one of my students, and I would be interested in hearing how you respond
to it.  I've included the question along with the response I gave this
morning.  It looks a bit long to me now, I must have been having an attack
of mania <grin>
****************************************************************************
***************
9. October 2000

        One of my graduate students just asked me, "I have been diligently
studying for the exam, but I realized that there are a lot of formulas and
sub formulas that I am having trouble memorizing.  I probably can memorize
them, but I not sure if that is what we need to do.  Should we memorize all
the formulas and sub-formulas or should we expend most of our energy on
having a good understanding of the concepts that we have covered or both?"

        Here is my reply:

        IMHO, one cannot have a good understanding of the concepts without
knowing some basic definitions.  As a simple example, I opine that you would
not have a good understanding of the concept of mean without knowing that it
is the "balance point" which makes the sum of deviations about it zero, and
that it is the quantity which minimizes the sum of squared deviations about
it (the least squares criterion).  Now, I can present that definition in
what you might call a pair of formulae, but it is, nevertheless, a
definition essential for understanding the concept.  On the other hand, if
you are going to compute a sample mean by hand, you will probably just add
up the scores and divide by the number of scores, a useful "computational
formula," but not a definition essential for understanding.

        Consider next the concept of variance (not just the more general
concept of dispersion).  To understand it, you need to know that it is
defined as the mean squared deviation of scores from their mean.  Yes, it is
just another sort of mean.  Again, I can present that definition in what you
might call a formula, but it is really just a definition essential for
understanding the concept.  On the other hand, I would not think it
essential that you know that you can get the corrected (for the mean) sum of
squared deviations (numerator of the ratio we call variance) by taking the
uncorrected sum of squares and subtracting the ratio of the square of the
summed scores to the number of scores -- but that is the formula you should
use if you were computing a variance by hand (but we have machines to do
such tasks now, tasks done by one's graduate students back in the dark ages
when I was a graduate student).

        Another example, after we cover correlation and regression, I would
expect you to know that the correlation coefficient is really just a mean --
the mean cross-product of standardized (z) scores, and it represents the
slope of the standardized least squares linear regression line for
predicting one variable from another.  While I could present that definition
of Pearson r in "formulas," those would not be the formulas you would use to
compute r, but rather are definitions that would help you understand r.
With that understanding, you would realize that r is the number of standard
deviations by which predicted Y increases per one standard deviation change
in known X.  Building on that understanding of r, you would then recognize
that the covariance is also just a mean, the mean cross product of
deviations of X about its mean and deviations of Y about its mean,
structurally the same as the univariate concept of variance, but in two
dimensions rather than just one.  The same least squares criterion used to
define the mean is used to define the regression line -- it minimizes the
(error) sum of squared deviations (in the Y dimension) about it.  The
univariate mean is really just our least squares predicted value for a score
when the only information we have is that in the univariate distribution.

If our linear model is any good, it should account for some the variance in
the variables.  The sum of the squared deviations of the predicted scores
about the regression line is used to measure that portion of the total
variance, and represents the reduction in error due to adding the X variable
to the model used to predict Y.  Divide that regression sum of squares by
the total sum of squares for the predicted variable and you obtain
r-squared, so now you have another way to interpret r -- squared, it is the
proportion of the total variance in one variable "accounted for" by our
model.

If you have read Edwin Abbott's "Flatland," you might recognize that the
same concept (a mean) which looked like a point in one dimensional space now
looks like a line in two dimensional space.  Then you would be ready to leap
into three dimensional space and even beyond, into hyperspace, but you might
want to sit down and have a good beer first.  I promise that we shall travel
that space before the semester is out (as soon as we get started on multiple
regression).

        So, to recap, starting with what might seem like a useless task of
memorizing a couple of formulas for the arithmetic mean, we come to an
understanding of several useful extensions of that concept, ending up in
hyperspace with a good beer.  What more could you possible expect from life
than having a good beer in hyperspace?
+++++++++++++++++++++++++++++++++++++++++
Karl L. Wuensch, Department of Psychology,
East Carolina University, Greenville NC  27858-4353
Voice:  252-328-4102     Fax:  252-328-6283
[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>   
http://core.ecu.edu/psyc/wuenschk/klw.htm
<http://core.ecu.edu/psyc/wuenschk/klw.htm> 


=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to