(Posted to JT and to edstat. -- DFB)

On Sun, 29 Feb 2004, Jacob Thomas wrote in part (edited):

> ... To give more background info I am teaching a group of general
> office staff various elements of Microsoft Office.  They are studying
> for a UK standard that is more or less eqivelent to the Microsoft MOUS
> qualification. One of the criteria they have to meet is to be able to
> create Excel pie, column, mixed line/column and scatter charts.  In
> theory all they need to do is create the graphs from supplied data but
> I would like to at least explain in what situation you might use a
> scatter graph, the purpose of the trend line and how to interpret the
> trendline equation.

>  I have given up on my original target of 30 words and propose to add
> this to the handout:

Editorial comments:
 "[old / NEW]" indicates original text ("old") and replacement ("NEW").
You may have to exercise judgement w.r.t. capitalization.
 {Explanatory note}, in braces, reports side comments.

 "A scatter graph is a statistical diagram drawn to [compare / SHOW
THE RELATION BETWEEN] two [sets of data / VARIABLES].
 {Can't draw scatter unless the two variables are in the SAME set of
data:  each plotted point represents one single observation in the data
set, and its coordinates are the values of the two variables.}
 It can be used to look for connections or a correlation between the two
[sets of data / VARIABLES] e.g. temperature and sales of ice cream,
hours watching TV and Exam results etc.
 A scatter chart {You wrote "scatter graph" above;  is there any virtue
in consistency of nomenclature?} has two value axes instead of one value
axis and one category axis like [most chart types / OTHER TYPES OF
CHARTS WE HAVE ENCOUNTERED].  {Or simply "some other chart types".}
 [Data used as x values should always be in the first row or column.
Data for y values should always be placed in the row or column following
the x values. / {OMIT?  The need for this rule is not at all clear to
me, as all the software I've used (which does not include Excel) permits
the variables to be in any order in the data file, and one simply picks
out the pair desired for a scatterplot.  Further, in all statistical
packages I've encountered, variables are always in columns, not in rows
-- rows generally denote different cases or observations in the file.}]

 Example
 A supermarket wants to know the effect temperature had on the sales of
 ice cream.  The available data was as follows:
 Temp (oC)      -10  -5  0  5 10  15 20  25 30  35
 Sales (�000)    2   3  4  5  8  12 25  40 50  75

{1. A little more detail is proper for an example.  Temperature of what?
Sales for a day, an hour, ...?  Hard to imagine that real data would
yield observed outdoor temperatures (if that's what these are supposed
to be) evenly spaced from -10 to 35.  OK, so it's a cooked example.
Make the cooking less obvious.  And of course you would supply, along
with this example, a scatterplot of it.  And a second scatterplot with
the trend line superposed on it.
 2. For a real handout, go to some trouble to make sure that the data
are aligned vertically.  Students need to have as much organization in
examples as you can provide:  helps to see what one is doing.
 3. If they are sufficiently naive, you may want to point out that while
the data need to be in columns (at least conceptually) for the program,
they are being displayed in rows for convenience of presentation.  And
you might consider whether there be a virtue in displaying them in
columns instead of rows.  (That would permit including, silently,
another variable such as calendar date, or month for which the
temperatures are putative averages, or whatever.)
 4. Indicate, explicitly, which variable is "x" and which is "y".  Your
rules *imply* that "Temp (oC)" is "x" and "Sales (000)" is "y", but
there is no good reason for not labelling them.}

 The equation resulting from [this /THESE] data is y = 1.4909x + 3.763
 (Does Excel insist on labelling variables as "x" and "y"?  Other
packages permit one to name the variables, and would report
  Sales = 1.4909 Temp + 3.763
 If these folks are naive, it will be VERY good for them to be shown
proper and punctilious technique in supplying variable names.  This is
often omitted because it can be tedious.  Don't;  it's more important
than it may appear to the uninitiated.}

 The equation is displayed in the form y = mx+ c where m is the slope
of the line and c is the Y-axis [interception / INTERCEPT].
 {Does Excel speak about a "trend line" in these terms?  If not, there's
no value that I can see to your introducing "m" and "c" here.  (Even if
so, I'm skeptical.  Why do they need to know that?)  If you want the
equation displayed, then display it the first time you quote it, don't
bury it in the paragraph:  e.g.,
 "The equation resulting from these data is
         y = 1.4909x + 3.763
  in which the value 1.4909 is the slope of the trend line and 3.763 is
  the intercept of the line (the value predicted for y when x = 0)."}

 The equation for the above trend line is:

 y = 1.4909x + 3.7636
 (This sentence is now redundant.}

 Assuming you accept the trend line,
 {Omit this phrase.  The consequent is true whether one accepts or not.}

 This equation allows you to [calculate / ESTIMATE] {Some would prefer
"predict" to "estimate"} [any Y axis value / y,] i.e. sales of ice
cream [IN THOUSANDS OF DOLLARS,] {or so I presume} [FROM ANY VALUE OF X]
without looking at the graph.
 [For example what are ice cream sales liable to be / FOR EXAMPLE, WHAT
ICE CREAM SALES ARE PREDICTED] when the temperature is 20oC?

 Using the equation [Y = 1.4909X + 3.7636 / Y = 1.49 X + 3.76]:
 {In using the equation, round the values reported.  It will almost
never be the case that you know these values to a precision greater than
three (3) decimal digits.  I have silently made this modification in the
sequel.  Do you have a reason for using X and Y all of a sudden, instead
of x and y?}

 Y (i.e. ice cream sales) = (1.49*20) + 3.76
 Y (i.e. ice cream sales) = 29.80 + 3.76
 Y (i.e. ice cream sales) = 33.56  {*}  THOUSANDS OF DOLLARS
                          = $33,560. "

 {* I believe your value of  33,586  was incorrect.}

 ------------------------------------------------------------
 Donald F. Burrill                              [EMAIL PROTECTED]
 56 Sebbins Pond Drive, Bedford, NH 03110      (603) 626-0816
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to