Re: Correlation problem

2002-01-07 Thread Stephen Clark


janne [EMAIL PROTECTED] wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
 I have a correlation formula I don't get to work. And we must use this
 formula on the test. Let me give you an example: Let's say X  and  Y
 are:
 xy
 1   68
 2   91
 3   102
 3   107
 4   105
 4   114
 5   115
 6   127
 _   ___
 28  829
 __
 X is =3.5 and Y is =103.625

 Now to my problem. Look at the formula in this URL:
 http://www.jannesgallery.com/corr.html.
 How do I do the first (X-X(with a line above))? I have tried to take
 _
X-X
 1-3.5=2.5
 2-3.5=-1.5
 3-3.5=-0.5
 3-3.5=-0.5
 4-3.5=0.5
 4-3.5=0.5
 5-3.5=1.5
 6-3.5=2.5
 
 0



 As you see the answer is zero. What do I do wrong? and the same with
 Y-Y(with a line above). It turns out to be zero. Please help me to tell
 how I should do.

 Janne

The sum is:

(1-3.5)*(68-103.625) + (2-3.5)*(91-103.625) + ... + (6-3.5)*(127-103.625)

which, in general, will not be zero.







=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Which one fit better??

2002-01-07 Thread Chia C Chong


Glen [EMAIL PROTECTED] wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
 Chia C Chong [EMAIL PROTECTED] wrote in message
news:a0n001$b7v$[EMAIL PROTECTED]...
  I plotted a histogram density of my data and its smooth version using
the
  normal kernel function. I tried to plot the estimated PDF (Laplacian 
  Generalised Gaussian) estimated using maximum likelihood method on top
as
  well. Graphically, its seems that Laplacian wil fit thr histogram
density
  graph better while the Generalised Gaussian will fit the smooth version
  (i.e. the kernel densoty version).
 

 Imagine that you began with a sample from a Laplacian (double
 exponential) distribution. What will happen to the central peak after
 you smooth it with a KDE?

The peak does not changed significantly...Maybe shifted to the left a
bit...not too much!!

CCC


 Glen




=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Correlation problem

2002-01-07 Thread Art Kendall

This is a multi-part message in MIME format.
--871448A000A42FB121E62065
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

another way to phrase thatis
for each case find x= X_XBAR and y = YBAR
then multiply x*y (this is called a cross-product).
then find the sum of the crossproducts.

Stephen Clark wrote:

 janne [EMAIL PROTECTED] wrote in message
 [EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
  I have a correlation formula I don't get to work. And we must use this
  formula on the test. Let me give you an example: Let's say X  and  Y
  are:
  xy
  1   68
  2   91
  3   102
  3   107
  4   105
  4   114
  5   115
  6   127
  _   ___
  28  829
  __
  X is =3.5 and Y is =103.625
 
  Now to my problem. Look at the formula in this URL:
  http://www.jannesgallery.com/corr.html.
  How do I do the first (X-X(with a line above))? I have tried to take
  _
 X-X
  1-3.5=2.5
  2-3.5=-1.5
  3-3.5=-0.5
  3-3.5=-0.5
  4-3.5=0.5
  4-3.5=0.5
  5-3.5=1.5
  6-3.5=2.5
  
  0
 
 
 
  As you see the answer is zero. What do I do wrong? and the same with
  Y-Y(with a line above). It turns out to be zero. Please help me to tell
  how I should do.
 
  Janne
 
 The sum is:

 (1-3.5)*(68-103.625) + (2-3.5)*(91-103.625) + ... + (6-3.5)*(127-103.625)

 which, in general, will not be zero.

--871448A000A42FB121E62065
Content-Type: text/x-vcard; charset=us-ascii;
 name=Arthur.Kendall.vcf
Content-Transfer-Encoding: 7bit
Content-Description: Card for Art Kendall
Content-Disposition: attachment;
 filename=Arthur.Kendall.vcf

begin:vcard 
n:Kendall;Art
tel;work:301-864-5570
x-mozilla-html:FALSE
adr:;;
version:2.1
email;internet:[EMAIL PROTECTED]
fn:Art Kendall
end:vcard

--871448A000A42FB121E62065--



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Excel Limitations and Mac Excel 2001 Bug?

2002-01-07 Thread Humberto Barreto

Hi,
[Please excuse multiple postings, but I need to get feedback from several
email communities.]
The discussion of Excel's limitations on edstat-l (archives are available
at http://jse.stat.ncsu.edu/) has
been interesting and informative. I agree substantially with David
Heiser that Excel can be used for statistical analysis, but the user must
exercise judgment and be knowledgeable about the software. I can
also see Cryer-McCullough-et al's point that the level of knowledge
required is way too high and many unsophisticated users, relying on
defaults, will get miserable results. No question the product can
be better.
The discussion then turned to variable declaration and David Firth posted
a nice review of data types. His email signature says 
David Firth
Still Thinking Different: Apple Powerbook 3400  Newton
2100
As a former Excel Mac user (now on the Dark Side), I recently had the
opportunity to debug a Mac user's add-in (Physics). Here's what I
found.
I believe there is a problem with the declaration of variables in Mac
Excel 2001. The largest number that the machine should be able to
represent with 32-bit floating point double-precision (that's Double
declaration in the code below), is supposed to be
1.79769313486232E308. In fact, in Mac Excel 2001, 1.797 * 10^38
works but 1.797* 10^39 does not! (Of course, forget about 10 ^ 308 and
this was the problem with add-in.)
So Mr. Firth, and other Mac Excel users, please run the macro below on
Excel 2001 on a Mac to see if you get the same behavior and let me know
what happens. You'll have to add a module, copy the code from
below, and then run it. If you're like me, you'll get an overflow
error on the line that reads, myMaxBug = 1.797 * 10 ^ 39. Mac Excel
2001 cannot represent that number.
Sub Excel2001BugTest()
 Dim myMaxOK As Double
 myMaxOK = 1.797 * 10 ^ 38
 Dim myMaxBug As Double
 myMaxBug = 1.797 * 10 ^ 39
End Sub
A variable declared as a Single has a highest value of 3.402823E38. In
the code above, Mac Excel 2001 does myMaxOK = 9* 10 ^ 38, just
fine. It's not that it doesn't support a Double, it appears that,
somehow, the Double has been coded for 38 instead of 308!!!  How can that
happen?
Note, Excel98 and all Win versions that I have tested work just
fine. Only Mac Excel 2001 gives the problem. I was running it
on a G4 with OS 9.1.
I have an RNG that uses the Currency data type (to use a essentially a
Double Long for large integer computations) and it works just fine on Mac
Excel 2001. Only Double (and Variant) don't work. 
Please let me know what you find or if you have any explanations for this
odd behavior. 
Thanks!

Humberto Barreto
x6315



Re: Correlation problem

2002-01-07 Thread Elliot Cramer

In sci.stat.consult janne [EMAIL PROTECTED] wrote:
: I have a correlation formula I don't get to work. And we must use this
: formula on the test. Let me give you an example: Let's say X  and  Y

If you don't know with x(with a line above) MEANS, you need to STUDY your
text.  Also your instructor should be available for consultation.




=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Correlation problem

2002-01-07 Thread Dennis Roberts

sum of deviations around a mean always = 0


X-X
1-3.5=2.5
2-3.5=-1.5
3-3.5=-0.5
3-3.5=-0.5
4-3.5=0.5
4-3.5=0.5
5-3.5=1.5
6-3.5=2.5

0



As you see the answer is zero. What do I do wrong? and the same with
Y-Y(with a line above). It turns out to be zero. Please help me to tell
how I should do.

Janne





=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
   http://jse.stat.ncsu.edu/
=

_
dennis roberts, educational psychology, penn state university
208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED]
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Standardizing evaluation scores

2002-01-07 Thread Dennis Roberts

sorry for late reply

ranking is the LEAST useful thing you can do ... so, i would never START 
with simple ranks
any sort of an absolute kind of scale ... imperfect as it is ... would 
generally be better ...

one can always convert more detailed scale values INTO ranks at the end if 
necessary BUT, you cannot go the reverse route

say we have 10 people measured on variable X ... and we end up with no ties 
... so, we get ranks of 1 to 10 ... but, these value give on NO idea 
whatsoever as to the differences amongst the 10

if i had a 3 person senior high school class with cumulative gpas of 4.00, 
3.97, and 2.38 ... the ranks would be 1, 2, and 3 ... but clearly, there is 
a huge difference between either of the top 2 and the bottom ... but, ranks 
give no clue to this at all

so, my message is ... DON'T START WITH RANKS

At 02:11 AM 12/19/01 +, Doug Federman wrote:
I have a dilemma which I haven't found a good solution for.  I work with
students who rotate with different preceptors on a monthly basis.  A
student will have at least 12 evaluations over a year's time.  A
preceptor usually will evaluate several students over the same year.
Unfortunately, the preceptors rarely agree on the grades.  One preceptor
is biased towards the middle of the 1-9 likert scale and another may be
biased towards the upper end.  Rarely, does a given preceptor use the 1-9
range completely.  I suspect that a 6 from an easy grader is equivalent
to a 3 from a tough grader.

I have considered using ranks to give a better evaluation for a given
student, but I have a serious constraint.  At the end of each year, I
must submit to another body their evaluation on the original 1-9 scale,
which is lost when using ranks.

Any suggestions?

--
It has often been remarked that an educated man has probably forgotten
most of the facts he acquired in school and university. Education is what
survives when what has been learned has been forgotten.
- B.F. Skinner New Scientist, 31 May 1964, p. 484




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
   http://jse.stat.ncsu.edu/
=

_
dennis roberts, educational psychology, penn state university
208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED]
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Correlation problem

2002-01-07 Thread Timothy E. Vaughan


janne [EMAIL PROTECTED] wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...

 How do I do the first (X-X(with a line above))? I have tried to take
 _
X-X
 *snip*
 0

 As you see the answer is zero. What do I do wrong?

You calculate

SUM[(x-x_bar)] * SUM[(y-y_bar)]

instead of what you are asked for, which is

SUM[(x-x_bar)*(y-y_bar)].

In other words, multiply each x term by the corresponding y term BEFORE
you perform the sum.




=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Looking for some datasets

2002-01-07 Thread Jill Binker

I have a page of links to data:

http://www.keypress.com/fathom/Data_Sets.html

Perhaps you can find something there.

At 11:01 PM -0700 1/4/02, Michael Joner wrote:
The new semester has started and one of my first assignments has been to
find some datasets that I'd be interested in evaluating during some of my
classes.

I spent some time searching the Internet for some interesting data.  The
data available on StatLib is not exactly what I'd prefer to study
(although I guess if I can't find anything else I can always fall back on
StatLib), mainly because the data is not very recent to begin with.  I
found some information but it seems that most of the information I've
found has been in PDF format or some other document-based format that
would be very difficult to read into SAS or S-PLUS.  I can clean data if
needed but would rather not have to go to the trouble of parsing a Word
document (I also found some of them).

The data would be for a Modern Regression Methods class.  I need some
datasets with categorical variables and some datasets with continuous
variables (I guess some datasets with a mix of categorical and continuous
wouldn't be bad, either).

Could someone here steer me in the direction of some good datasets?  Some
of my interests include technology, sports, and vital statistics.

Mike Joner
=



Jill Binker
Fathom Dynamic Statistics Software
KCP Technologies, an affiliate of
Key Curriculum Press
1150 65th St
Emeryville, CA  94608
1-800-995-MATH (6284)
[EMAIL PROTECTED]
http://www.keypress.com
http://www.keycollege.com
__


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Looking for some datasets

2002-01-07 Thread Dennis Roberts

some minitab files and other things are here

http://roberts.ed.psu.edu/users/droberts/datasets.htm



_
dennis roberts, educational psychology, penn state university
208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED]
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Which one fit better??

2002-01-07 Thread Glen Barnett


Chia C Chong [EMAIL PROTECTED] wrote in message
a1bpk5$62b$[EMAIL PROTECTED]">news:a1bpk5$62b$[EMAIL PROTECTED]...

 Glen [EMAIL PROTECTED] wrote in message
 [EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
  Chia C Chong [EMAIL PROTECTED] wrote in message
 news:a0n001$b7v$[EMAIL PROTECTED]...
   I plotted a histogram density of my data and its smooth version using
 the
   normal kernel function. I tried to plot the estimated PDF (Laplacian 
   Generalised Gaussian) estimated using maximum likelihood method on top
 as
   well. Graphically, its seems that Laplacian wil fit thr histogram
 density
   graph better while the Generalised Gaussian will fit the smooth version
   (i.e. the kernel densoty version).
  
 
  Imagine that you began with a sample from a Laplacian (double
  exponential) distribution. What will happen to the central peak after
  you smooth it with a KDE?

 The peak does not changed significantly...Maybe shifted to the left a
 bit...not too much!!

No, I was not talking about your data, since you don't necessarily have
Laplacian - that's what you're trying to decide!

Imagine you have data actually from a Laplacian distribution.
(It has a sharp peak in the middle, and exponential tails.)

Now you smooth it (KDE via gaussian kernel).

What happens to the peak?  (assume a typical window width)

[Answer? It gets smoothed, so it no longer looks like a sharp peak.]

That's where your impression of a gaussian-looking KDE is probably coming from.

Note that the tails of a normal and a laplace are different, so if those are
the two choices, that may help.

Glen





=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



clusters within a sample

2002-01-07 Thread Yvonne Unrau


I am working with a large administrative data (N=1,086) set for a
foster care agency. In short, I am comparing client outcomes across two
branches (each is delivering a different service model). For analyses, I
am using logistic regression (SPSS) where my dependent variables include
a variety of outcomes measuring program success vs. failure. My test
variable is the program (two groups), plus I have several other
demographic and service related variables. 
My problem is that I have two types of clusters of children
in my data set:

siblings from the same biological family (may or may not be placed in
the same foster home)
foster children placed in one foster home (may or may not be
siblings)
I am looking for ways to test the amount of error associated
with the above clusters using SPSS. My strategy to date has been to
SELECT the restricted sample, run the LR analysis, then eyeball the
results. What are my other options?
Many thanks.

Yvonne A. Unrau, PhD
Associate Professor
School of Social Work
Illinois State
University
Campus Box 4650
Normal, Illinois 61790-4650
Direct Office Phone: (309) 438-8579
School Office Phone: (309) 438-3631
School Fax: (309) 438-5880
e-mail: [EMAIL PROTECTED]




Re: Excel2000- the same errors in stat. computations andgraphics

2002-01-07 Thread Jay Warner


Jon Cryer wrote:
David:
I have certainly never said nor implied that Excel
cannot
produce reasonably
good graphics. My concern is that it makes it so easy to produce
poor
graphics. The defaults are absurd and should never be used. It seems
to me that
defaults should produce at least something useful. The default graphs
are certainly not good
business graphs if the intent is to produce good visual display of
quantitative information!
Isn't that what graphs are for?
The purpose of fancy graphics is to cover up the paucity of information
contained therein. For that reason alone, Excel's cornucopia of choices
fits the bill very nicely.
(Slap your own face, Jay, for being such a cynic.)
Cheers,
Jay
--
Jay Warner
Principal Scientist
Warner Consulting, Inc.
 North Green Bay Road
Racine, WI 53404-1216
USA
Ph: (262) 634-9100
FAX: (262) 681-1133
email: [EMAIL PROTECTED]
web: http://www.a2q.com
The A2Q Method (tm) -- What do you want to improve today?



Excel vs Quattro Pro

2002-01-07 Thread Edward Dreyer

Does anyone know if Quattro Pro suffers the same statistical problems as Excel?

Cheers.  ECD
___

Edward C. Dreyer
Political Science
The University of Tulsa






=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Thank you!

2002-01-07 Thread janne

Thank you for helping me with my problem!

Janne



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Thank you all!!!

2002-01-07 Thread janne

Thank you everybody who helped me with my correlation problem Stephen,
Art, Timothy, Patrick. It was very sweet of you.

Janne



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Excel vs Quattro Pro

2002-01-07 Thread Dennis Roberts

i don't know the answer to this but ... i have a general question with 
regards to using spreadsheets for stat analysis

why? ... why do we not help our students and encourage our students to use 
tools designed for a task ... rather than substituting something that may 
just barely get us by?

we don't ask stat packages to do what spreadsheets were designed to do ... 
why the reverse?

just because packages like excel are popular and readily available ... does 
not therefore mean that we should be recommending it (or them) to people 
for statistical analysis

it's like telling people that notepad will be sufficient to do all your 
word processing needs ...

At 04:56 PM 1/7/02 -0600, Edward Dreyer wrote:
Does anyone know if Quattro Pro suffers the same statistical problems as 
Excel?

Cheers.  ECD
___

Edward C. Dreyer
Political Science
The University of Tulsa






=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

_
dennis roberts, educational psychology, penn state university
208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED]
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Excel vs Quattro Pro

2002-01-07 Thread Kenmlin

i don't know the answer to this but ... i have a general question with 
regards to using spreadsheets for stat analysis

Many students are computer illiterate and it might be easier to teach them how
to use the spreadsheet than a formal programming language.  




=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Excel vs Quattro Pro

2002-01-07 Thread dennis roberts

most stat packages have nothing to do with programming anything ... you 
either use simple commands to do things you want done (like in minitab ... 
mtb correlation 'height' 'weight') or, select procedures from menus and 
dialog boxes

At 12:27 AM 1/8/02 +, Kenmlin wrote:
 i don't know the answer to this but ... i have a general question with
 regards to using spreadsheets for stat analysis

Many students are computer illiterate and it might be easier to teach them how
to use the spreadsheet than a formal programming language.




=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
   http://jse.stat.ncsu.edu/
=



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Excel vs Quattro Pro

2002-01-07 Thread Vadim and Oxana Marmer

there is a lot of packages that are half-way between spreadsheets and
formal programming languages: SAS, SPSS, Stata. anything is better than
spreadsheets.


On 8 Jan 2002, Kenmlin wrote:

 i don't know the answer to this but ... i have a general question with
 regards to using spreadsheets for stat analysis

 Many students are computer illiterate and it might be easier to teach them how
 to use the spreadsheet than a formal programming language.






=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Excel vs Quattro Pro

2002-01-07 Thread Art Kendall

This is a multi-part message in MIME format.
--EFD979E9843F6B9938938A9A
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Spreadsheets are fine for minor business/commercial data analysis.  They are not
designed to be statistical packages.  A package like SPSS is designed for a wide
variety of statistical applications across many disciplines.  It shares many
features of a spreadsheet in the user interface.  It is a package not a
programming language.   A person who is going to use statistics does not have to
become a programmer.
(Although exposure to a programming language or two will be a help to
statisticians.)


Kenmlin wrote:

 i don't know the answer to this but ... i have a general question with
 regards to using spreadsheets for stat analysis

 Many students are computer illiterate and it might be easier to teach them how
 to use the spreadsheet than a formal programming language.

--EFD979E9843F6B9938938A9A
Content-Type: text/x-vcard; charset=us-ascii;
 name=Arthur.Kendall.vcf
Content-Transfer-Encoding: 7bit
Content-Description: Card for Art Kendall
Content-Disposition: attachment;
 filename=Arthur.Kendall.vcf

begin:vcard 
n:Kendall;Art
tel;work:301-864-5570
x-mozilla-html:FALSE
adr:;;
version:2.1
email;internet:[EMAIL PROTECTED]
fn:Art Kendall
end:vcard

--EFD979E9843F6B9938938A9A--



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=