Re: Schmid-Leiman

2001-10-10 Thread Joe Ward

Julie --

I worked with Jack Schmid and John Leiman at the
Air Force Personnel and Training Research Center at Lackland
AFB.

I communicate with Jack Schmid occasionally but I'm not
sure where John Leiman is located now.

Perhaps Jack can point you to someone who can help.

-- Joe

- Original Message -
From: "Penley, Julie" <[EMAIL PROTECTED]>
To: "edstat (E-mail) (E-mail)" <[EMAIL PROTECTED]>
Sent: Wednesday, October 10, 2001 9:42 AM
Subject: Schmid-Leiman


> Could someone please tell me how to perform a Schmid-Leiman transformation
> in SPSS?  Thanks very much.
> Julie
>
>
> Julie A. Penley, M.A.
> Evaluation Coordinator
> Partnership in Teacher Preparation
> The University of Texas at El Paso
> El Paso, TX 79968
> phone: (915) 747-5642
>
>
>
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =
>


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Analysis of covariance

2001-09-25 Thread Joe Ward

Paolo --

Here comes my usual response to messages similar to yours:

Following the use of Regression/Linear Models:

1. State your research question in "NATURAL LANGUAGE" not
in terms of a "canned statistical name" that may or may not
be relevant to your question.

2. Create an ASSUMED MODEL that allows you to translate your
"NATURAL LANGUAGE" questions into RESTRICTIONS on your ASSUMED MODEL.

3. Impose the restrictions on your ASSUMED MODEL to obtain your RESTRICTED
MODEL and then you have the essentials to test your
hypotheses.

If this procedure is IDENTICAL to someone's COVARIANCE ANALYSIS then
you might want to call yours a COVARIANCE ANALYSIS.

-- Joe


*** Joe H. Ward,  Jr.
*** 167 East Arrowhead Dr.
*** San Antonio, TX 78228-2402
*** Phone: 210-433-6575
*** Fax:   210-433-2828
*** Email: [EMAIL PROTECTED]
*** http://www.northside.isd.tenet.edu/healthww/biostatistics/wardindex.html
*** ---
*** Health Careers High School
*** 4646 Hamilton-Wolfe
*** San Antonio, TX 78229
*

- Original Message -
From: "Morelli Paolo" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, September 25, 2001 5:26 AM
Subject: Analysis of covariance


> HI all,
> I have to analyse some clinical data. In particular the analysis is a
> comparison between two groups of the mean change baseline to endpoint of a
> score. The statistician who planned the analysis used the ANCOVA on the
mean
> change, using as covariate the baseline values of the scores.
> Do you think this analysis is correct?
> I thing that in this way we are correcting twice. I think that the right
> analysis is an ANOVA on the mean change.
> Please let me know your opinion
> thanks
> Paolo
>
>
>
>
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =
>


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: adjusted r-square

2001-08-22 Thread Joe Ward

If the least-squares regression algorithm does not
"REQUIRE THE NUMBER OF OBSERVATIONS TO EXCEED
THE NUMBER OF PREDICTORS, THEN THE REGRESSION
ALGORITHM COULD BE USED TO SOLVE A SYSTEM OF
SIMULTANEOUS EQUATIONS THAT WOULD HAVE
NO ERRORS."

Another "interesting" characteristic of Excel Regression is that it
"requires
the number of observations to exceed the number of predictors".

Fortunately, Colin Bell is working with the Excel folks at Microsoft to
improve the numerous "interesting" characteristics of  Statistics in Excel.

-- Joe



*** Joe H. Ward,  Jr.
*** 167 East Arrowhead Dr.
*** San Antonio, TX 78228-2402
*** Phone: 210-433-6575
*** Fax:   210-433-2828
*** Email: [EMAIL PROTECTED]
*** http://www.ijoa.org/resumes/ward.html
*** ---
*** Health Careers High School
*** 4646 Hamilton-Wolfe
*** San Antonio, TX 78229
*




- Original Message -
From: "Graeme Byrne" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, August 22, 2001 4:42 AM
Subject: Re: adjusted r-square


> In short, you don't. If the number of terms in the model equals the number
> of observations you have much bigger problems than not being able to
compute
> adjusted R^2. It should always be the case that the number of observations
> exceed the number of terms in the model otherwise you cannot calculate any
> of the standard regression diagnostics (F-stats, t-stats etc). My advice
is
> get more data or remove terms from the model. If neither of these is an
> option you are stuck.
>
>
> "Atul" <[EMAIL PROTECTED]> wrote in message
> [EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> > I have a doubt regarding adjusted r-square
> >
> > How do we calculate the adjusted r-square when the error degrees of
> > freedom are zero ?
> > (or in other words, number of samples is equal  to the number of
> > regression terms including the constant)
> >
> > Such a situation leads to a zero in the denominator in the expression
> > for calculating adjusted r-square.
> >
> > Your help is highly appreciated.
> >
> > Thanks
> > Atul
>
>
>
>
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =
>



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Experimental Design Text Advice

2001-01-18 Thread Joe Ward

DENNIS ROBERTS WRITES -

- Original Message -
From: "dennis roberts" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, January 18, 2001 1:31 PM
Subject: Re: Experimental Design Text Advice


> At 10:49 AM 1/18/01 -0600, Ken K. wrote:
> >I find BH&H to be quite good, but a little hard to read and getting a
little
> >dated. I much prefer "Design and Analysis of Experiments" by Douglas C.
> >Montgomery, John Wiley & Sons, ISBN 0-471-52000-4
> >
> >I really like the simple style Montogomery uses in all his books
>
> not disagreeing with the above but, one of the big problems in selecting a
> book on experimental design is ... that appropriate designs DEPEND upon
the
> problem(s) being investigated ...
>
> in addition, i have sensed that most "design" books are not really about
> designing experiments but, how to analyze data FROM particular designs ...
> and there IS a large difference
>
> while it may not be too difficult to talk about 1 and 2 and 3 or more
> factor designs ... with blocking variables or not ... with repeated
> measures or not (etc.) ... whether any of these are appropriate again
> depends on the question(s) being asked
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>       http://jse.stat.ncsu.edu/
> =
>

===  Joe Ward Comments ==
Dennis --

You said it well!!!

I might add that a good approach is to develop
capabilities to create models and impose restrictions
to answer the RESEARCH QUESTIONS OF INTEREST TO THE RESEARCHER.

It is difficult to teach students to create models if the
teachers have not developed their own capability to create models
appropriate to the research questions of interest.

-- Joe

Joe Ward
167 East Arrowhead Dr.
San Antonio, TX 78228-2402
Home phone: 210-433-6575
Home fax:   210-433-2828
Email: [EMAIL PROTECTED]
http://www.ijoa.org/joeward/wardindex.html

Health Careers High School
4646 Hamilton Wolfe
San Antonio, TX 78229
Phone: 210-617-5400
Fax:   210-617-5423




- Original Message -
From: "dennis roberts" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, January 18, 2001 1:31 PM
Subject: Re: Experimental Design Text Advice


> At 10:49 AM 1/18/01 -0600, Ken K. wrote:
> >I find BH&H to be quite good, but a little hard to read and getting a
little
> >dated. I much prefer "Design and Analysis of Experiments" by Douglas C.
> >Montgomery, John Wiley & Sons, ISBN 0-471-52000-4
> >
> >I really like the simple style Montogomery uses in all his books
>
> not disagreeing with the above but, one of the big problems in selecting a
> book on experimental design is ... that appropriate designs DEPEND upon
the
> problem(s) being investigated ...
>
> in addition, i have sensed that most "design" books are not really about
> designing experiments but, how to analyze data FROM particular designs ...
> and there IS a large difference
>
> while it may not be too difficult to talk about 1 and 2 and 3 or more
> factor designs ... with blocking variables or not ... with repeated
> measures or not (etc.) ... whether any of these are appropriate again
> depends on the question(s) being asked
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =
>




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Re: topic?

2001-01-02 Thread Joe Ward

Happy New Year --

Perhaps Laurie Snell will make a good start through the future
CHANCE issues.

-- Joe

Joe Ward
167 East Arrowhead Dr.
San Antonio, TX 78228-2402
Home phone: 210-433-6575
Home fax:   210-433-2828
Email: [EMAIL PROTECTED]
http://www.ijoa.org/joeward/wardindex.html

Health Careers High School
4646 Hamilton Wolfe
San Antonio, TX 78229
Phone: 210-617-5400
Fax:   210-617-5423




- Original Message - 
From: "Bokhorst, Frank" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, January 02, 2001 4:41 AM
Subject: Re: topic?


> Bob Hayden asked:
>  
> > Anybody have anything to say about statistical education???
> 
> I would like to turn the question round, and ask if it might be
> possible to summarize relevant material from the recent discussion 
> on the forum about the US election saga into a form suitable for 
> teaching purposes? 
> 
> In particular, to sift through the EDSTAT archive and edit a 
> resource text.
> 
> There was much off-topic discussion, but there was also a huge 
> volume of generally polite and reasonable talk with many good 
> points illustrating key issues relevant to education.  The topic 
> itself was extremely pertinent and interesting to a wide audience.  
> For example, someone recently asked for examples of the misuse of 
> statistics - surely many examples could be found in the US election 
> saga?   What we need is a good summary. 
> 
> As another example, I note that Herman Rubin frequently argues
> the need for proper understanding of statistics:  Could he, or 
> someone anybody else on the EDSTAT forum, perhaps help educators
> by compiling some examples that arose in the recent discussion?
> What kind of understanding of statistics might be required of 
> lawyers, politicians, voters, media editors?
> 
> Maybe someone could list key points that came out of these EDSTAT 
> discussions?
> 
> 
> Frank Bokhorst
> http://www.uct.ac.za/depts/psychology/bok
>   _O
> tel: 021 650-3708   -\<,
> fax: 021 689-7572   One car less  (.)/(.)
> Psychology Dept.,   The owner of this bicycle
> University of   takes responsibility for 
> Cape Town,  the shape of his drawing 
> Rondebosch 7701,only if you use a fixed
> South Africa.   size font such as Courier.
> 
> 
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =
> 



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Statistical penalties for sequential analyses

2000-12-08 Thread Joe Ward

Rich -
You might want to consider doing some Resampling (Cross-Validation,
Bootstrap)
as you continue through your analyses.

-- Joe


Joe Ward Health Careers High School
167 East Arrowhead Dr _ 4646 Hamilton Wolfe
San Antonio, TX 78228-2402  San Antonio, TX 78229
Phone: 210-433-6575__  Phone:  210-617-5400
Fax: 210-433-2828   Fax: 210-617-5423
Email: [EMAIL PROTECTED]
http://www.ijoa.org/joeward/wardindex.html
***
- Original Message -
From: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, December 08, 2000 3:30 PM
Subject: Statistical penalties for sequential analyses


> Need some advice.  We are doing a series of tests looking for correlations
> among age-sensitive variables in a population of mice. We will have about
> 600 mice in all, and it will take 3 years to test each mouse at about 200
> mice tested each year.
>
> We are considering three strategies:
>
> A) Wait 3 years until all the data are in; then do the analyses.
>
> B) Analyze the data on the first 300 mice, and publish anything that looks
> exciting and meets conventional significance criteria. When the second set
> of mice is finished, we can use these second 300 animals as a replicate
> samples to (try to) confirm the significant findings we reported on the
first
> set.  And we can also pool all 600 mice to obtain higher statistical power
> than we had for the initial analysis with N = 300.
>
> Of course this represents testing some hypotheses twice, and thus
increases
> the Type I error rate. I suspect that there are theoretically justified
> methods for adjusting significance criteria to "adjust" for taking two
looks
> at the data, but I don't know how to do this.  Anyone have a recipe, or a
> reference to get me started?
>
> Thanks.
>
> Rich Miller
> University of Michigan
>
> Reply to: [EMAIL PROTECTED]
>
>
> Sent via Deja.com http://www.deja.com/
> Before you buy.
>
>
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =
>




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: [ap-stat] Textbook for "regular" statistics vs. AP Statistics

2000-12-01 Thread Joe Ward


- Original Message -
From: "Carole Black" <[EMAIL PROTECTED]>
To: "AP Statistics" <[EMAIL PROTECTED]>
Sent: Wednesday, November 29, 2000 12:58 PM
Subject: [ap-stat] Textbook for "regular" statistics vs. AP Statistics


> I have taught a "regular" statistics class at my high school for the
> last 3 years using Elementary Statistics by Mario Triola. (This was
> the book I inherited.)  This is textbook adoption year for Georgia and
> I have the priviledge of picking out Statistics books for both the
> "regular" stat class as well as a new AP class that will be offered
> for the first time next year.  (I will be teaching both classes).  My
> first question is, should I go with 2 different textbooks or the same
> textbook?
>
> My second question is much the same as many others posted on this
> site, which book? I am seriously considering the Yates, Moore and
> McCabe "The Practice of Statistics" for the AP class.  I am
> considering either Moore's "Basic Practice of Statistice" or the
> "Elementary Statistics" book published by McGraw Hill for the regular
> statistics class.
>
> Any comments would be greatly appreciated.
> Carole Black
>
> ---

=  Joe Ward Comments  ==

Hi, Carole --

Your opportunity of having an AP-Statistics class and a "regular" Statistics
class can allow
you the freedom of using the "regular" class to give students the capability
to use the
combined power of Regression/Linear Models and Computers to investigate some
interesting and
practical research questions.  You might recruit some of your science
students to give them
useful techniques to support their research projects.  You can give your
students the
power to create models to answer their research questions.

It is certainly reasonable that  you must give your AP-Statistics students
the objectives that tend to
match the corresponding college course.  For the "regular" Statistics course
you can
make the course both interesting  and practical without the constraints of
AP-Statistics.

There probably are many AP teachers who can accomplish the AP-Statistics
objectives
AND have extra time to give their students some more powerful capabilities.

Try to make your "regular" statistics course available for ALL students.
Frequently,
the "regular" course is designed for the less talented.  You CAN make the
regular
course the more popular since your students might be able to do some
powerful
research.  Students who are involved with Science Fairs, Jr. Academy of
Science and
the ASA Project/Poster competitions should be your target population for the
"regular"
course.

Be sure to have access to books that contain ideas of how to use
Regression/Linear
models to create models to answer the students research questions of
interest.

-- Joe



Joe Ward Health Careers High School
167 East Arrowhead Dr _ 4646 Hamilton Wolfe
San Antonio, TX 78228-2402  San Antonio, TX 78229
Phone: 210-433-6575__  Phone:  210-617-5400
Fax: 210-433-2828   Fax: 210-617-5423
Email: [EMAIL PROTECTED]
http://www.ijoa.org/joeward/wardindex.html
***







=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



[ap-stat] RE: election proposal

2000-11-14 Thread Joe Ward

Does anyone know WHY so many states DON'T DO IT THIS WAY?
Perhaps the Political Science/History folks can comment.

-- Joe

****
Joe Ward.Health Careers High School
167 East Arrowhead Dr4646 Hamilton Wolfe
San Antonio, TX 78228-2402...San Antonio, TX 78229
Phone: 210-433-6575...Phone:  210-617-5400
Fax: 210-433-2828Fax: 210-617-5423
Email: [EMAIL PROTECTED]
http://www.ijoa.org/joeward/wardindex.html
***


- Original Message -
From: "Lee Creighton" <[EMAIL PROTECTED]>
To: "AP Statistics" <[EMAIL PROTECTED]>
Sent: Monday, November 13, 2000 8:11 AM
Subject: [ap-stat] RE: election proposal


> People are listening! This is exactly how Nebraska and Maine vote, as we
speak.
>
> It was decided after the disastrous 1824 election that the states would
have the power to manage how they pick electors, and *not* the federal
government.
>
> > -Original Message-
> > From: Jon Graetz [mailto:[EMAIL PROTECTED]]
> > Sent: Sunday, November 12, 2000 11:30 PM
> > To: AP Statistics
> > Subject: [ap-stat] RE: election proposal
> >
> >
> > I like it!  Now, to get anyone else to listen...
> >
> > Jon Graetz
> > The Miami Valley School
> > 5151 Denise Drive
> > Dayton, OH  45429
> > (937)434-
> > [EMAIL PROTECTED]
> > [EMAIL PROTECTED]
> >
> > -Original Message-
> > From: Reba Taylor [mailto:[EMAIL PROTECTED]]
> > Sent: Sunday, November 12, 2000 11:00 PM
> > To: AP Statistics
> > Subject: [ap-stat] election proposal
> >
> >
> > I've been toying with this idea:
> >
> > Each state has the same number of electors as their congressional
> > delegation:  e.g.  in VA, we have 11 congressional districts
> > + 2 senators =
> > 13 electors.
> >
> > Let's keep the electors, but have the ones representing the
> > congressional
> > districts vote the way their district  votes.  Then the 2
> > at-large electors
> > will vote the way the state as a whole votes.
> >
> > I think this is more equable than winner-take-all.  I also
> > think it would
> > be a more representative sample of the popular vote, but
> > still giving the
> > smaller states as much clout as the larger ones.
> >
> > Reba Taylor
> >
> >
> > *
> > *   Reba Taylor [EMAIL PROTECTED] *
> > * *
> > *   Home: School: *
> > * Blacksburg High School *
> > *   2418 Ridge Road 520 Patrick Henry Drive *
> > *   Blacksburg, VA 24060 Blacksburg, VA 24060 *
> > *   540-953-2421 540-951-5706 *
> > * *
> > *  AP Computer Science, AP Statistics, Math *
> > * *
> > *  Black holes are where God divided by zero. *
> > * *
> > * "Can't never could, till it tried!"  -- S.C. Taylor
> > *
> > * *
> > *
>
> ---
> You are currently subscribed to ap-stat as: [EMAIL PROTECTED]
> To unsubscribe send a blank email to
> [EMAIL PROTECTED]
> Frequently Asked Questions(FAQ) Site is at
> http://www.ncssm.edu/statsteachers
> AP Statistics Archives are at http://forum.swarthmore.edu/epigone/apstat-l
>
>



---
You are currently subscribed to ap-stat as: [EMAIL PROTECTED]
To unsubscribe send a blank email to
[EMAIL PROTECTED]
Frequently Asked Questions(FAQ) Site is at
http://www.ncssm.edu/statsteachers
AP Statistics Archives are at http://forum.swarthmore.edu/epigone/apstat-l
 



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Help needed ... :-(

2000-11-13 Thread Joe Ward

Well said, Bob --

-- Joe


Joe Ward.Health Careers High School
167 East Arrowhead Dr4646 Hamilton Wolfe
San Antonio, TX 78228-2402...San Antonio, TX 78229
Phone: 210-433-6575...Phone:  210-617-5400
Fax: 210-433-2828Fax: 210-617-5423
Email: [EMAIL PROTECTED]
http://www.ijoa.org/joeward/wardindex.html
***
- Original Message -
From: "Bob Hayden" <[EMAIL PROTECTED]>
To: "EdStat-L" <[EMAIL PROTECTED]>
Sent: Monday, November 13, 2000 9:46 PM
Subject: Re: Help needed ...



> - Forwarded message from David Heiser -
>
>
> - Original Message -
> From: Dennis <[EMAIL PROTECTED]>
>
> > Hello Newsgroup, I'm searching for real good books on stats. I'm a
> > student of psychology and we've been taught very much stats. But I
> > read all the time your postings and wonder why I've never heard
> > about that what I read.
> ...
> > Hopefully and with much regards
> > yours Dennis
> >
> ---
>
> What you need is a good class in written English
> DAH
>
> - End of forwarded message from David Heiser -
>
> From the email address, it appears that Dennis lives in a European
> country where English is not the predominant language.  The written
> English here far surpasses my written French, German or Latin, to
> mention only languages I have studied.  I note that, unlike most
> Americans, Dennis uses the word "hopefully" correctly.  Of course, if
> Americans were as good with other people's languages as Europeans are,
> Dennis could have sent us a native-language posting, and then
> criticized us when we tried to respond in that language.
>
> I think this list can benefit greatly from being an INTERNATIONAL
> list.  Let's make folks from other countries feel welcome.
>
>
>   _
>  | | Robert W. Hayden
>  | |  Work: Department of Mathematics
> /  | Plymouth State College MSC#29
>|   | Plymouth, New Hampshire 03264  USA
>| * | fax (603) 535-2943
>   /|   Home: 82 River Street (use this in the summer)
>  | ) Ashland, NH 03217
>  L_/ (603) 968-9914 (use this year-round)
> Map of New[EMAIL PROTECTED] (works year-round)
> Hampshire http://mathpc04.plymouth.edu (works year-round)
>
> The State of New Hampshire takes no responsibility for what this map
> looks like if you are not using a fixed-width font such as Courier.
>
> "Opportunity is missed by most people because it is dressed in
> overalls and looks like work." --Thomas Edison
>
>
>
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =
>




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: [ap-stat] RE: election proposal

2000-11-13 Thread Joe Ward

Does anyone know WHY so many states DON'T DO IT THIS WAY?
Perhaps the Political Science/History folks can comment.

-- Joe

****
Joe Ward.Health Careers High School
167 East Arrowhead Dr4646 Hamilton Wolfe
San Antonio, TX 78228-2402...San Antonio, TX 78229
Phone: 210-433-6575...Phone:  210-617-5400
Fax: 210-433-2828Fax: 210-617-5423
Email: [EMAIL PROTECTED]
http://www.ijoa.org/joeward/wardindex.html
***


- Original Message -
From: "Lee Creighton" <[EMAIL PROTECTED]>
To: "AP Statistics" <[EMAIL PROTECTED]>
Sent: Monday, November 13, 2000 8:11 AM
Subject: [ap-stat] RE: election proposal


> People are listening! This is exactly how Nebraska and Maine vote, as we
speak.
>
> It was decided after the disastrous 1824 election that the states would
have the power to manage how they pick electors, and *not* the federal
government.
>
> > -Original Message-
> > From: Jon Graetz [mailto:[EMAIL PROTECTED]]
> > Sent: Sunday, November 12, 2000 11:30 PM
> > To: AP Statistics
> > Subject: [ap-stat] RE: election proposal
> >
> >
> > I like it!  Now, to get anyone else to listen...
> >
> > Jon Graetz
> > The Miami Valley School
> > 5151 Denise Drive
> > Dayton, OH  45429
> > (937)434-
> > [EMAIL PROTECTED]
> > [EMAIL PROTECTED]
> >
> > -Original Message-
> > From: Reba Taylor [mailto:[EMAIL PROTECTED]]
> > Sent: Sunday, November 12, 2000 11:00 PM
> > To: AP Statistics
> > Subject: [ap-stat] election proposal
> >
> >
> > I've been toying with this idea:
> >
> > Each state has the same number of electors as their congressional
> > delegation:  e.g.  in VA, we have 11 congressional districts
> > + 2 senators =
> > 13 electors.
> >
> > Let's keep the electors, but have the ones representing the
> > congressional
> > districts vote the way their district  votes.  Then the 2
> > at-large electors
> > will vote the way the state as a whole votes.
> >
> > I think this is more equable than winner-take-all.  I also
> > think it would
> > be a more representative sample of the popular vote, but
> > still giving the
> > smaller states as much clout as the larger ones.
> >
> > Reba Taylor
> >
> >
> > *
> > *   Reba Taylor [EMAIL PROTECTED] *
> > * *
> > *   Home: School: *
> > * Blacksburg High School *
> > *   2418 Ridge Road 520 Patrick Henry Drive *
> > *   Blacksburg, VA 24060 Blacksburg, VA 24060 *
> > *   540-953-2421 540-951-5706 *
> > * *
> > *  AP Computer Science, AP Statistics, Math *
> > * *
> > *  Black holes are where God divided by zero. *
> > * *
> > * "Can't never could, till it tried!"  -- S.C. Taylor
> > *
> > * *
> > *
>
> ---
> You are currently subscribed to ap-stat as: [EMAIL PROTECTED]
> To unsubscribe send a blank email to
> [EMAIL PROTECTED]
> Frequently Asked Questions(FAQ) Site is at
> http://www.ncssm.edu/statsteachers
> AP Statistics Archives are at http://forum.swarthmore.edu/epigone/apstat-l
>
>




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



[ap-stat] Re: revote and Accuracy and Design of Voting Forms

2000-11-10 Thread Joe Ward


  Bob Hayden wrote to the AP list:   ==
> > - Original Message -
> > From: "Bob Hayden" <[EMAIL PROTECTED]>
> > To: "AP Statistics" <[EMAIL PROTECTED]>
> > Sent: Friday, November 10, 2000 10:01 AM
> > Subject: [ap-stat] revote
> >
> >
> > > After considering all the issues raised on the lists regarding the
> > > election, I think the best solution would be a revote in every state
> > > of the union -- but with NEW CANDIDATES!-)
> > > --
> > >  | | Robert W. Hayden
> > >  | |  Work: Department of Mathematics
> > > /  | Plymouth State College MSC#29
> > >|   | Plymouth, New Hampshire 03264  USA
> > >| * | fax (603) 535-2943
> > >   /|   Home: 82 River Street (use this in the summer)
> > >  | ) Ashland, NH 03217
> > >  L_/ (603) 968-9914 (use this year-round)
> > > Map of New[EMAIL PROTECTED] (works year-round)
> > > Hampshire http://mathpc04.plymouth.edu (works year-round)
> > >
> > > The State of New Hampshire takes no responsibility for what this map
> > > looks like if you are not using a fixed-width font such as Courier.
> > >
> > > "Opportunity is missed by most people because it is dressed in
> > > overalls and looks like work." --Thomas Edison

===  Joe Ward replied to Bob Hayden ===

> > Hey, Bob --
> >
> > THAT really brought some hearty chuckles to
> > Bettie and I.
> >
> > -- Joe
> >

> > Joe Ward.Health Careers High
School
> > 167 East Arrowhead Dr4646 Hamilton Wolfe
> > San Antonio, TX 78228-2402...San Antonio, TX 78229
> > Phone: 210-433-6575...Phone:  210-617-5400
> > Fax: 210-433-2828....Fax: 210-617-5423
> > Email: [EMAIL PROTECTED]
> > http://www.ijoa.org/joeward/wardindex.html
> >
***

== Then Bob Hayden wrote:   =
- Original Message -
From: "Bob Hayden" <[EMAIL PROTECTED]>
To: "Joe Ward" <[EMAIL PROTECTED]>
Sent: Friday, November 10, 2000 10:41 AM
Subject: Re: [ap-stat] revote
>
> Their post-election bickering did not endear them to me.  I think they
> should both go home, return to their jobs, and SHUT UP.
>

  Joe Ward Comments about Accuracy of Voting Responses ==
  Is there research on the Design of Voting Forms?  =
Hi, Bob --

In ANY election, the format for obtaining voting responses should be
designed
to minimize the chances for inaccurate responses.  It is surprising that
the "format-approval folks" in Palm Beach did not redesign the form.  It
looks like
the form was designed for convenience of the computer folks or
the print shop or others--but not for the accuracy of responses.

No matter who is the winner in any election,  there probably are some local
voting
systems that need "fine tuning".

In San Antonio, we have gone through numerous varieties of
voting formats.  Some seem better than others. I'm not sure how
the final forms are "approved".

In this recent election we used felt-tip markers!!!  The ink soaked through
to the back side of the paper but when my wife mentioned it, the "judges"
said that it had been checked and "did not interfere with the markings on
the other side".
 But do we know what happens if there is a SMEAR of the wet ink?  Does THAT
BALLOT COUNT, or is it rejected?  If I were running for election in our
county and the
voting was close, then I certainly would ask for a "hand" recount to find
out how many
votes were rejected by the scan machine because of "smear" or because the
wet ink
soaked through the paper (probably cheap paper) and was "sensed" on the
back.

 Perhaps there should be a research project designed by a TASK FORCE of some
ASA
members to evaluate the many different forms to find out which form(s)
MINIMIZE INACCURACY
OF RESPONSE.  It is likely that  such research has been done since it
is such an important activity.  The studies should consider age, education,
language and other
variables.

-- Joe


Joe Ward.Health Careers High School
167 East Arrowhead Dr4646 Hamilton Wolfe
San Antonio, TX 78228-2402...San Antonio, TX 78229
Phone: 210-433-6575...Phone:  210-617-5400

Re: [ap-stat] revote and Accuracy and Design of Voting Forms

2000-11-10 Thread Joe Ward


  Bob Hayden wrote to the AP list:   ==
> > - Original Message -
> > From: "Bob Hayden" <[EMAIL PROTECTED]>
> > To: "AP Statistics" <[EMAIL PROTECTED]>
> > Sent: Friday, November 10, 2000 10:01 AM
> > Subject: [ap-stat] revote
> >
> >
> > > After considering all the issues raised on the lists regarding the
> > > election, I think the best solution would be a revote in every state
> > > of the union -- but with NEW CANDIDATES!-)
> > > --
> > >  | | Robert W. Hayden
> > >  | |  Work: Department of Mathematics
> > > /  | Plymouth State College MSC#29
> > >|   | Plymouth, New Hampshire 03264  USA
> > >| * | fax (603) 535-2943
> > >   /|   Home: 82 River Street (use this in the summer)
> > >  | ) Ashland, NH 03217
> > >  L_/ (603) 968-9914 (use this year-round)
> > > Map of New[EMAIL PROTECTED] (works year-round)
> > > Hampshire http://mathpc04.plymouth.edu (works year-round)
> > >
> > > The State of New Hampshire takes no responsibility for what this map
> > > looks like if you are not using a fixed-width font such as Courier.
> > >
> > > "Opportunity is missed by most people because it is dressed in
> > > overalls and looks like work." --Thomas Edison

===  Joe Ward replied to Bob Hayden ===

> > Hey, Bob --
> >
> > THAT really brought some hearty chuckles to
> > Bettie and I.
> >
> > -- Joe
> >

> > Joe Ward.Health Careers High
School
> > 167 East Arrowhead Dr4646 Hamilton Wolfe
> > San Antonio, TX 78228-2402...San Antonio, TX 78229
> > Phone: 210-433-6575...Phone:  210-617-5400
> > Fax: 210-433-2828....Fax: 210-617-5423
> > Email: [EMAIL PROTECTED]
> > http://www.ijoa.org/joeward/wardindex.html
> >
***

== Then Bob Hayden wrote:   =
- Original Message -
From: "Bob Hayden" <[EMAIL PROTECTED]>
To: "Joe Ward" <[EMAIL PROTECTED]>
Sent: Friday, November 10, 2000 10:41 AM
Subject: Re: [ap-stat] revote
>
> Their post-election bickering did not endear them to me.  I think they
> should both go home, return to their jobs, and SHUT UP.
>

  Joe Ward Comments about Accuracy of Voting Responses ==
  Is there research on the Design of Voting Forms?  =
Hi, Bob --

In ANY election, the format for obtaining voting responses should be
designed
to minimize the chances for inaccurate responses.  It is surprising that
the "format-approval folks" in Palm Beach did not redesign the form.  It
looks like
the form was designed for convenience of the computer folks or
the print shop or others--but not for the accuracy of responses.

No matter who is the winner in any election,  there probably are some local
voting
systems that need "fine tuning".

In San Antonio, we have gone through numerous varieties of
voting formats.  Some seem better than others. I'm not sure how
the final forms are "approved".

In this recent election we used felt-tip markers!!!  The ink soaked through
to the back side of the paper but when my wife mentioned it, the "judges"
said that it had been checked and "did not interfere with the markings on
the other side".
 But do we know what happens if there is a SMEAR of the wet ink?  Does THAT
BALLOT COUNT, or is it rejected?  If I were running for election in our
county and the
voting was close, then I certainly would ask for a "hand" recount to find
out how many
votes were rejected by the scan machine because of "smear" or because the
wet ink
soaked through the paper (probably cheap paper) and was "sensed" on the
back.

 Perhaps there should be a research project designed by a TASK FORCE of some
ASA
members to evaluate the many different forms to find out which form(s)
MINIMIZE INACCURACY
OF RESPONSE.  It is likely that  such research has been done since it
is such an important activity.  The studies should consider age, education,
language and other
variables.

-- Joe


Joe Ward.Health Careers High School
167 East Arrowhead Dr4646 Hamilton Wolfe
San Antonio, TX 78228-2402...San Antonio, TX 78229
Phone: 210-433-6575...Phone:  210-617-5400
Fax: 210-433-2828Fax: 210-617-5423
Email: [EMAIL PROTECTED]
http://www.ijoa.org/joeward/wardindex.html
***




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: 2 factor ANOVA with empty cells

2000-11-01 Thread Joe Ward

Right you are, Elliot.

However, when one finds "no-interaction" among all of those cells
that are present, then one can feel "better" about estimating
the "missing" cell values.  Of course, there could be a surprising
explosion!! The more interaction that is detected the more dangerous it can
be.

When there is little or no interaction it is possible to design the study
to save money and time.  There is no need to fill in  all the cells all the
time -- particularly when the cost is great.

The real experimental design "experts" can get lots of information from
a small study that might have missing cells "strategically located".

- Joe

****

Joe Ward.Health Careers High School
167 East Arrowhead Dr4646 Hamilton Wolfe
San Antonio, TX 78228-2402...San Antonio, TX 78229
Phone: 210-433-6575...Phone:  210-617-5400
Fax: 210-433-2828Fax: 210-617-5423
Email: [EMAIL PROTECTED]
http://www.ijoa.org/joeward/wardindex.html
***


- Original Message -
From: "Elliot Cramer" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, November 01, 2000 8:43 PM
Subject: Re: 2 factor ANOVA with empty cells


> Jeff E. Houlahan <[EMAIL PROTECTED]> wrote:
> : Is it ever appropriate to do a 2-factor unreplicated ANOVA with
> : empty cells if you aren't sure there is no interaction between the
> ^
> you can test the part of the interaction that is testable, but of course
> you can never know about the rest.
>
>
>
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =
>




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



[ap-stat] Independent-Dependent Variable Discussion--Inverse Estimation

2000-10-20 Thread Joe Ward

Hi Dan and all --

I had intended to comment about the independent-dependent variable
discussion
earlier but I got side-tracked.  Since Dan reminded us with his comment:

"> This problem statement also brings back the independent-dependent
variable
> discussion.  In the real context, the activity level of the crickets
depends
> upon the temperature, so temperature is the independent variable and
number
> of chirps the dependent variable.  However, if you want to predict the
> temperature using the number of chirps, you must consider the number of
> chirps as the "independent" variable and temperature as the "dependent"
> variable."


I have inserted some comments below:

===  Joe Ward writes ==
In the ancient past (1950s), for calibration studies --

Let
Y be a reading from a measuring instrument, SUBJECT TO "ERRORS OF
MEASUREMENT".
and
X be a KNOWN STANDARD, ASSUMED TO BE "WITHOUT ERROR" (FIXED).

Then the least-squares regression model used to PREDICT THE "STANDARD" (X)
from the measurement Y  WAS computed as:

Y = b0  + b1*X + Error

Then from this equation to estimate (predict) the KNOWN STANDARD (X) from
the measurement (Y), the past procedure was to solve  for X in the above
equation
(leaving off the Error)

Y = b0  + b1*X

or

X = (Y-b0)/b1

is used to PREDICT X from Y.

Dan,  you probably are better acquainted with the most recent approach from
the Bureau of Standards since I have not kept up with any changes in the
Standards calibration policy.

Furthermore, in the distant past, it is interesting to note that
simultaneous regression equations were solved to estimate  unkown amounts of
chemical compositions in a solution.

An interesting study by Fisher, Hans, R.G. Hansen, and H.W. Norton (1955).
Quantitative determination of  glucose and galactose. Anal. Chem. 27,
857-859. is discussed in
E.J Williams' book Regression Analysis, Wiley, 1959, page 163.  Williams
refers to this topic as INVERSE ESTIMATION.

Even though the goal is to ESTIMATE (PREDICT) the values of X,  the
dependent variables (Y's) are the MEASURES SUBJECT TO ERROR.  After the
least-squares solutions are computed then the simultaneous regression
equations are solved, INVERSELY, for unknown X values from
measured(observed) values of Y (which are subject to ERRORS).

It would be interesting to know if this approach is still used.  Is the
INVERSE method BETTER? Have there been recent studies comparing the REGULAR
approach with the
INVERSE approach?

Comments from experienced "experts" in this area are welcome.

-- Joe



Joe Ward.Health Careers High School
167 East Arrowhead Dr4646 Hamilton Wolfe
San Antonio, TX 78228-2402...San Antonio, TX 78229
Phone: 210-433-6575...Phone:  210-617-5400
Fax: 210-433-2828Fax: 210-617-5423
Email: [EMAIL PROTECTED]
http://www.ijoa.org/joeward/wardindex.html
***
===  End of Joe Ward's message  =


- Original Message -
From: "Teague, Dan" <[EMAIL PROTECTED]>
To: "AP Statistics" <[EMAIL PROTECTED]>
Sent: Friday, October 20, 2000 10:42 AM
Subject: [ap-stat] RE: effect on LSRL


> Rebecca,
>
> If your student chose values of the independent variable that were very
> large (250-450) and found the y-values that correspond to these x-values
> using y = 56.212 + 0.1356x, then he could increase the slope.  For these
> data, the point (249, 55) is below that portion of the regression line on
> the left.  The regression line would be pulled towards the point, just as
> you said, but in this situation, it would cause the slope to increase.
>
> The student's argument is flawed to the extent that these values of the
> independent variable do not match the summary statistics (xbar = 167 and s
=
> 31).  We expect to find the number of chirps between 70 and 290 and the
> temperature roughly between 50 and 100.  For these values of x, the slope
> will be pulled down by the addition of this point.
>
> This problem statement also brings back the independent-dependent variable
> discussion.  In the real context, the activity level of the crickets
depends
> upon the temperature, so temperature is the independent variable and
number
> of chirps the dependent variable.  However, if you want to predict the
> temperature using the number of chirps, you must consider the number of
> chirps as the "independent" variable and temperature as the "dependent"
> variable.
>
>
> Daniel J. Teague
> NC School of Science and Mathematics
> 1219 Broad Street
> Durham, NC  27705
> [EMAIL PROTECTED]
>
&g

Independent-Dependent Variable Discussion--Inverse Estimation

2000-10-20 Thread Joe Ward

Hi Dan and all --

I had intended to comment about the independent-dependent variable
discussion
earlier but I got side-tracked.  Since Dan reminded us with his comment:

"> This problem statement also brings back the independent-dependent
variable
> discussion.  In the real context, the activity level of the crickets
depends
> upon the temperature, so temperature is the independent variable and
number
> of chirps the dependent variable.  However, if you want to predict the
> temperature using the number of chirps, you must consider the number of
> chirps as the "independent" variable and temperature as the "dependent"
> variable."


I have inserted some comments below:

===  Joe Ward writes ==
In the ancient past (1950s), for calibration studies --

Let
Y be a reading from a measuring instrument, SUBJECT TO "ERRORS OF
MEASUREMENT".
and
X be a KNOWN STANDARD, ASSUMED TO BE "WITHOUT ERROR" (FIXED).

Then the least-squares regression model used to PREDICT THE "STANDARD" (X)
from the measurement Y  WAS computed as:

Y = b0  + b1*X + Error

Then from this equation to estimate (predict) the KNOWN STANDARD (X) from
the measurement (Y), the past procedure was to solve  for X in the above
equation
(leaving off the Error)

Y = b0  + b1*X

or

X = (Y-b0)/b1

is used to PREDICT X from Y.

Dan,  you probably are better acquainted with the most recent approach from
the Bureau of Standards since I have not kept up with any changes in the
Standards calibration policy.

Furthermore, in the distant past, it is interesting to note that
simultaneous regression equations were solved to estimate  unkown amounts of
chemical compositions in a solution.

An interesting study by Fisher, Hans, R.G. Hansen, and H.W. Norton (1955).
Quantitative determination of  glucose and galactose. Anal. Chem. 27,
857-859. is discussed in
E.J Williams' book Regression Analysis, Wiley, 1959, page 163.  Williams
refers to this topic as INVERSE ESTIMATION.

Even though the goal is to ESTIMATE (PREDICT) the values of X,  the
dependent variables (Y's) are the MEASURES SUBJECT TO ERROR.  After the
least-squares solutions are computed then the simultaneous regression
equations are solved, INVERSELY, for unknown X values from
measured(observed) values of Y (which are subject to ERRORS).

It would be interesting to know if this approach is still used.  Is the
INVERSE method BETTER? Have there been recent studies comparing the REGULAR
approach with the
INVERSE approach?

Comments from experienced "experts" in this area are welcome.

-- Joe



Joe Ward.Health Careers High School
167 East Arrowhead Dr4646 Hamilton Wolfe
San Antonio, TX 78228-2402...San Antonio, TX 78229
Phone: 210-433-6575...Phone:  210-617-5400
Fax: 210-433-2828Fax: 210-617-5423
Email: [EMAIL PROTECTED]
http://www.ijoa.org/joeward/wardindex.html
***
===  End of Joe Ward's message  =


- Original Message -
From: "Teague, Dan" <[EMAIL PROTECTED]>
To: "AP Statistics" <[EMAIL PROTECTED]>
Sent: Friday, October 20, 2000 10:42 AM
Subject: [ap-stat] RE: effect on LSRL


> Rebecca,
>
> If your student chose values of the independent variable that were very
> large (250-450) and found the y-values that correspond to these x-values
> using y = 56.212 + 0.1356x, then he could increase the slope.  For these
> data, the point (249, 55) is below that portion of the regression line on
> the left.  The regression line would be pulled towards the point, just as
> you said, but in this situation, it would cause the slope to increase.
>
> The student's argument is flawed to the extent that these values of the
> independent variable do not match the summary statistics (xbar = 167 and s
=
> 31).  We expect to find the number of chirps between 70 and 290 and the
> temperature roughly between 50 and 100.  For these values of x, the slope
> will be pulled down by the addition of this point.
>
> This problem statement also brings back the independent-dependent variable
> discussion.  In the real context, the activity level of the crickets
depends
> upon the temperature, so temperature is the independent variable and
number
> of chirps the dependent variable.  However, if you want to predict the
> temperature using the number of chirps, you must consider the number of
> chirps as the "independent" variable and temperature as the "dependent"
> variable.
>
>
> Daniel J. Teague
> NC School of Science and Mathematics
> 1219 Broad Street
> Durham, NC  27705
> [EMAIL PROTECTED]
>
&g

Re: How to Pool Slopes

2000-10-08 Thread Joe Ward

Hi, Stan --
I've inserted a reply at the end of your message. Let me know
how things turn out.
-- Joe

****
Joe Ward.Health Careers High School
167 East Arrowhead Dr4646 Hamilton Wolfe
San Antonio, TX 78228-2402...San Antonio, TX 78229
Phone: 210-433-6575...Phone:  210-617-5400
Fax: 210-433-2828Fax: 210-617-5423
Email: [EMAIL PROTECTED]
http://www.ijoa.org/joeward/wardindex.html
***

- Original Message -
From: "Stanley110" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Sunday, October 08, 2000 1:59 PM
Subject: Q: How to Pool Slopes


> Assume I have three sets of x,y data. I fit each by least-squares to a
straight
> line. I determine that the three fitted lines are homogeneous and
> indistinguishable at a certain significance level. I want to express the
slope
> (of the three) as a single point estimate and as a confidence interval.
What is
> the formula for doing this?
>
> Please reply to this newsgroup and to the writer at <[EMAIL PROTECTED]>.
>
> Thank you for your help.
>
> stan alekman
>
>
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =


==  JOE WARD REPLIES  ===
Hi, Stan --

Your Title says
(1)"How to Pool Slopes" and you indicate later that
(2)"I determine that the three fitted lines are homogeneous and
indistinguishable.
   For (1) it sounds like you will want THREE DIFFERENT INTERCEPTS, but
for case (2) it sounds like you may want only ONE INTERCEPT.

This is good example of the use of the Regression Option of "NO INT" option
in SAS
or "Y-intercept = zero". The reason that this appears to be a difficult
problem is the use of the frequently-used DEFAULT option in most statistics
packages.
The approach used below for your THREE GROUP DATA is  shown for TWO groups
of data
in the Prentice-Hall published book (1973) -- "Introduction to Linear
Models" by
Ward and Jennings. Chapter 8, page 143.

I don't know which Regression Software you are using, but you should be sure
to
FORCE THE Y-intercept THROUGH THE ORIGIN..

First, it is important to put ALL THREE SETS OF DATA in the same model.

Let Y = dependent variable (containing ALL THREE SETS OF DATA)
D1 = 1 if the corresponding element of Y is from DATA SET #1; 0 otherwise
D2 = 1 if the corresponding element of Y is from DATA SET #2; 0 otherwise
D3 = 1 if the corresponding element of Y is from DATA SET #3; 0 otherwise
X1 = Value of x if the corresponding element of Y is from DATA SET #1; 0
otherwise
X2 = Value of x if the corresponding element of Y is from DATA SET #2; 0
otherwise
X3 = Value of x if the corresponding element of Y is from DATA SET #3; 0
otherwise
X = Value of x for ALL corresponding elements of Y.
U =  1 for every element.

Then your ASSUMED MODEL is shown below: (this should give you the same
regression
coefficients that you already have computed -- a check that your new model
is
correct)

Y = a1*D1 + b1*X1 + a2*D2 + b2*X2 + a3*D3 + b3*X3 + E1 (Model #1)

After you have computed this ASSUMED MODEL you may want to TEST THE
HYPOTHESIS
that you imply in CASE (1) above, that the
THREE SLOPES ARE EQUAL, i.e., b1=b2= b3=bc (THE COMMON SLOPE)

Then substituting these restrictions into Model #1 produces  the RESTRICTED
MODEL
FOR CASE (1):

Y = a1*D1 + bc*X1 + a2*D2 + bc*X2 + a3*D3 + bc*X3 + E2 (Model #2)

Factoring (or collecting terms) produces:

Y = a1*D1 + a2*D2 + a3*D3 + bc*X + E2 (Model #2)
(Note that the values of a1, a2, and a3 in Model #2 are NOT numerically
equal to
the values in Model #1)

>From Model #2, bc is the least-squares SINGLE POINT estimate of the COMMON
SLOPE.

Your favorite Regression procedure should give what you need to compute a
confidence interval (such as the standard error of bc).

Now for CASE (2) above you may want to test that:
THREE SLOPES ARE EQUAL, i.e., b1=b2= b3=bc ( THE COMMON SLOPE)
 and
THREE INTERCEPTS ARE EQUAL, i.e., a1=a2=a3=ac (THE COMMON INTERCEPT)

In which case, the RESTRICTED MODEL becomes:
Y = ac*D1 + bc*X1 + ac*D2 + bc*X2 + ac*D3 + bc*X3 + E3 (Model #3)

Factoring (or collecting terms) produces:

Y = ac*U + bc*X + E3 (Model #3)
(Note that the value of bc in Model #3 is NOT numerically equal to the value
in
Model #2)

And, as before, your favorite Regression procedure should give what you need
to
compute a confidence interval (such as the standard error of bc). Let me
k

Re: How many Olympic Medals should Great Britain have won?

2000-10-03 Thread Joe Ward



Hi, Paige --
 
Good comments about "There are so many 
different factors..."
 
 
"To say that half the observations should 
have positive errors and halfshould have negative errors is to confuse 
median with mean." 
  I used the word ABOUT intentionally to
distinguish from EXACTLY.
 
--Joe
 
- Original Message - 
From: "Paige Miller" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, October 03, 2000 10:19 
AM
Subject: Re: How many Olympic Medals should 
Great Britain have won?

> > Hi, Graham --> >  
> > It's been a long time since I've heard any discussion 
about> > UNDERACHIEVERS and OVERACHIEVERS.  I've never been able 
to understand> > the discussions.> >  > > NO 
MATTER WHAT VALUE THE CORRELATION (SLOPE OF THE REGRESSION LINE) HAS we> 
> know that the ALGEBRAIC SUM OF THE ERRORS IS ZERO.  Now that says 
that> > the SUM OF THE ABSOLUTE VALUES OF THE POSITIVE ERRORS IS EQUAL 
TO THE> > SUM OF THE ABSOLUTE VALUES OF THE NEGATIVE ERRORS.  
THEN WE WOULD EXPECT> > TO OBSERVE ABOUT ONE-HALF OF THE OBSERVATIONS TO HAVE 
POSITIVE ERRORS AND> > ONE-HALF TO HAVE NEGATIVE VALUES.  
> >  > > THEREFORE, FOR ALL CORRELATIONS (ZERO 
INCLUDED) WE SHOULD EXPECT TO> > CONCLUDE THAT ABOUT ONE-HALF OF ALL CASES> > WOULD BE 
CALLED "OVER-ACHIEVERS" AND ABOUT 
ONE-HALF WOULD BE CALLED> > "UNDER-ACHIEVERS".  
DOES THAT DESIGNATION HAVE ANY OPERATIONALLY USEFUL> > MEANING?  

 
 
Paige writes 
 
> There are so many different factors 
that go into the amount of medals> won that it seems silly to perform a 
regression based upon population> and GDP to use as predictors. 
Organization of Olympic Committees,> training facility quality, programs 
for youths, weather, etc. all can> affect the number of medals won, and 
then there is the factor of> injuries, which to me seems like it cannot 
be modelled except as> random noise. > > To say that half 
the observations should have positive errors and half> should have 
negative errors is to confuse median with mean. > > -- > 
Paige Miller> Eastman Kodak Company> [EMAIL PROTECTED]> 
> "It's nothing until I call it!" -- Bill Klem, NL Umpire> "Those 
black-eyed peas tasted all right to me" -- Dixie Chicks> > 
> 
=> 
Instructions for joining and leaving this list and remarks about> the 
problem of INAPPROPRIATE MESSAGES are available at> 
  
http://jse.stat.ncsu.edu/> 
=> 



Re: How many Olympic Medals should Great Britain have won?

2000-10-02 Thread Joe Ward



Hi, Graham --
 
It's been a long time since I've heard any 
discussion about
UNDERACHIEVERS and OVERACHIEVERS.  
I've never been able to understand
the discussions.
 
NO MATTER WHAT VALUE THE CORRELATION (SLOPE 
OF THE REGRESSION LINE) HAS we
know that the ALGEBRAIC SUM OF THE ERRORS 
IS ZERO.  Now that says that
the SUM OF THE ABSOLUTE VALUES OF THE 
POSITIVE ERRORS IS EQUAL TO THE
SUM OF THE ABSOLUTE VALUES OF THE NEGATIVE 
ERRORS.  THEN WE WOULD EXPECT
TO OBSERVE ABOUT ONE-HALF OF THE 
OBSERVATIONS TO HAVE POSITIVE ERRORS AND
ONE-HALF TO HAVE NEGATIVE VALUES.  

 
THEREFORE, FOR ALL CORRELATIONS (ZERO 
INCLUDED) WE SHOULD EXPECT TO
CONCLUDE THAT ABOUT ONE-HALF OF ALL 
CASES
WOULD BE CALLED "OVER-ACHIEVERS" AND ABOUT 
ONE-HALF WOULD BE CALLED
"UNDER-ACHIEVERS".  DOES THAT 
DESIGNATION HAVE ANY OPERATIONALLY USEFUL
MEANING?  
 
--Joe
********Joe 
Ward.Health Careers High School167 
East Arrowhead Dr4646 Hamilton 
Wolfe  
San Antonio, TX 78228-2402...San Antonio, TX 78229Phone: 
210-433-6575...Phone:  210-617-5400Fax: 
210-433-2828Fax: 210-617-5423Email: [EMAIL PROTECTED]http://www.ijoa.org/joeward/wardindex.html***

  - Original Message - 
  From: 
  Dr Graham D Smith 
  To: Edstat 
  Sent: Monday, October 02, 2000 11:40 
  AM
  Subject: How many Olympic Medals should 
  Great Britain have won?
  
  
  How many Olympic Medals should Great Britain have 
  won?
  British Olympians won a grand total of 28 medals at the Sydney 2000 Games, 
  our best medal haul for 80 years. Many commentators have suggested that the 
  big improvement in British fortunes compared to the Atlanta 1996 Games is due 
  to the use of Lottery funding to help our top sportsmen and sportswomen. But 
  how many medals should Britain expect to win? Did we fulfil our potential or 
  fall short of it?
  One important determinant of a country's Olympic success is the size of its 
  population. USA, China and Russia head the Sydney 2000 medal table, they also 
  have large populations. However, population size does not fully account for 
  the number of medals won. Both India and China have much larger populations 
  than USA but won fewer medals. Another important predictor of a nation's 
  Olympic performance is economic prosperity. Richer nations often outperform 
  poorer nations of the same size. Gross domestic product (GDP) is an economic 
  index that reflects both economic success and population size.
  A scatterplot of the number of medals won and GDP of the 80 medal winning 
  countries at the 2000 Olympics shows a positive correlation; r = 0.595, 
  p < 0.01 (see attached). GDP accounts for 35.4% of the variance of 
  medals won. A regression analysis was performed on the data to estimate the 
  number of medals Team GB should expect. Given that the UK GDP is equivalent to 
  US$ 1.29 trillion the expected number of medals is 15. It seems that our 
  Olympians did far better than we could have expected. Well done team GB!
  And well done too to Team USA, their expected medal count is 26.5. However, 
  the top overachiever was Russia (followed by USA and Australia). The top 
  underachiever was India.
   
  *Dr Graham 
  D. SmithPsychology DivisionPark CampusUniversity College 
  NorthamptonBoughton Green Rd.NorthamptonNN2 7AL
  Tel: +44 (0) 1604 735500 Ext 2393E-mail: [EMAIL PROTECTED]*
   
   
   
  *Dr 
  Graham D. SmithPsychology DivisionPark CampusUniversity College 
  NorthamptonBoughton Green Rd.NorthamptonNN2 7AL
   
  Tel: +44 (0) 1604 735500 Ext 2393E-mail: [EMAIL PROTECTED]*


Re: What is today's Hogg & Craig?

2000-09-23 Thread Joe Ward

Hi, Gary, Jerry et al --
Here is a message from Bob Hogg.

-- Joe
- Original Message -
From: "Robert V. Hogg" <[EMAIL PROTECTED]>
To: "Joe Ward" <[EMAIL PROTECTED]>
Sent: Friday, September 22, 2000 9:19 AM
Subject: Re: Fw: What is today's Hogg & Craig?


> joe,  HOGG AND TANIS is used more for undergrads.COSELLA AND BERGER
for
> first year grad students in stat.HOGG AND CRAIG for good seniors and
> first year grad students in other areas[like actuarial sci].   bob
>
>
>
> At 11:24 PM 9/21/00 -0500, Joe Ward wrote:
> >Bob --
> >
> >Any suggestions for Jerry?
> >
> >-- Joe
>
>***
*
> >
> >Joe Ward.Health Careers High
School
> >167 East Arrowhead Dr4646 Hamilton Wolfe
> >San Antonio, TX 78228-2402...San Antonio, TX 78229
> >Phone: 210-433-6575...Phone:  210-617-5400
> >Fax: 210-433-2828Fax: 210-617-5423
> >Email: [EMAIL PROTECTED]
> >http://www.ijoa.org/joeward/wardindex.html
>
>***
> >- Original Message -
> >From: "Jerry Dallal" <[EMAIL PROTECTED]>
> >To: <[EMAIL PROTECTED]>
> >Sent: Thursday, September 21, 2000 9:32 PM
> >Subject: What is today's Hogg & Craig?
> >
> >
> >> Back in the "old days", the standard text for an undergraduate math
stat
> >> course was Hogg & Craig.  I had some fondness for Lindgren.  I haven't
> >> taught this course in nearly 20 years.  Which texts occupy their
position
> >> today?
> >>
> >> Thanks.
> >>

- Original Message -
From: "Gary McClelland" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, September 22, 2000 11:49 AM
Subject: Re: What is today's Hogg & Craig?


> in article [EMAIL PROTECTED], Jerry Dallal at [EMAIL PROTECTED]
> wrote on 9/21/00 8:32 PM:
>
> > Back in the "old days", the standard text for an undergraduate math stat
> > course was Hogg & Craig.  I had some fondness for Lindgren.  I haven't
> > taught this course in nearly 20 years.  Which texts occupy their
position
> > today?
> >
> > Thanks.
>
> According to amazon.com, the 1994 5th edition is still in print.
> I keep my much earlier edition closely guarded.  But I too would be
> interested in hearing what the kids learn with today.
>
> gary
> --
> [EMAIL PROTECTED]
>
>
>
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =
>






=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: cluster

2000-09-22 Thread Joe Ward

Hi, Thomas --

If you have a SAS Manual the McQuitty method is described briefly in the
CLUSTER Chapter.

Also,  I think the original article is:

McQuitty, L.L. (1966) "Similarity Analysis by Reciprocal Pairs for Discrete
and Continuous Data"
Ed and Psy Meas, 17, 207-229.

Look at:

Anderberg, M.R. (1973) "Cluster Analysis for Applications"  New York,
Academic Press.

--- Joe
********

Joe Ward.Health Careers High School
167 East Arrowhead Dr4646 Hamilton Wolfe
San Antonio, TX 78228-2402...San Antonio, TX 78229
Phone: 210-433-6575...Phone:  210-617-5400
Fax: 210-433-2828Fax: 210-617-5423
Email: [EMAIL PROTECTED]
http://www.ijoa.org/joeward/wardindex.html
***

- Original Message -
From: "Thomas Pesl" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, September 22, 2000 4:19 AM
Subject: cluster


> Does anyone know the formula of the McQuitty clustering method?
>
> Thanks,
> Thomas
>
>
>
>
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =
>




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Statistics for Visually Impaired

2000-09-18 Thread Joe Ward

Those of you who are teaching statistics to visually impaired (blind)
students may find
some helpful ideas from Bob Bottenberg's comments to Jay Thomas, included at
the end of this message.

Bob received his Ph.D. from Stanford after he was blinded in WWII.  He
developed a strong  statistics background from courses with Z.W. Birnbaum,
Al Bowker, Meyer Gershick and George Polya and an unusual memory for
everything he has HEARD.

 I have had the pleasure to work with Bob for many years and he can be an
inspiration to anyone with whom he associates - blind or with full vision.
Now that he is retired from his work as a civilian researcher for the U.S.
Air Force, Bob is getting into the internet action.
Bob would be happy to share any of his procedures for hearing and reading
about stat concepts at  [EMAIL PROTECTED]

Bob and I wrote a 140 page document on "Applied Multiple Linear Regression"
in 1963 in order to bring the combined power of Regression/Linear models and
Computers to the researchers with whom we worked.  The reference is
Bottenberg, R.A. and Ward, J.H. "Applied Multiple Linear Regression",
PRL-TDR-63-6, AD-413- 128 -- originally available from the Clearinghouse for
Federal Scientific and Technical Information, Dept. of Commerce, Wash. D.C.
A few of the
"old-timers" who are lurking on the internet occasionally mention having a
copy.  The approach was expanded in 1973 in the Prentice-Hall-published book
"Introduction to Linear Models" by Ward and Jennings.

--
JAY THOMAS WRITES:

From: "Thomas, Jay" <[EMAIL PROTECTED]>
To: "'Earl Jennings'" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Tuesday, September 05, 2000 11:27 AM
Subject: RE: Visually impaired students

 Dr. Jennings, et al,
>
> Thanks very much for getting my inquiry to Dr. Bottenberg, and of course
to
> Dr. Bottenberg for his detailed reply. Several people have given
> suggestions, none as extensive as these were. I hope to compile the
> suggestions after the chaos of the first week or two of school and send
them
> out.
>
> Incidentally, I was reading a history of statistics over the summer (I
lead
> an exciting life) and learned that one of the early important figures in
the
> field was Nicholas Saunderson, who held the Lucasian Chair at Oxford after
> Newton and was blind from the age of 12 months. Oddly, one of his major
> mathematical contributions was in the field of optics.
> Again, thanks for your advice.
>
> Jay Thomas
---
JAY THOMAS'  MESSAGE RECEIVED BY PAUL KELLEY

-
Delivered-To: [EMAIL PROTECTED]
Date: Tue, 29 Aug 2000 14:15:37 -0700
Reply-To: "Thomas, Jay" <[EMAIL PROTECTED]>
Sender: APA Division 5 Members <[EMAIL PROTECTED]>
From: "Thomas, Jay" <[EMAIL PROTECTED]>
Subject: [APA] visually impaired statistics students
To: [EMAIL PROTECTED]

I have a couple of visually impaired students in my upcoming basic
statistics course this fall. I normally stress visualization and drawing
sketches to understand statistics, but expect that tactic won't work with
these students. Has anyone found effective ways of presenting statistical
concepts to blind students?

Jay Thomas
-
BOB BOTTENBERG REPLIES TO JAY THOMAS

 Hi Jay,

Joe Ward passed on to me a note you sent about techniques for teaching
statistics to
visually impaired students.  I've been totally blind since 1945, and took
some
undergraduate statistics courses in the psychology department at the U. of
Missouri in
the late 40s.  Then at Stanford in 1952-1953, I enrolled in five or six
courses in
probability and mathematical statistics.  This background is offered by way
of apology
for not having many suggestions for teaching in a contemporary environment.
Graphs,
charts and figures have always been troublesome, and, as I recall (45 years
back), I
absorbed that material in a quite tedious way.  A reader, outside of a
classroom
setting, would describe a graph by saying the names of the axis, horizontal,
vertical.
Then indicating in a very general way the path of the line from left to
right, have
first provided a word or two about the units on each axis -- lower and
upper.  Then, the
really slow part -- pick a point on the line and read the approximate
coordinates.  Do
that for a few points, and the mental picture of the graph would begin to
emerge.  Of
course, the pace of classroom activity makes it impractical to do anything
like that
there.  Charts were handled in a similar manner -- the reader reads the
column headers,
then the row headers, then reads a row at a time, or a column at a time.  Of
course, the
real 

Re: How can I analyze split-design by SPSS v9.0?

2000-09-06 Thread Joe Ward

Anuvat --

Here comes my "standard" comment!

1.  State your research question(s) in "natural language".
2.  Create a model that enables you to answer the "natural language"
questions that YOU WANT.
3.  Impose restrictions on YOUR MODEL that answers YOUR questions of
interest.
4.  Use the computer to get YOUR DESIRED RESULTS.

Then AFTER YOU HAVE VERIFIED THAT THERE EXISTS A "PACKAGED" ALGORITHM THAT
ANSWERS YOUR QUESTIONS OF INTEREST, THEN USE THE "PACKAGED" ALGORITHM.

Since many "interesting" research questions involve creating models for
unique problems, it can be more efficient to create your OWN MODELS rather
than searching for "packaged" algorithms that MAY fit YOUR research
questions of interest.  IMHO  it seems best to take time to develop
"model-creation" skills so that you can have the POWER that is available.

If you have time to take a look at the URL below, Slides 7 and 8 of the
PowerPoint presentation on "Using Calculators and Computers in Statistics" -
Laura Niland & Joe Ward, CAMT98 45th Annual Conference, San Antonio, July
23, 1998 - give a pictorial view of "Forcing" vs. "Creating" Models.

Good luck--

Joe


Joe Ward.Health Careers High School
167 East Arrowhead Dr4646 Hamilton Wolfe
San Antonio, TX 78228-2402...San Antonio, TX 78229
Phone: 210-433-6575...Phone:  210-617-5400
Fax: 210-433-2828Fax: 210-617-5423
Email: [EMAIL PROTECTED]
http://www.ijoa.org/joeward/wardindex.html
***
- Original Message -
From: "Anuvat Jangchud" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, September 06, 2000 10:32 PM
Subject: How can I analyze split-design by SPSS v9.0?


> I would like to use SPSS v.9.0 for SPLIT Design anlysis.  Could you help
me
> out?
>
>
>
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =
>




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Regression books

2000-08-05 Thread Joe Ward

Copies of

INTRODUCTION TO LINEAR MODELS by Ward and Jennings is
available by contacting:

Dr. Jimmy Mitchell
The Institute for Job and Occupational Analysis (IJOA)
10010 San Pedro, Suite 440, San Antonio, Texas 78216
(210) 349-8525   Fax: (210) 349-0168
[EMAIL PROTECTED]



Bottenberg, R.A., & Ward, J.H., Jr. (1963, March). Applied multiple linear
regression.
PRL-TDR-63-6, AD-413 128
Lackland AFB, TX: 6570th Personnel Research Laboratory, Aerospace Medical
Division.

This might be available from:

National Technical Information Service
Technology Administration
U.S. Department of Commerce
Springfield, VA 22161
703-605-6000
Email:  www.ntis.gov


- Original Message -
From: "Christopher Tong" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Saturday, August 05, 2000 5:33 PM
Subject: Regression books


> I posted my request for recommended regression books a couple
> weeks ago, and I appreciate everyone who has replied,
> both on the newsgroup and privately.
> For those interested, here is a summary of the recommendations.
>
> The most popularly recommended books are Draper & Smith
> and Cohen & Cohen.  Honorable mention goes to Montgomery & Peck,
> Acton's out-of-print "Analysis of Straight Line Data", and the Sage Press
> monographs.
>
> The other books that were mentioned were
> Bottenberg & Ward (*)
> Daniel & Wood
> Darlington
> Edwards (*)
> Hamilton
> Judd & McClelland (*)
> Neter, et al.
> Pedhazur
> Rawlings
> Ward & Jennings (*)
>
> Nonlinear regression books that were recommended were
> Bard (*)
> Bates & Watts
> Seber & Wild
>
> Econometrics books with good coverage of regression were
> Greene
> Gujarati
> Pindyck & Rubinfield
>
> (*) = out of print, according to amazon.com
>
>
>
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =
>










=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Math Education of Mathematics Teachers

2000-08-01 Thread Joe Ward

Dick --

I'm staying 'til Friday to attend THAT SESSION.

The discussions should be of interest to secondary teachers in the
Indianapolis
area.  It would be great if arrangements could be made for teachers to
attend
THAT session without needing to register for the JSM.

I think it is Session 281, Thursday, Aug. 17 10:30 a.m. - 12:30.

--  Joe

****
****
Joe Ward.Health Careers High School
167 East Arrowhead Dr4646 Hamilton Wolfe
San Antonio, TX 78228-2402...San Antonio, TX 78229
Phone: 210-433-6575...Phone:  210-617-5400
Fax: 210-433-2828Fax: 210-617-5423
Email: [EMAIL PROTECTED]
http://www.ijoa.org/joeward/wardindex.html
***


- Original Message -
From: "Richard L. Scheaffer" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Tuesday, August 01, 2000 1:22 PM
Subject: Math Education of Mathematics Teachers


> I would like to call your attention to a session at the Joint Statistics
> Meetings that those of you interested in statistics education might have
> overlooked.  Session 279, The Importance of Statistics in the Education of
> Future Teachers reports on a project of the Conference Board of the
> Mathematical Sciences, funded by NSF an DoEd, that will attempt to get
> departments of mathematical sciences more involved in the education of
future
> teachers.  Teachers coming out of colleges of education are ill equipped
to
> teach in the modern math curriculum - a curriculum that includes much
> statistics.  This project makes a series of recommendations on how to
solve
> this problem.  Among the recommendations are strong statements about the
> importance of statistics.
>
> The panel consists of Alan Tucker, mathematician and lead writer of the
CBMS
> report, Judy Sowder, math educator responsible for the middle school
section
> of the report, Gail Burrill, former president of NCTM and now head of the
Math
> Sciences Education Board at the NAS, and Jerry Moreno, a well-known
statistics
> educator.
>
> Unfortunately, this session is in the last time slot of the meeting, 10:30
> Thursday morning.  So, I hope some of you will have the time and interest
to
> stop by.  It should be a lively discussion of a very important topic.
>
> Hope to see you there!
>
> Dick Scheaffe
>
>
>
> ps  A draft of the report is on the web.
>
> CBMS Math Education of Teachers Project Draft Report on the Web
> 
>
>
> --
> Richard L. Scheaffer   [EMAIL PROTECTED]
> Department of Statistics phone 352-392-1941 (#224)
> Box 118545 fax 352-392-5175
> University of Florida
> Gainesville, FL 32611
>
> 907 NW 21 Terrace 352-378-1996
> Gainesville, FL  32603
>
>
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =
>




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: regression books?

2000-07-22 Thread Joe Ward

If you are near a university library you may want to take a look
at INTRODUCTION TO LINEAR MODELS by Ward and Jennings.

The Purdue library might have a copy.

Also, the Fountain-Ward JSE article shown at the URL below is related to
your interest.

http://www.ijoa.org/joeward/wardindex.html

http://www.amstat.org/publications/jse/v4n3/ward.html


-- Joe



Joe Ward.Health Careers High School
167 East Arrowhead Dr4646 Hamilton Wolfe
San Antonio, TX 78228-2402...San Antonio, TX 78229
Phone: 210-433-6575...Phone:  210-617-5400
Fax: 210-433-2828Fax: 210-617-5423
Email: [EMAIL PROTECTED]
http://www.ijoa.org/joeward/wardindex.html
***

- Original Message -
From: "Christopher Tong" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Saturday, July 22, 2000 2:12 PM
Subject: regression books?


>
> Does anyone have recommendations for introductory
> books on regression analysis?  I posted this question on
> sci.stat.math and got only one reply so far.
>
> I am currently using Neter, Kutner, Nachtsheim, and
> Wasserman, which I find unwieldy and not very concise.
> I have my eye on Montgomery & Peck, but am wondering what anyone
> else would recommend.  My one reply so far suggested Cohen & Cohen.
>
>
>
>
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =
>




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: bump hunting in nonlinear regression

2000-07-18 Thread Joe Ward

Daniela--

Does "nonlinear" refer to a LINEAR MODEL of the form:

Y = a1*X1 + a2*X2 + a3*X3 + a4*X4  +... + ap*Xp + E

where

X1 = U - a predictor of all 1s.
X2 = X - any numerical predictor
X3 = X^2 -  the "squares" of the elements in X
X4 = X^3 - the "cubes" of the elements in X
etc.?

If this is the situation you can do wonderful things with
a general polynomial form.  You can use an nth degree
polynomial and impose retrictions that allow you much
flexibility about the shape of your curve.  For example,
you might choose to start with a 6th degree form and
then impose restrictions FOR THE RANGE OF INTEREST
ON THE X VARIABLE that allow you to use part of the
function that has ONE wiggle (hump), TWO wiggles (humps),
etc.  You can FORCE  any "undesired" wiggles (humps) to occur
OUTSIDE your RANGE OF INTEREST.

-- Joe
********

Joe Ward.Health Careers High School
167 East Arrowhead Dr4646 Hamilton Wolfe
San Antonio, TX 78228-2402...San Antonio, TX 78229
Phone: 210-433-6575...Phone:  210-617-5400
Fax: 210-433-2828Fax: 210-617-5423
Email: [EMAIL PROTECTED]
http://www.ijoa.org/joeward/watdindex.html
***



- Original Message -
From: "Daniela Ichim" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, July 18, 2000 12:17 PM
Subject: bump hunting in nonlinear regression


>
>
> In a nonlinear (univariate) regression problem, specifically
> a calibration problem in thermometrics, I have the problem of
> testing whether a curve expressing a relationship between
> Electrical Resistance and Temperature is monotone
> versus the possibility of it having bumps inverting the monotonicity.
>
> The problem of checking the existance of bumps becomes difficult
> especially
> in the regions of sparse data.
>
> I would like directions to the existing related statistical literature.
> Thanks.
>
>
>
>
>
>
>
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =
>




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: bivariate normality and correlation

2000-07-10 Thread Joe Ward

Hi, Znarf --

Every so often I find an occasion to include (SEE THE END OF THIS MESSAGE)
an earlier message from Mike  Palij related to the results of a study by
Jack Schmid about the RESTRICTION OF RANGE EFFECT ON THE
CORRELATION COEFFICIENT.

After many years of being around folks who were concerned about
RESTRICTION OF RANGE it became obvious to me that the correlation
coefficient should be used with EXTREME CAUTION.

-- Joe


Joe Ward.Health Careers High School
167 East Arrowhead Dr4646 Hamilton Wolfe
San Antonio, TX 78228-2402...San Antonio, TX 78229
Phone: 210-433-6575...Phone:  210-617-5400
Fax: 210-433-2828Fax: 210-617-5423
Email: [EMAIL PROTECTED]
http://www.ijoa.org/joeward/watdindex.html
***


- Original Message -
From: "Znarf Akfak" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Monday, July 10, 2000 2:41 AM
Subject: bivariate normality and correlation


> I'm considering reporting Pearson's correlation coefficient with a
> confidence interval for several bivariate associations.  As bivariate
> normality is assumed under the computation of the confidence interval,
> I have two questions.
>
> 1.  What is a good way to examine the assumption of bivariate normality
> for a given data set?
>
> 2.  To what extent are such confidence intervals robust to departures
> from bivariate normality?
>
> References to publications would be much appreciated as I don't have
> access to CIS, as would other suggestions and comments.
>
> Cheers,
>
> --
> Znarf
>
>
> Sent via Deja.com http://www.deja.com/
> Before you buy.
>
>
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>       http://jse.stat.ncsu.edu/
> =


==  INSERT BY JOE WARD OF MESSAGE FROM MIKE PALIJ
==


-- Forwarded message --
Date: Fri, 23 May 1997 09:30:20 -0400 (EDT)
From: Mike Palij <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED], [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: Testing basic statistical concepts

I'd like to thank Joe Ward for reminding us of this situation
(his posting is appended below), as well as jogging my own
memory for a previous posting I had made.  A while back I
had posted the Anscombe dataset (in the context of an SPSS
program) which also clearly shows the benefit of plotting
the data:  the four situations produce almost identical
Pearson r values but only one actually shows the classic
scatterplot, the others show a nonlinear pattern and the
influence that a single point has on the calculation of r.
What does the value of r tell us here?  Aren't the basic
statistical concepts to be learned in this situation far
more important and most clearly seen through a coordination
of the graphical and numerical information?

-Mike Palij/Psychology Dept/New York University

Joe H Ward <[EMAIL PROTECTED]> writes:
 To Mike et al --

 There have been several message related to the Simple Correlation
 Coefficient.  IMHO, when out in the "real world" involving practical
 decision-making the correlation coefficient has very limited value and
 sometimes dangerous consequences.  The correlation coefficient may be
 an important topic for the history of statistics to learn the problems
 associated with its use .

 Attached below is an item that I submitted a long time ago, and it may be
 of interest to those following the discussion of "r".

 -- Joe
 ***
 * Joe WardHealth Careers High School  *
 * 167 East Arrowhead Dr.4646 Hamilton Wolfe   *
 * San Antonio, TX 78228-2402San Antonio, TX 78229   *
 * Phone: 210-433-6575 Phone: 210-617-5400 *
 * [EMAIL PROTECTED]
 ***

 NON-RANDOM SAMPLING AND REGRESSION

-- PROVIDED (MANY YEARS AGO) BY
  JACK SCHMID, UNIV. OF NORTHERN COLORADO, GREELEY, COLORADO

 y from (MU=0, SIGMA = 1.25)
 x from (MU=0, SIGMA = 1.00)
 RHOxy = .60

 Sample 10,000 cases at each level of progressive TRUNCATION ON x.

 Regression equation:  y = bx + a
  _ _
 %Remaining   y x   sigmay sigmax  r=BETA   baSyx
 
 100%   .01   .021.25   1.00 .60 .75  -.01  1.00
  90%   

Re: Novice questions about regression analysis.

2000-06-28 Thread Joe Ward

Good comment, Paige--

"> A well-designed experiment will yield regression estimates with more
> desirable properties than a poorly-designed experiment will.
> Specifically, the parameter estimates may have smaller variance in a
> well-design experiment, and the parameters will be less correlated (or
> uncorrelated) with each other. The predicted values of the responses
> likewise will have smaller variance in a well-designed experiment."

However, it is safest to be sure that the "packaged" analyses do what
the researcher wants.Do many "packaged COVARIANCE algorithms" still
assume NO INTERACTION?  Does SAS (or other stat packages) warn us
 when there is a "missing cell" in an ANOVA-LIKE   GLM computation?

-- Joe
******
Joe Ward  Health Careers High School
167 East Arrowhead Dr.   4646 Hamilton Wolfe
San Antonio, TX 78228-2402   San Antonio, TX 78229
Phone: 210-433-6575   Phone:  210-617-5400
Fax: 210-433-2828Fax: 210-617-5423
Email: [EMAIL PROTECTED]
http://www.ijoa.org/joeward/watdindex.html
**





- Original Message -
From: "Paige Miller" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, June 28, 2000 11:08 AM
Subject: Re: Novice questions about regression analysis.


> Wen-Feng Hsiao wrote:
> >
> > Dear listers,
> >
> > I am stuck with the experiment design of my dissertation. My experiment
> > would like to investigate the influences of different factors of stimuli
> > on the subject's response (each factor is a continuous variable), and
> > further build a regression model for these relations. My questions are:
> >
> > 1. It seems that no experiment-design issues related to Regression
> > Analysis are discussed in the usual statistics textbook. Why? Does it
> > mean one needn't consider the experiment design if he uses Regression
> > Analysis to analyze his data?
>
> A well-designed experiment will yield regression estimates with more
> desirable properties than a poorly-designed experiment will.
> Specifically, the parameter estimates may have smaller variance in a
> well-design experiment, and the parameters will be less correlated (or
> uncorrelated) with each other. The predicted values of the responses
> likewise will have smaller variance in a well-designed experiment.
>
> > 2. Due to the measure of the dependent variable is the participants'
> > subjective responses, to remove unrelated subject-specific variables, I
> > am considering to employ a within-subject design. But there seems no
> > statistical packages ready for dealing with within-subject design of
> > Regression Analysis?
>
> SAS and JMP will perform these analyses, although the manual may not
> specifically call them 'within-subject' analyses. Other packages
> probably will handle them as well, but I cannot advise you of specifics.
>
> > Suppose a design in which each of the n subjects gives rise to a Y
> > observation under each of c different conditions, then a total of N=ncY
> > observations could be obtained. How can I use Regression Analysis to
> > analyze these observations?
>
> The model will predict the response Y as a function of the subject and
> each of the design variables, plus any desired interactions between
> design variables, interactions between subject and design variables, and
> polynomial terms (if desired) involving design variables.
>
>
> --
> Paige Miller
> Eastman Kodak Company
> [EMAIL PROTECTED]
>
> "It's nothing until I call it!" -- Bill Klem, NL Umpire
> "Those black-eyed peas tasted all right to me" -- Dixie Chicks
>
>
>
===
> This list is open to everyone.  Occasionally, less thoughtful
> people send inappropriate messages.  Please DO NOT COMPLAIN TO
> THE POSTMASTER about these messages because the postmaster has no
> way of controlling them, and excessive complaints will result in
> termination of the list.
>
> For information about this list, including information about the
> problem of inappropriate messages and information about how to
> unsubscribe, please see the web page at
> http://jse.stat.ncsu.edu/
>
===
>




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Novice questions about regression analysis.

2000-06-28 Thread Joe Ward

Wen-Feng--

Briefly --

1. While planning your experimental design, state your research questions in
"natural language"-- before you start collecting data.

2. Create a Prediction/Regression/Linear Model that allows you to translate
your "natural language"
 research questions in terms of your Model  -- your ASSUMED MODEL.
You may need to cycle through this process several times to get an
appropriate model.l

3. Impose Restrictions implied by your research questions on your ASSUMED
MODEL to obtain your RESTRICTED MODEL.

4. Compare your ASSUMED and RESTRICTED MODELS.

You can do much  PLANNING BEFORE YOU BEGIN YOUR COLLECTION AND ANALYSES.

If some high school students can do it, I feel confident that you can do it,
too.
But be careful!   If your committee members can't do it, then you may not
"pass".

-- Joe
******
Joe WardHealth Careers High School
167 East Arrowhead Dr.   4646 Hamilton Wolfe
San Antonio, TX 78228-2402 San Antonio, TX 78229
Phone: 210-433-6575Phone:  210-617-5400
Fax: 210-433-2828 Fax: 210-617-5423
Email: [EMAIL PROTECTED]
http://www.ijoa.org/joeward/watdindex.html


- Original Message -
From: "Wen-Feng Hsiao" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, June 28, 2000 10:10 AM
Subject: Novice questions about regression analysis.


> Dear listers,
>
> I am stuck with the experiment design of my dissertation. My experiment
> would like to investigate the influences of different factors of stimuli
> on the subject's response (each factor is a continuous variable), and
> further build a regression model for these relations. My questions are:
>
> 1. It seems that no experiment-design issues related to Regression
> Analysis are discussed in the usual statistics textbook. Why? Does it
> mean one needn't consider the experiment design if he uses Regression
> Analysis to analyze his data?
>
> 2. Due to the measure of the dependent variable is the participants'
> subjective responses, to remove unrelated subject-specific variables, I
> am considering to employ a within-subject design. But there seems no
> statistical packages ready for dealing with within-subject design of
> Regression Analysis?
>
> Suppose a design in which each of the n subjects gives rise to a Y
> observation under each of c different conditions, then a total of N=ncY
> observations could be obtained. How can I use Regression Analysis to
> analyze these observations?
>
> Thanks for your help.
>
> Wen-Feng
>
>
>
===
> This list is open to everyone.  Occasionally, less thoughtful
> people send inappropriate messages.  Please DO NOT COMPLAIN TO
> THE POSTMASTER about these messages because the postmaster has no
> way of controlling them, and excessive complaints will result in
> termination of the list.
>
> For information about this list, including information about the
> problem of inappropriate messages and information about how to
> unsubscribe, please see the web page at
> http://jse.stat.ncsu.edu/
>
===
>




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Stupid question on relationship of r and t

2000-06-24 Thread Joe Ward

Jason --

t^2 = r^2*(n-2)
---
 (1-r^2)

is a special case of the more general case of using R^2 to compute
the F statistic in a Prediction/Regression/Linear Models approach to
research studies.


Letting

R^2(Assumed) = R^2 for the ASSUMED MODEL
R^2(Restricted)= R^2 for the RESTRICTED MODEL
NA  =  number of linearly independent predictor vectors (i.e., the number of
parameters) in the ASSUMED MODEL.
NR  =  number of linearly independent predictor vectors (i.e., the number of
parameters) in the RESTRICTED MODEL
N=  total number of observations (cases)
df1  =  NA - NR  =numerator degrees of freedom
df2 =   N   - NA=denominator degrees of freedom

F(df1,df2) =   (R^2(Assumed) - R^2(Restricted))/(df1)
   ---
   (1 - R^2(Assumed))/(df2)

Now consider the your special case when:

The ASSUMED MODEL CONTAINS ONLY TWO PREDICTORS:

Y = b0*U + b1*X + Ea

and the Hypothesis is "b1 = 0").  Then the RESTRICTED MODEL is:

Y = b0*U + Er

In this special case,

R^2(Restricted) = 0

and then

F(df1,df2) = (R^2(Assumed)/(df1)
   ---
   (1 - R^2(Assumed))/(df2)

and you can easily solve for R^2 if desired.

R^2(Assumed) =  F*(df1)
---
(df2) + F*(df1)


and in your special case of only ONE predictor (in addition to, U),
sometimes called "simple regression".

df1 = 2 - 1 = 1
 and
df2 = N - 2

R^2(Assumed) = r^2  =F

N - 2 + F


but since

t^2(df2) = F(1,df2)

then we have

r^2 =t^2
 -
 N - 2 + t^2

which is what you obtain from Bob's
suggestion --

> > t= r * sqrt(n-2)
> >-
> >sqrt(1-r^2)
> >
> > I want to be able to calculate r from t.  I tried algebraically
> > manipulating the formula, but never quite got it to where I could do
> > this.  Any advice?
> >
> Try squaring both sides and re-arranging.  ( Joe Ward's comment "GOOD
SUGGESTION BY BOB")
>
> Bob
>
> --
> Bob O'Hara
> Metapopulation Research Group
> Division of Population Biology
> Department of Ecology and Systematics
> PO Box 17 (Arkadiankatu 7)
> FIN-00014 University of Helsinki
> Finland
>
> tel: +358 9 191 7382  fax: +358 9 191 7301
> email: [EMAIL PROTECTED]
> To induce catatonia, visit:
> http://www.helsinki.fi/science/metapop/



- Original Message -
From: "Anon." <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Saturday, June 24, 2000 7:20 AM
Subject: Re: Stupid question on relationship of r and t


> "Jason Osborne, Ph.D." wrote:
> >
> > I am working on a power analysis project- we are reviewing old journal
> > articles to calculate observed effect sizes and power.  Some of these
> > articles, for example reporting t-test results, only give means and
> > t-test, no standard deviation.  thus, no effect size calculation is
> > possible.  I was hoping to estimate an effect size by converting a t to
> > an r.  I seem to remember a formula that relates the two, but am having
> > a dickens of a time tracking one down.  The one I did track down, for
> > calculating t from r, is not that helpful:
> >
> > t= r * sqrt(n-2)
> >-
> >sqrt(1-r^2)
> >
> > I want to be able to calculate r from t.  I tried algebraically
> > manipulating the formula, but never quite got it to where I could do
> > this.  Any advice?
> >
> Try squaring both sides and re-arranging.
>
> Bob
>
> --
> Bob O'Hara
> Metapopulation Research Group
> Division of Population Biology
> Department of Ecology and Systematics
> PO Box 17 (Arkadiankatu 7)
> FIN-00014 University of Helsinki
> Finland
>
> tel: +358 9 191 7382  fax: +358 9 191 7301
> email: [EMAIL PROTECTED]
> To induce catatonia, visit:
> http://www.helsinki.fi/science/metapop/
>
> I have yet to see any problem, however complicated, which, when you
> looked at it in the right way, did not become still more complicated.  -
> Poul Anderson
>
>
>
===
> This list is open to everyone.  Occasionally, less thoughtful
> people send inappropriate messages.  Please DO NOT COMPLAIN TO
> THE POSTMASTER about these messages because the postmaster has no
> way of controlling them, and excessive complaints will result in
> termination of the list.
>
> For information about this list, including information about the
> problem of inappropriate messages and information about how to
> unsubscribe, please see the web page at
> http://jse.stat.ncsu.edu/
>
===
>




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate message

Re: differences between groups/treatments ?

2000-06-19 Thread Joe Ward

Great comments, Don --

You are right on target again.  Yep,  the way to investigate this type of
questions is via PREDICTION/REGRESSION/LINEAR MODELS.

By coincidence, I am working with some local high school students
this summer, preparing them to "attack" their science fair projects.

The example we are doing, at this very moment, involves predicting
Final Performance of Students (Dependent or Response Attribute) from
knowledge
of Their Teacher's Name and the Students' Prior Performance.
 NOTICE THAT THIS IS A FIRST SHOT AT A "NATURAL
LANGUAGE" STATEMENT OF THE QUESTION OF INTEREST.
A more frequent approach is to talk about the TYPE OF ANALYSIS
before stating the research questions in "NATURAL LANGUAGE".

I will elaborate on this in detail later since I'm in the process of
preparing an Email Activity
for the students that asks them to investigate the INTERACTION between
TEACHER and PRIOR PERFORMANCE.

More later.

-- Joe

******
Joe Ward  Health Careers High School
167 East Arrowhead Dr.  4646 Hamilton Wolfe
San Antonio, TX 78228-2402San Antonio, TX 78229
Phone: 210-433-6575  Phone:  210-617-5400
Fax: 210-433-2828  Fax: 210-617-5423
Email: [EMAIL PROTECTED]
http://www.ijoa.org/joeward/watdindex.html


- Original Message -
From: "Donald Burrill" <[EMAIL PROTECTED]>
To: "Donal" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Monday, June 19, 2000 4:30 PM
Subject: Re: differences between groups/treatments ?


> On Mon, 19 Jun 2000, Donal wrote:
>
> > I'm currently analysing data resulting from a study of children's
> > reading ability.
>
> I shall resist the temptation to quibble over your inability to observe
> reading ability (as distinct from some indeterminate lower bound on that
> ability) ...
>
> As you describe the study, you have an unspecified number of children
> divided into four groups in a two-way design of Treatments (2 levels)
> by Prior Performance (2 levels).  This would naturally lend itself to
> a two-way analysis of variance, or equivalently (pace Joe Ward) to a
> multiple regression analysis with three predictors:  Treatment,
> Performance, and Treatment*Performance.  If there are indeed effects
> attributable to Treatment and Performance, this analysis will be more
> sensitive to them than the two separate t-tests you propose.  And if
> there is an interaction between Treatment and Performance, as there may
> well be, the sensitivity to possible effects increases.
>
> Whether this is the best analysis available is another question entirely.
>
> 1.  If there are children of different sexes, you may be able to
> consider a three-way design, although I suspect it would be unbalanced,
> which (I also suspect!) may induce serious difficulties for you.
>
> 2.  Your Performance information you have chosen to dichotomize,
> although it is presumably (quasi-)continuous to start with.  You might
> find out something useful by treating it as a continuous predictor
> rather than as a dichotomy:  in effect carrying out an analysis of
> covariance with pre-treatment reading score as the covariate, whether you
> used an "Analysis of Covariance" program or a "Multiple Regression"
> program or a "General Linear Model" (GLM) program to do the arithmetic.
>
> 3.  In addition to sex, there may be other lurking variables in your data
> that could be used as predictors.  Whether it is sensible to consider
> including them in a hypothetical model depends partly on how many
> children you have all together, and partly on the distribution of any
> such candidate variable among _these_ children.
>
> > The study involves two treatments and each child's reading ability was
> > measured before and after the application of one of the treatments.
> > Thus, each child received one or the other (but not both) of two
> > possible treatments.  The children are divided into two groups:
>
> Well, that's not quite true.  You chose to categorize them into two
> groups, but they could equally well have been divided into three, or
> four, or six (depending on the number of children available and one's
> degree of interest in fine-tuning the "Weak/Strong" dimension).
> And if you have both boys and girls, you have two sexes as well, and
> it would not be surprising if they differed in their responses to the
> two treatments.  And how about the ages of the children?
>
> > Weak readers: those whose pre-treatment reading score was less than
> > the mean pre-treatment reading score
> > Strong r

Re: Beginner requests for help on ANOVA and T-tests (n SYSTAT97 --CAUTION)

2000-06-16 Thread Joe Ward

Thanks, Richard --

Yes, there IS A PROBLEM!

I called the IJOA folks about the situation.

http://www.ijoa.org/joeward/wardindex.html


IJOA is in the process of changing computer systems, so their URL will
be down until sometime next week.  Of course, those system changes are
unpredictable.

Thanks for your interest.

By the way,  we checked the EXCEL2000 Regression program yesterday and
it still has the WRONG TOTAL and REGRESSION SUM OF SQUARES WHEN
THE "CONSTANT IS ZERO" IS CHECKED.  Of course, that makes the F statistic
wrong, too.

Fortunately, the RESIDUAL SUM OF SQUARES IS CORRECT.

I'm sure that previous users must have mentioned this to  the EXCEL
REGRESSION folks.
Perhaps no one cares.

-- Joe
********
*  Joe Ward
*  167 East Arrowhead Dr.
*  San Antonio, TX 78228-2402
*  Phone: 210-433-6575
*  Fax:  210-433-2828
*  Email: [EMAIL PROTECTED]
*  http://www.ijoa.org/joeward/wardindex.html
*  
*  Health Careers High School
*  4646 Hamilton Wolfe
*  San Antonio, TX 78229
*  Phone: 210-617-5400
*  Fax:   210-617-5423
**



-Original Message-
From: R.C. <[EMAIL PROTECTED]>
To: Joe Ward <[EMAIL PROTECTED]>
Date: Friday, June 16, 2000 10:08 AM
Subject: Re: Beginner requests for help on ANOVA and T-tests (n
SYSTAT97 --CAUTION)


>IS THERE A PROBLEM WITH THE LINK PROVIDED HERE?  OR IS
>IT ME?
>
>THANKS,
>RICHARD

======
>--- Joe Ward <[EMAIL PROTECTED]> wrote:
>> Edmond--
>>
>> You may want to use the REGRESSION program in Excel
>> (WITH CAUTION).
>>   That way you can create your own models to do what
>> YOU WANT TO DO.
>> You might want to contact a statistician to help you
>> use REGRESSION
>> models.  You don't need to use some of the
>> Pre-Computer algorithms if
>> you know who to create your models to answere YOUR
>> QUESTIONS.
>>
>> The URL below has a few articles related to this
>> message:
>>  http://www.ijoa.org/joeward/wardindex.html
>>
>> If the "packaged" algorithms answer the questions of
>> interest,
>> then you can use them.
>>
>> I am using Excel 97 with three high school students
>> this summer.
>> 2 Sophomores and 1 Senior in preparation for their
>> Science Fair
>> Research Projects.  I usually use SYSTAT. However,
>> these students
>> already have Excel, so we are "testing" the use of
>> REGRESSION in Excel.
>>
>> Incidentally, when you use REGRESSION models that
>> need to:
>>
>> NOT HAVE THE Y-INTERCEPT TO PASS THROUGH ZERO,
>>
>> THE REGRESSION SUM OF SQUARES ARE NOT CORRECT.
>>
>> So be careful when you use REGRESSION in Excel 97.
>>
>> The Excel97  Error is due to the fact that the
>> REGRESSION SUM OF SQUARES
>> IS CALCULATED FROM THE "TOTAL SUM OF SQUARES" MINUS
>> THE "RESIDUAL
>> SUM OF SQUARES".   THE "TOTAL SUM OF SQUARES" IS NOT
>> CORRECT
>> WHEN YOU INDICATED THAT YOU DO NOT WANT THE
>> INTERCEPT TO PASS THROUGH
>> THE ORIGIN.
>>
>>  THE EXCEL PROGRAM USES THE "ADJUSTED SUM OF
>> SQUARES"
>> (REMOVING the REGRESSION SUM OF SQUARES ACCOUNTED
>> FOR BY THE
>> UNIT VECTOR (the "MEAN").  The REAL TOTAL SUM OF
>> SQUARES IN THIS
>> CASE SHOULD BE THE SUM OF SQUARES FOR THE DEPENDENT
>> VARIABLE.
>>
>> Apparently the programmer of the REGRESSION
>> procedure did not know how to
>> compute the REAL TOTAL SUM OF SQUARES.
>>
>> As some of the users and creators of Statistical
>> Software Packages
>> frequently mention:
>>
>> "Using the statistical routines in Excel can be
>> risky."
>>
>>  Of course, ALL statistical packages should be used
>> with caution.
>>
>> We have not had time to check on the Excel2000 to
>> find out if it is still
>> has the
>> same problem.
>>
>> Keep in touch.
>>
>> --  JHW
>> 
>> *  Joe Ward
>> *  167 East Arrowhead Dr.
>> *  San Antonio, TX 78228-2402
>> *  Phone: 210-433-6575
>> *  Fax:  210-433-2828
>> *  Email: [EMAIL PROTECTED]
>> *  http://www.ijoa.org/joeward/wardindex.html
>> *  
>> *  Health Careers High School
>> *  4646 Hamilton Wolfe
>> *  San Antonio, TX 78229
>> *  Phone: 210-617-5400
>> *  Fax:   210-617-5423
>> **
>>
>> -Original Message-

Re: Beginner requests for help on ANOVA and T-tests (n SYSTAT97 --CAUTION)

2000-06-15 Thread Joe Ward

Edmond--

You may want to use the REGRESSION program in Excel (WITH CAUTION).
  That way you can create your own models to do what YOU WANT TO DO.
You might want to contact a statistician to help you use REGRESSION
models.  You don't need to use some of the Pre-Computer algorithms if
you know who to create your models to answere YOUR QUESTIONS.

The URL below has a few articles related to this message:
 http://www.ijoa.org/joeward/wardindex.html

If the "packaged" algorithms answer the questions of interest,
then you can use them.

I am using Excel 97 with three high school students this summer.
2 Sophomores and 1 Senior in preparation for their Science Fair
Research Projects.  I usually use SYSTAT. However, these students
already have Excel, so we are "testing" the use of
REGRESSION in Excel.

Incidentally, when you use REGRESSION models that need to:

NOT HAVE THE Y-INTERCEPT TO PASS THROUGH ZERO,

THE REGRESSION SUM OF SQUARES ARE NOT CORRECT.

So be careful when you use REGRESSION in Excel 97.

The Excel97  Error is due to the fact that the REGRESSION SUM OF SQUARES
IS CALCULATED FROM THE "TOTAL SUM OF SQUARES" MINUS THE "RESIDUAL
SUM OF SQUARES".   THE "TOTAL SUM OF SQUARES" IS NOT CORRECT
WHEN YOU INDICATED THAT YOU DO NOT WANT THE INTERCEPT TO PASS THROUGH
THE ORIGIN.

 THE EXCEL PROGRAM USES THE "ADJUSTED SUM OF SQUARES"
(REMOVING the REGRESSION SUM OF SQUARES ACCOUNTED FOR BY THE
UNIT VECTOR (the "MEAN").  The REAL TOTAL SUM OF SQUARES IN THIS
CASE SHOULD BE THE SUM OF SQUARES FOR THE DEPENDENT VARIABLE.

Apparently the programmer of the REGRESSION  procedure did not know how to
compute the REAL TOTAL SUM OF SQUARES.

As some of the users and creators of Statistical Software Packages
frequently mention:

"Using the statistical routines in Excel can be risky."

 Of course, ALL statistical packages should be used with caution.

We have not had time to check on the Excel2000 to find out if it is still
has the
same problem.

Keep in touch.

--  JHW

*  Joe Ward
*  167 East Arrowhead Dr.
*  San Antonio, TX 78228-2402
*  Phone: 210-433-6575
*  Fax:  210-433-2828
*  Email: [EMAIL PROTECTED]
*  http://www.ijoa.org/joeward/wardindex.html
*  
*  Health Careers High School
*  4646 Hamilton Wolfe
*  San Antonio, TX 78229
*  Phone: 210-617-5400
*  Fax:   210-617-5423
**

-Original Message-
From: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
Date: Thursday, June 15, 2000 9:37 AM
Subject: Beginner requests for help on ANOVA and T-tests


>Hello, I am a 16 year old student and a beginner to statistics.
>I'm lost.
>Currently I only have Microsoft Excel 97. And I would like to know the
>differences between the following ANOVA tests (in Excel):
>
>ANOVA Single Factor
>ANOVA Two-Factors with replication
>ANOVA Two-Factors without replication
>
>What do all these mean? Where and when should they be applied? And can
>anyone please use simple english terms to explain? I am only a beginner.
>What is one-way or two-way ANOVA?
>
>How about for T-Test?
>T-Test: Paired two samples for means
>T-Test: Two-sample assuming equal variances
>T-Test: Two-sample assuming unequal variances
>
>Also, can I use ANOVA instead of T-test when testing null hypothesis?
>Between 2 groups?
>
>Thanks for your help,
>Edmund
>
>
>Sent via Deja.com http://www.deja.com/
>Before you buy.
>
>
>===
>This list is open to everyone.  Occasionally, less thoughtful
>people send inappropriate messages.  Please DO NOT COMPLAIN TO
>THE POSTMASTER about these messages because the postmaster has no
>way of controlling them, and excessive complaints will result in
>termination of the list.
>
>For information about this list, including information about the
>problem of inappropriate messages and information about how to
>unsubscribe, please see the web page at
>http://jse.stat.ncsu.edu/
>===
>





===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: MANOVA

2000-06-14 Thread Joe Ward

If the 'ZERO' or 'DOT' means that you have some missing cells then
that is a good time to "CREATE YOUR OWN MODEL".

-- Joe
********
*  Joe Ward
*  167 East Arrowhead Dr.
*  San Antonio, TX 78228-2402
*  Phone: 210-433-6575
*  Fax:  210-433-2828
*  Email: [EMAIL PROTECTED]
*  http://www.ijoa.org/joeward/wardindex.html
*  
*  Health Careers High School
*  4646 Hamilton Wolfe
*  San Antonio, TX 78229
*  Phone: 210-617-5400
*  Fax:   210-617-5423
**
-Original Message-
From: HAideren <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
Date: Wednesday, June 14, 2000 8:12 PM
Subject: MANOVA


>Hi,
>
>I have run a MANOVA and in the 'Parameter Estimates' section of the
results,
>some of the cells are filled with a zero or a dot (.). Is there a way to
>overcome this problem? If no, should I run a different multivariate test
and
>what would be the appropriate substitute test?
>
>Cheers.
>
>
>
>
>
>===
>This list is open to everyone.  Occasionally, less thoughtful
>people send inappropriate messages.  Please DO NOT COMPLAIN TO
>THE POSTMASTER about these messages because the postmaster has no
>way of controlling them, and excessive complaints will result in
>termination of the list.
>
>For information about this list, including information about the
>problem of inappropriate messages and information about how to
>unsubscribe, please see the web page at
>http://jse.stat.ncsu.edu/
>===
>




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Inequalities constrains on the coefficients

2000-06-08 Thread Joe Ward

I asked  Lee Wilkinson how this is done in SYSTAT.
Here is his reply.

-- Joe


*  Joe Ward
*  167 East Arrowhead Dr.
*  San Antonio, TX 78228-2402
*  Phone: 210-433-6575
*  Fax:  210-433-2828
*  Email: [EMAIL PROTECTED]
*  http://www.ijoa.org/joeward/wardindex
*  
*  Health Careers High School
*  4646 Hamilton Wolfe
*  San Antonio, TX 78229
*  Phone: 210-617-5400
*  Fax:   210-617-5423
**
-Original Message-
From: Wilkinson, Leland <[EMAIL PROTECTED]>
To: 'Joe Ward' <[EMAIL PROTECTED]>
Date: Thursday, June 08, 2000 9:34 AM
Subject: RE: Inequalities constrains on the coefficients


>The SYSTAT procedure NONLIN does the same with the LOSS option and FUNPAR.
>Could you perhaps post this to Ed-Stat in the same thread?
>Thanks,
>Lee
>
>-----Original Message-
>From: Joe Ward [mailto:[EMAIL PROTECTED]]
>Sent: Tuesday, June 06, 2000 11:56 AM
>To: Wilkinson, Leland (SYSTAT
>Subject: Fw: Inequalities constrains on the coefficients
>
>
>Lee --
>
>Is this available in any version of SYSTAT?
>What about SYSTAT8-Student Version?
>
>-- Joe
>
=
-Original Message-
From: Jonathan Fry <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
Date: Tuesday, June 06, 2000 11:05 AM
Subject: Re: Inequalities constrains on the coefficients


>Arie Beresteanu wrote:
>>
>> Hi,
>>
>>  Estimation of linear (multivariate) regression with equality constrains
>> on the coefficients is a well known problem (at least for me). What
>> about if the constrains are inequalities? More specifically:
>>
>> Y=Xb+e
>> s.t.
>> Qb<=q
>>
>> where Q is a matrix and q is a vector. (for example Y=b0+b1*X1+b2*X2+e
>> s.t. b1+2*b2>=0 )
>>
>> How do I solve that? How do I test the constrain? Is there something on
>> MatLab/STATA/SAS for that?
>>
>> Thank you,
>> Arie.
>
>The SPSS procedure CNLR (constrained non-linear regression) handles this
>kind of problem directly, using a quadratic programming solver.
>
>Jonathan Fry
>SPSS Inc.
>
>
>===
>This list is open to everyone.  Occasionally, less thoughtful
>people send inappropriate messages.  Please DO NOT COMPLAIN TO
>THE POSTMASTER about these messages because the postmaster has no
>way of controlling them, and excessive complaints will result in
>termination of the list.
>
>For information about this list, including information about the
>problem of inappropriate messages and information about how to
>unsubscribe, please see the web page at
>http://jse.stat.ncsu.edu/
>===
>




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Inequalities constrains on the coefficients

2000-06-08 Thread Joe Ward



Lee Wilkinson indicates how this is done in SYSTAT.

--- Joe

-Original Message-
From: Wilkinson, Leland <[EMAIL PROTECTED]>
To: 'Joe Ward' <[EMAIL PROTECTED]>
Date: Thursday, June 08, 2000 9:34 AM
Subject: RE: Inequalities constrains on the coefficients


>The SYSTAT procedure NONLIN does the same with the LOSS option and FUNPAR.
>Could you perhaps post this to Ed-Stat in the same thread?
>Thanks,
>Lee
>
>-----Original Message-
>From: Joe Ward [mailto:[EMAIL PROTECTED]]
>Sent: Tuesday, June 06, 2000 11:56 AM
>To: Wilkinson, Leland (SYSTAT
>Subject: Fw: Inequalities constrains on the coefficients
>
>
>Lee --
>
>Is this available in any version of SYSTAT?
>What about SYSTAT8-Student Version?
>
>-- Joe
>

-Original Message-
From: Jonathan Fry <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
Date: Tuesday, June 06, 2000 11:05 AM
Subject: Re: Inequalities constrains on the coefficients


>Arie Beresteanu wrote:
>>
>> Hi,
>>
>>  Estimation of linear (multivariate) regression with equality constrains
>> on the coefficients is a well known problem (at least for me). What
>> about if the constrains are inequalities? More specifically:
>>
>> Y=Xb+e
>> s.t.
>> Qb<=q
>>
>> where Q is a matrix and q is a vector. (for example Y=b0+b1*X1+b2*X2+e
>> s.t. b1+2*b2>=0 )
>>
>> How do I solve that? How do I test the constrain? Is there something on
>> MatLab/STATA/SAS for that?
>>
>> Thank you,
>> Arie.
>
>The SPSS procedure CNLR (constrained non-linear regression) handles this
>kind of problem directly, using a quadratic programming solver.
>
>Jonathan Fry
>SPSS Inc.
>
>
>===
>This list is open to everyone.  Occasionally, less thoughtful
>people send inappropriate messages.  Please DO NOT COMPLAIN TO
>THE POSTMASTER about these messages because the postmaster has no
>way of controlling them, and excessive complaints will result in
>termination of the list.
>
>For information about this list, including information about the
>problem of inappropriate messages and information about how to
>unsubscribe, please see the web page at
>http://jse.stat.ncsu.edu/
>===
>




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Regression and Correlation (Was Correlation)

2000-05-20 Thread Joe Ward

Hi Brett, Herman et al --

Occasionally it seems appropriate to send some results that help
reinforce the idea that the correlation coefficient can be of limited value
in some situations.

The table shown below illustrates what happens when the range of
X is restricted.

-- Joe
 
* Joe Ward  Health Careers High School *
* 167 East Arrowhead Dr 4646 Hamilton Wolfe*
* San Antonio, TX 78228-2402San Antonio, TX 78229  *
* Phone: 210-433-6575   Phone: 210-617-5400*
* Fax: 210-433-2828 Fax: 210-617-5423  *
* [EMAIL PROTECTED]*
* http://www.ijoa.org/joeward/wardindex.html   *


- Original Message - 
From: Magill, Brett <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Friday, May 19, 2000 12:46 PM
Subject: Regression and Correlation (Was Correlation)


| I am no statistician, so let me make sure I am understanding what you are
| saying.  Your point is that you may have an identical regression equation
| despite the fact that the correlation may vary depending on the amount of
| variation in X.  If this is your point, I agree and recognize this--r is a
| measure of the fit about the regression line.
| 
| Nonetheless, regression and correlation are the same in the bivariate case
| with the exception of scale.  In a bivariate regression, the standardized
| Beta coefficient is equal to the Pearson r.  As with any standardization, it
| removes the scale of the variation and the result is that the slope
| describes the relationship or B = r.
| 
| Brett
| 

BEGIN HERMAN'S MESSAGE
| -Original Message-
| From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
| Sent: Friday, May 19, 2000 11:43 AM
| To: [EMAIL PROTECTED]
| Subject: Re: Correlation
| 
| 
| Magill, Brett <[EMAIL PROTECTED]> wrote:
| >Mike,
| 
| >In the bivariate case, regression and correlation are identical.
| 
| This is false.  Correlation is the measure of the
| proportion of the variance of one variable explained by a
| linear function of the other in a joint distribution, while
| linear regression is the linear relation itself.  One can
| have non-linear versions as well.
| 
| If in fact E(Y|X) = aX + b, this will also be the case no
| matter how selection is made on X, whereas the correlation
| can vary greatly.
| 
---
END OF HERMAN'S MESSAGE
-----
Beginning of insert by Joe Ward
-

-- Forwarded message --
Date: Fri, 23 May 1997 09:30:20 -0400 (EDT)
From: Mike Palij <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED], [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: Testing basic statistical concepts

I'd like to thank Joe Ward for reminding us of this situation
(his posting is appended below), as well as jogging my own
memory for a previous posting I had made.  A while back I
had posted the Anscombe dataset (in the context of an SPSS
program) which also clearly shows the benefit of plotting
the data:  the four situations produce almost identical
Pearson r values but only one actually shows the classic
scatterplot, the others show a nonlinear pattern and the
influence that a single point has on the calculation of r.
What does the value of r tell us here?  Aren't the basic 
statistical concepts to be learned in this situation far 
more important and most clearly seen through a coordination
of the graphical and numerical information?

-Mike Palij/Psychology Dept/New York University

Joe H Ward <[EMAIL PROTECTED]> writes:
 To Mike et al --
 
 There have been several message related to the Simple Correlation
 Coefficient.  IMHO, when out in the "real world" involving practical
 decision-making the correlation coefficient has very limited value and
 sometimes dangerous consequences.  The correlation coefficient may be
 an important topic for the history of statistics to learn the problems 
 associated with its use . 
 
 Attached below is an item that I submitted a long time ago, and it may be 
 of interest to those following the discussion of "r".
 
 -- Joe
 ***
 * Joe WardHealth Careers High School  *
 * 167 East Arrowhead Dr.4646 Hamilton Wolfe   *
 * San Antonio, TX 78228-2402  San Antonio, TX 78229   *
 * Phone: 210-433-6575   Phone: 210-617-5400 *
 * [EMAIL PROTECTED] Fax  : 210-617-5423 *
 

Re: R sq vs r sq

2000-05-05 Thread Joe Ward

Good message, Jon --
:-)
-- Joe
- Original Message - 
From: Jon Cryer <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, May 05, 2000 11:13 AM
Subject: Re: R sq vs r sq


| But the important issue, statistically, is that the model is
| linear in the _parameters_ (not the predictor variables). When this is the
| case
| the equations from which the least squares estimates of the parameters
| are obtained are linear equations (the so-called normal equations).
| This is true even when fitting a quadratic (or higher order) equation.
| 
| Statisticians always talk about linear models in this way. Statistically
| speaking, response = quadratic curve in x + random error  is a _linear_ model.
| Statisticians use the term nonlinear model for more complex models that are
| not linear in the parameters.
| 
| Jon Cryer
| 
| At 07:46 PM 5/5/00 +0200, you wrote:
| >Joe,
| >
| >Well by linear *I* meant what we mean in algebra 2 class y = mx + b,
| >but I do not object to calling y = a0 + a1 x1 + a2 x2 + a3 x3 + ... linear.
| >I certainly DO object to your definition of linear, although I suppose
| >it *is* used by some people, I find it very confusing.
| >
| >Cheers,
| >Bill Larson
| >Geneva, Switzerland
| >
| >- Original Message -
| >From: Joe Ward <[EMAIL PROTECTED]>
| >To: William J. Larson <[EMAIL PROTECTED]>; Paul Velleman
| ><[EMAIL PROTECTED]>
| >Cc: <[EMAIL PROTECTED]>
| >Sent: 2000 May 05 9:07 PM
| >Subject: Re: R sq vs r sq
| >
| >
| >
| >Hi Paul, William et al.--
| >
| >This may be ANOTHER GOOD TIME TO COMMENT ON
| >THE COMMUNICATION PROBLEMS OF STATISTICS (AND OTHER AREAS, TOO).
| >
| >I suggest that when we use the terms LINEAR and NONLINEAR that we
| >tell the reader what the SENDER means by those terms.
| >
| >When I write:
| >
| >Y = b1*X1 + b2*X2 + ... + bp*Xp + E
| >
| >where bi (i = 1,2,...p) are least-squares regression coefficients, I
| >will refer to this as a LINEAR MODEL.
| >
| >The Xs can be any numbers that I choose-- log(z), ln(z), z^3,  cos(z), 1/z,
| >binary (1or 0), ...
| >
| >If a person writes the form:
| >
| >Y = a0 + a1*X + a2*X^2 + a3*X ^3 + E
| >
| >then they might say that this is a NONLINEAR model.
| >
| >As long as the reader knows exactly what the model is-- then we are
| >communicating.
| >
| >In these days of fancy 3D graphic displays, it is interesting to picture the
| >function:
| >
| >Y = a0 + a1*X + a2*X^2
| >
| >in the 2D space of Y and X -- which appears as a CURVE.
| >
| >and then picture the function in the 3D space of Y, X and X^2 or
| >re-designating X^2 as Z
| >
| >Y = a0 + a1*X + a2*Z
| >
| >We notice that the 3D function lies in a PLANE -- reminding us that
| >we have a "LINEAR MODEL".
| >
| >If we hurriedly say to someone that "this function is NONLINEAR in the 2D
| >space  of Y and X, but
| >LINEAR in the 3D space of Y,X and Z", then we might even cause more
| >frustration. :-(
| >
| >"COMMUNICATION" IS A PROBLEM EVERYWHERE!
| >
| >DO WILLIAM AND PAUL HAVE THE SAME MEANING FOR "NONLINEAR"?
| >:-)
| >
| >--- Joe
| >
| >* Joe Ward  Health Careers High School *
| >* 167 East Arrowhead Dr 4646 Hamilton Wolfe*
| >* San Antonio, TX 78228-2402San Antonio, TX 78229  *
| >* Phone: 210-433-6575   Phone: 210-617-5400*
| >* Fax: 210-433-2828 Fax: 210-617-5423  *
| >* [EMAIL PROTECTED]*
| >* http://www.ijoa.org/joeward/wardindex.html   *
| >
| >
| >
| >
| >
| >- Original Message -
| >From: Paul Velleman <[EMAIL PROTECTED]>
| >To: William J. Larson <[EMAIL PROTECTED]>
| >Cc: <[EMAIL PROTECTED]>
| >Sent: Friday, May 05, 2000 6:43 AM
| >Subject: Re: R sq vs r sq
| >
| >
| >| At 11:18 AM +0200 05/05/2000, William J. Larson wrote:
| >| >
| >| >It appears that R sq is some sort of generalization of r sq
| >| >for nonlinear cases. True?
| >| >
| >| Not really. common convention is  to capitalize the R for multiple
| >| correlation. The R sqr reported in regressions allows for the
| >| generalization of simple regression to a multiple regression (2 or
| >| more predictors). In both cases R sqr is the squared correlation
| >| between y and y-hat. Y-hat represents the best (in the least squares
| >| sense) fit to y among all linear combinations of the x's. All of
| >| these are

Re: R sq vs r sq

2000-05-05 Thread Joe Ward

Bill --

You are so right!!  The term NONLINEAR is very confusing.

As I indicated in the earlier message, most folks in the statistics world refer to a 
LINEAR MODEL as I indicated.  

Y = b1*X1 + b2*X2 + ... + bp*Xp + E

and some folks will write UNFORTUNATELY --

Y = b0 + b1*X1 + b2 * X2 + ... + bp*Xp + E

that leads to more confusion!!

The main point is that the functions are LINEAR IN THE UNKNOWN COEFFICIENTS.

This is why we sometimes take the logs of the function so that the new function is
LINEAR IN THE UNKNOWN COEFFICIENTS -- AND THE SOLUTIONS ARE EASIER.

A  "REAL" NONLINEAR MODEL NEEDS SOME SPECIAL ALGORITHMS FOR SOLUTION.

---
Someday -- long after I'm out of this world -- the AP-Statistics objectives WILL ALLOW 
OUR
STUDENTS TO HAVE --

"The power they deserve to use REGRESSION/LINEAR MODELS and COMPUTERS/CALCULATORS
to their fullest".

Perhaps the secondary teachers can speed up improvements through the NCTM
"Principles and Standards for School Mathematics". 

Perhaps there should be an Applied Research Statistics course that has few 
restrictions on the content -- focusing on those topics that help students do what
they NEED to accomplish practical results -- leading to more enthusiasm for
statistics and data analysis.

Change is slow!!

:-)

-- Joe
******** 
* Joe Ward  Health Careers High School *
* 167 East Arrowhead Dr 4646 Hamilton Wolfe*
* San Antonio, TX 78228-2402San Antonio, TX 78229  *
* Phone: 210-433-6575   Phone: 210-617-5400*
* Fax: 210-433-2828 Fax: 210-617-5423  *
* [EMAIL PROTECTED]*
* http://www.ijoa.org/joeward/wardindex.html   *




- Original Message - 
From: William J. Larson <[EMAIL PROTECTED]>
To: Joe Ward <[EMAIL PROTECTED]>; Paul Velleman <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Friday, May 05, 2000 10:46 AM
Subject: Re: R sq vs r sq


| Joe,
| 
| Well by linear *I* meant what we mean in algebra 2 class y = mx + b,
| but I do not object to calling y = a0 + a1 x1 + a2 x2 + a3 x3 + ... linear.
| I certainly DO object to your definition of linear, although I suppose
| it *is* used by some people, I find it very confusing.
| 
| Cheers,
| Bill Larson
| Geneva, Switzerland
| 
| - Original Message -
| From: Joe Ward <[EMAIL PROTECTED]>
| To: William J. Larson <[EMAIL PROTECTED]>; Paul Velleman
| <[EMAIL PROTECTED]>
| Cc: <[EMAIL PROTECTED]>
| Sent: 2000 May 05 9:07 PM
| Subject: Re: R sq vs r sq
| 
| 
| 
| Hi Paul, William et al.--
| 
| This may be ANOTHER GOOD TIME TO COMMENT ON
| THE COMMUNICATION PROBLEMS OF STATISTICS (AND OTHER AREAS, TOO).
| 
| I suggest that when we use the terms LINEAR and NONLINEAR that we
| tell the reader what the SENDER means by those terms.
| 
| When I write:
| 
| Y = b1*X1 + b2*X2 + ... + bp*Xp + E
| 
| where bi (i = 1,2,...p) are least-squares regression coefficients, I
| will refer to this as a LINEAR MODEL.
| 
| The Xs can be any numbers that I choose-- log(z), ln(z), z^3,  cos(z), 1/z,
| binary (1or 0), ...
| 
| If a person writes the form:
| 
| Y = a0 + a1*X + a2*X^2 + a3*X ^3 + E
| 
| then they might say that this is a NONLINEAR model.
| 
| As long as the reader knows exactly what the model is-- then we are
| communicating.
| 
| In these days of fancy 3D graphic displays, it is interesting to picture the
| function:
| 
| Y = a0 + a1*X + a2*X^2
| 
| in the 2D space of Y and X -- which appears as a CURVE.
| 
| and then picture the function in the 3D space of Y, X and X^2 or
| re-designating X^2 as Z
| 
| Y = a0 + a1*X + a2*Z
| 
| We notice that the 3D function lies in a PLANE -- reminding us that
| we have a "LINEAR MODEL".
| 
| If we hurriedly say to someone that "this function is NONLINEAR in the 2D
| space  of Y and X, but
| LINEAR in the 3D space of Y,X and Z", then we might even cause more
| frustration. :-(
| 
| "COMMUNICATION" IS A PROBLEM EVERYWHERE!
| 
| DO WILLIAM AND PAUL HAVE THE SAME MEANING FOR "NONLINEAR"?
| :-)
| 
| --- Joe
| 
| * Joe Ward  Health Careers High School *
| * 167 East Arrowhead Dr 4646 Hamilton Wolfe*
| * San Antonio, TX 78228-2402San Antonio, TX 78229  *
| * Phone: 210-433-6575   Phone: 210-617-5400*
| * Fax: 210-433-2828 Fax: 210-617-5423  *
| * [EMAIL PROTECTED]*
| * http://www.ijoa.org/joeward/wardindex.html   *
| *

Re: R sq vs r sq

2000-05-05 Thread Joe Ward

Hi Paul, William et al.--

This may be ANOTHER GOOD TIME TO COMMENT ON 
THE COMMUNICATION PROBLEMS OF STATISTICS (AND OTHER AREAS, TOO).

I suggest that when we use the terms LINEAR and NONLINEAR that we
tell the reader what the SENDER means by those terms.

When I write:

Y = b1*X1 + b2*X2 + ... + bp*Xp + E

where bi (i = 1,2,...p) are least-squares regression coefficients, I
will refer to this as a LINEAR MODEL.

The Xs can be any numbers that I choose-- log(z), ln(z), z^3,  cos(z), 1/z, binary 
(1or 0), ...

If a person writes the form:

Y = a0 + a1*X + a2*X^2 + a3*X ^3 + E

then they might say that this is a NONLINEAR model.

As long as the reader knows exactly what the model is-- then we are communicating.

In these days of fancy 3D graphic displays, it is interesting to picture the function:

Y = a0 + a1*X + a2*X^2 

in the 2D space of Y and X -- which appears as a CURVE.

and then picture the function in the 3D space of Y, X and X^2 or
re-designating X^2 as Z 

Y = a0 + a1*X + a2*Z

We notice that the 3D function lies in a PLANE -- reminding us that
we have a "LINEAR MODEL".

If we hurriedly say to someone that "this function is NONLINEAR in the 2D space  of Y 
and X, but
LINEAR in the 3D space of Y,X and Z", then we might even cause more frustration. :-(

"COMMUNICATION" IS A PROBLEM EVERYWHERE!

DO WILLIAM AND PAUL HAVE THE SAME MEANING FOR "NONLINEAR"?
:-)

--- Joe
**** 
* Joe Ward  Health Careers High School *
* 167 East Arrowhead Dr 4646 Hamilton Wolfe*
* San Antonio, TX 78228-2402San Antonio, TX 78229  *
* Phone: 210-433-6575   Phone: 210-617-5400*
* Fax: 210-433-2828 Fax: 210-617-5423  *
* [EMAIL PROTECTED]*
* http://www.ijoa.org/joeward/wardindex.html   *





- Original Message - 
From: Paul Velleman <[EMAIL PROTECTED]>
To: William J. Larson <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Friday, May 05, 2000 6:43 AM
Subject: Re: R sq vs r sq


| At 11:18 AM +0200 05/05/2000, William J. Larson wrote:
| >
| >It appears that R sq is some sort of generalization of r sq
| >for nonlinear cases. True?
| >
| Not really. common convention is  to capitalize the R for multiple 
| correlation. The R sqr reported in regressions allows for the 
| generalization of simple regression to a multiple regression (2 or 
| more predictors). In both cases R sqr is the squared correlation 
| between y and y-hat. Y-hat represents the best (in the least squares 
| sense) fit to y among all linear combinations of the x's. All of 
| these are statistics for linear models. It is dangerous to apply them 
| to nonlinear models.
| 
| -- Paul
| -- 
| Paul F. Velleman
| Cornell University  Data Description, Inc.
| 358 Ives Hall  Box 4555
| Ithaca, NY 14853   Ithaca, NY 14852-4555
| (607) 255-4411  (607) 257-1000
| (607) 255-8484 fax(607) 257-4146 fax
| ===
| The Advanced Placement Statistics List
| To UNSUBSCRIBE send a message to [EMAIL PROTECTED] containing:
| unsubscribe apstat-l 
| Discussion archives are at
| http://forum.swarthmore.edu/epigone/apstat-l
| Problems with the list or your subscription? mailto:[EMAIL PROTECTED]
| ===
| 



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: no correlation assumption among X's in MLR

2000-05-04 Thread Joe Ward

And in addition to:

1. A Correlation Matrix
and 
2. A Covariance Matrix
  another person
  may simply use
3. An X'X matrix of inner products of the "raw" vectors.

Numerical accuracy is always an important consideration.

-- Joe
******** 
* Joe Ward  Health Careers High School *
* 167 East Arrowhead Dr 4646 Hamilton Wolfe*
* San Antonio, TX 78228-2402San Antonio, TX 78229  *
* Phone: 210-433-6575   Phone: 210-617-5400*
* Fax: 210-433-2828 Fax: 210-617-5423  *
* [EMAIL PROTECTED]*
* http://www.ijoa.org/joeward/wardindex.html   *




- Original Message - 
From: David A. Heiser <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>; Warren Sarle <[EMAIL PROTECTED]>
Sent: Thursday, May 04, 2000 7:29 PM
Subject: Re: no correlation assumption among X's in MLR


| 
| - Original Message -
| From: Warren Sarle <[EMAIL PROTECTED]>
| To: <[EMAIL PROTECTED]>
| Sent: Thursday, May 04, 2000 12:23 PM
| Subject: Re: no correlation assumption among X's in MLR
| 
| 
| > Of course Herman is right (as usual)! Where are people getting this
| > ridiculous idea that correlation and collinearity are the same thing?
| 
| ..
| Statistics is one field that has almost no agreed on usage of terms.
| Everybody is independent.
| 
| In one of my books, "Applied Linear Regression Models", by Neter, Wasserman
| and Kutner (1989) says "When the independent variables are correlated among
| themselves, intercorrelation or multicollinearity among them is said to
| exist. (Sometimes the latter term is reserved for those instances when the
| correlation among independent variables is very high.)..." The authors use
| multicolinearlity to refer to the correlation between X variables. (Who is
| right?)
| 
| From a numerical analysis viewpoint the basic matrix in OLS is the
| normalized X matrix which is called the correlation matrix. If
| standardization is not applied, the matrix is the covariance matrix.
| 
| It is clear then that there is a numerical difference between the covariance
| and correlation matricies.
| 
| ...
| >
| > Assuming you're using an intercept, a pair of variables is
| > collinear if and only if their correlation is 1.0 or -1.0.
| > Three or more variables are collinear if and only if there
| > is at least one of the variables that has a multiple
| > correlation of 1.0 with the other variables.
| 
| ...
| This may be your interpretation, but it is not universal.
| ...
| >
| > If the independent variables in a multiple linear regression are
| > collinear, there are infinitely many sets of least-squares
| > regression coefficients that produce the same predictions, MSE,
| > R-squared, etc.
| ..
| This is only true when the correlation matrix has off diagonals with
| 1.0. If it is slightly
| different because of numerical representations in the computer, there will
| be a finite set of apparent identical solutions.
| ...
| 
|   Although least squares does not produce unique
| > estimates, if you have prior information, you may be able to get
| > meaningful and useful Bayesian estimates. Regardless of whether you
| > have prior information, you can get useful predictions for new
| > cases lying in the same subspace as the original sample. Without
| > prior information, you cannot get useful extrapolations outside of
| > that subspace.  Statisticians who are not data miners sometimes
| > forget the distinction between estimation and prediction. :-)
| 
| 
| For many years the method of ridge analysis (non-Bayesian) has been
| extensively used in industry to get valid and workable extrapolations (i.e.
| predictions) beyond the range of the data used. The technique of varying
| lambda to reduce the variance inflation factor is a very good way to obtain
| useful and valid predictions. (All non-Baysian).
| 
| > Collinearity generally will NOT cause different machines or
| > different 

STATISTICS AT ISEF2000- International Science & Engineering Fair -- Detroit May 7-13 --Summer Workshop in San Antonio

2000-05-04 Thread Joe Ward

Topic #1 --The directory of finalists for ISEF2000 is now available at:

http://www.sciserv.org/isef/finaldir.pdf

There are finalists from all U.S. states and over 40 nations.

I did a brief search for MICHIGAN and a few schools represented are:

Renaissance HS
Saginaw Arts & Science Academy
Western High School
Redford HS

It is easy to find finalists near your location.

If you know any finalists, teachers, parents or others who might
be interested I will present the annual Shop Talk titled:

"Combining the Power of Statistics and Computers to Enhance Science Fair Projects"

at 9:00-10:00 a.m. on Monday, May 8, 2000 in Cobo Hall Room O2-41.

The purpose of this session is to provide guidance to Science Fair students, teachers 
and others to help them acquire statistics advice and  suggest kinds of questions they 
might ask their
statistical advisors.  As you can guess, I will encourage the participants to
get assistance from those who can teach them to create the models needed
to answer their, possibly unique, questions of interest. 

You can tell your friends that there will be some valuable drawing prizes for
those who get there early and stay 'til the end.

This year, none of the students with whom I advised in their data-analysis  made it to 
ISEF2000.
---sigh :-(

==

Topic #2 --

We have decided to open our Summer Workshop, emphasizing the Power of Statistics and 
Computers in Science 
Research, to a select few folks who may want to attend FROM OUTSIDE THE SAN ANTONIO 
REGION.  The application form with detailed information can be seen at the web site 
shown below.  This may be of interest to those who work with
student research projects  and those AP-Statistics teachers who have some extra school 
time AFTER THE MAY AP-EXAM
to introduce their students to some additional data-analysis ideas.
http://www.ijoa.org/joeward/wardindex.html   

  The dates are May 29 - June 9. If a participant can
stay for only the first week, that's OK.  Those who may be interested can call me to 
discuss details.

--- Joe
**** 
* Joe Ward  Health Careers High School *
* 167 East Arrowhead Dr 4646 Hamilton Wolfe*
* San Antonio, TX 78228-2402San Antonio, TX 78229  *
* Phone: 210-433-6575   Phone: 210-617-5400*
* Fax: 210-433-2828 Fax: 210-617-5423  *
* [EMAIL PROTECTED]*
* http://www.ijoa.org/joeward/wardindex.html   *















===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing -Reply

2000-04-17 Thread Joe Ward

Hi, Robert and all --

Yes, there occasionally were discussions in our Air Force research
whether or not we were working with the POPULATION or a SAMPLE.

As Dennis comments:
| 
| > the flaw here is that ... she has population data i presume ... or about
| as
| > close as one can come to it ... within the institution ... via the budget
| > or comptroller's office ... THE salary data are known ... so, whatever
| > differences are found ... DEMS are it!
| >

One of my Professors used to use the Invertebrate Paleontologists as his
example of a POPULATION.  I think at that time there were less than 20
people who were Invertebrate Paleontologists.

-- Joe
**** 
* Joe Ward  Health Careers High School *
* 167 East Arrowhead Dr 4646 Hamilton Wolfe*
* San Antonio, TX 78228-2402San Antonio, TX 78229  *
* Phone: 210-433-6575   Phone: 210-617-5400*
* Fax: 210-433-2828 Fax: 210-617-5423  *
* [EMAIL PROTECTED]*
* http://www.ijoa.org/joeward/wardindex.html   *




- Original Message - 
From: Robert Dawson <[EMAIL PROTECTED]>
To: dennis roberts <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Monday, April 17, 2000 9:54 AM
Subject: Re: hyp testing -Reply


| 
| - Original Message -
| From: dennis roberts
| > At 10:32 AM 4/17/00 -0300, Robert Dawson wrote:
| >
| > > There's a chapter in J. Utts' mostly wonderful but flawed low-math
| intro
| > >text "Seeing Through Statistics", in which she does much the same. She
| > >presents a case study based on some of her own work in which she looked
| at
| > >the question of gender discrimination in pay at her own university, and
| > >fails to reject the null hypothesis [no systemic difference in pay
| between
| > >male and female faculty]. She heads the example "Important, but not
| > >significant, differences in salaries"; comments (_perhaps_ technically
| > >correctly but misleadingly) that "a statistically naive reader could
| > >conclude that there is no problem" and in closing states:
| 
| and Dennis Roberts replied:
| 
| > the flaw here is that ... she has population data i presume ... or about
| as
| > close as one can come to it ... within the institution ... via the budget
| > or comptroller's office ... THE salary data are known ... so, whatever
| > differences are found ... DEMS are it!
| >
| > the notion of statistical significance in this case seems IRRELEVANT ...
| > the real issue is ... given that there are a variety of factors that might
| > account for such differences (numbers in ranks, time in ranks, etc. etc.)
| >  is the remaining difference (if there is one) IMPORTANT TO DEAL WITH
| ...
| 
| 
| If one can totally explain all contributing factors, so that a model
| with significantly fewer parameters than there are faculty fits everybody to
| within a practically significant margin of error, then yes, either the model
| continues to work with gender removed or it doesn't.
| 
| If, on the other hand, there are unknown sources of variation (a
| reasonable assumption in any situation involving people), or more sources of
| variation than there are data (another good bet if one thought hard enough),
| one cannot automatically go from the observation
| 
| (*)  "The average pay of female faculty members here is less than that of
| male faculty members"
| 
| to the apparently desired conclusion
| 
| (**)  "There is a gender-based _pattern_ of discrimination in faculty
| salaries"
| 
| without considering the study as a pseudo-experiment, and analyzing it as
| such.  One would be trying to decide: is the difference between mean male
| and female faculty salaries greater than one would expect if one took N1
| males and N2 females and assigned factors such as experience, rank,
| skill/luck at negotiating a first contract, demand for specialties,  merit
| pay actually deserved [as opposed to given on a gender basis], etc. at
| random?
| 
| This is what Utts and her coauthors were, it seems, trying to do.
| However, when the tests were not significant at the chosen level they seem
| to have fallen back on inferring (**) directly from (*).
| 
| -Robert Dawson
| 
| 
| 
| ===
| This list is open to everyone.  Occasionally, less thoughtful
| people send inappropriate messages.  Please DO NOT COMPLAIN TO
| THE POSTMASTER about these messages because the postmaster has no
| way of controlling the

Re: linear model or interactive model?

2000-04-16 Thread Joe Ward

Good message, Alan --
As you indicate, the model is LINEAR in the coefficients b0, b1, b2, b3
and in the 4-D space of y,x1,x2,x3(i.e., x1*x2) the function lies in a PLANE.
But in the 3-D space of y,x1,x2 the surface is TWISTED ( not in
a PLANE).

-- Joe

- Original Message - 
From: Alan McLean <[EMAIL PROTECTED]>
To: Wen-Feng Hsiao <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Sunday, April 16, 2000 4:01 PM
Subject: Re: linear model or interactive model?


| The model
| 
|  y = b0 + b1 * x1 + b2 * x2 + b3 * x1*x2
| 
| is a nonlinear model, just as in engineering. However, it is 'linear in the
| variables'. In statistics this is useful, because in estimating the model from a
| data set, one can define a 'new' variable x3 = x2*x2 and apply, for example, a
| linear regression algorithm.
| 
| But in interpreting the results you have to remember that the model is nonlinear!
| 
| Regards,
| Alan
| 
| 
| 
| 
| 
| Wen-Feng Hsiao wrote:
| 
| > Dear Hartig,
| >
| > Thanks for your reply. I am sorry for my poor knowledge in statistics.
| > But I wonder why the definition of 'linearity' of statistics is different
| > from that of engineering mathematics, which defines 'linear' as:
| >
| >  Each unknown xj appears to the first power only, and that there are no
| > cross product terms xi*xj with i!=j.
| >
| > Wen-Feng
| >
| > In article <[EMAIL PROTECTED]>,
| > [EMAIL PROTECTED] says...
| > > Generally, you can include an interaction (or moderator) term in a linear
| > > model, like
| > > y = b0 + b1 * x1 + b2 * x2 + b3 * x1*x2,
| > > and the model still is linear. If you decide not to include x1 and x2, like
| > > y = b0 + b1 * x1*x2,
| > > you still have a linear model.
| >
| > ===
| > This list is open to everyone.  Occasionally, less thoughtful
| > people send inappropriate messages.  Please DO NOT COMPLAIN TO
| > THE POSTMASTER about these messages because the postmaster has no
| > way of controlling them, and excessive complaints will result in
| > termination of the list.
| >
| > For information about this list, including information about the
| > problem of inappropriate messages and information about how to
| > unsubscribe, please see the web page at
| > http://jse.stat.ncsu.edu/
| > ===
| 
| --
| Alan McLean ([EMAIL PROTECTED])
| Department of Econometrics and Business Statistics
| Monash University, Caulfield Campus, Melbourne
| Tel:  +61 03 9903 2102Fax: +61 03 9903 2007
| 
| 
| 
| 
| ===
| This list is open to everyone.  Occasionally, less thoughtful
| people send inappropriate messages.  Please DO NOT COMPLAIN TO
| THE POSTMASTER about these messages because the postmaster has no
| way of controlling them, and excessive complaints will result in
| termination of the list.
| 
| For information about this list, including information about the
| problem of inappropriate messages and information about how to
| unsubscribe, please see the web page at
| http://jse.stat.ncsu.edu/
| ===
| 



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: linear model or interactive model?

2000-04-15 Thread Joe Ward

Wen-Feng-

The term LINEAR is a difficult term.

As I mentioned to you in an earlier message (included for
reference as the end of this message),
a LINEAR STATISTICAL MODEL is "LINEAR" in the unknown
coefficients, a1, a2,... ap in the model:

Y = a1*X1 + a2*X2 + ... + ap*Xp + E

The X predictors can be ANY NUMBERS THAT WE LIKE.

If we write --

Y = a1*U + a2*X + a2*X^2 + E

where 
U = 1
X  = a continuous predictor
X^2   = X*X 
E = error or residual

we might say that the function is NON-LINEAR in the two-dimensional, Y-X plane,
but it is LINEAR in the three dimensional space of Y-X-X^2.  With 3-D displays that we
can rotate as we would like, it is enlightening to observe that the CURVE seen in the 
two-dimensional
space lies in a PLANE in the three-dimensional space of Y-X-X^2.

-- Joe  
******** 
* Joe Ward  Health Careers High School *
* 167 East Arrowhead Dr 4646 Hamilton Wolfe*
* San Antonio, TX 78228-2402San Antonio, TX 78229  *
* Phone: 210-433-6575   Phone: 210-617-5400*
* Fax: 210-433-2828 Fax: 210-617-5423  *
* [EMAIL PROTECTED]*
* http://www.ijoa.org/joeward/wardindex.html   *



- Original Message - 
From: Wen-Feng Hsiao <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Saturday, April 15, 2000 5:14 AM
Subject: Re: linear model or interactive model?


| Dear Hartig,
| 
| Thanks for your reply. I am sorry for my poor knowledge in statistics.
| But I wonder why the definition of 'linearity' of statistics is different 
| from that of engineering mathematics, which defines 'linear' as:
| 
|  Each unknown xj appears to the first power only, and that there are no 
| cross product terms xi*xj with i!=j.
| 
| Wen-Feng
| 
| In article <[EMAIL PROTECTED]>, 
| [EMAIL PROTECTED] says...
| > Generally, you can include an interaction (or moderator) term in a linear
| > model, like
| > y = b0 + b1 * x1 + b2 * x2 + b3 * x1*x2,
| > and the model still is linear. If you decide not to include x1 and x2, like
| > y = b0 + b1 * x1*x2,
| > you still have a linear model.
| 

====
- Original Message - 
From: Joe Ward <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>; Wen-Feng Hsiao <[EMAIL PROTECTED]>
Sent: Thursday, April 13, 2000 10:30 AM
Subject: Re: linear model or interactive model?

- Original Message - 
From: Wen-Feng Hsiao <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, April 13, 2000 3:06 AM
Subject: linear model or interactive model?

| Dear all,
| 
| Suppose I have an aggregation model which is in the following form:
|   Y =  c1*(X11 * X12) + c2*(X21 * X22)?

| 
| This model could be thought as an aggregation of two knowledge, namely 
| X1. and X2.. Each knowledge contains two pieces of information 
| (attributes). For example, X1 contains X11 ans X12. Now if X.1 is the 
| height, and X.2 is the weight of a person. Then, the aggregation of any 
| two persons, say, Student1(height=170cm, weight=60kg), 
| Student2(height=180cm, weight=68kg) can be represented by
| 
| Y = 170*60+180*68=22440.
| 
| My question: a model as the above form is linear or interactive? I doubt 
| it is not a linear model. Since it is not in this form: Y= c1 X1 + c2 X2, 
| where c1 and c2 are constant. I doubt it is not a pure interactive form, 
| since X.1 and X.2 are dependent. Sorry for this stupid question.
| 
| Wen-Feng
| 
  Joe Ward writes| 
===

Wen-Feng---

Your model --

Y = X11 * X12 + X21 * X22.

does not have any unknowns.

Did you mean to write:

Y =  c1*(X11 * X12) + c2*(X21 * X22)?

All models of the form:

Y = c1*X1 + c2*X2 + ... + cp*Xp + E

are LINEAR MODELS.

It does not matter what NUMBERS are included in the Xs.

Y = c1*X1 + c2*X2 + c3*(X1*X2) + c4*(X1^2) + c5*(lnX1) + E

is LINEAR in the unknown coefficients c1, c2, ...

The most useful Xs are the BINARY( 1 or 0) predictors.


--- Joe
 
* Joe Ward  Health Careers High School *
* 167 East Arrowhead Dr 4646 Hamilton Wolfe*
* San Antonio, TX 78228-2402San Antonio, TX 78229  *
* Phone: 210-433-6575   Phone: 210-617-5400*
* Fax: 210-433-2828 Fax: 210-617-5423  *
* [EMAIL PROTECTED]*
* http://www.ijoa.org/joeward/wardindex.html   *






Re: linear model or interactive model?

2000-04-13 Thread Joe Ward


- Original Message - 
From: Wen-Feng Hsiao <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, April 13, 2000 3:06 AM
Subject: linear model or interactive model?

| Dear all,
| 
| Suppose I have an aggregation model which is in the following form:
|   Y =  c1*(X11 * X12) + c2*(X21 * X22)?

| 
| This model could be thought as an aggregation of two knowledge, namely 
| X1. and X2.. Each knowledge contains two pieces of information 
| (attributes). For example, X1 contains X11 ans X12. Now if X.1 is the 
| height, and X.2 is the weight of a person. Then, the aggregation of any 
| two persons, say, Student1(height=170cm, weight=60kg), 
| Student2(height=180cm, weight=68kg) can be represented by
| 
| Y = 170*60+180*68=22440.
| 
| My question: a model as the above form is linear or interactive? I doubt 
| it is not a linear model. Since it is not in this form: Y= c1 X1 + c2 X2, 
| where c1 and c2 are constant. I doubt it is not a pure interactive form, 
| since X.1 and X.2 are dependent. Sorry for this stupid question.
| 
| Wen-Feng
| 
====  Joe Ward writes| 
===

Wen-Feng---

Your model --

Y = X11 * X12 + X21 * X22.

does not have any unknowns.

Did you mean to write:

Y =  c1*(X11 * X12) + c2*(X21 * X22)?

All models of the form:

Y = c1*X1 + c2*X2 + ... + cp*Xp + E

are LINEAR MODELS.

It does not matter what NUMBERS are included in the Xs.

Y = c1*X1 + c2*X2 + c3*(X1*X2) + c4*(X1^2) + c5*(lnX1) + E

is LINEAR in the unknown coefficients c1, c2, ...

The most useful Xs are the BINARY( 1 or 0) predictors.


--- Joe
**** 
* Joe Ward  Health Careers High School *
* 167 East Arrowhead Dr 4646 Hamilton Wolfe*
* San Antonio, TX 78228-2402San Antonio, TX 78229  *
* Phone: 210-433-6575   Phone: 210-617-5400*
* Fax: 210-433-2828 Fax: 210-617-5423  *
* [EMAIL PROTECTED]*
* http://www.ijoa.org/joeward/wardindex.html   *







===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing

2000-04-07 Thread Joe Ward

Hi, Dennis--

Yes, "LOT of years!" ago (the 1950's), when I first started into the real applied 
world,
our main job was to PREDICT, PREDICT, PREDICT outcomes.  We had
some real cost figures to evaluate our predictions.  Before the term Bootstrap
arrived on the scene, we were Cross-Validating like mad.  We would divide those
punched cards into "random?" groups and shuffle them over and over again and 
"re-group".
Then apply the predictions developed from one data set to the others to see how well
he were doing.

Hypothesis testing -- in the classical sense -- was not involved

I still believe that TWO important ideas in life are:

- PREDICTION
and 
- OPTIMIZATION (choosing among alternative PREDICTIONS to MAXIMIZE or MINIMIZE one
or more OBJECTIVE FUNCTIONS).

If "Hypothesis testing" helps improve PREDICTION and OPTIMIZATION then that's great.

One of the difficulties in academia may be due to the lack of practical, 
decision-making
opportunities.  

What PRACTICAL  ACTIONS do we take as a result of analyzing a two-way table with
a Chi-Square "test" if we find a "statistically significant"  outcome?  I imagine we 
will get
some suggestions from our readers!
:-)
-- Joe
******** 
* Joe Ward  Health Careers High School *
* 167 East Arrowhead Dr 4646 Hamilton Wolfe*
* San Antonio, TX 78228-2402San Antonio, TX 78229  *
* Phone: 210-433-6575   Phone: 210-617-5400*
* Fax: 210-433-2828 Fax: 210-617-5423  *
* [EMAIL PROTECTED]*
* http://www.ijoa.org/joeward/wardindex.html   *





- Original Message - 
From: dennis roberts <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, April 07, 2000 6:41 AM
Subject: hyp testing


| let's say that today ... we as the statistical community decided, by 
| democratic vote, that the concept of 'hypothesis testing' ... which has 
| essentially dominated statistical work for as long as i can remember 
| (which,  er um ... is a LOT of years!) ... is relegated to the 'we USED 
| to do this stuff' category
| 
| just THINK about this 
| 
| what would the vast majority of folks who either do inferential work and/or 
| teach it ... DO
| what analyses would they be doing? what would they be teaching?
| 
| 
| 
| ===
| This list is open to everyone.  Occasionally, less thoughtful
| people send inappropriate messages.  Please DO NOT COMPLAIN TO
| THE POSTMASTER about these messages because the postmaster has no
| way of controlling them, and excessive complaints will result in
| termination of the list.
| 
| For information about this list, including information about the
| problem of inappropriate messages and information about how to
| unsubscribe, please see the web page at
| http://jse.stat.ncsu.edu/
| ===
| 



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Reference for "regression discontinuity"

2000-03-22 Thread Joe Ward



Hi, Carl ---
 
If you still have your copy of
 Introduction to Linear Models (Ward & 
Jennings)
you will find many examples in Chapters 10 and 
11.
 
An interesting example is on paged 217, 

11.9 Discontinuity Between Two 
Second-Degree Polynomials.
 
With facility to create linear models 
appropriate to the 
research questions of interest, many 
seemingly-unique problems
can be handled easily, e.g. Cubic 
Splines. 
 
-- Joe
 
* Joe 
Ward  
Health Careers High School ** 167 East Arrowhead 
Dr 
4646 Hamilton Wolfe    ** San 
Antonio, TX 
78228-2402    
San Antonio, TX 78229  ** Phone: 
210-433-6575   
Phone: 210-617-5400    ** Fax: 
210-433-2828 
Fax: 210-617-5423  ** 
[EMAIL PROTECTED]    
** http://www.ijoa.org/joeward/wardindex.html   
*
 
 
 
 
 
- Original Message - 
From: Carl J Huberty <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, March 22, 2000 8:30 
AM
| Will someone give me a (readable) 
reference for "regression| discontinuity"?  Thanks in advance.| 
| Carl Huberty| | | | 
===| 
This list is open to everyone.  Occasionally, less thoughtful| people 
send inappropriate messages.  Please DO NOT COMPLAIN TO| THE POSTMASTER 
about these messages because the postmaster has no| way of controlling them, 
and excessive complaints will result in| termination of the list.| | 
For information about this list, including information about the| problem of 
inappropriate messages and information about how to| unsubscribe, please see 
the web page at| http://jse.stat.ncsu.edu/| 
===| 



Re: Matrix multiplication

2000-03-18 Thread Joe Ward



David --
 
Great message!!
 
One of most "revealing" numerical analysis 
problems is when there is
interest in "POWERING" a transition matrix in a 
Markov model.
 
PRE-MULTIPLYING to "POWER" the matrix 

compared to
POST-MULTIPLYING  can get quite different 
results
 
This due to the different order of accumulation of 
the sum of products of
numbers between 0 and 1.
 
Numerical analysts can have lots of challenging 
problems.
 
-- Joe
******** 
* Joe 
Ward  
Health Careers High School ** 167 East Arrowhead 
Dr 
4646 Hamilton Wolfe    ** San 
Antonio, TX 
78228-2402    
San Antonio, TX 78229  ** Phone: 
210-433-6575   
Phone: 210-617-5400    ** Fax: 
210-433-2828 
Fax: 210-617-5423  ** 
[EMAIL PROTECTED]    
** http://www.ijoa.org/joeward/wardindex.html   
*
 
 
 
 
- Original Message - 
From: David A. Heiser <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>; 
Anthony Pleticos <[EMAIL PROTECTED]>
Sent: Friday, March 17, 2000 2:27 PM
Subject: Re: Matrix 
multiplication
| | - Original Message 
-| From: Anthony Pleticos <[EMAIL PROTECTED]>| 
To: <[EMAIL PROTECTED]>| 
Sent: Wednesday, March 15, 2000 4:24 PM| Subject: Matrix multiplication| 
| | > I don't know if I hit the correct site but would be grateful 
for an| answer -| > it is a fundamental one. We all know that linear 
regression can be| > accomplished by matrix multiplication and that there 
are packages which| will| > do it for you. I am teaching myself C++ 
and for the purposes of the| > excercise I would like to know how to 
create a matrix or obtain ready made| > code (ie "numerical recipe" 
)class so I could declare in a program:| >| > #include 
| > #include | > #include 
  /* if there is such a file 
*/| 
| 
| The basic problem is that there is an enormous 
differences between real| world matricies. There is no one method for 
numerical matrix reductions. For| example note the very large number of 
Fortran subroutines that focus on| peculiar aspects (banded, complex, 
sparse, near singular, positive definite,| not positive definate, 
triangular, rank deficient, etc., etc) Note the large| number of free 
Fortran subroutines devoted to matrices in "NETLIB". There| are other free 
Fortran libraries available from the web.| | Matrix multiplication is 
not numerically straightforward given a finite| computer environment. One 
can get very misleading results doing the standard| multiply and add method 
using standard single precision.| | I would suggest you get familiar 
with numerical analysis methods. I| personally prefer the works of G. W. 
Stewart as a source.| | DAHeiser| | | | 
===| 
This list is open to everyone.  Occasionally, less thoughtful| people 
send inappropriate messages.  Please DO NOT COMPLAIN TO| THE POSTMASTER 
about these messages because the postmaster has no| way of controlling them, 
and excessive complaints will result in| termination of the list.| | 
For information about this list, including information about the| problem of 
inappropriate messages and information about how to| unsubscribe, please see 
the web page at| http://jse.stat.ncsu.edu/| 
===| 



Re: Looking for text on resampling...

2000-03-17 Thread Joe Ward



Scott --
Peter Bruce  should be able to give us the 
latest "word".
-- Joe
******** 
* Joe 
Ward  
Health Careers High School ** 167 East Arrowhead 
Dr 
4646 Hamilton Wolfe    ** San 
Antonio, TX 
78228-2402    
San Antonio, TX 78229  ** Phone: 
210-433-6575   
Phone: 210-617-5400    ** Fax: 
210-433-2828 
Fax: 210-617-5423  ** 
[EMAIL PROTECTED]    
** http://www.ijoa.org/joeward/wardindex.html   
*
 
 
 
- Original Message - 

From: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Sunday, February 06, 2000 12:08 
PM
Subject: Looking for text on 
resampling...
|   Our small college library has a 
collection of basic biostats texts but| nothing that specifically covers the 
area of resampling. I am currently| looking over a 1991 text by Bryan Manly 
(Randomization and Monte Carlo| Methods in Biology) - the first two chapters 
seem quite accessible (to| someone unfamiliar with the field!)| | 
  Could anyone suggest other texts that might cover bootstrapping and| 
jacknife techniques - I would favour texts that have a biology bent and| are 
written so non-specialists can follow...| |   Many thanks!| 
| Scott| | | Sent via Deja.com http://www.deja.com/| Before you buy.| 
| | 
===| 
  This list is open to everyone. Occasionally, people lacking respect| 
  for other members of the list send messages that are inappropriate| 
  or unrelated to the list's discussion topics. Please just delete the| 
  offensive email.| |   For information concerning the list, 
please see the following web page:|   http://jse.stat.ncsu.edu/| 
===| 



Re: Why do we use and teach z?

2000-03-17 Thread Joe Ward



Josh, Bill, et al --
 
I can't resist!!
 
Yes, those who have invested much of their life in 
acquiring certain
knowledge tend to want future generations to have 
those "exciting"
historical experiences.  It is rather 
unfortunate that we have a hard
time making changes to give future generations 
some of the power they deserve.
 
I experienced some difficulty in the 1950's with 
those folks who had become
"masters" of the various analysis of 
variance algorithms that were developed
before computers became available.  My first 
major job in the 1950s was to 
"get us off of Frieden, Marchant  and Monroe 
desk calculators onto the
IBM 602A followed by IBM 607, then IBM 650 etc."  The biggest difficulty was 
to get researchers to take advantage of the 
computer power that allowed them the
freedom to create their own models to answer their questions of 
interest.
 
It was very difficult for persons with Ph.D. degrees to give up that for which 
they had invested so much time to learn.  
It was a little "traumatic" in the 1950s when a Ph.D. was told 
that "you don't need to have equal or proportional Ns in a two-way ANOVA". And it 
was 
really interesting to see the reaction when they 
were told that "you don't need 
a response in every cell".  As a matter of 
fact, the managers of our Air Force research
organization assembled a panel of experts to come 
in to find out what Bob Bottenberg
and I were up to when we were promoting the use of 
a more general approach to
creating models to answer research questions of 
interest.
 
It is indeed amazing that, 40 years later, many 
first-course statistics students
are told that "IT IS BEYOND THE SCOPE OF THIS TEXT 
TO DEAL WITH SITUATIONS IN WHICH 
SAMPLE SIZES ARE UNEQUAL IN THE CELLS OF TWO-WAY 
ANOVA".
It is little wonder that these students can do 
very little data analysis in support
of practical research.
 
A few of you have heard this "sermon" 
before!!
 
By the way, those of you who have six weeks of 
school after the exam might 
want to give your students some power to use 
Prediction/Regression/Linear Models
and Computers. They might be able to do some 
useful data analysis and appreciate
your efforts!!
 
Well, that's enough from a "NON-INFLUENTIAL 
OUTLIER".
 
-- Joe
 
 
* Joe 
Ward  
Health Careers High School ** 167 East Arrowhead 
Dr 
4646 Hamilton Wolfe    ** San 
Antonio, TX 
78228-2402    
San Antonio, TX 78229  ** Phone: 
210-433-6575   
Phone: 210-617-5400    ** Fax: 
210-433-2828 
Fax: 210-617-5423  ** 
[EMAIL PROTECTED]    
** http://www.ijoa.org/joeward/wardindex.html   
*

  - Original Message - 
  From: 
  Joshua Tabor 
  To: William J. Larson ; AP Stats. list 
  
  Sent: Friday, March 17, 2000 9:11 
AM
  Subject: RE: Why do we use and teach 
  z?
   Reply to:   RE: Why do we use and teach z?

I agree with you completely. The 
  only explanation I received for why it is still in most books is that it is a 
  nice stepping stone to a full fledged t-test (of course, it is very likely I 
  am misinformed). Anyway, this year I have decided to teach inference for 
  proportions first (as the stepping stone) and then go straight into t-tests, 
  eliminating z-tests for means. It helps make the course more realistic, and it 
  saves me precious time (we start the second week of september and have 6 weeks 
  of school after the AP!).I am curious to hear what the college folks 
  (and textbook authors) have to sayjoshJosh TaborWilson HSHacienda 
  Heights, CA[EMAIL PROTECTED]William J. Larson wrote:>Why do we use and teach z?>>As I 
  continually tell my students, normally (no pun intended) we do >not 
  know sigma, so we should use t not z. Indeed can we ever know>sigma? If 
  not why do we even bother to mention z? Is it historical >reasons? Or 
  because in the real world lots of people ignore the above >fact & 
  use z anyway, so we are conscientiously preparing our students >for the 
  real world? Or (more likely) am I missing something?>>Dr. 
  William J. Larson>[EMAIL PROTECTED]>Institut Monte Rosa>Montreux, 
  Switzerland>>>>>>===>The 
  Advanced Placement Statistics List>To UNSUBSCRIBE send a message to 
  [EMAIL PROTECTED] containing:>unsubscribe apstat-l >Discussion archives are at>http://forum.swarthmore.edu/epigone/apstat-l>Problems with the list or your 
  subscription? mailto:[EMAIL PROTECTED]>==

Re: When *must* use weighted LS?

2000-03-15 Thread Joe Ward



John--
 
If you are interested in PREDICTION then the 
way YOU use your information is up to
YOU.  By Cross-validation, Resampling etc. 
you can determine which prediction method
seems to be "best" for your 
situation.
 
-- Joe
******** 
* Joe 
Ward  
Health Careers High School ** 167 East Arrowhead 
Dr 
4646 Hamilton Wolfe    ** San 
Antonio, TX 
78228-2402    
San Antonio, TX 78229  ** Phone: 
210-433-6575   
Phone: 210-617-5400    ** Fax: 
210-433-2828 
Fax: 210-617-5423  ** 
[EMAIL PROTECTED]    
** http://www.ijoa.org/joeward/wardindex.html   
*
 
 
 
 
 
- Original Message - 

From: John Hendrickx <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, March 15, 2000 1:22 
AM
Subject: Re: When *must* use weighted 
LS?
| In article <8am7d1$hqj$[EMAIL PROTECTED]">8am7d1$hqj$[EMAIL PROTECTED]>, 
| [EMAIL PROTECTED] 
says...| > | > I think I made the formulation too wordy in 
previous| > post.  | > | > Let me try this simple 
question:| > | > When one wishes to do a (multi)linear regression 
on a set of | > observed data, and one is in the (unusual) position of 
possessing| > a set of sample standard deviations (of varying degrees of 
f.) | > at each value of the "explanatory" variable, how does one| 
> determine whether one ought or ought not to solve the weighted| > 
least squares problem using those sample standard deviations?| > | 
> What is the usual decision test for "heterscedasticity" *before* one| 
> solves the regression system?  What do people do in practise?| 
> | Most social scientists don't worry very much about the assumptions of 
OLS | regression, noting that OLS estimates are fairly robust and can give 
| unbiased estimates even if those assumptions aren't fulfilled. Exceptions 
| are multilevel models and time series data, data for which the assumption 
| of uncorrelated error terms is violated. But these require special | 
programs, not weighted least squares.| | There is also some debate on 
using weights for stratified sampling and/or | to correct for sampling bias. 
Weighting leads to correct estimates but | incorrect standard errors. One 
solution is to include the design | variables in the model instead of 
weighting. Stata and Wesvar are two | programs that can take weighting into 
account when calculating standard | errors of estimates. But a quite common 
approach is to use weights for | descriptive statistics, but not in 
multivariate models.| | Weights can also be used for certain dependent 
variables that will | violate the assumption of heteroscedasticity, e.g. a 
dichotomous | dependent. I recently did a weighted least squares analysis 
for a co-| worker to replicate an analysis in another paper. The weight was 
| groupn*pct*(1-pct), where groupn was the number of cases per group and 
| pct was the proportion with a positive response within each group. But 
| this basically amounts to a poor approximation of a logit model. Programs 
| like GLIM that use iteratively reweighted least squares use pct*(1-pct) 
| as the weight when estimating the model, but now pct is the predicted 
| probability from the previous iteration.| | As for a test for 
heteroscedasticity, Stata has a "hettest", which | performs a Cook-Weisberg 
test and produces a chi-square statistic. They | wrote a book in 1982, 
"Residuals and influence in regression". I've never | used it though.| 
| Hope this helps,| John Hendrickx| | | 
===| 
This list is open to everyone.  Occasionally, less thoughtful| people 
send inappropriate messages.  Please DO NOT COMPLAIN TO| THE POSTMASTER 
about these messages because the postmaster has no| way of controlling them, 
and excessive complaints will result in| termination of the list.| | 
For information about this list, including information about the| problem of 
inappropriate messages and information about how to| unsubscribe, please see 
the web page at| http://jse.stat.ncsu.edu/| 
===| 



Re: Cluster and outliers

2000-03-12 Thread Joe Ward



Nicolas --
 
Most of the statistical software systems have 
Clustering Algorithms with
a variety of objective functions.  It is 
certainly reasonable to use a several approaches
to help identify "OUTLIERS" or "INFLUENTIAL" 
observations.  The identification AND definition of an 
"OUTLIER" or "INFLUENTIAL" observations should 
be  the responsibility of the researcher who KNOWS the context 
of the analysis.
 
Also, regression models can be used to provide 
information to help the researcher
identify "OUTLIERS" or "INFLUENTIAL" 
observations.  
 
One approach is to LEAVE EACH OBSERVATION OUT OF 
THE ANALYSIS and 
"test the hypothesis that the observed Y value for 
each of the "left-out" observations
is equal to the PREDICTED value from the other N-1 
observations." The output of these
N hypotheses can be helpful.
 
The Classification Society of North America has a 
web site at
http://www.pitt.edu/~csna/  that 
might be helpful in your search about
Clustering.  A good contact is the Secretary/Treasurer of CSNA:Stanley L. 
ScloveDepartment of Information and Decision Sciences M/C 294College of 
Business AdministrationUniversity of Illinois at Chicago601 S. Morgan 
StreetChicago, IL 60607-7124www.uic.edu/~slsclove
Have fun!!
 
--Joe
**** 
* Joe 
Ward  
Health Careers High School ** 167 East Arrowhead 
Dr 
4646 Hamilton Wolfe    ** San 
Antonio, TX 
78228-2402    
San Antonio, TX 78229  ** Phone: 
210-433-6575   
Phone: 210-617-5400    ** Fax: 
210-433-2828 
Fax: 210-617-5423  ** 
[EMAIL PROTECTED]    
** http://www.ijoa.org/joeward/wardindex.html   
*
 
 
 
 
 
 
- Original Message - 
From: Nicolas MEYER <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Sunday, March 12, 2000 4:52 AM
Subject: Cluster and outliers
| Hi everybody !!| | I'm 
desperately looking for books or papers on possible links beetwen| cluster 
analysis and outliers, cluster being of course used to detect| 
outlier(s).| Does anybody knows anything about this ?| Thank's !!| 
| Nicolas MEYER| Interne en Santé Publique| CHU 
Strasbourg-FRANCE| | | | | 
===| 
This list is open to everyone.  Occasionally, less thoughtful| people 
send inappropriate messages.  Please DO NOT COMPLAIN TO| THE POSTMASTER 
about these messages because the postmaster has no| way of controlling them, 
and excessive complaints will result in| termination of the list.| | 
For information about this list, including information about the| problem of 
inappropriate messages and information about how to| unsubscribe, please see 
the web page at| http://jse.stat.ncsu.edu/| 
===| 



Re: Repeated measures

2000-03-09 Thread Joe Ward



Hi, Kaspar--
 
The CORRECT model is the one that allows YOU to 
answer YOUR OWN
questions of interest.  If the "packaged" 
PROCs have been 
verified to do what YOU want, then that's good.
It is sometimes difficult to know what question a 
"packaged" PROC 
is attempting to answer.
 
  Be careful -- especially if there may 
be "missing cells".
 
:-)
--Joe
******** 
* Joe 
Ward  
Health Careers High School ** 167 East Arrowhead 
Dr 
4646 Hamilton Wolfe    ** San 
Antonio, TX 
78228-2402    
San Antonio, TX 78229  ** Phone: 
210-433-6575   
Phone: 210-617-5400    ** Fax: 
210-433-2828 
Fax: 210-617-5423  ** 
[EMAIL PROTECTED]    
** http://www.ijoa.org/joeward/wardindex.html   
*
 
 
 
 
- Original Message - 
From: Kasper Hornbæk <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, March 09, 2000 1:41 
AM
Subject: Q: Repeated measures
| Hi everybody.| I have a question 
concerning repeated measures analysis. I am not sure of| whether a linear 
model with a factor that varies as repeated measures are| taken (e.g., order 
or session) is identical to a repeated measures analyses.| I'll detail the 
question below.| | I have a within-subject study in which subjects used 
three methods to solve| six different tasks. The experiment is run in three 
sessions, each| consisting of two tasks. Three of the tasks are very 
different from the| other three tasks.| | For analysing this 
experiment, I plan to use a model like Y[ijkl]:= u+| subject[i]+ task[j]+ 
session[k]+ method[l]+ e[ijkl],| possibly adding interactions between task, 
method and session. Is this a| repeated measures analysis or equivalent to a 
repeated measures analysis?| | If not, how should I analyse these data 
using SAS's repeated measures| option?| | Kind regards,| 
   Kasper Hornbæk/|    kash(at)diku.dk| | | 
| | 
===| 
This list is open to everyone.  Occasionally, less thoughtful| people 
send inappropriate messages.  Please DO NOT COMPLAIN TO| THE POSTMASTER 
about these messages because the postmaster has no| way of controlling them, 
and excessive complaints will result in| termination of the list.| | 
For information about this list, including information about the| problem of 
inappropriate messages and information about how to| unsubscribe, please see 
the web page at| http://jse.stat.ncsu.edu/| 
===| 



Fw: other uses for Minitab

2000-03-06 Thread Joe Ward



Hi, Tim -- It's good to hear that 
  some folks think it is useful to fit a least-squaresline through the 
  origin.  Of course it is even better to be able to "force"a 
  least-squares model to have a wide range of properties 
  (restrictions).Without any connection 
  to statistics, students should be given the 
  opportunityto use their algebra "savvy" to impose restrictions on math 
  models. For example, Given a model of the 
  form: Y = a0 + a1*X + a2*X^2 + E it might be of 
  interest to "restrict" the model to: -- Pass through the 
  originor-- Pass through X=1 and Y = 2or-- Slope = 0 at 
  X=5  (For the calculus crowd) orMany 
  others!---Using Algebra, Geometry and Trig. the "least-squares 
  story" can be presented to students WITHOUT 
  CALCULUS. Minimizing "distance" from a point to a line, or plane, 
  or hyper-plane seems tobe more appealing than taking partial 
  derivatives.  Connecting "perpendicularity" to"orthogonality" seems 
  to work well. -- 
  Joe******** 
  * Joe 
  Ward  
  Health Careers High School ** 167 East Arrowhead 
  Dr 
  4646 Hamilton Wolfe    ** San 
  Antonio, TX 
  78228-2402    
  San Antonio, TX 78229  ** Phone: 
  210-433-6575   
  Phone: 210-617-5400    ** Fax: 
  210-433-2828 
  Fax: 210-617-5423  ** 
  [EMAIL PROTECTED]    
  ** http://www.ijoa.org/joeward/wardindex.html  
  * ----- 
  Original Message - From: Tim Erickson <[EMAIL PROTECTED]>To: Joe Ward <[EMAIL PROTECTED]>Sent: Sunday, 
  March 05, 2000 3:28 PMSubject: Re: other uses for Minitab| on 
  00.03.03 10:51 PM, Joe Ward at [EMAIL PROTECTED] wrote:| | > 
  A Bob, you remembered.| > | > I've been "bugging" the 
  calculator makers for many years about including| > the least-squares 
  model of the form:| > | > LinReg(bx), Letting the function pass 
  through the origin.| | | just a note -- Fathom has a "lock 
  Intercept at Zero" command for its least| squares regression, hich amounts 
  to the same thing.| | I think it's also an interesting exercise for a 
  (calculus?) student to| derive a formula for "b" given an arbitrary set of 
  data and the constraint| that b must minimize the sum of squares of the 
  residuals.  At least it was| interesting to me!| | Tim| 
  | | 

Earl 
Jennings   
Phone: (512) 
345-0628  
|
6917 Thorncliffe Dr.  
e-mail 
address:  
|
Austin, TX 78731-2955   
[EMAIL PROTECTED]  |
 



Fw: other uses for Minitab

2000-03-06 Thread Joe Ward



Hi, Tim -- It's good to hear that 
  some folks think it is useful to fit a least-squaresline through the 
  origin.  Of course it is even better to be able to "force"a 
  least-squares model to have a wide range of properties 
  (restrictions).Without any connection 
  to statistics, students should be given the 
  opportunityto use their algebra "savvy" to impose restrictions on math 
  models. For example, Given a model of the 
  form: Y = a0 + a1*X + a2*X^2 + E it might be of 
  interest to "restrict" the model to: -- Pass through the 
  originor-- Pass through X=1 and Y = 2or-- Slope = 0 at 
  X=5  (For the calculus crowd) orMany 
  others!---Using Algebra, Geometry and Trig. the "least-squares 
  story" can be presented to students WITHOUT 
  CALCULUS. Minimizing "distance" from a point to a line, or plane, 
  or hyper-plane seems tobe more appealing than taking partial 
  derivatives.  Connecting "perpendicularity" to"orthogonality" seems 
  to work well. -- 
  Joe******** 
  * Joe 
  Ward  
  Health Careers High School ** 167 East Arrowhead 
  Dr 
  4646 Hamilton Wolfe    ** San 
  Antonio, TX 
  78228-2402    
  San Antonio, TX 78229  ** Phone: 
  210-433-6575   
  Phone: 210-617-5400    ** Fax: 
  210-433-2828 
  Fax: 210-617-5423  ** 
  [EMAIL PROTECTED]    
  ** http://www.ijoa.org/joeward/wardindex.html  
  * ----- 
  Original Message - From: Tim Erickson <[EMAIL PROTECTED]>To: Joe Ward <[EMAIL PROTECTED]>Sent: Sunday, 
  March 05, 2000 3:28 PMSubject: Re: other uses for Minitab| on 
  00.03.03 10:51 PM, Joe Ward at [EMAIL PROTECTED] wrote:| | > 
  A Bob, you remembered.| > | > I've been "bugging" the 
  calculator makers for many years about including| > the least-squares 
  model of the form:| > | > LinReg(bx), Letting the function pass 
  through the origin.| | | just a note -- Fathom has a "lock 
  Intercept at Zero" command for its least| squares regression, hich amounts 
  to the same thing.| | I think it's also an interesting exercise for a 
  (calculus?) student to| derive a formula for "b" given an arbitrary set of 
  data and the constraint| that b must minimize the sum of squares of the 
  residuals.  At least it was| interesting to me!| | Tim| 
  | | 

Earl 
Jennings   
Phone: (512) 
345-0628  
|
6917 Thorncliffe Dr.  
e-mail 
address:  
|
Austin, TX 78731-2955   
[EMAIL PROTECTED]  |
 



Re: Howto interpret interactions in an ANOVA

2000-02-29 Thread Joe Ward



Hi all --
 
Again -- I'm jumping on the band wagon in support 
of these messages that
advocate-- what I call -- a 
PREDICTION/REGRESSION/LINEAR MODELS approach.
 
I was attracted to Lee Wilkinson and SYSTAT many 
years ago when Lee
had a sign at one of his SYSTAT BOOTHS that 
said:
 
"Ask me about Cell Means Analysis" (May not be 
Lee's exact words)
 
 I was so excited to see a software package 
that required the user to
insert the word CONSTANT in the regression model 
when the user
wanted it -- NOT AS THE DEFAULT.  When using 
SAS at 
Clemson in 1985-86, I had to tell students that 
they must use the NOINT
OPTION until I explained why.  A most 
misunderstood and troublesome idea
is the lack of understanding of the predictor, U, 
a vector of 1's. If students
would -- in the beginning -- insert THEIR OWN U, 
when needed, then they might
have a better understanding of the "efficiency" of 
having the CONSTANT or INTERCEPT
as the DEFAULT. This lack of understanding about 
the CONSTANT or INTERCEPT is
revealed by the many Email messages we see related 
to "What is RSQ WHEN there is NO 
CONSTANT or INTERCEPT".
 
It is interesting that the more "modern" versions 
of SYSTAT require the user to
REMOVE THE CONSTANT when appropriate.
 
It would be really great if the statistics 
education folks would advocate the
introduction of PREDICTION/REGRESSION/LINEAR 
MODELS early so that the students
would have something useful in their experience 
and perhaps continue their study
of statistics.  I'm afraid that many FIRST 
STATISTICS COURSES have little
"selling/marketing" effect on 
students.
 
The "Cell-Means Approach" is easy to introduce to 
high school students, since
these students have experiences with AVERAGES, 
MEANS, GPAs.  And the
"Missing Cells Problem?" is really not a 
problem until the students are
told that some folks don't know what to do about 
"Missing Cells".
 
Enough "preaching to the choir"!!
 
--Joe
 
 
 
* Joe 
Ward  
Health Careers High School ** 167 East Arrowhead 
Dr 
4646 Hamilton Wolfe    ** San 
Antonio, TX 
78228-2402    
San Antonio, TX 78229  ** Phone: 
210-433-6575   
Phone: 210-617-5400    ** Fax: 
210-433-2828 
Fax: 210-617-5423  ** 
[EMAIL PROTECTED]    
** http://www.ijoa.org/joeward/wardindex.html   
*
- Original Message - 
From: Gregory C. Mayer <[EMAIL PROTECTED]>
 
To: <[EMAIL PROTECTED]>
Sent: Tuesday, February 29, 2000 6:46 
AM
Subject: Re: Howto interpret interactions in an 
ANOVA
| R.R. Sokal & F.J. Rohlf in 
Biometry (1995, Freeman) emphasize the unity of| anova, ancova and 
regression (and in their shorter Introduction to| Biostatistics, anova and 
regression).  They introduce them in turn,| however; I agree that a 
text that began with glm and then took up anova,| ancova and regression as 
instances of the general approach would be| preferable.  This is 
especially so when using Systat, as the model| statements closely parallel 
the models, allowing more complex| models to be grasped and implemented 
immediately, instead of being treated| as some new technique.| | 
Gregory C. Mayer| [EMAIL PROTECTED]| 
| | | | On Mon, 28 Feb 2000, Bob Madden wrote:| | > I 
agree.  In  fact, I have sought in vain for an introductory level 
statistics| > text that does not treat ANOVA and regression as two 
totally separate,| > disconnected techniques.| > With 
disconcerting monotony, they all monkey each other in this respect.  I| 
> think students| > would be better served by being shown early on 
that regression, ANOVA, and for| > that| > matter, ANCOVA, are all 
special cases of the glm.| > | > --Bob Madden| > | > 
James Friedrich wrote:| > | > > Let me ad to the speculation 
regarding why interaction effects are often| > > omitted from multiple 
regression.  I think the reality is that  people are| > > 
generally trained in one "mode" or the other (ANOVA or Regression) without| 
> > a sense of their connectedness (a point already alluded to in 
previoous| > > posts).  In an in-press national survey of 
undergraduate statistical| > > instruction for psychology majors, I 
found that ANOVA dominates, with| > > little attention to  
regression (except "simple").  The specialties of| > > those 
teaching the stats / methods courses tends to be in laboratory -| > > 
experimental areas where ANOVAs are the norm.  The bottom line is that 
i| > > don't think budding psychologists, at least, 

Re: Linear Regression with known intercept (Long Message)

2000-02-14 Thread Joe Ward




  Mark writes -
   
  - Original Message - From: <[EMAIL PROTECTED]>To: <[EMAIL PROTECTED]>Sent: 
  Saturday, February 12, 2000 4:51 PMSubject: Linear Regression with known 
  intercept| Hi,| | If I want to find the least squares 
  estimator of the slope of a simple| linear regression model where my 
  intercept is known, will this| estimator will be the same as if I did not 
  know my intercept(=Sxy/sxx)?| How about the variance and the confidence 
  interval of my estimator?| will they be bigger or smaller than the 
  estimator for the case where| both my intercept and slope unknown?| 
  | Thank you for your help.| | Mark| | | Sent via 
  Deja.com http://www.deja.com/---Hi, Mark --Glad 
  you sent this Email.  It is a nice and simple example of the useof 
  Prediction/Regression/Linear Models -- which should be one of theimportant 
  objectives of a FIRST NON-CALCULUS-BASED STATISTICS COURSE.Consider, 
  first, the Simple Regression Model:Y = a1*U +  a2*X + 
  E1where Y  = a vector containing 
  observations on a  dependent or response variable.U  = a predictor (vector) containing all 1's. (THE MOST 
  NEGLECTED AND NON-UNDERSTOOD PREDICTOR OF ALL)X  = another predictor 
  with any elements -- could be BINARY (0,1).E1= the Error or Residual 
  vector.a1 = least-squares regression coefficient 
  of U    (this is frequently 
  referred to as the "Y-intercept").a2 = least-squares regression 
  coefficient of X    (this is 
  frequently referred to as the "Slope".A powerful capability to give 
  students who are comfortable withAlgebra is to be able to IMPOSE ANY 
  DESIRED LINEAR RESTRICTIONSON A 
  LINEAR MODEL OF THE FORM:Y = a1*X1 + a2*X2 + ... + ap *Xp + 
  EThis capability is useful in many applications 
  BESIDES STATISTICS.Now, to your neat example:"If I want to 
  find the least squares estimator of the slope of a simplelinear regression 
  model where my intercept is known, ...  "You wish to impose the 
  restriction that-a1 = k (a known value)Imposing that restriction 
  on Model 1 above gives:Y = k*U +  a2*X + E2The only 
  unknown regression coefficient is a2 which I will rename as:Let b2 = 
  a2 to remind us that the numerical value of the coefficient of Xin Model 1 
  is most likely different from the value in Model 2.Then, Y = k*U + 
  b2*X + E2Since k*U is known, the least-squares value for b2 is 
  obtained from:Y-k*U = b2*X + E2or letting 
  Y-k*U be designated by a single symbol, WW = b2*X 
  + E2and the least-squares value of b2 for Model 2 (and for any 
  ONE-PREDICTOR model) is:    b2 = (W'X)/(X'X)  
  =  Sum(wi*xi)/Sum 
  (xi*xi)  b2 is 
  the "slope of the line which is "forced by the restriction" a1 = 
  k Most software now allows one to find the value of b2 by 
  forcing
  an option that requires that the vector U be omitted as a 
  predictor.
  If you have good software available, the software will 
  produce the 
  standard errors of a1 and a2 by solving equation 1 and the 
  standard
  error b2 by solving equation 2. 
  ---Now, if it is "interesting" to TEST AN 
  HYPOTHESIS THAT --a1 = kThen  a statistic student may 
  want to compute:F = (SSQE2 - SSQE1)/(2-1)    
    
  ---   
  (SSQE1)/(n-2)F = (SSQE2 - SSQE1)/1    
    
  ---   
  (SSQE1)/(n-2)and since F(1,df2)  = t^2(df2)t(df2) = 
  sqrt(F(1,df2))This IS a 
  "t-test".And, perhaps, from this value of "t" another statistics 
  studentmight want to compute the Standard Error of 
  a1, and then computea Confidence 
  Interval.The astute student can compute the 
  Standard Error from:  t = 
  Statistic/Standard Errorbut sine the numerical values of t and the "Statistic" are known we 
  have:Standard Error = Statistic/t In 
  this particular case, Standard Error = a1/tThis procedure allows 
  for easy computation of the "StandardError" of any of the 'weights' 
  (intercept or slope) in a regression model and in the more general case, 
  any linearcombination of the weights in a multiple linear regression 
  model.
   
  Sorry for the length of this message, but I couldn't resist 
  promoting theuse of Prediction/Regression/Linear Models for ALL 
  STUDENTS.--- Joe
   
   
   
   
   
   
   
   
   
   


Re: ANN vs. nonlinear regression: forecasting

2000-02-11 Thread Joe Ward



John --
 
Sounds very interesting--
 
If you mean "classical" least-squares model, there 
are no assumptions involved
in fitting least-squares. It's only the 
"statistics" assumptions that get added into
the extra "assumptions".
 
PREDICTION is the important thing.  

Compare the PREDICTIVE accuracy/costs/etc.of 
various approaches.
 
You may wish to include 
RESAMPLING/BOOTSTRAP/CROSS-VALIDATION 
in your 
research. 
 
 The 
proof of the "best" is how well it PREDICTS
 
I will be interested in what you 
learn.
 
-- Joe
******** 
* Joe 
Ward  
Health Careers High School ** 167 East Arrowhead 
Dr 
4646 Hamilton Wolfe    ** San 
Antonio, TX 
78228-2402    
San Antonio, TX 78229  ** Phone: 
210-433-6575   
Phone: 210-617-5400    ** Fax: 
210-433-2828 
Fax: 210-617-5423  ** 
[EMAIL PROTECTED]    
** http://www.ijoa.org/joeward/wardindex.html   
*
 
 
- Original Message - 
From: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, February 11, 2000 7:01 
AM
Subject: ANN vs. nonlinear regression: 
forecasting
| I'm working on a study that compares 
neural networks to classical non-| linear statistical estimators in 
forecasting time series.  My thesis is| that the NN would be robust 
under conditions where the assumptions of| the classical model are not met, 
and the nn would be inferior where the| classical assumptions are 
satisfied.| | What would be a good classical model to compare a neural 
network to?| Does anyone know of any papers/sources on this subject?| 
| I sincerely appreciate any help/suggestions.| | John Carrier| 
[EMAIL PROTECTED]| | 
| Sent via Deja.com http://www.deja.com/| Before you buy.| 
| | 
===| 
  This list is open to everyone. Occasionally, people lacking respect| 
  for other members of the list send messages that are inappropriate| 
  or unrelated to the list's discussion topics. Please just delete the| 
  offensive email.| |   For information concerning the list, 
please see the following web page:|   http://jse.stat.ncsu.edu/| 
===| 



Re: adjusting marks; W. Edwards Deming

2000-02-09 Thread Joe Ward



  Robert Knodt writes in response to 
the message at http://www.remarq.com 
The Internet's Discussion Network    (SEE BELOW) 
---
 
Re: adjusting marks; W. Edwards 
Deming
 
It would be nice if those sending 
to the mailing list would clearly identify themselves. It would also be nice if 
they used an e-mail address so individuals might send them e-mail directly. 
Thanks, 
 
Dr. Robert C. Knodt 4949 Samish 
Way, #31
Bellingham, WA 98226 [EMAIL PROTECTED] 
 
  End of 
Robert Knodt's message
 
 Beginning of Joe Ward's comment 
--
 
Good comment, Robert --
 
Perhaps the unidentified writer is 
a frustrated product of "Non-mastery" Spelling Education
and is intentionally (or unintentionally) showing 
the results.
 
See BOLD items below.
 
 -- 
Joe
******** 
* Joe 
Ward  
Health Careers High School ** 167 East Arrowhead 
Dr 
4646 Hamilton Wolfe    ** San 
Antonio, TX 
78228-2402    
San Antonio, TX 78229  ** Phone: 
210-433-6575   
Phone: 210-617-5400    ** Fax: 
210-433-2828 
Fax: 210-617-5423  ** 
[EMAIL PROTECTED]    
** http://www.ijoa.org/joeward/wardindex.html   
*
 
-  End of Joe Ward's comment 
--
 
- Original Message - 
From: Consultantssuck <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Monday, February 07, 2000 5:12 
PM
Subject: Re: adjusting marks; W. Edwards 
Deming
| Dr. Deming Naive? You, sir, are 
misguided and unfortunately,| misinformed of the genius of the master Dr. 
Shewhart, and his| disiple and 
messenger to the latter half of the 20th century,| Dr. Deming.| | 
Humans want to do a good job. Dr. Deming was pellucid on this| 
point.   People and school fit nicely into this axiom.| | what 
you fail to understand is the profound knowledge of| thinking preparing, and 
continual improvement.  Grading is nice,| succinct, and above all, 
usually useless in its existing| design.  Does grading permit our 
student to readdress problem or| slow areas?  In many cases grading 
only shows how well you did,| based on varying factors-The next test, 
completely different.| | we have all seen studies where the pretty girl 
is awarded better| grades for the same caliber of work as others.  we 
have all| seen  reports where teachers are wrong in their 
suppositions,| then corrected or challenged by students, ultimately 
leading| these educators to hold a grudge for "attitude and behavior"| 
when report card time recurs.| | Do you want to know why the AFT and the 
NEA are against teaching| LOGIC in elementary schools (Logic being the 
foundation for all| higher math applications)?| | Could it be 
because some protege will learn to ask the harder| questions?  Possibly 
Some "smart alec" will not accept our| educator's "Because I told you it 
did."| | A recent report found Elementary educators, when pressed 
for| answers they did not know, simply "winged it."  This sophristry| unfortunately happens when our 
educators are not versed in the| sciences, history or math, and they wish to 
appear (to| themselves and) to their students, smart.| | People want 
to do a good job.  Grading allows teachers to make| decisions in our 
children's early years based on mostly the| faliable 
educator's emotions toward that one particular budding| 
mind.   Grading should be benchmarks for ever improvement based| 
on practice, practice practice of the fundementals. Then of| course moving foward with a keen sence of where the student is| going.  Any good 
music teacher will tell you the ones who| practice the fundemental scales, dilegently, go on to master the| difficult 
pieces.| | Read the book OUT OF CRISES again, and again.  I assure 
you, you| will soon "get it."| | | | * Sent from RemarQ http://www.remarq.com The Internet's Discussion 
Network *| The fastest and easiest way to search and participate in Usenet - 
Free!| | | | 
===| 
  This list is open to everyone. Occasionally, people lacking respect| 
  for other members of the list send messages that are inappropriate| 
  or unrelated to the list's discussion topics. Please just delete the| 
  offensive email.| |   For information concerning the list, 
please see the following web page:|   http://jse.stat.ncsu.edu/| 
===| 



Re: Looking for text on resampling...

2000-02-06 Thread Joe Ward

Scott --

Peter Bruce is the contact!!!

-- Joe

- Original Message - 
From: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Sunday, February 06, 2000 12:08 PM
Subject: Looking for text on resampling...


|   Our small college library has a collection of basic biostats texts but
| nothing that specifically covers the area of resampling. I am currently
| looking over a 1991 text by Bryan Manly (Randomization and Monte Carlo
| Methods in Biology) - the first two chapters seem quite accessible (to
| someone unfamiliar with the field!)
| 
|   Could anyone suggest other texts that might cover bootstrapping and
| jacknife techniques - I would favour texts that have a biology bent and
| are written so non-specialists can follow...
| 
|   Many thanks!
| 
| Scott
| 
| 
| Sent via Deja.com http://www.deja.com/
| Before you buy.
| 
| 
| ===
|   This list is open to everyone. Occasionally, people lacking respect
|   for other members of the list send messages that are inappropriate
|   or unrelated to the list's discussion topics. Please just delete the
|   offensive email.
| 
|   For information concerning the list, please see the following web page:
|   http://jse.stat.ncsu.edu/
| ===
| 



===
  This list is open to everyone. Occasionally, people lacking respect
  for other members of the list send messages that are inappropriate
  or unrelated to the list's discussion topics. Please just delete the
  offensive email.

  For information concerning the list, please see the following web page:
  http://jse.stat.ncsu.edu/
===



Re: Course Curriculum

2000-02-04 Thread Joe Ward



Perhaps in the short time you have, it may be 
appropriate to give your students the
power to ask statisticians the appropriate 
research questions "in natural language".
 
In this regard your medical residents should 
expect their support statisticians to
use the combined POWER OF COMPUTERS and GENERAL 
LINEAR MODELS/REGRESSION (and other
computer aided techniques; e.g. Resampling, 
Bootstrap, Simulation,
to answer useful, non-trivial research 
questions.
 
Your residents may not need to know how to do it 
themselves, but it would be great if they 
can communicate with those who can 
help.
 
-- Joe
******** 
* Joe 
Ward  
Health Careers High School ** 167 East Arrowhead 
Dr 
4646 Hamilton Wolfe    ** San 
Antonio, TX 
78228-2402    
San Antonio, TX 78229  ** Phone: 
210-433-6575   
Phone: 210-617-5400    ** Fax: 
210-433-2828 
Fax: 210-617-5423  ** 
[EMAIL PROTECTED]    
** http://www.ijoa.org/joeward/wardindex.html   
*
 
 
- Original Message - 
From: SAlbert <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, February 03, 2000 10:14 
PM
Subject: Re: Course Curriculum
| >I'm organizing a BASIC research 
methods/statistical analysis course for| >medical residents. The course 
will be held over multiple sessions for a| >total of about 15-20 
hours.| >| >Any suggestions for textbooks, course materials, 
format for conducting| >the course, etc?| >| >Thanks!| 
>SR Millis| >-- | | Take a look at Harvey Motulsky's book 
"Intuitive Biostatistics."  It's| readable, has good examples, and not 
so technical as to throw people off. | While not perfect, it's the best I've 
seen of its kind.  (I understand Dr.| Motulsky is doing a revision, but 
I don't know when that might come out.  The| book is published by 
Oxford University Press, if I remember right.)| | Steve Albert| 
| | | 
===| 
  This list is open to everyone. Occasionally, people lacking respect| 
  for other members of the list send messages that are inappropriate| 
  or unrelated to the list's discussion topics. Please just delete the| 
  offensive email.| |   For information concerning the list, 
please see the following web page:|   http://jse.stat.ncsu.edu/| 
===| 



Fw: CORRECTION TO EARLIER MESSAGE-Correlation - Constraints on Variables

2000-01-05 Thread Joe Ward


My Apologies -- "Haste makes waste!"

Notice the serious errors in the previous version!!!

The model (1) below should have read:

(1)Y = a1*U + a2*X + E  
 
and not

(1)Y = a1*U + a1*X + E  

and where there were statements about the
hypothesis "a1=0" it should read "a2=0"

:-(

-- Joe

- Original Message - 
From: Joe Ward <[EMAIL PROTECTED]>
To: bkamen <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Cc: APSTAT-L <[EMAIL PROTECTED]>
Sent: Tuesday, January 04, 2000 3:08 PM
Subject: Re: Correlation - Constraints on Variables


In the beginning all information is BINARY/CATEGORICAL  (not DUMMY).

I refer to models of the very general form:

Y = a1*X1 + a2*X2 + a3*X3 + ... + ap*Xp + E

as Prediction/Regression/Linear Models.

The predictors X1, X2, X3, ...,Xp can be defined in many ways.
a1, a2, a3,...,ap are usually least-squares coefficients that
MINIMIZE THE SUM OF SQUARES OF THE ELEMENTS 
OF THE "Error" 'E'.

If the model is of the form:

(1)Y = a1*U + a1*X + E
   CORRECTION: SHOULD HAVE READ
(1)Y = a1*U + a2*X + E   
where
Y = a dependent variable, usually "continuous" (Mile run time, Blood pressure)
U = a predictor with every element equal 1 
X = a continuous variable, e.g. Age, Height, Weight, Test Score
E = "error" or sometime called "residual" 

then the model is sometimes called "simple regression".

In this form, a test of the Hypothesis a2=0 is sometimes called a
test of "ZERO CORRELATION" or "SLOPE = 0".

Now consider Model 1 as above:

  (1)  Y = a1*U + a1*X + E 
 CORRECTION: THIS SHOULD HAVE READ
  (1)Y = a1*U + a2*X + E  

and we let
Y = a dependent variable, usually "continuous" (Mile run time, Blood pressure)
U = a predictor with every element equal 1 
(as above)
but
X = 1 if the Y observation is from a Male; 0 if the Y observation is from a Female.

In this model, a test of the Hypothesis a2=0 is sometime called a 
test of the hypothesis that the
 Expected Value of Y (Mean) for Males = Expected Value of Y (Mean) for Females
or 
a "t-test for the difference between two means".

Other special forms of the GENERAL MODEL are called different names, such as 
One-way Analysis of Variance (ANOVA), Analysis of Covariance, Two-way Analysis of 
Variance, etc.

Before we acquired high-speed computers, we needed special easy-to-calculate 
computational
procedures.   WE SHOULD NOT BE CONSTRAINED NOW THAT WE HAVE THE COMPUTER POWER.

Many seemingly-different algorithms of statistics can be accomplished under ONE 
GENERAL FORM.

But of most importance, the ONE GENERAL FORM can be used to create models that fit
unique research questions.

The items contained in the URL shown below are related to your question.

If you would like to see some detailed examples, you may want to look in a university
library at:
Introduction to Linear Models by Ward & Jennings,Prentice-Hall, 1973.

Copies of this book are available from the 

Institute for Job and Occupation Analysis:

Jimmy L. Mitchell, Ph.D., Director 
[EMAIL PROTECTED]
10010 San Pedro, Suite 440, San Antonio, Texas 78216
(210) 349-8525   Fax: (210) 349-0168


--- Joe
 
* Joe Ward  Health Careers High School *
* 167 East Arrowhead Dr 4646 Hamilton Wolfe*
* San Antonio, TX 78228-2402San Antonio, TX 78229  *
* Phone: 210-433-6575   Phone: 210-617-5400*
* Fax: 210-433-2828 Fax: 210-617-5423  *
* [EMAIL PROTECTED]*
* http://www.ijoa.org/joeward/wardindex.html   *



 






- Original Message - 
From: bkamen <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Sunday, January 02, 2000 12:17 PM
Subject: Correlation - Constraints on Variables


| This is a multi-part message in MIME format.
| 
| --=_NextPart_000_0054_01BF551B.4D1409A0
| Content-Type: text/plain;
| charset="iso-8859-1"
| Content-Transfer-Encoding: quoted-printable
| 
| This practical question arose between myself and a colleague at work.  =
| It concerns whether we can use correlation analysis if one of the =
| variables is non-continuous or "categorical."  She believes that both =
| variables must be continuous.  However she cannot say why, and I cannot =
| find any such constraint in the statistics book I have relied on since =
| graduating in Industrial Engineering a few years ago, Miller and Freund, =
| 'Probability and Statistics for Engineers.' =20
| 
| I have been thinking that if x is discrete and can assume only a few =
| values compared with y which is continuous, the correlation study may =

Re: Correlation - Constraints on Variables

2000-01-04 Thread Joe Ward

In the beginning all information is BINARY/CATEGORICAL  (not DUMMY).

I refer to models of the very general form:

Y = a1*X1 + a2*X2 + a3*X3 + ... + ap*Xp + E

as Prediction/Regression/Linear Models.

The predictors X1, X2, X3, ...,Xp can be defined in many ways.
a1, a2, a3,...,ap are usually least-squares coefficients that
MINIMIZE THE SUM OF SQUARES OF THE ELEMENTS 
OF THE "Error" 'E'.

If the model is of the form:

(1)Y = a1*U + a1*X + E  
where
Y = a dependent variable, usually "continuous" (Mile run time, Blood pressure)
U = a predictor with every element equal 1 
X = a continuous variable, e.g. Age, Height, Weight, Test Score
E = "error" or sometime called "residual" 

then the model is sometimes called "simple regression".

In this form, a test of the Hypothesis a1=0 is sometimes called a
test of "ZERO CORRELATION" or "SLOPE = 0".

Now consider Model 1 as above:

  (1)  Y = a1*U + a1*X + E  

and we let
Y = a dependent variable, usually "continuous" (Mile run time, Blood pressure)
U = a predictor with every element equal 1 
(as above)
but
X = 1 if the Y observation is from a Male; 0 if the Y observation is from a Female.

In this model, a test of the Hypothesis a1=0 is sometime called a 
test of the hypothesis that the
 Expected Value of Y (Mean) for Males = Expected Value of Y (Mean) for Females
or 
a "t-test for the difference between two means".

Other special forms of the GENERAL MODEL are called different names, such as 
One-way Analysis of Variance (ANOVA), Analysis of Covariance, Two-way Analysis of 
Variance, etc.

Before we acquired high-speed computers, we needed special easy-to-calculate 
computational
procedures.   We SHOULD NOT BE CONSTRAINED NOW THAT WE HAVE THE COMPUTER POWER.

Many seemingly-different algorithms of statistics can be accomplished under ONE 
GENERAL FORM.

But of most importance, the ONE GENERAL FORM can be used to create models that fit
unique research questions.

The items contained in the URL shown below are related to your question.

If you would like to see some detailed examples, you may want to look in a university
library at:
Introduction to Linear Models by Ward & Jennings,Prentice-Hall, 1973.

Copies of this book are available from the 

Institute for Job and Occupation Analysis:

Jimmy L. Mitchell, Ph.D., Director 
[EMAIL PROTECTED]
10010 San Pedro, Suite 440, San Antonio, Texas 78216
(210) 349-8525   Fax: (210) 349-0168


--- Joe
 
* Joe Ward  Health Careers High School *
* 167 East Arrowhead Dr 4646 Hamilton Wolfe*
* San Antonio, TX 78228-2402San Antonio, TX 78229  *
* Phone: 210-433-6575   Phone: 210-617-5400*
* Fax: 210-433-2828 Fax: 210-617-5423  *
* [EMAIL PROTECTED]*
* http://www.ijoa.org/joeward/wardindex.html   *



 






- Original Message - 
From: bkamen <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Sunday, January 02, 2000 12:17 PM
Subject: Correlation - Constraints on Variables


| This is a multi-part message in MIME format.
| 
| --=_NextPart_000_0054_01BF551B.4D1409A0
| Content-Type: text/plain;
| charset="iso-8859-1"
| Content-Transfer-Encoding: quoted-printable
| 
| This practical question arose between myself and a colleague at work.  =
| It concerns whether we can use correlation analysis if one of the =
| variables is non-continuous or "categorical."  She believes that both =
| variables must be continuous.  However she cannot say why, and I cannot =
| find any such constraint in the statistics book I have relied on since =
| graduating in Industrial Engineering a few years ago, Miller and Freund, =
| 'Probability and Statistics for Engineers.' =20
| 
| I have been thinking that if x is discrete and can assume only a few =
| values compared with y which is continuous, the correlation study may =
| yield a high probability of type-one error.  I interpret this as =
| providing insufficient evidence with which to reject the null =
| hypothesis.  But I have not thought of this as an inappropriate use of =
| correlation. =20
| 
| On the other hand in attempting to probe Miller and Freund I find that =
| correlation is based on the "bivariate normal distribution,"  the =
| formula for which has numerous parameters including alpha and beta, the =
| least squares regression coefficients.  I am aware that to obtain the =
| latter requires that the function be differentiable, hence x must also =
| be continuous.  This seems to support my friend's view.
| 
| I would appreciate clarification of any such con

Re: Factor analysis

2000-01-02 Thread Joe Ward



Haider --
 
You may want to consider another 
approach:
 
1.  Use "Policy Capturing", "Judgment 
Analysis (JAN)", "Policy Specifying" or any of your
favorite Multi-Attribute Decision Model approaches 
to obtain ONE function of
your THREE DEPENDENT VARIABLES.
 
IMHO,  Only human(s) should make 
judgments
about how to combine multiple dependent 
variables.
 
After that, you now have Y = function of (your 
THREE DEPENDENT VARIABLES)
 
2.  Then you use your favorite 
regression 
program to predict Y = function of (your 
PREDICTOR VARIABLES)
 
This approach is not involved with factor analysis 
interpretation.
 
However, if you want to do a factor analysis on 
the PREDICTORS, then you can  
USE THE FACTOR SCORES AS PREDICTORS. The 
disadvantage of using factor scores
is that you still have to use ALL OF THE PREDICTOR 
VARIABLES.  So if you 
would like to reduce the number of predictors, 
then you should NOT use
factor scores but use regression 
models.
 
-- Joe
******** 
* Joe 
Ward  
Health Careers High School ** 167 East Arrowhead 
Dr 
4646 Hamilton Wolfe    ** San 
Antonio, TX 
78228-2402    
San Antonio, TX 78229  ** Phone: 
210-433-6575   
Phone: 210-617-5400    ** Fax: 
210-433-2828 
Fax: 210-617-5423  ** 
[EMAIL PROTECTED]    
** http://www.ijoa.org/joeward/wardindex.html   
*
 
 
 
 
- Original Message - 
From: Haider Al-Katem <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Saturday, December 18, 1999 4:00 
AM
Subject: Factor analysis 
| Hi,| | I have conducted a 
factor analysis on some questionnaire items. The| dependent variables that I 
am measuring for example ('Intention To Buy',| 'Attitude towards a 
product'  and 'Trust in buying the product from a| merchant' ) seem to 
load significantly high on two factors which leaves me| with a NOT SIMPLE 
FACTOR STRUCTURE.| | I am assuming that since 'Intention To Buy', 
'Attitude towards a product'| and 'Trust in buying the product from a 
merchant'  all seem to be some type| of an ATTITUDE , the significantly 
high factor loadings on the two factors| may be justifiable.| | My 
questions are:| | 1. Are my above interpretations of the result 
correct?| | 2. If not, is there a statistical method that can help me 
overcome this| 'non-simple factor structure'?| | Thanks.| | 
| 


Re: grading on the curve

1999-12-23 Thread Joe Ward

Herman --

I liked your last sentence indicating that MASTERY IS IMPORTANT!!

" I do not use a linear grading method; fortunately, early in my
   teaching, I had a student put it all together on the final."
^
Joe
******** 
* Joe Ward  Health Careers High School *
* 167 East Arrowhead Dr   4646 Hamilton Wolfe  *
* San Antonio, TX 78228-2402San Antonio, TX 78229   *
* Phone: 210-433-6575   Phone: 210-617-5400 *
* Fax: 210-433-2828Fax: 210-617-5423   *
* [EMAIL PROTECTED] *
* http://www.ijoa.org/joeward/wardindex.html   *




- Original Message - 
From: Herman Rubin <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, December 23, 1999 7:23 AM
Subject: Re: grading on the curve


| In article <[EMAIL PROTECTED]>,
| dennis roberts <[EMAIL PROTECTED]> wrote:
| >this discussion is interesting ...
| 
| >there seems to be TWO general kinds of "grading" on the curve ... it would
| >be interesting to try to "estimate" how frequently each happens ...
| 
| >1. LOWERing cutoffs ... thus, INcreasing the #s of those getting various
| >higher grades
| 
| >2. making cutoffs such that the distribution of GRADES resembles a normal
| >distribution
| 
| >i assume that #1 occurs much more frequently and, from my perspective,
| >there is NO good rationale for doing #2 ... unless one assumes that ability
| >within a class is normally distributed AND ... and far more crucial ...
| >that achievement SHOULD resemble the distribution of ability ... 
| 
| Something like #2 occurs far too often.  But either one of these
| defeats the value of a grade in indicating anything about what
| the student has accomplished.
| 
| NOTHING is normally distributed, so grades should not be.
| 
| Also, classes are not equal; even different sections of the same
| course in the same term are not equal.  Trying a different approach
| to teaching may well change the distribution of the amount of 
| knowledge, and thus should change the distribution of grades.
| 
| Only absolute grading is a meaningful assessment of what the 
| student has accomplished.  Relative grading almost forces 
| levels to go down.  The American undergraduate grades in the
| strong mathematics courses preparing for graduate work are
| essentially meaningless at this time.
| 
| >in any case ... instructors are suppose to give students some reasonable
| >description of the grading system used ... at the BEginning of a course ...
| >which i assume would include some facimile of a grading scale ... or what
| >one has to do to earn certain grades ... and in this context, i would think
| >that anyone who might 'consider" RAISING cutoffs so that FEWER students get
| >higher grades ... would be challenged from students .. as this appears to
| >border on unethical practice ... 
| 
| One is not required to go that far.  Saying that you will give
| your best assessment of what the student knows and can do, based
| on scores given on various items, meets the legal requirements.
| I do not use a linear grading method; fortunately, early in my
| teaching, I had a student put it all together on the final.
| 
| 
| >At 02:32 PM 12/22/99 -0500, [EMAIL PROTECTED] wrote:
| >>  I never, as a teacher, used any curving 
| >>procedure to lower students grades!
| -- 
| This address is for information only.  I do not claim that these views
| are those of the Statistics Department or of Purdue University.
| Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
| [EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558
| 



Re: teaching statistical methods by rules?

1999-12-20 Thread Joe Ward

Yep!!

As you say:
"Why are people so obsessed with T and Z? "

Perhaps it would be even better (easier?) to focus on F since

F(df1,df2) = t^2(df2)

(Reminder: when using a t-table, the p-values usually involve ONE-TAIL and
when using the F-table, the p-values involve TWO-TAILS )

Example:  The critical-value of t for probability of  p =  .05 at t(18) = 1.734
The critical-value of F for probability of p = .10  at F(1,18)  =  
(1.734)^2  =  3.01

:-)
-- Joe
******** 
* Joe Ward  Health Careers High School *
* 167 East Arrowhead Dr 4646 Hamilton Wolfe*
* San Antonio, TX 78228-2402San Antonio, TX 78229  *
* Phone: 210-433-6575   Phone: 210-617-5400*
* Fax: 210-433-2828 Fax: 210-617-5423  *
* [EMAIL PROTECTED]*
* http://www.ijoa.org/joeward/wardindex.html   *


 





- Original Message - 
From: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Sunday, December 19, 1999 4:44 PM
Subject: Re: teaching statistical methods by rules?


| In article <[EMAIL PROTECTED]>, 
| [EMAIL PROTECTED] says...
| >
| > 
| >
| >On the other hand, a body of knowledge can be thought of as a set of
| >'rules'. The important thing is that this set is constructed by the
| >individual, so our aim should not be to teach statistics as a set of
| >rules, but in such a way that each student can develop his or her own
| >set of rules. They won't be the same for all, and they will different
| >from the teacher's, but they hopefully will work. (If you like, this is
| >a defintion of a 'good student' - one who manages to construct a
| >successful set of rules for each subject.
| 
| 
| It's either undergraduate students in Australia are much smarter than those 
| living in the United States or you live on a different planet. The last time I 
| taught an undergraduate introductory statistics class, some students couldn't 
| even do fractions and simple algebra. Can you expect them to develop their own 
| rules?
| 
| Why are people so obsessed with T and Z? When the degrees of freedom exceeds 
| say 30, the difference between T and Z is practically negligible. You can use T 
| or Z in such a case. However, the P-value from Z is easier to compute.
| 
| -- 
| Tjen-Sien Lim
| [EMAIL PROTECTED]
| www.Recursive-Partitioning.com
| 
| Get your free Web-based email! http://recursive-partitioning.zzn.com
| 
| 



Re: Prediction Model Question

1999-12-16 Thread Joe Ward

- Original Message - 
From: Burke Johnson <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Thursday, December 16, 1999 9:13 AM
Subject: Prediction Model Question

| Hi,
| 
| A student of mine is getting ready to develop a GLM prediction model that will 
|include a mixture of categorical and quantitative predictor variables. We will 
|probably not include interaction terms in the model (i.e., it will be a main effects 
|only model).
| 
| Here's my question: Do you suggest using dummy coding (0,1) or effects coding 
|(1,0,-1) for the categorical variables included in the model? 
| 
| The reason I'm asking is because dummy coding does not always give the same result 
|for a factorial design as does ANOVA and effects coding, and, hence, Pedhazur 
|recommends using effects coding rather than dummy coding in the factorial case. Do 
|you know if the choice of dummy or effects coding matters for a main effects only 
|model with multiple categorical and quantitatively scaled predictor variables?
| 
| Thanks in advance,
| Burke Johnson 
| 
--
Hi, Burke --

First, I use the words BINARY (or INDICATOR) predictors -- and NOT "DUMMY" predictors.
In the beginning ALL PREDICTOR INFORMATION IS BINARY!

It is unfortunate that the word DUMMY has became popular.  Students might get the idea 
that
there is something wrong with using DUMMIES!!  I think that the BINARIES are really 
the most
BRILLIANT!!

Now to your concern --

Your last paragraph

"The reason I'm asking is because dummy coding does not always give the same result 
for a factorial design as does ANOVA and effects coding, and, hence, Pedhazur 
recommends using effects coding rather than dummy coding in the factorial case. Do you 
know if the choice of dummy or effects coding matters for a main effects only model 
with multiple categorical and quantitatively scaled predictor variables?"

 is a very good example of the situation that arises in the use of "packaged"
algorithms.  The user of the "package" may have no idea what questions are being 
answered by the
"package".  

I always suggest that researchers create their own models!  That is the only SAFE WAY!
If a "packaged" procedure is verified to produce the results desired by the researcher 
then it certainly
should be used.

The researcher should:

1. State their research questions in "natural language" -- avoid terms such as  "MAIN 
EFFECTS"  and
 "EFFECTS CODING" since those expressions may mean different things to different 
people.  In some instances
  the user of those terms may not know what is meant when they utter the statement.  
Ask someone what they
  mean if they utter something about MAIN EFFECTS in a 3-factor ANOVA with unequal 
numbers of observations
  in the cells.  

2. Create an ASSUMED MODEL that allows the researcher to investigate their research 
questions of interest.

3. Impose resrictions on the parameters of ASSUMED MODEL that are implied by the 
research questions of interest.
This results in a RESTRICTED MODEL.

4. Compare the Error Sum of Squares between the ASSUMED and RESTRICTED MODELS using an 
F-test and
obtain confidence intervals if appropriate.

I assume there must be a reason for assuming that there is NO INTERACTION among the 
predictors.  
Many researchers would test for NO INTERACTION first.  Then, if appropriate, switch to 
the NO INTERACTION MODEL.

I would be interested in seeing the models that your student develops to investigate 
his/her  OWN QUESTIONS OF INTEREST!!

:-)

-- Joe
** 
* Joe Ward  Health Careers High School 
* 167 East Arrowhead Dr  4646 Hamilton Wolfe   
* San Antonio, TX 78228-2402   San Antonio, TX 78229
* Phone: 210-433-6575 Phone: 210-617-5400   
* Fax: 210-433-2828 Fax: 210-617-5423
* [EMAIL PROTECTED] 
* http://www.ijoa.org/joeward/wardindex.html   




Re: could someone help me with this intro to stat. problem

1999-12-08 Thread Joe Ward

- Original Message - 
From: Donald F. Burrill <[EMAIL PROTECTED]>
To: Mike Wogan <[EMAIL PROTECTED]>
Cc: Luv 2 muah 143 <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Wednesday, December 08, 1999 12:41 PM
Subject: Re: could someone help me with this intro to stat. problem


| On Wed, 8 Dec 1999, Mike Wogan wrote, in response to Luv 2 muah 143's 
| question:
|  
| > > 5 of 10 volunteers are randomly selected to receive self-defense
| > > training.  The other 5 receive no training.  At the end of the 
| > > training period, all subjects complete a self-confidence 
| > > questionnaire. 
| 
| > > a.)  Is there a difference in self-confidence between the 2 groups 
| > > (p<.01)?
|  
| > > b.)  What are the effects of self-defense traing on self-confidence 
| > > (I'm assuming a two-tailed test?).  Explain analysis
|  
| > Without a pre-test measure of self-confidence, taken prior to the
| > training, even if there is a significant difference post-training, it's 
| > not possible to tell whether the difference is the result of the 
| > training or was there to begin with.  
| 
| Oh, come on, Mike.  What did you think "randomly selected" was 
| in there for?  (Or were you trying to confuse the querent because he 
| had the effrontery to ask a homework (or perhaps exam) question of this 
| list?)
| 
| > If there is a pre-post measurement of self-confidence, then you need a
| > mixed model Anova, with Training vs. No Training as the between groups
| > factor and Pre-Post as the within groups factor.
| 
| This sure must sound scary to someone who's having trouble with 
| the first semester of an elementary stats course!
| -- DFB.
|  
|  Donald F. Burrill [EMAIL PROTECTED]
|  348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
|  MSC #29, Plymouth, NH 03264     603-535-2597
|  184 Nashua Road, Bedford, NH 03110  603-471-7128  
| 
| 
--  Joe Ward writes --
Hi Don, et al --

While it seems that the question is stimulated from a student's assignment,
it seems to me that  students should be given the "power they deserve"
to do something useful when they complete their course of instruction.

You indicated that--

"This sure must sound scary to someone who's having trouble with 
 the first semester of an elementary stats course!"
IT SHOULD NOT BE SCARY. 
If students can't "control for the uncontrollable" such as "PRE-TEST", or
GENDER, etc. then they are not being given what they deserve in
A NON-CALCULUS ELEMENTARY STATS COURSE.

I realize that I am an "outlier" in what I believe to be a lack of SALESMANSHIP
about the power that statistics can give students -- before they are "turned off".

But talented high school students can do it -- so why not college students?

But I get more cynical in my old age! 

-- Joe
*  
Joe Ward   Health Careers High School 
167 East Arrowhead Dr  4646 Hamilton Wolfe   
San Antonio, TX 78228-2402 San Antonio, TX 78229  
Phone:  210-433-6575   Phone: 210-617-5400
Fax: 210-433-2828  Fax: 210-617-5423 
[EMAIL PROTECTED]
http://www.ijoa.org/joeward/wardindex.html   
*





Re: Coefficient of Determination Question

1999-12-08 Thread Joe Ward

Hi, GM --

We always have trouble trying to give "names" to things. Usually
we increase misunderstanding as we give ambiguous names to things.

For example, how many folks know what is meant when they hear
someone say "In a 3-factor ANOVA (A,B,C) there is a "significant
'A' MAIN EFFECT."  The "someone" should just say what they 
really mean -- if they know!
  
r^2 should have been "unnamed" since it's as easy to say "r square"
as it is to say "coefficient of determination".

However, if someone insists on giving a name to (1-r^2) then why not call
it the "coefficient of non-determination".  But "one minus r square"
is about as easy to say as "coefficient of non-determination".

-- Joe
*  
Joe Ward   Health Careers High School 
167 East Arrowhead Dr  4646 Hamilton Wolfe   
San Antonio, TX 78228-2402 San Antonio, TX 78229  
Phone:  210-433-6575   Phone: 210-617-5400
Fax: 210-433-2828  Fax: 210-617-5423 
[EMAIL PROTECTED]
http://www.ijoa.org/joeward/wardindex.html   
*




- Original Message - 
From: Gaurang Mehta <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, December 08, 1999 10:15 AM
Subject: Coefficient of Determination Question


| I am looking for the coefficient name for (1-r^2).  I know r^2 is the
| Coefficient of Determination, but I do not know the name of the (1-r^2)
| coefficient.
| 
| Any assistance would be greatly appreciated.
| 
| Thanks in advance
| 
| GM
| 
| 
| 



Re: could someone help me with this intro to stat. problem

1999-12-08 Thread Joe Ward

 Mike Wogan writes  --
- Original Message - 
From: Mike Wogan <[EMAIL PROTECTED]>
To: Luv 2 muah 143 <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Wednesday, December 08, 1999 11:16 AM
Subject: Re: could someone help me with this intro to stat. problem


| On 8 Dec 1999, Luv 2 muah 143 wrote:
| 
| > 5 of 10 volunteers are randomly selected to receive self-defense training.  The
| > other 5 receive no training.  At the end of the training period, all subjects
| > complete a self-confidence questionnaire.  
| > 
| > a.)  Is there a difference in self-confidence between the 2 groups (p<.01)?
| > 
| > 
| > b.)  What are the effects of self-defense traing on self-confidence (I'm
| > assuming a two-tailed test?).  Explain analysis
| > 
| > Please help, I can't figure it out...my mind has gone blank
| 
| Without a pre-test measure of self-confidence, taken prior to the
| training, even if there is a significant difference post-training, it's
| not possible to tell whether the difference is the result of the training 
| or was there to begin with.  
| 
| If there is a pre-post measurement of self-confidence, then you need a
| mixed model Anova, with Training vs. No Training as the between groups
| factor and Pre-Post as the within groups factor.
| 
| Mike
|  
--  End of Mike's message  --
Great suggestion, Mike --
 
" Without a pre-test measure of self-confidence, taken prior to the
training, even if there is a significant difference post-training, it's
not possible to tell whether the difference is the result of the training 
 or was there to begin with. " 
 
The question "IN NATURAL LANGUAGE" might be stated slightly differently as:

(1) For subjects who have the SAME PRE-TEST MEASURE OF SELF-CONFIDENCE but
have DIFFERENT TRAINING (i.e., TRAINING vs NO-TRAINING) is their a DIFFERENCE 
IN THE EXPECTED POST-TEST MEASURE OF SELF-CONFIDENCE.

or  perhaps

(2) If their is a difference between the two groups, is the difference the SAME FOR
ALL VALUES OF THE PRE-TEST MEASURE OF SELF-CONFIDENCE?

In these "NATURAL LANGUAGE FORMS" of the research questions
 the researcher should be able to write an ASSUMED MODEL
that allows for the expression of the hypotheses of interest in terms OF PARAMETERS OF
THE ASSUMED MODEL.  Then the restrictions implied by the questions of interest can be
imposed on the ASSUMED MODEL to obtain a RESTRICTED MODEL to test the hypotheses.

AND NOW FOR MY "STANDARD SERMON"!

The approach described as:

" If there is a pre-post measurement of self-confidence, then you need a
mixed model Anova, with Training vs. No Training as the between groups
factor and Pre-Post as the within groups factor." 
DOES NOT COMMUNICATE clearly how to proceed. 

 The reader has to learn the meaning of:

"mixed model Anova"
"between groups factor"
"Pre-Post as within groups factor."

or be able to locate a "packaged" algorithm that sounds similar to:

Mixed model Anova, with Training vs. No Training as the between groups
factor and Pre-Post as the within groups factor." 

Another "advisor" might suggest:

"Do an Analysis of Covariance, with the Pre-Test Measure of  Self-Confidence
as the Covariable".

As before, the researcher must know the meaning of the advice or locate 
a "package" that is labeled as "COVARIANCE ANALYSIS".

This second approach is dangerous since many "packaged" COVARIANCE
ANALYSIS" algorithms my not allow the researcher to answer the questions of
interest, e.g. question #2 above.

And even if such "packages" are located the researcher may not be able to 
verify that the answers produced by the "package" are related to the natural
language questions of interest. 

In summary, statistics instruction should give students (researchers) the power to:

1.State their research questions in NATURAL LANGUAGE so
 that normal humans can understand.

2.Create models that allow the researcher to express  hypotheses of interest.

3.Translate  NATURAL LANGUAGE questions into RESTRICTIONS on
parameters of the ASSUMED MODEL.

4.Impose the RESTRICTIONS to obtain a RESTRICTED MODEL.

5. Verify that the RESTRICTED MODEL has the RESTRICTIONS IMPLIED BY THE
QUESTIONS OF INTEREST.

5.Use information from the ASSUMED and RESTRICTED MODELS to HELP
   make decisions about the questions of interest.

Hopefully, (Luv 2 muah 143) is being provided the opportunity to do the above!!
Reasonably talented high school students should be given to power to do this.

:-)

--- Joe
*  
Joe Ward  Health Careers High School 
167 East Arrowhead Dr  4646 Hamilton Wolfe   
San Antonio, TX 78228-2402   San Antonio, TX 78229  
Phone:  210-433-6575Phone: 210-617-5400
Fax: 210-433-2828 Fax: 210-617-5423 
[EMAIL PROTECTED]
http://www.ijoa.org/joeward/wardindex.html   
*



Re: ancova

1999-12-05 Thread Joe Ward

 DENNIS ROBERTS WRITES -
- Original Message - 
From: dennis roberts <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Sunday, December 05, 1999 7:24 AM
Subject: ancova


| some time ago, i sent out a note about a handout i had re: ancova. now, in 
| that handout, i illustrated a very simple case of how ancova might account 
| for some of the within groups 'error'. in that handout, i showed, near the 
| end ... some minitab output for the analysis. now, in that output ... the 
| adjusted SS adds up to MORE than what the simple anova adds too. NOTE: the 
| dependent measure in the Exp and Cont group example was performance on a 
| test .. and the covariate was IQ.
| 
| the one way shows:
| 
| One-way Analysis of Variance
| 
| Analysis of Variance
| Source DFSSMSFP
| Factor  1   252   252 1.540.231
| Error  18  2949   164
| Total  19  3201
| 
| and the ancova shows:
| 
| Analysis of Variance for TOTY, using Adjusted SS for Tests
| 
| Source DF Seq SS Adj SS Adj MS   F  P
| TOTIQ   1 1539.9 2057.9 2057.9   39.26  0.000
| Group   1  770.0  770.0  770.0   14.69  0.001
| Error  17  891.0  891.0   52.4
| Total  19 3200.9
| 
| in the handout, i showed that the adjusted SS(TOT) equals the sum of the 
| 770 and 891 values for Group and Error in the Adj SS columns ... but where 
| does the 2057 come from and, when you add to the 770 and 891 values .. you 
| get a much larger value than the original 3201?
| 
| what would be the simplest way to discuss this with students? in what way 
| could you use the original data on the dependent measure ... and show how 
| this new SS(TOT) value could be obtained?
| 
| thanks

| --
| 208 Cedar Bldg., University Park, PA 16802
| AC 814-863-2401Email mailto:[EMAIL PROTECTED]
| WWW: http://roberts.ed.psu.edu/users/droberts/drober~1.htm
| FAX: AC 814-863-1002

  JOE WARD COMMENTS ---
Hi, Dennis --

You probably can predict my comments!!

It is very difficult to try to explain the computer outputs
without knowing (or guessing) what hypotheses are being "tested".

A continuing situation that is observed on various Email lists
involves interpretations of computer outputs without
concern for understanding what questions are being "answered" by
the computer package.  In some situations -- particularly the "case
of the missing cell" --  the answers might be for "uninteresting questiong".
The communications provided by the internet continue to reveal the short-comings of
statistics education.
Until we change our statistics education these problems will not go away.

Without going into details of exactly how to proceed your
students should:

1. State their hypotheses of interest in "natural language".

2. Create an ASSUMED MODEL that allows them to express their
"natural language" hypotheses in terms of parameters in their
ASSUMED MODEL.

3. Impose the restrictions on the ASSUMED MODEL to obtain a 
RESTRICTED MODEL.

4. Compare the Error Sum of Squares from the ASSUMED MODEL with
the Error Sum of Squares from the RESTRICTED MODEL using an
F statistic.

In the 1960's, when we presented short courses on
Prediction/Regression/Linear Models at the American
Educational Research Association (AERA) it was indeed rare
to find anyone who knew the meaning of the MAIN EFFECTS HYPOTHESES
(ROW and COLUMN MAIN EFFECTS) in a TWO-FACTOR ANOVA.  Of course,
everyone knows it these days --I hope.

After your students have become acquainted with the
PREDICTION/REGRESSION/LINEAR MODELS approach, then it is fun
to ask them to do some DETECTIVE WORK to indicate the hypotheses
that are being tested in the computer output that you show in your
example -- and for more complicated computer outputs.

The "homework" or "exam" assignment might be as follows":

Indicate the hypotheses that are being "tested" in the computer 
outputs shown below.

1.Explain in as much detail as you can, including
a "natural language" statement and/or in terms of ASSUMED and
RESTRICTED MODELS.

2. How do you use the ANCOVA output to test for (NO)INTERACTION between
   IQ and GROUP? Is it possible? If not, why not? 

3. Create a model that will allow you to test for (NO) INTERACTION
between IQ and GROUP. Impose the restrictions needed to test for 
(NO)INTERACTION. Compare your ASSUMED MODEL with your RESTRICTED MODEL.

Incidentally, talented high school students SHOULD be able to handle this
if they are given the opportunity!

| One-way Analysis of Variance
| 
| Analysis of Variance
| Source DFSSMSFP
| Factor  1   252   252 1.540.231  (EXPLAIN THIS HYPOTHESIS)
| Error  18  2949   164
| Total  19  3201
| 
| and t

Re: categorical or numerical

1999-12-04 Thread Joe Ward

I've  watched the many thoughtful "categorical or numerical"
messages with interest.

For many, many years I've proposed that IN THE BEGINNING
ALL RECORDED INFORMATION is BINARY, CATEGORICAL, NOMINAL --  (not DUMMY).

AFTER humans associate MEANING with the categories then the information
acquires various NAMES/TYPES.  It is important to know the MEANING
that is associated with the recorded information. This is a problem with
all communication.

--

An interesting example is the discussion about GRADES GIVEN 
FOR ASSESSING PERFORMANCE BY STUDENTS.

In some situations we observe grade names as:
C, D,B,F,A   (CATEGORICAL, NOMINAL, BINARY, ...?)

THEN WE SOMETIMES THINK OF THEM AS BEING IN SOME "ORDER".
A,B,C,D,F 
(we probably use "F" for Failure, but why don't we use "E" for Excellent?)

THEN WE MIGHT LIKE TO COMPUTE GRADE POINT AVERAGES AND 
Let 4 = (another name for) "A"
3 = (another name for) "B"
2 = (another name for) "C"
1 = (another name for) "D"
0 = (another name for) "F"

(Why is the difference between D(1) and F(0) the same as ALL
other adjacent categories?)

And on the other hand we may have "SCORES" on a 100-point scale.
but someone might desire to have some "LETTERS".  So we may 
Let 90-100 = A
80-89  = B
    70-79  = C
60-69  = D
00-59  = F

:-)
-- Joe

*  
Joe Ward   Health Careers High School 
167 East Arrowhead Dr  4646 Hamilton Wolfe   
San Antonio, TX 78228-2402 San Antonio, TX 78229  
Phone:  210-433-6575   Phone: 210-617-5400
Fax: 210-433-2828  Fax: 210-617-5423 
[EMAIL PROTECTED]
http://www.ijoa.org/joeward/wardindex.html   
*

- Original Message - 
From: Jan de Leeuw <[EMAIL PROTECTED]>
To: Paul Velleman <[EMAIL PROTECTED]>; Hankins <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Cc: Paul F Velleman <[EMAIL PROTECTED]>
Sent: Saturday, December 04, 1999 7:59 AM
Subject: RE: categorical or numerical


| It's nice to sort of disagree with Paul for a change.
| 
| Students should be taught that ALL measurements are categorical and WHY
| we are usually pretty succesfull treating data AS IF it were measured on a
| continuous scale EVEN THOUGH WE KNOW IT IS NOT.
| 
| Thus, on a minor point, it is neither a good nor a bad idea to "slice 
| continuous
| measurements into categories".  There are no continuous measurements, so
| the whole notion is irrelevant. We can just choose to make our categories more
| broad, and this is a choice which is part of the analysis.  The argument
| that this "throws away information" seems to suggest that is inherently bad.
| But statistics is the art of throwing away information.
| 
| Think about the shift of emphasis if the normal would be moved back to its
| rightful historical place: as a convenient and widely applicable numerical
| approximation tool. No more nonsense such as "Assume the data are a sample from
| a normal distribution ... ".
| 
| Of course I agree with Paul that unnecessary discretizing is, well, 
| unnecessary.
| 
| At 11:46 PM -0500 12/3/99, Paul Velleman wrote:
| >At 11:14 PM 12/03/1999, Hankins wrote:
| >  >We would not be able to measure anything, then not able to record the
| >  >measurement, if slicing continuous measurements into categories is "almost
| >  >always a bad idea"!
| >
| >Perhaps I should have been more precise. Of course every recorded
| >measurement is discretized to some degree. What I oppose is *unnecessary*
| >discretizing.
| >
| >  >The students should rather be taught that ALL measurements are categorical.
| >
| >On this, however, I disagree. Calling a variable categorical usually
| >suggests a limited range of analysis possibilities. In fact, we are usually
| >pretty successful treating discretized data as if it were meausred on a
| >continuous scale even when we know it is not.
| >
| >-- paul
| >
| >Paul F. Velleman
| >Cornell University  Data Description, Inc.
| >358 Ives Hall  Box 4555
| >Ithaca, NY 14853   Ithaca, NY 14852-4555
| >(607) 255-4411  (607) 257-1000
| >(607) 255-8484 fax(607) 257-4146 fax
| >
| >
| >===
| >The Advanced Placement Statistics List
| >To UNSUBSCRIBE send a message to [EMAIL PROTECTED] containing:
| >unsubscribe apstat-l 
| >Discussion archives are at
| >http://forum.

Re: sets of values

1999-11-29 Thread Joe Ward

Bob makes his, as-usual, valuable comments!!
 
- Original Message - 
From: Bob Hayden <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Sunday, November 28, 1999 9:17 PM
Subject: Re: sets of values


| > On 27 Nov 1999 18:44:33 -0800, [EMAIL PROTECTED] wrote:
| > 
| > > Obviously the sets are not related in a linear fashion.
| > > 
| > > I would suggest that a 4th degree polynomial equation best fits the data.
| > 
| > Oh!  that should have been obvious
| 
| > Rich Ulrich, [EMAIL PROTECTED]
| > http://www.pitt.edu/~wpilib/index.html
|  
| I was hoping someone else would respond to the polynomial problem.
| Rich did, but I fear his point and his humor may be lost on those who
| need it most.
| 
| Higher order polynomial fits are problematic in many ways.  It would
| be VERY unusual for a polynomial of degree higher than two to be a
| reasonable model (outside of cases where prior theory specifically
| predicts a higher order polynomial).  A software package that
| recommends fitting a slew of higher order polynomials and then
| choosing among them is of dubious statistical quality.  To know what
| to do instead you would need to know more about the context and
| meaning of the data.  For example, in my current regression class we
| had data on electrical consumption of condominium units of different
| sizes.  A parabola gave a considerably better fit than a straight line
| -- but it also predicted that costs would peak out at a size within
| the range of the data and then drop off for larger sizes.  This is not
| very sensible.  My choice was to transform size into 1/size^2.  This
| was not perfect but it was reasonable for the range of sizes studied
| and did not do bizarre things just beyond that range.  It gave a model
| that rose more slowly for large sizes but never went down with
| increasing size.  
| 
| PS I learned about the dangers of fitting higher order polynomials as
| part of a final programming assignment in a Fortran course I took as
| an undergraduate at MIT in about 1970.  If you have n data points with
| distinct x-values, a polynomial of degree n-1 gives a PERFECT fit in
| the sense of going right through each point.  However, for n more than
| a few, it wiggles wildly between points and the matrix algebra croaked
| all the canned packages we had at the time because of multicollinearity
| problems.  The point of the assignment: having a computer is no
| substitute for knowing what you're doing.
|  
| 
|   _
|  | |  Robert W. Hayden
|  | |  Department of Mathematics
| /  |  Plymouth State College MSC#29
||   |  Plymouth, New Hampshire 03264  USA
|| * |  Rural Route 1, Box 10
|   /|  Ashland, NH 03217-9702
|  | )  (603) 968-9914 (home)
|  L_/  [EMAIL PROTECTED]
|   fax (603) 535-2943 (work)
| 

-  Joe Ward comments --
Hi, Bob  --

Re your first paragraph--

nth-degree polynomials CAN BE USEFUL IN FITTING A WIDE RANGE OF MODELS!

I'm assuming that you are referring to a LINEAR MODEL of the form:

Y = a0*U + a1*X + a2*X^2 + a3*X^3 + ... + an*X^n + E

(where U is a predictor of 1's -- the most neglected and misunderstood
predictor of all time)

By  applying the capabilities acquired in learning to apply restrictions
to investigate hypotheses using LINEAR MODELS it is possible to use
ONLY THOSE PARTS OF A GENERAL POLYNOMIAL THAT DO A GOOD JOB OF FITTING THE DATA.

We might START with a model of the general form shown above and then impose 
restrictions
so that we can use only THAT PART OF THE FUNCTION THAT HAS a monotonic increasing or 
decreasing portion of the more-general form; or, if desired, use only a portion of the 
function that has TWO CHANGES OF DIRECTION, etc.  It isn't necessary to use ALL of the 
predictors in the
general form.

If students are given the capability by their statistics teachers to impose 
restrictions on
models, these students will have useful tools outside the statistics world.

A student might want to create a model of the general form:

Y = a0*U + a1*X + a2*X^2 

such that the slope = 0  at X = k

or

such that the slope = s  at X = k

A curious student might want to spend many hours exploring the possibilities.

In the SAS system, it is quite easy have a very general STARTING MODEL, then
use the RESTRICT STATEMENT to create an ASSUMED MODEL and then use the TEST 
STATEMENT to test hypotheses.

When students are first learning to impose restrictions it seems best to have them
actually develop the RESTRICTED MODEL and then to VERIFY THAT THE RESTRICTED MODEL
HAS THE DESIRED PROPERTIES.  Even the SAS system might create "strange" models that
are not what the user has in mind.

Students can apply their "basic" algebra (IF they have "basic" algebra)to some
practical use!


And, referring to your MIT exper

Re: Need to evaluate difference between two R's

1999-11-24 Thread Joe Ward

- Original Message - 
From: Herman Rubin <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, November 24, 1999 10:07 AM
Subject: Re: Need to evaluate difference between two R's


| In article <[EMAIL PROTECTED]>,
| Rich Ulrich  <[EMAIL PROTECTED]> wrote:
| >On Tue, 23 Nov 1999 04:39:28 GMT, [EMAIL PROTECTED] wrote:
| 
| >> Does any one know how one might test for significant differences
| >> between two multiple R's (or R squar's)generated from two sets of data?
| >> I need to determine if two R's generated on two separate occasions
| >> using the same DV and IV's differ significantly from one another.
| 
| >Correlations are not very good candidates for comparisons, since it is
| >so easy to do tests that are more precise.
| > - to test whether the predictive relations are different, you would
| >test the regressions -- do a Chow test or the equivalent, to see if a
| >different set of regressors are needed for a different sampling.
| > - to test whether the variances are different (which is something
| >that would change the correlations), you might test variances
| >directly.
| 
| This is correct.  In fact, it is generally the case that
| correlations, except as measures of how well the model
| fits, do not have any real meaning.
| 
| Even the amount of the variance explained can change
| drastically with a change in design, but the parameters of
| the model do not change, if normalizations are not done.
| For example, if one has a "normal" model with correlation
| coefficient .5, 25% of the variance is explained.  Now 
| suppose that the predictor variable is selected to be
| 2 standard deviations away from the mean, equally likely
| to be in either direction.  Then the correlation becomes
| .756, and the proportion of the variance explained goes
| up to 57%.  But the prediction model is still the same.
| -- 
| This address is for information only.  I do not claim that these views
| are those of the Statistics Department or of Purdue University.
| Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
| [EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558
| 
-- 
Herman --

Great comment!

Discussions about correlation coefficients arise
periodically on various lists. So when the time seems 
appropriate I resend an old message (see below and the WORD 
attachment) that might be of interest.

IMHO their is too much time spent on the correlation coefficient
since it is of limited and sometimes misleading value
for practical decision-making in the real world.  However,
there are still some folks who are adjusting correlation
coefficients for "restriction of range" in hopes that it
might be useful.

-- Joe
*  
Joe Ward   Health Careers High School 
167 East Arrowhead Dr  4646 Hamilton Wolfe   
San Antonio, TX 78228-2402 San Antonio, TX 78229  
Phone:  210-433-6575   Phone: 210-617-5400
Fax: 210-433-2828  Fax: 210-617-5423 
[EMAIL PROTECTED]
http://www.ijoa.org/joeward/wardindex.html   
* 



-- Forwarded message --
Date: Fri, 23 May 1997 09:30:20 -0400 (EDT)
From: Mike Palij <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED], [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: Testing basic statistical concepts

I'd like to thank Joe Ward for reminding us of this situation
(his posting is appended below), as well as jogging my own
memory for a previous posting I had made.  A while back I
had posted the Anscombe dataset (in the context of an SPSS
program) which also clearly shows the benefit of plotting
the data:  the four situations produce almost identical
Pearson r values but only one actually shows the classic
scatterplot, the others show a nonlinear pattern and the
influence that a single point has on the calculation of r.
What does the value of r tell us here?  Aren't the basic 
statistical concepts to be learned in this situation far 
more important and most clearly seen through a coordination
of the graphical and numerical information?

-Mike Palij/Psychology Dept/New York University

Joe H Ward <[EMAIL PROTECTED]> writes:
 To Mike et al --
 
 There have been several message related to the Simple Correlation
 Coefficient.  IMHO, when out in the "real world" involving practical
 decision-making the correlation coefficient has very limited value and
 sometimes dangerous consequences.  The correlation coefficient may be
 an important topic for the history of statistics to learn the problems 
 associated with i

Re: blocking for variation/confounding

1999-01-16 Thread Joe Ward

Steve --

Your students are asking the good questions!!  This comes up
repeatedly.

I try to minimize reference to unfamiliar statistical terms when
I introduce students to the use of Prediction/Regression/Linear Models
in science research projects.  Without using special, unfamiliar terms
such as "reducing variation","blocking","confounding","main effects"
,etc. students can understand natural language statements, such as:

(Consider Example 10.22, page 774 of M&M 2nd edition)

1.  "Is there a difference between the MEAN HEART RATE AFTER 6 MINUTES
OF TREADMILL EXERCISE of RUNNERS WHO AVERAGED AT LEAST 15 MILES PER WEEK and
a GROUP OF "SEDENTARY" SUBJECTS?

A shorthand might be the following ONE expression:
  
Is MEAN PERFORMANCE OF THE EXERCISE GROUP = MEAN PERFORMANCE OF THE "COUCH POTATOES"?
or, equivalently
Is MEAN PERFORMANCE OF THE EXERCISE GROUP - MEAN PERFORMANCE OF THE "COUCH POTATOES = 
0"?

And after an interesting discussion about the problem, students might turn to:

2. "Is there a difference between the MEAN HEART RATE AFTER 6 MINUTES
OF TREADMILL EXERCISE of RUNNERS WHO AVERAGED AT LEAST 15 MILES PER WEEK and
a GROUP OF "SEDENTARY" SUBJECTS AND WHO ARE OF THE SAME SEX?

A shorthand might be the TWO expressions:

Is MEAN PERFORMANCE OF THE MALE EXERCISE GROUP = MEAN PERFORMANCE OF THE MALE "COUCH   
  POTATOES"?
and
Is MEAN PERFORMANCE OF THE FEMALE EXERCISE GROUP = MEAN PERFORMANCE OF THE FEMALE 
"COUCH 
POTATOES"?
   or, equivalently,
Is MEAN PERFORMANCE OF THE MALE EXERCISE GROUP - MEAN PERFORMANCE OF THE MALE  
 "COUCH POTATOES = 0"?
and
Is MEAN PERFORMANCE OF THE FEMALE EXERCISE GROUP - MEAN PERFORMANCE OF THE FEMALE 
"COUCH 
POTATOES = 0"?
   
And, further discussion might lead to:

3. If there ARE DIFFERENCES in 2. above:

Is 
MEAN PERFORMANCE OF THE MALE EXERCISE GROUP - MEAN PERFORMANCE OF THE MALE 
  "COUCH POTATOES"
=
MEAN PERFORMANCE OF THE FEMALE EXERCISE GROUP - MEAN PERFORMANCE OF THE FEMALE 
"COUCH POTATOES"?

Students can easily discuss these questions without special terminology.  Also,
they can discuss "controlling for", "holding fixed" or other expressions that make
sense to them -- WITHOUT SPECIAL TERMINOLOGY. 

These TWO-ATTRIBUTE PROBLEMS CAN BE DISCUSSED DURING THE FIRST DAY OF CLASS
TO SHOW THE POWERFUL QUESTIONS THAT CAN BE INVESTIGATED IF THE STUDENTS 
STICK WITH THEIR STUDY OF STATISTICS. 

Also, it is easy to enter additional predictor attributes if students feel 
that such attributes are relevant.  Notice that there is no need to discuss
whether or not the FOUR MUTUALLY EXCLUSIVE GROUPS MUST HAVE EQUAL NUMBERS OF
OBSERVATIONS since computers eliminate the computational problems presented
by UNEQUAL N's.

Even if the curriculum does not allow time to actually analyze these questions --
students should be aware of WHAT THEY MIGHT BE ABLE TO DO IF GIVEN THE OPPORTUNITY.

I strongly suggest that students are HIGHLY MOTIVATED by their POWER TO CONTROL FOR 
THE 
UNCONTROLLABLE. They really like the motto: IF YOU CAN'T CONTROL IT -- MEASURE IT AND
PUT IT IN THE MODEL.  In addition, if their future research leads to an interest in the
possible INTERACTION (a special term associated with question #3 above) between 
attributes,
then the attributes MUST BE MEASURED AND INCLUDED IN THE MODEL.  

And the sermon comes to the end!!  Amen!!  :-)

-- Joe
*  
Joe Ward   Health Careers High School 
167 East Arrowhead Dr  4646 Hamilton Wolfe   
San Antonio, TX 78228-2402 San Antonio, TX 78229  
Phone:  210-433-6575   Phone: 210-617-5400
Fax: 210-433-2828  Fax: 210-617-5423 
[EMAIL PROTECTED]
http://www.ijoa.org/joeward/wardindex.html   
*
- Original Message - 
From: SUGHRUE, STEVE <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, November 18, 1999 7:15 PM
Subject: blocking for variation/confounding


| Hi everyone!
| In the AP course description booklet, multiple choice question
| number 13 asks for the primary reason for blocking when designing an
| experiment. My students and I agree that reducing variation is a good
| answer, but isn't reducing confounding also pretty good? Are we missing
| something here?? 
| Thanks to anyone who can help .
| 
| Steve Sughrue
| Tabor Academy
| Marion, MA
| 
| ===
| The Advanced Placement Statistics List
| To UNSUBSCRIBE send a message to [EMAIL PROTECTED] containing:
| unsubscribe apstat-l 
| Discussion archives are at
| http://forum.swarthmore.edu/epigone/apstat-l
| Problems with the list or your subscription? mailto:[EMAIL PROTECTED]
| ===
|