Re: Schmid-Leiman
Julie -- I worked with Jack Schmid and John Leiman at the Air Force Personnel and Training Research Center at Lackland AFB. I communicate with Jack Schmid occasionally but I'm not sure where John Leiman is located now. Perhaps Jack can point you to someone who can help. -- Joe - Original Message - From: "Penley, Julie" <[EMAIL PROTECTED]> To: "edstat (E-mail) (E-mail)" <[EMAIL PROTECTED]> Sent: Wednesday, October 10, 2001 9:42 AM Subject: Schmid-Leiman > Could someone please tell me how to perform a Schmid-Leiman transformation > in SPSS? Thanks very much. > Julie > > > Julie A. Penley, M.A. > Evaluation Coordinator > Partnership in Teacher Preparation > The University of Texas at El Paso > El Paso, TX 79968 > phone: (915) 747-5642 > > > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = > = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Analysis of covariance
Paolo -- Here comes my usual response to messages similar to yours: Following the use of Regression/Linear Models: 1. State your research question in "NATURAL LANGUAGE" not in terms of a "canned statistical name" that may or may not be relevant to your question. 2. Create an ASSUMED MODEL that allows you to translate your "NATURAL LANGUAGE" questions into RESTRICTIONS on your ASSUMED MODEL. 3. Impose the restrictions on your ASSUMED MODEL to obtain your RESTRICTED MODEL and then you have the essentials to test your hypotheses. If this procedure is IDENTICAL to someone's COVARIANCE ANALYSIS then you might want to call yours a COVARIANCE ANALYSIS. -- Joe *** Joe H. Ward, Jr. *** 167 East Arrowhead Dr. *** San Antonio, TX 78228-2402 *** Phone: 210-433-6575 *** Fax: 210-433-2828 *** Email: [EMAIL PROTECTED] *** http://www.northside.isd.tenet.edu/healthww/biostatistics/wardindex.html *** --- *** Health Careers High School *** 4646 Hamilton-Wolfe *** San Antonio, TX 78229 * - Original Message - From: "Morelli Paolo" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, September 25, 2001 5:26 AM Subject: Analysis of covariance > HI all, > I have to analyse some clinical data. In particular the analysis is a > comparison between two groups of the mean change baseline to endpoint of a > score. The statistician who planned the analysis used the ANCOVA on the mean > change, using as covariate the baseline values of the scores. > Do you think this analysis is correct? > I thing that in this way we are correcting twice. I think that the right > analysis is an ANOVA on the mean change. > Please let me know your opinion > thanks > Paolo > > > > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = > = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: adjusted r-square
If the least-squares regression algorithm does not "REQUIRE THE NUMBER OF OBSERVATIONS TO EXCEED THE NUMBER OF PREDICTORS, THEN THE REGRESSION ALGORITHM COULD BE USED TO SOLVE A SYSTEM OF SIMULTANEOUS EQUATIONS THAT WOULD HAVE NO ERRORS." Another "interesting" characteristic of Excel Regression is that it "requires the number of observations to exceed the number of predictors". Fortunately, Colin Bell is working with the Excel folks at Microsoft to improve the numerous "interesting" characteristics of Statistics in Excel. -- Joe *** Joe H. Ward, Jr. *** 167 East Arrowhead Dr. *** San Antonio, TX 78228-2402 *** Phone: 210-433-6575 *** Fax: 210-433-2828 *** Email: [EMAIL PROTECTED] *** http://www.ijoa.org/resumes/ward.html *** --- *** Health Careers High School *** 4646 Hamilton-Wolfe *** San Antonio, TX 78229 * - Original Message - From: "Graeme Byrne" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, August 22, 2001 4:42 AM Subject: Re: adjusted r-square > In short, you don't. If the number of terms in the model equals the number > of observations you have much bigger problems than not being able to compute > adjusted R^2. It should always be the case that the number of observations > exceed the number of terms in the model otherwise you cannot calculate any > of the standard regression diagnostics (F-stats, t-stats etc). My advice is > get more data or remove terms from the model. If neither of these is an > option you are stuck. > > > "Atul" <[EMAIL PROTECTED]> wrote in message > [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > > I have a doubt regarding adjusted r-square > > > > How do we calculate the adjusted r-square when the error degrees of > > freedom are zero ? > > (or in other words, number of samples is equal to the number of > > regression terms including the constant) > > > > Such a situation leads to a zero in the denominator in the expression > > for calculating adjusted r-square. > > > > Your help is highly appreciated. > > > > Thanks > > Atul > > > > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = > = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Experimental Design Text Advice
DENNIS ROBERTS WRITES - - Original Message - From: "dennis roberts" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, January 18, 2001 1:31 PM Subject: Re: Experimental Design Text Advice > At 10:49 AM 1/18/01 -0600, Ken K. wrote: > >I find BH&H to be quite good, but a little hard to read and getting a little > >dated. I much prefer "Design and Analysis of Experiments" by Douglas C. > >Montgomery, John Wiley & Sons, ISBN 0-471-52000-4 > > > >I really like the simple style Montogomery uses in all his books > > not disagreeing with the above but, one of the big problems in selecting a > book on experimental design is ... that appropriate designs DEPEND upon the > problem(s) being investigated ... > > in addition, i have sensed that most "design" books are not really about > designing experiments but, how to analyze data FROM particular designs ... > and there IS a large difference > > while it may not be too difficult to talk about 1 and 2 and 3 or more > factor designs ... with blocking variables or not ... with repeated > measures or not (etc.) ... whether any of these are appropriate again > depends on the question(s) being asked > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = > === Joe Ward Comments == Dennis -- You said it well!!! I might add that a good approach is to develop capabilities to create models and impose restrictions to answer the RESEARCH QUESTIONS OF INTEREST TO THE RESEARCHER. It is difficult to teach students to create models if the teachers have not developed their own capability to create models appropriate to the research questions of interest. -- Joe Joe Ward 167 East Arrowhead Dr. San Antonio, TX 78228-2402 Home phone: 210-433-6575 Home fax: 210-433-2828 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html Health Careers High School 4646 Hamilton Wolfe San Antonio, TX 78229 Phone: 210-617-5400 Fax: 210-617-5423 - Original Message - From: "dennis roberts" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, January 18, 2001 1:31 PM Subject: Re: Experimental Design Text Advice > At 10:49 AM 1/18/01 -0600, Ken K. wrote: > >I find BH&H to be quite good, but a little hard to read and getting a little > >dated. I much prefer "Design and Analysis of Experiments" by Douglas C. > >Montgomery, John Wiley & Sons, ISBN 0-471-52000-4 > > > >I really like the simple style Montogomery uses in all his books > > not disagreeing with the above but, one of the big problems in selecting a > book on experimental design is ... that appropriate designs DEPEND upon the > problem(s) being investigated ... > > in addition, i have sensed that most "design" books are not really about > designing experiments but, how to analyze data FROM particular designs ... > and there IS a large difference > > while it may not be too difficult to talk about 1 and 2 and 3 or more > factor designs ... with blocking variables or not ... with repeated > measures or not (etc.) ... whether any of these are appropriate again > depends on the question(s) being asked > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = > = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Re: topic?
Happy New Year -- Perhaps Laurie Snell will make a good start through the future CHANCE issues. -- Joe Joe Ward 167 East Arrowhead Dr. San Antonio, TX 78228-2402 Home phone: 210-433-6575 Home fax: 210-433-2828 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html Health Careers High School 4646 Hamilton Wolfe San Antonio, TX 78229 Phone: 210-617-5400 Fax: 210-617-5423 - Original Message - From: "Bokhorst, Frank" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, January 02, 2001 4:41 AM Subject: Re: topic? > Bob Hayden asked: > > > Anybody have anything to say about statistical education??? > > I would like to turn the question round, and ask if it might be > possible to summarize relevant material from the recent discussion > on the forum about the US election saga into a form suitable for > teaching purposes? > > In particular, to sift through the EDSTAT archive and edit a > resource text. > > There was much off-topic discussion, but there was also a huge > volume of generally polite and reasonable talk with many good > points illustrating key issues relevant to education. The topic > itself was extremely pertinent and interesting to a wide audience. > For example, someone recently asked for examples of the misuse of > statistics - surely many examples could be found in the US election > saga? What we need is a good summary. > > As another example, I note that Herman Rubin frequently argues > the need for proper understanding of statistics: Could he, or > someone anybody else on the EDSTAT forum, perhaps help educators > by compiling some examples that arose in the recent discussion? > What kind of understanding of statistics might be required of > lawyers, politicians, voters, media editors? > > Maybe someone could list key points that came out of these EDSTAT > discussions? > > > Frank Bokhorst > http://www.uct.ac.za/depts/psychology/bok > _O > tel: 021 650-3708 -\<, > fax: 021 689-7572 One car less (.)/(.) > Psychology Dept., The owner of this bicycle > University of takes responsibility for > Cape Town, the shape of his drawing > Rondebosch 7701,only if you use a fixed > South Africa. size font such as Courier. > > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = > = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Statistical penalties for sequential analyses
Rich - You might want to consider doing some Resampling (Cross-Validation, Bootstrap) as you continue through your analyses. -- Joe Joe Ward Health Careers High School 167 East Arrowhead Dr _ 4646 Hamilton Wolfe San Antonio, TX 78228-2402 San Antonio, TX 78229 Phone: 210-433-6575__ Phone: 210-617-5400 Fax: 210-433-2828 Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** - Original Message - From: <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Friday, December 08, 2000 3:30 PM Subject: Statistical penalties for sequential analyses > Need some advice. We are doing a series of tests looking for correlations > among age-sensitive variables in a population of mice. We will have about > 600 mice in all, and it will take 3 years to test each mouse at about 200 > mice tested each year. > > We are considering three strategies: > > A) Wait 3 years until all the data are in; then do the analyses. > > B) Analyze the data on the first 300 mice, and publish anything that looks > exciting and meets conventional significance criteria. When the second set > of mice is finished, we can use these second 300 animals as a replicate > samples to (try to) confirm the significant findings we reported on the first > set. And we can also pool all 600 mice to obtain higher statistical power > than we had for the initial analysis with N = 300. > > Of course this represents testing some hypotheses twice, and thus increases > the Type I error rate. I suspect that there are theoretically justified > methods for adjusting significance criteria to "adjust" for taking two looks > at the data, but I don't know how to do this. Anyone have a recipe, or a > reference to get me started? > > Thanks. > > Rich Miller > University of Michigan > > Reply to: [EMAIL PROTECTED] > > > Sent via Deja.com http://www.deja.com/ > Before you buy. > > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = > = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: [ap-stat] Textbook for "regular" statistics vs. AP Statistics
- Original Message - From: "Carole Black" <[EMAIL PROTECTED]> To: "AP Statistics" <[EMAIL PROTECTED]> Sent: Wednesday, November 29, 2000 12:58 PM Subject: [ap-stat] Textbook for "regular" statistics vs. AP Statistics > I have taught a "regular" statistics class at my high school for the > last 3 years using Elementary Statistics by Mario Triola. (This was > the book I inherited.) This is textbook adoption year for Georgia and > I have the priviledge of picking out Statistics books for both the > "regular" stat class as well as a new AP class that will be offered > for the first time next year. (I will be teaching both classes). My > first question is, should I go with 2 different textbooks or the same > textbook? > > My second question is much the same as many others posted on this > site, which book? I am seriously considering the Yates, Moore and > McCabe "The Practice of Statistics" for the AP class. I am > considering either Moore's "Basic Practice of Statistice" or the > "Elementary Statistics" book published by McGraw Hill for the regular > statistics class. > > Any comments would be greatly appreciated. > Carole Black > > --- = Joe Ward Comments == Hi, Carole -- Your opportunity of having an AP-Statistics class and a "regular" Statistics class can allow you the freedom of using the "regular" class to give students the capability to use the combined power of Regression/Linear Models and Computers to investigate some interesting and practical research questions. You might recruit some of your science students to give them useful techniques to support their research projects. You can give your students the power to create models to answer their research questions. It is certainly reasonable that you must give your AP-Statistics students the objectives that tend to match the corresponding college course. For the "regular" Statistics course you can make the course both interesting and practical without the constraints of AP-Statistics. There probably are many AP teachers who can accomplish the AP-Statistics objectives AND have extra time to give their students some more powerful capabilities. Try to make your "regular" statistics course available for ALL students. Frequently, the "regular" course is designed for the less talented. You CAN make the regular course the more popular since your students might be able to do some powerful research. Students who are involved with Science Fairs, Jr. Academy of Science and the ASA Project/Poster competitions should be your target population for the "regular" course. Be sure to have access to books that contain ideas of how to use Regression/Linear models to create models to answer the students research questions of interest. -- Joe Joe Ward Health Careers High School 167 East Arrowhead Dr _ 4646 Hamilton Wolfe San Antonio, TX 78228-2402 San Antonio, TX 78229 Phone: 210-433-6575__ Phone: 210-617-5400 Fax: 210-433-2828 Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
[ap-stat] RE: election proposal
Does anyone know WHY so many states DON'T DO IT THIS WAY? Perhaps the Political Science/History folks can comment. -- Joe **** Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** - Original Message - From: "Lee Creighton" <[EMAIL PROTECTED]> To: "AP Statistics" <[EMAIL PROTECTED]> Sent: Monday, November 13, 2000 8:11 AM Subject: [ap-stat] RE: election proposal > People are listening! This is exactly how Nebraska and Maine vote, as we speak. > > It was decided after the disastrous 1824 election that the states would have the power to manage how they pick electors, and *not* the federal government. > > > -Original Message- > > From: Jon Graetz [mailto:[EMAIL PROTECTED]] > > Sent: Sunday, November 12, 2000 11:30 PM > > To: AP Statistics > > Subject: [ap-stat] RE: election proposal > > > > > > I like it! Now, to get anyone else to listen... > > > > Jon Graetz > > The Miami Valley School > > 5151 Denise Drive > > Dayton, OH 45429 > > (937)434- > > [EMAIL PROTECTED] > > [EMAIL PROTECTED] > > > > -Original Message- > > From: Reba Taylor [mailto:[EMAIL PROTECTED]] > > Sent: Sunday, November 12, 2000 11:00 PM > > To: AP Statistics > > Subject: [ap-stat] election proposal > > > > > > I've been toying with this idea: > > > > Each state has the same number of electors as their congressional > > delegation: e.g. in VA, we have 11 congressional districts > > + 2 senators = > > 13 electors. > > > > Let's keep the electors, but have the ones representing the > > congressional > > districts vote the way their district votes. Then the 2 > > at-large electors > > will vote the way the state as a whole votes. > > > > I think this is more equable than winner-take-all. I also > > think it would > > be a more representative sample of the popular vote, but > > still giving the > > smaller states as much clout as the larger ones. > > > > Reba Taylor > > > > > > * > > * Reba Taylor [EMAIL PROTECTED] * > > * * > > * Home: School: * > > * Blacksburg High School * > > * 2418 Ridge Road 520 Patrick Henry Drive * > > * Blacksburg, VA 24060 Blacksburg, VA 24060 * > > * 540-953-2421 540-951-5706 * > > * * > > * AP Computer Science, AP Statistics, Math * > > * * > > * Black holes are where God divided by zero. * > > * * > > * "Can't never could, till it tried!" -- S.C. Taylor > > * > > * * > > * > > --- > You are currently subscribed to ap-stat as: [EMAIL PROTECTED] > To unsubscribe send a blank email to > [EMAIL PROTECTED] > Frequently Asked Questions(FAQ) Site is at > http://www.ncssm.edu/statsteachers > AP Statistics Archives are at http://forum.swarthmore.edu/epigone/apstat-l > > --- You are currently subscribed to ap-stat as: [EMAIL PROTECTED] To unsubscribe send a blank email to [EMAIL PROTECTED] Frequently Asked Questions(FAQ) Site is at http://www.ncssm.edu/statsteachers AP Statistics Archives are at http://forum.swarthmore.edu/epigone/apstat-l = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Help needed ... :-(
Well said, Bob -- -- Joe Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** - Original Message - From: "Bob Hayden" <[EMAIL PROTECTED]> To: "EdStat-L" <[EMAIL PROTECTED]> Sent: Monday, November 13, 2000 9:46 PM Subject: Re: Help needed ... > - Forwarded message from David Heiser - > > > - Original Message - > From: Dennis <[EMAIL PROTECTED]> > > > Hello Newsgroup, I'm searching for real good books on stats. I'm a > > student of psychology and we've been taught very much stats. But I > > read all the time your postings and wonder why I've never heard > > about that what I read. > ... > > Hopefully and with much regards > > yours Dennis > > > --- > > What you need is a good class in written English > DAH > > - End of forwarded message from David Heiser - > > From the email address, it appears that Dennis lives in a European > country where English is not the predominant language. The written > English here far surpasses my written French, German or Latin, to > mention only languages I have studied. I note that, unlike most > Americans, Dennis uses the word "hopefully" correctly. Of course, if > Americans were as good with other people's languages as Europeans are, > Dennis could have sent us a native-language posting, and then > criticized us when we tried to respond in that language. > > I think this list can benefit greatly from being an INTERNATIONAL > list. Let's make folks from other countries feel welcome. > > > _ > | | Robert W. Hayden > | | Work: Department of Mathematics > / | Plymouth State College MSC#29 >| | Plymouth, New Hampshire 03264 USA >| * | fax (603) 535-2943 > /| Home: 82 River Street (use this in the summer) > | ) Ashland, NH 03217 > L_/ (603) 968-9914 (use this year-round) > Map of New[EMAIL PROTECTED] (works year-round) > Hampshire http://mathpc04.plymouth.edu (works year-round) > > The State of New Hampshire takes no responsibility for what this map > looks like if you are not using a fixed-width font such as Courier. > > "Opportunity is missed by most people because it is dressed in > overalls and looks like work." --Thomas Edison > > > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = > = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: [ap-stat] RE: election proposal
Does anyone know WHY so many states DON'T DO IT THIS WAY? Perhaps the Political Science/History folks can comment. -- Joe **** Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** - Original Message - From: "Lee Creighton" <[EMAIL PROTECTED]> To: "AP Statistics" <[EMAIL PROTECTED]> Sent: Monday, November 13, 2000 8:11 AM Subject: [ap-stat] RE: election proposal > People are listening! This is exactly how Nebraska and Maine vote, as we speak. > > It was decided after the disastrous 1824 election that the states would have the power to manage how they pick electors, and *not* the federal government. > > > -Original Message- > > From: Jon Graetz [mailto:[EMAIL PROTECTED]] > > Sent: Sunday, November 12, 2000 11:30 PM > > To: AP Statistics > > Subject: [ap-stat] RE: election proposal > > > > > > I like it! Now, to get anyone else to listen... > > > > Jon Graetz > > The Miami Valley School > > 5151 Denise Drive > > Dayton, OH 45429 > > (937)434- > > [EMAIL PROTECTED] > > [EMAIL PROTECTED] > > > > -Original Message- > > From: Reba Taylor [mailto:[EMAIL PROTECTED]] > > Sent: Sunday, November 12, 2000 11:00 PM > > To: AP Statistics > > Subject: [ap-stat] election proposal > > > > > > I've been toying with this idea: > > > > Each state has the same number of electors as their congressional > > delegation: e.g. in VA, we have 11 congressional districts > > + 2 senators = > > 13 electors. > > > > Let's keep the electors, but have the ones representing the > > congressional > > districts vote the way their district votes. Then the 2 > > at-large electors > > will vote the way the state as a whole votes. > > > > I think this is more equable than winner-take-all. I also > > think it would > > be a more representative sample of the popular vote, but > > still giving the > > smaller states as much clout as the larger ones. > > > > Reba Taylor > > > > > > * > > * Reba Taylor [EMAIL PROTECTED] * > > * * > > * Home: School: * > > * Blacksburg High School * > > * 2418 Ridge Road 520 Patrick Henry Drive * > > * Blacksburg, VA 24060 Blacksburg, VA 24060 * > > * 540-953-2421 540-951-5706 * > > * * > > * AP Computer Science, AP Statistics, Math * > > * * > > * Black holes are where God divided by zero. * > > * * > > * "Can't never could, till it tried!" -- S.C. Taylor > > * > > * * > > * > > --- > You are currently subscribed to ap-stat as: [EMAIL PROTECTED] > To unsubscribe send a blank email to > [EMAIL PROTECTED] > Frequently Asked Questions(FAQ) Site is at > http://www.ncssm.edu/statsteachers > AP Statistics Archives are at http://forum.swarthmore.edu/epigone/apstat-l > > = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
[ap-stat] Re: revote and Accuracy and Design of Voting Forms
Bob Hayden wrote to the AP list: == > > - Original Message - > > From: "Bob Hayden" <[EMAIL PROTECTED]> > > To: "AP Statistics" <[EMAIL PROTECTED]> > > Sent: Friday, November 10, 2000 10:01 AM > > Subject: [ap-stat] revote > > > > > > > After considering all the issues raised on the lists regarding the > > > election, I think the best solution would be a revote in every state > > > of the union -- but with NEW CANDIDATES!-) > > > -- > > > | | Robert W. Hayden > > > | | Work: Department of Mathematics > > > / | Plymouth State College MSC#29 > > >| | Plymouth, New Hampshire 03264 USA > > >| * | fax (603) 535-2943 > > > /| Home: 82 River Street (use this in the summer) > > > | ) Ashland, NH 03217 > > > L_/ (603) 968-9914 (use this year-round) > > > Map of New[EMAIL PROTECTED] (works year-round) > > > Hampshire http://mathpc04.plymouth.edu (works year-round) > > > > > > The State of New Hampshire takes no responsibility for what this map > > > looks like if you are not using a fixed-width font such as Courier. > > > > > > "Opportunity is missed by most people because it is dressed in > > > overalls and looks like work." --Thomas Edison === Joe Ward replied to Bob Hayden === > > Hey, Bob -- > > > > THAT really brought some hearty chuckles to > > Bettie and I. > > > > -- Joe > > > > Joe Ward.Health Careers High School > > 167 East Arrowhead Dr4646 Hamilton Wolfe > > San Antonio, TX 78228-2402...San Antonio, TX 78229 > > Phone: 210-433-6575...Phone: 210-617-5400 > > Fax: 210-433-2828....Fax: 210-617-5423 > > Email: [EMAIL PROTECTED] > > http://www.ijoa.org/joeward/wardindex.html > > *** == Then Bob Hayden wrote: = - Original Message - From: "Bob Hayden" <[EMAIL PROTECTED]> To: "Joe Ward" <[EMAIL PROTECTED]> Sent: Friday, November 10, 2000 10:41 AM Subject: Re: [ap-stat] revote > > Their post-election bickering did not endear them to me. I think they > should both go home, return to their jobs, and SHUT UP. > Joe Ward Comments about Accuracy of Voting Responses == Is there research on the Design of Voting Forms? = Hi, Bob -- In ANY election, the format for obtaining voting responses should be designed to minimize the chances for inaccurate responses. It is surprising that the "format-approval folks" in Palm Beach did not redesign the form. It looks like the form was designed for convenience of the computer folks or the print shop or others--but not for the accuracy of responses. No matter who is the winner in any election, there probably are some local voting systems that need "fine tuning". In San Antonio, we have gone through numerous varieties of voting formats. Some seem better than others. I'm not sure how the final forms are "approved". In this recent election we used felt-tip markers!!! The ink soaked through to the back side of the paper but when my wife mentioned it, the "judges" said that it had been checked and "did not interfere with the markings on the other side". But do we know what happens if there is a SMEAR of the wet ink? Does THAT BALLOT COUNT, or is it rejected? If I were running for election in our county and the voting was close, then I certainly would ask for a "hand" recount to find out how many votes were rejected by the scan machine because of "smear" or because the wet ink soaked through the paper (probably cheap paper) and was "sensed" on the back. Perhaps there should be a research project designed by a TASK FORCE of some ASA members to evaluate the many different forms to find out which form(s) MINIMIZE INACCURACY OF RESPONSE. It is likely that such research has been done since it is such an important activity. The studies should consider age, education, language and other variables. -- Joe Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400
Re: [ap-stat] revote and Accuracy and Design of Voting Forms
Bob Hayden wrote to the AP list: == > > - Original Message - > > From: "Bob Hayden" <[EMAIL PROTECTED]> > > To: "AP Statistics" <[EMAIL PROTECTED]> > > Sent: Friday, November 10, 2000 10:01 AM > > Subject: [ap-stat] revote > > > > > > > After considering all the issues raised on the lists regarding the > > > election, I think the best solution would be a revote in every state > > > of the union -- but with NEW CANDIDATES!-) > > > -- > > > | | Robert W. Hayden > > > | | Work: Department of Mathematics > > > / | Plymouth State College MSC#29 > > >| | Plymouth, New Hampshire 03264 USA > > >| * | fax (603) 535-2943 > > > /| Home: 82 River Street (use this in the summer) > > > | ) Ashland, NH 03217 > > > L_/ (603) 968-9914 (use this year-round) > > > Map of New[EMAIL PROTECTED] (works year-round) > > > Hampshire http://mathpc04.plymouth.edu (works year-round) > > > > > > The State of New Hampshire takes no responsibility for what this map > > > looks like if you are not using a fixed-width font such as Courier. > > > > > > "Opportunity is missed by most people because it is dressed in > > > overalls and looks like work." --Thomas Edison === Joe Ward replied to Bob Hayden === > > Hey, Bob -- > > > > THAT really brought some hearty chuckles to > > Bettie and I. > > > > -- Joe > > > > Joe Ward.Health Careers High School > > 167 East Arrowhead Dr4646 Hamilton Wolfe > > San Antonio, TX 78228-2402...San Antonio, TX 78229 > > Phone: 210-433-6575...Phone: 210-617-5400 > > Fax: 210-433-2828....Fax: 210-617-5423 > > Email: [EMAIL PROTECTED] > > http://www.ijoa.org/joeward/wardindex.html > > *** == Then Bob Hayden wrote: = - Original Message - From: "Bob Hayden" <[EMAIL PROTECTED]> To: "Joe Ward" <[EMAIL PROTECTED]> Sent: Friday, November 10, 2000 10:41 AM Subject: Re: [ap-stat] revote > > Their post-election bickering did not endear them to me. I think they > should both go home, return to their jobs, and SHUT UP. > Joe Ward Comments about Accuracy of Voting Responses == Is there research on the Design of Voting Forms? = Hi, Bob -- In ANY election, the format for obtaining voting responses should be designed to minimize the chances for inaccurate responses. It is surprising that the "format-approval folks" in Palm Beach did not redesign the form. It looks like the form was designed for convenience of the computer folks or the print shop or others--but not for the accuracy of responses. No matter who is the winner in any election, there probably are some local voting systems that need "fine tuning". In San Antonio, we have gone through numerous varieties of voting formats. Some seem better than others. I'm not sure how the final forms are "approved". In this recent election we used felt-tip markers!!! The ink soaked through to the back side of the paper but when my wife mentioned it, the "judges" said that it had been checked and "did not interfere with the markings on the other side". But do we know what happens if there is a SMEAR of the wet ink? Does THAT BALLOT COUNT, or is it rejected? If I were running for election in our county and the voting was close, then I certainly would ask for a "hand" recount to find out how many votes were rejected by the scan machine because of "smear" or because the wet ink soaked through the paper (probably cheap paper) and was "sensed" on the back. Perhaps there should be a research project designed by a TASK FORCE of some ASA members to evaluate the many different forms to find out which form(s) MINIMIZE INACCURACY OF RESPONSE. It is likely that such research has been done since it is such an important activity. The studies should consider age, education, language and other variables. -- Joe Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: 2 factor ANOVA with empty cells
Right you are, Elliot. However, when one finds "no-interaction" among all of those cells that are present, then one can feel "better" about estimating the "missing" cell values. Of course, there could be a surprising explosion!! The more interaction that is detected the more dangerous it can be. When there is little or no interaction it is possible to design the study to save money and time. There is no need to fill in all the cells all the time -- particularly when the cost is great. The real experimental design "experts" can get lots of information from a small study that might have missing cells "strategically located". - Joe **** Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** - Original Message - From: "Elliot Cramer" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, November 01, 2000 8:43 PM Subject: Re: 2 factor ANOVA with empty cells > Jeff E. Houlahan <[EMAIL PROTECTED]> wrote: > : Is it ever appropriate to do a 2-factor unreplicated ANOVA with > : empty cells if you aren't sure there is no interaction between the > ^ > you can test the part of the interaction that is testable, but of course > you can never know about the rest. > > > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = > = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
[ap-stat] Independent-Dependent Variable Discussion--Inverse Estimation
Hi Dan and all -- I had intended to comment about the independent-dependent variable discussion earlier but I got side-tracked. Since Dan reminded us with his comment: "> This problem statement also brings back the independent-dependent variable > discussion. In the real context, the activity level of the crickets depends > upon the temperature, so temperature is the independent variable and number > of chirps the dependent variable. However, if you want to predict the > temperature using the number of chirps, you must consider the number of > chirps as the "independent" variable and temperature as the "dependent" > variable." I have inserted some comments below: === Joe Ward writes == In the ancient past (1950s), for calibration studies -- Let Y be a reading from a measuring instrument, SUBJECT TO "ERRORS OF MEASUREMENT". and X be a KNOWN STANDARD, ASSUMED TO BE "WITHOUT ERROR" (FIXED). Then the least-squares regression model used to PREDICT THE "STANDARD" (X) from the measurement Y WAS computed as: Y = b0 + b1*X + Error Then from this equation to estimate (predict) the KNOWN STANDARD (X) from the measurement (Y), the past procedure was to solve for X in the above equation (leaving off the Error) Y = b0 + b1*X or X = (Y-b0)/b1 is used to PREDICT X from Y. Dan, you probably are better acquainted with the most recent approach from the Bureau of Standards since I have not kept up with any changes in the Standards calibration policy. Furthermore, in the distant past, it is interesting to note that simultaneous regression equations were solved to estimate unkown amounts of chemical compositions in a solution. An interesting study by Fisher, Hans, R.G. Hansen, and H.W. Norton (1955). Quantitative determination of glucose and galactose. Anal. Chem. 27, 857-859. is discussed in E.J Williams' book Regression Analysis, Wiley, 1959, page 163. Williams refers to this topic as INVERSE ESTIMATION. Even though the goal is to ESTIMATE (PREDICT) the values of X, the dependent variables (Y's) are the MEASURES SUBJECT TO ERROR. After the least-squares solutions are computed then the simultaneous regression equations are solved, INVERSELY, for unknown X values from measured(observed) values of Y (which are subject to ERRORS). It would be interesting to know if this approach is still used. Is the INVERSE method BETTER? Have there been recent studies comparing the REGULAR approach with the INVERSE approach? Comments from experienced "experts" in this area are welcome. -- Joe Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** === End of Joe Ward's message = - Original Message - From: "Teague, Dan" <[EMAIL PROTECTED]> To: "AP Statistics" <[EMAIL PROTECTED]> Sent: Friday, October 20, 2000 10:42 AM Subject: [ap-stat] RE: effect on LSRL > Rebecca, > > If your student chose values of the independent variable that were very > large (250-450) and found the y-values that correspond to these x-values > using y = 56.212 + 0.1356x, then he could increase the slope. For these > data, the point (249, 55) is below that portion of the regression line on > the left. The regression line would be pulled towards the point, just as > you said, but in this situation, it would cause the slope to increase. > > The student's argument is flawed to the extent that these values of the > independent variable do not match the summary statistics (xbar = 167 and s = > 31). We expect to find the number of chirps between 70 and 290 and the > temperature roughly between 50 and 100. For these values of x, the slope > will be pulled down by the addition of this point. > > This problem statement also brings back the independent-dependent variable > discussion. In the real context, the activity level of the crickets depends > upon the temperature, so temperature is the independent variable and number > of chirps the dependent variable. However, if you want to predict the > temperature using the number of chirps, you must consider the number of > chirps as the "independent" variable and temperature as the "dependent" > variable. > > > Daniel J. Teague > NC School of Science and Mathematics > 1219 Broad Street > Durham, NC 27705 > [EMAIL PROTECTED] > &g
Independent-Dependent Variable Discussion--Inverse Estimation
Hi Dan and all -- I had intended to comment about the independent-dependent variable discussion earlier but I got side-tracked. Since Dan reminded us with his comment: "> This problem statement also brings back the independent-dependent variable > discussion. In the real context, the activity level of the crickets depends > upon the temperature, so temperature is the independent variable and number > of chirps the dependent variable. However, if you want to predict the > temperature using the number of chirps, you must consider the number of > chirps as the "independent" variable and temperature as the "dependent" > variable." I have inserted some comments below: === Joe Ward writes == In the ancient past (1950s), for calibration studies -- Let Y be a reading from a measuring instrument, SUBJECT TO "ERRORS OF MEASUREMENT". and X be a KNOWN STANDARD, ASSUMED TO BE "WITHOUT ERROR" (FIXED). Then the least-squares regression model used to PREDICT THE "STANDARD" (X) from the measurement Y WAS computed as: Y = b0 + b1*X + Error Then from this equation to estimate (predict) the KNOWN STANDARD (X) from the measurement (Y), the past procedure was to solve for X in the above equation (leaving off the Error) Y = b0 + b1*X or X = (Y-b0)/b1 is used to PREDICT X from Y. Dan, you probably are better acquainted with the most recent approach from the Bureau of Standards since I have not kept up with any changes in the Standards calibration policy. Furthermore, in the distant past, it is interesting to note that simultaneous regression equations were solved to estimate unkown amounts of chemical compositions in a solution. An interesting study by Fisher, Hans, R.G. Hansen, and H.W. Norton (1955). Quantitative determination of glucose and galactose. Anal. Chem. 27, 857-859. is discussed in E.J Williams' book Regression Analysis, Wiley, 1959, page 163. Williams refers to this topic as INVERSE ESTIMATION. Even though the goal is to ESTIMATE (PREDICT) the values of X, the dependent variables (Y's) are the MEASURES SUBJECT TO ERROR. After the least-squares solutions are computed then the simultaneous regression equations are solved, INVERSELY, for unknown X values from measured(observed) values of Y (which are subject to ERRORS). It would be interesting to know if this approach is still used. Is the INVERSE method BETTER? Have there been recent studies comparing the REGULAR approach with the INVERSE approach? Comments from experienced "experts" in this area are welcome. -- Joe Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** === End of Joe Ward's message = - Original Message - From: "Teague, Dan" <[EMAIL PROTECTED]> To: "AP Statistics" <[EMAIL PROTECTED]> Sent: Friday, October 20, 2000 10:42 AM Subject: [ap-stat] RE: effect on LSRL > Rebecca, > > If your student chose values of the independent variable that were very > large (250-450) and found the y-values that correspond to these x-values > using y = 56.212 + 0.1356x, then he could increase the slope. For these > data, the point (249, 55) is below that portion of the regression line on > the left. The regression line would be pulled towards the point, just as > you said, but in this situation, it would cause the slope to increase. > > The student's argument is flawed to the extent that these values of the > independent variable do not match the summary statistics (xbar = 167 and s = > 31). We expect to find the number of chirps between 70 and 290 and the > temperature roughly between 50 and 100. For these values of x, the slope > will be pulled down by the addition of this point. > > This problem statement also brings back the independent-dependent variable > discussion. In the real context, the activity level of the crickets depends > upon the temperature, so temperature is the independent variable and number > of chirps the dependent variable. However, if you want to predict the > temperature using the number of chirps, you must consider the number of > chirps as the "independent" variable and temperature as the "dependent" > variable. > > > Daniel J. Teague > NC School of Science and Mathematics > 1219 Broad Street > Durham, NC 27705 > [EMAIL PROTECTED] > &g
Re: How to Pool Slopes
Hi, Stan -- I've inserted a reply at the end of your message. Let me know how things turn out. -- Joe **** Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** - Original Message - From: "Stanley110" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Sunday, October 08, 2000 1:59 PM Subject: Q: How to Pool Slopes > Assume I have three sets of x,y data. I fit each by least-squares to a straight > line. I determine that the three fitted lines are homogeneous and > indistinguishable at a certain significance level. I want to express the slope > (of the three) as a single point estimate and as a confidence interval. What is > the formula for doing this? > > Please reply to this newsgroup and to the writer at <[EMAIL PROTECTED]>. > > Thank you for your help. > > stan alekman > > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = == JOE WARD REPLIES === Hi, Stan -- Your Title says (1)"How to Pool Slopes" and you indicate later that (2)"I determine that the three fitted lines are homogeneous and indistinguishable. For (1) it sounds like you will want THREE DIFFERENT INTERCEPTS, but for case (2) it sounds like you may want only ONE INTERCEPT. This is good example of the use of the Regression Option of "NO INT" option in SAS or "Y-intercept = zero". The reason that this appears to be a difficult problem is the use of the frequently-used DEFAULT option in most statistics packages. The approach used below for your THREE GROUP DATA is shown for TWO groups of data in the Prentice-Hall published book (1973) -- "Introduction to Linear Models" by Ward and Jennings. Chapter 8, page 143. I don't know which Regression Software you are using, but you should be sure to FORCE THE Y-intercept THROUGH THE ORIGIN.. First, it is important to put ALL THREE SETS OF DATA in the same model. Let Y = dependent variable (containing ALL THREE SETS OF DATA) D1 = 1 if the corresponding element of Y is from DATA SET #1; 0 otherwise D2 = 1 if the corresponding element of Y is from DATA SET #2; 0 otherwise D3 = 1 if the corresponding element of Y is from DATA SET #3; 0 otherwise X1 = Value of x if the corresponding element of Y is from DATA SET #1; 0 otherwise X2 = Value of x if the corresponding element of Y is from DATA SET #2; 0 otherwise X3 = Value of x if the corresponding element of Y is from DATA SET #3; 0 otherwise X = Value of x for ALL corresponding elements of Y. U = 1 for every element. Then your ASSUMED MODEL is shown below: (this should give you the same regression coefficients that you already have computed -- a check that your new model is correct) Y = a1*D1 + b1*X1 + a2*D2 + b2*X2 + a3*D3 + b3*X3 + E1 (Model #1) After you have computed this ASSUMED MODEL you may want to TEST THE HYPOTHESIS that you imply in CASE (1) above, that the THREE SLOPES ARE EQUAL, i.e., b1=b2= b3=bc (THE COMMON SLOPE) Then substituting these restrictions into Model #1 produces the RESTRICTED MODEL FOR CASE (1): Y = a1*D1 + bc*X1 + a2*D2 + bc*X2 + a3*D3 + bc*X3 + E2 (Model #2) Factoring (or collecting terms) produces: Y = a1*D1 + a2*D2 + a3*D3 + bc*X + E2 (Model #2) (Note that the values of a1, a2, and a3 in Model #2 are NOT numerically equal to the values in Model #1) >From Model #2, bc is the least-squares SINGLE POINT estimate of the COMMON SLOPE. Your favorite Regression procedure should give what you need to compute a confidence interval (such as the standard error of bc). Now for CASE (2) above you may want to test that: THREE SLOPES ARE EQUAL, i.e., b1=b2= b3=bc ( THE COMMON SLOPE) and THREE INTERCEPTS ARE EQUAL, i.e., a1=a2=a3=ac (THE COMMON INTERCEPT) In which case, the RESTRICTED MODEL becomes: Y = ac*D1 + bc*X1 + ac*D2 + bc*X2 + ac*D3 + bc*X3 + E3 (Model #3) Factoring (or collecting terms) produces: Y = ac*U + bc*X + E3 (Model #3) (Note that the value of bc in Model #3 is NOT numerically equal to the value in Model #2) And, as before, your favorite Regression procedure should give what you need to compute a confidence interval (such as the standard error of bc). Let me k
Re: How many Olympic Medals should Great Britain have won?
Hi, Paige -- Good comments about "There are so many different factors..." "To say that half the observations should have positive errors and halfshould have negative errors is to confuse median with mean." I used the word ABOUT intentionally to distinguish from EXACTLY. --Joe - Original Message - From: "Paige Miller" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, October 03, 2000 10:19 AM Subject: Re: How many Olympic Medals should Great Britain have won? > > Hi, Graham --> > > > It's been a long time since I've heard any discussion about> > UNDERACHIEVERS and OVERACHIEVERS. I've never been able to understand> > the discussions.> > > > NO MATTER WHAT VALUE THE CORRELATION (SLOPE OF THE REGRESSION LINE) HAS we> > know that the ALGEBRAIC SUM OF THE ERRORS IS ZERO. Now that says that> > the SUM OF THE ABSOLUTE VALUES OF THE POSITIVE ERRORS IS EQUAL TO THE> > SUM OF THE ABSOLUTE VALUES OF THE NEGATIVE ERRORS. THEN WE WOULD EXPECT> > TO OBSERVE ABOUT ONE-HALF OF THE OBSERVATIONS TO HAVE POSITIVE ERRORS AND> > ONE-HALF TO HAVE NEGATIVE VALUES. > > > > THEREFORE, FOR ALL CORRELATIONS (ZERO INCLUDED) WE SHOULD EXPECT TO> > CONCLUDE THAT ABOUT ONE-HALF OF ALL CASES> > WOULD BE CALLED "OVER-ACHIEVERS" AND ABOUT ONE-HALF WOULD BE CALLED> > "UNDER-ACHIEVERS". DOES THAT DESIGNATION HAVE ANY OPERATIONALLY USEFUL> > MEANING? Paige writes > There are so many different factors that go into the amount of medals> won that it seems silly to perform a regression based upon population> and GDP to use as predictors. Organization of Olympic Committees,> training facility quality, programs for youths, weather, etc. all can> affect the number of medals won, and then there is the factor of> injuries, which to me seems like it cannot be modelled except as> random noise. > > To say that half the observations should have positive errors and half> should have negative errors is to confuse median with mean. > > -- > Paige Miller> Eastman Kodak Company> [EMAIL PROTECTED]> > "It's nothing until I call it!" -- Bill Klem, NL Umpire> "Those black-eyed peas tasted all right to me" -- Dixie Chicks> > > => Instructions for joining and leaving this list and remarks about> the problem of INAPPROPRIATE MESSAGES are available at> http://jse.stat.ncsu.edu/> =>
Re: How many Olympic Medals should Great Britain have won?
Hi, Graham -- It's been a long time since I've heard any discussion about UNDERACHIEVERS and OVERACHIEVERS. I've never been able to understand the discussions. NO MATTER WHAT VALUE THE CORRELATION (SLOPE OF THE REGRESSION LINE) HAS we know that the ALGEBRAIC SUM OF THE ERRORS IS ZERO. Now that says that the SUM OF THE ABSOLUTE VALUES OF THE POSITIVE ERRORS IS EQUAL TO THE SUM OF THE ABSOLUTE VALUES OF THE NEGATIVE ERRORS. THEN WE WOULD EXPECT TO OBSERVE ABOUT ONE-HALF OF THE OBSERVATIONS TO HAVE POSITIVE ERRORS AND ONE-HALF TO HAVE NEGATIVE VALUES. THEREFORE, FOR ALL CORRELATIONS (ZERO INCLUDED) WE SHOULD EXPECT TO CONCLUDE THAT ABOUT ONE-HALF OF ALL CASES WOULD BE CALLED "OVER-ACHIEVERS" AND ABOUT ONE-HALF WOULD BE CALLED "UNDER-ACHIEVERS". DOES THAT DESIGNATION HAVE ANY OPERATIONALLY USEFUL MEANING? --Joe ********Joe Ward.Health Careers High School167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229Phone: 210-433-6575...Phone: 210-617-5400Fax: 210-433-2828Fax: 210-617-5423Email: [EMAIL PROTECTED]http://www.ijoa.org/joeward/wardindex.html*** - Original Message - From: Dr Graham D Smith To: Edstat Sent: Monday, October 02, 2000 11:40 AM Subject: How many Olympic Medals should Great Britain have won? How many Olympic Medals should Great Britain have won? British Olympians won a grand total of 28 medals at the Sydney 2000 Games, our best medal haul for 80 years. Many commentators have suggested that the big improvement in British fortunes compared to the Atlanta 1996 Games is due to the use of Lottery funding to help our top sportsmen and sportswomen. But how many medals should Britain expect to win? Did we fulfil our potential or fall short of it? One important determinant of a country's Olympic success is the size of its population. USA, China and Russia head the Sydney 2000 medal table, they also have large populations. However, population size does not fully account for the number of medals won. Both India and China have much larger populations than USA but won fewer medals. Another important predictor of a nation's Olympic performance is economic prosperity. Richer nations often outperform poorer nations of the same size. Gross domestic product (GDP) is an economic index that reflects both economic success and population size. A scatterplot of the number of medals won and GDP of the 80 medal winning countries at the 2000 Olympics shows a positive correlation; r = 0.595, p < 0.01 (see attached). GDP accounts for 35.4% of the variance of medals won. A regression analysis was performed on the data to estimate the number of medals Team GB should expect. Given that the UK GDP is equivalent to US$ 1.29 trillion the expected number of medals is 15. It seems that our Olympians did far better than we could have expected. Well done team GB! And well done too to Team USA, their expected medal count is 26.5. However, the top overachiever was Russia (followed by USA and Australia). The top underachiever was India. *Dr Graham D. SmithPsychology DivisionPark CampusUniversity College NorthamptonBoughton Green Rd.NorthamptonNN2 7AL Tel: +44 (0) 1604 735500 Ext 2393E-mail: [EMAIL PROTECTED]* *Dr Graham D. SmithPsychology DivisionPark CampusUniversity College NorthamptonBoughton Green Rd.NorthamptonNN2 7AL Tel: +44 (0) 1604 735500 Ext 2393E-mail: [EMAIL PROTECTED]*
Re: What is today's Hogg & Craig?
Hi, Gary, Jerry et al -- Here is a message from Bob Hogg. -- Joe - Original Message - From: "Robert V. Hogg" <[EMAIL PROTECTED]> To: "Joe Ward" <[EMAIL PROTECTED]> Sent: Friday, September 22, 2000 9:19 AM Subject: Re: Fw: What is today's Hogg & Craig? > joe, HOGG AND TANIS is used more for undergrads.COSELLA AND BERGER for > first year grad students in stat.HOGG AND CRAIG for good seniors and > first year grad students in other areas[like actuarial sci]. bob > > > > At 11:24 PM 9/21/00 -0500, Joe Ward wrote: > >Bob -- > > > >Any suggestions for Jerry? > > > >-- Joe > >*** * > > > >Joe Ward.Health Careers High School > >167 East Arrowhead Dr4646 Hamilton Wolfe > >San Antonio, TX 78228-2402...San Antonio, TX 78229 > >Phone: 210-433-6575...Phone: 210-617-5400 > >Fax: 210-433-2828Fax: 210-617-5423 > >Email: [EMAIL PROTECTED] > >http://www.ijoa.org/joeward/wardindex.html > >*** > >- Original Message - > >From: "Jerry Dallal" <[EMAIL PROTECTED]> > >To: <[EMAIL PROTECTED]> > >Sent: Thursday, September 21, 2000 9:32 PM > >Subject: What is today's Hogg & Craig? > > > > > >> Back in the "old days", the standard text for an undergraduate math stat > >> course was Hogg & Craig. I had some fondness for Lindgren. I haven't > >> taught this course in nearly 20 years. Which texts occupy their position > >> today? > >> > >> Thanks. > >> - Original Message - From: "Gary McClelland" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Friday, September 22, 2000 11:49 AM Subject: Re: What is today's Hogg & Craig? > in article [EMAIL PROTECTED], Jerry Dallal at [EMAIL PROTECTED] > wrote on 9/21/00 8:32 PM: > > > Back in the "old days", the standard text for an undergraduate math stat > > course was Hogg & Craig. I had some fondness for Lindgren. I haven't > > taught this course in nearly 20 years. Which texts occupy their position > > today? > > > > Thanks. > > According to amazon.com, the 1994 5th edition is still in print. > I keep my much earlier edition closely guarded. But I too would be > interested in hearing what the kids learn with today. > > gary > -- > [EMAIL PROTECTED] > > > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = > = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: cluster
Hi, Thomas -- If you have a SAS Manual the McQuitty method is described briefly in the CLUSTER Chapter. Also, I think the original article is: McQuitty, L.L. (1966) "Similarity Analysis by Reciprocal Pairs for Discrete and Continuous Data" Ed and Psy Meas, 17, 207-229. Look at: Anderberg, M.R. (1973) "Cluster Analysis for Applications" New York, Academic Press. --- Joe ******** Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** - Original Message - From: "Thomas Pesl" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Friday, September 22, 2000 4:19 AM Subject: cluster > Does anyone know the formula of the McQuitty clustering method? > > Thanks, > Thomas > > > > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = > = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Statistics for Visually Impaired
Those of you who are teaching statistics to visually impaired (blind) students may find some helpful ideas from Bob Bottenberg's comments to Jay Thomas, included at the end of this message. Bob received his Ph.D. from Stanford after he was blinded in WWII. He developed a strong statistics background from courses with Z.W. Birnbaum, Al Bowker, Meyer Gershick and George Polya and an unusual memory for everything he has HEARD. I have had the pleasure to work with Bob for many years and he can be an inspiration to anyone with whom he associates - blind or with full vision. Now that he is retired from his work as a civilian researcher for the U.S. Air Force, Bob is getting into the internet action. Bob would be happy to share any of his procedures for hearing and reading about stat concepts at [EMAIL PROTECTED] Bob and I wrote a 140 page document on "Applied Multiple Linear Regression" in 1963 in order to bring the combined power of Regression/Linear models and Computers to the researchers with whom we worked. The reference is Bottenberg, R.A. and Ward, J.H. "Applied Multiple Linear Regression", PRL-TDR-63-6, AD-413- 128 -- originally available from the Clearinghouse for Federal Scientific and Technical Information, Dept. of Commerce, Wash. D.C. A few of the "old-timers" who are lurking on the internet occasionally mention having a copy. The approach was expanded in 1973 in the Prentice-Hall-published book "Introduction to Linear Models" by Ward and Jennings. -- JAY THOMAS WRITES: From: "Thomas, Jay" <[EMAIL PROTECTED]> To: "'Earl Jennings'" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Tuesday, September 05, 2000 11:27 AM Subject: RE: Visually impaired students Dr. Jennings, et al, > > Thanks very much for getting my inquiry to Dr. Bottenberg, and of course to > Dr. Bottenberg for his detailed reply. Several people have given > suggestions, none as extensive as these were. I hope to compile the > suggestions after the chaos of the first week or two of school and send them > out. > > Incidentally, I was reading a history of statistics over the summer (I lead > an exciting life) and learned that one of the early important figures in the > field was Nicholas Saunderson, who held the Lucasian Chair at Oxford after > Newton and was blind from the age of 12 months. Oddly, one of his major > mathematical contributions was in the field of optics. > Again, thanks for your advice. > > Jay Thomas --- JAY THOMAS' MESSAGE RECEIVED BY PAUL KELLEY - Delivered-To: [EMAIL PROTECTED] Date: Tue, 29 Aug 2000 14:15:37 -0700 Reply-To: "Thomas, Jay" <[EMAIL PROTECTED]> Sender: APA Division 5 Members <[EMAIL PROTECTED]> From: "Thomas, Jay" <[EMAIL PROTECTED]> Subject: [APA] visually impaired statistics students To: [EMAIL PROTECTED] I have a couple of visually impaired students in my upcoming basic statistics course this fall. I normally stress visualization and drawing sketches to understand statistics, but expect that tactic won't work with these students. Has anyone found effective ways of presenting statistical concepts to blind students? Jay Thomas - BOB BOTTENBERG REPLIES TO JAY THOMAS Hi Jay, Joe Ward passed on to me a note you sent about techniques for teaching statistics to visually impaired students. I've been totally blind since 1945, and took some undergraduate statistics courses in the psychology department at the U. of Missouri in the late 40s. Then at Stanford in 1952-1953, I enrolled in five or six courses in probability and mathematical statistics. This background is offered by way of apology for not having many suggestions for teaching in a contemporary environment. Graphs, charts and figures have always been troublesome, and, as I recall (45 years back), I absorbed that material in a quite tedious way. A reader, outside of a classroom setting, would describe a graph by saying the names of the axis, horizontal, vertical. Then indicating in a very general way the path of the line from left to right, have first provided a word or two about the units on each axis -- lower and upper. Then, the really slow part -- pick a point on the line and read the approximate coordinates. Do that for a few points, and the mental picture of the graph would begin to emerge. Of course, the pace of classroom activity makes it impractical to do anything like that there. Charts were handled in a similar manner -- the reader reads the column headers, then the row headers, then reads a row at a time, or a column at a time. Of course, the real
Re: How can I analyze split-design by SPSS v9.0?
Anuvat -- Here comes my "standard" comment! 1. State your research question(s) in "natural language". 2. Create a model that enables you to answer the "natural language" questions that YOU WANT. 3. Impose restrictions on YOUR MODEL that answers YOUR questions of interest. 4. Use the computer to get YOUR DESIRED RESULTS. Then AFTER YOU HAVE VERIFIED THAT THERE EXISTS A "PACKAGED" ALGORITHM THAT ANSWERS YOUR QUESTIONS OF INTEREST, THEN USE THE "PACKAGED" ALGORITHM. Since many "interesting" research questions involve creating models for unique problems, it can be more efficient to create your OWN MODELS rather than searching for "packaged" algorithms that MAY fit YOUR research questions of interest. IMHO it seems best to take time to develop "model-creation" skills so that you can have the POWER that is available. If you have time to take a look at the URL below, Slides 7 and 8 of the PowerPoint presentation on "Using Calculators and Computers in Statistics" - Laura Niland & Joe Ward, CAMT98 45th Annual Conference, San Antonio, July 23, 1998 - give a pictorial view of "Forcing" vs. "Creating" Models. Good luck-- Joe Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** - Original Message - From: "Anuvat Jangchud" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, September 06, 2000 10:32 PM Subject: How can I analyze split-design by SPSS v9.0? > I would like to use SPSS v.9.0 for SPLIT Design anlysis. Could you help me > out? > > > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = > = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Regression books
Copies of INTRODUCTION TO LINEAR MODELS by Ward and Jennings is available by contacting: Dr. Jimmy Mitchell The Institute for Job and Occupational Analysis (IJOA) 10010 San Pedro, Suite 440, San Antonio, Texas 78216 (210) 349-8525 Fax: (210) 349-0168 [EMAIL PROTECTED] Bottenberg, R.A., & Ward, J.H., Jr. (1963, March). Applied multiple linear regression. PRL-TDR-63-6, AD-413 128 Lackland AFB, TX: 6570th Personnel Research Laboratory, Aerospace Medical Division. This might be available from: National Technical Information Service Technology Administration U.S. Department of Commerce Springfield, VA 22161 703-605-6000 Email: www.ntis.gov - Original Message - From: "Christopher Tong" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Saturday, August 05, 2000 5:33 PM Subject: Regression books > I posted my request for recommended regression books a couple > weeks ago, and I appreciate everyone who has replied, > both on the newsgroup and privately. > For those interested, here is a summary of the recommendations. > > The most popularly recommended books are Draper & Smith > and Cohen & Cohen. Honorable mention goes to Montgomery & Peck, > Acton's out-of-print "Analysis of Straight Line Data", and the Sage Press > monographs. > > The other books that were mentioned were > Bottenberg & Ward (*) > Daniel & Wood > Darlington > Edwards (*) > Hamilton > Judd & McClelland (*) > Neter, et al. > Pedhazur > Rawlings > Ward & Jennings (*) > > Nonlinear regression books that were recommended were > Bard (*) > Bates & Watts > Seber & Wild > > Econometrics books with good coverage of regression were > Greene > Gujarati > Pindyck & Rubinfield > > (*) = out of print, according to amazon.com > > > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = > = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Math Education of Mathematics Teachers
Dick -- I'm staying 'til Friday to attend THAT SESSION. The discussions should be of interest to secondary teachers in the Indianapolis area. It would be great if arrangements could be made for teachers to attend THAT session without needing to register for the JSM. I think it is Session 281, Thursday, Aug. 17 10:30 a.m. - 12:30. -- Joe **** **** Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** - Original Message - From: "Richard L. Scheaffer" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Tuesday, August 01, 2000 1:22 PM Subject: Math Education of Mathematics Teachers > I would like to call your attention to a session at the Joint Statistics > Meetings that those of you interested in statistics education might have > overlooked. Session 279, The Importance of Statistics in the Education of > Future Teachers reports on a project of the Conference Board of the > Mathematical Sciences, funded by NSF an DoEd, that will attempt to get > departments of mathematical sciences more involved in the education of future > teachers. Teachers coming out of colleges of education are ill equipped to > teach in the modern math curriculum - a curriculum that includes much > statistics. This project makes a series of recommendations on how to solve > this problem. Among the recommendations are strong statements about the > importance of statistics. > > The panel consists of Alan Tucker, mathematician and lead writer of the CBMS > report, Judy Sowder, math educator responsible for the middle school section > of the report, Gail Burrill, former president of NCTM and now head of the Math > Sciences Education Board at the NAS, and Jerry Moreno, a well-known statistics > educator. > > Unfortunately, this session is in the last time slot of the meeting, 10:30 > Thursday morning. So, I hope some of you will have the time and interest to > stop by. It should be a lively discussion of a very important topic. > > Hope to see you there! > > Dick Scheaffe > > > > ps A draft of the report is on the web. > > CBMS Math Education of Teachers Project Draft Report on the Web > > > > -- > Richard L. Scheaffer [EMAIL PROTECTED] > Department of Statistics phone 352-392-1941 (#224) > Box 118545 fax 352-392-5175 > University of Florida > Gainesville, FL 32611 > > 907 NW 21 Terrace 352-378-1996 > Gainesville, FL 32603 > > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = > = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: regression books?
If you are near a university library you may want to take a look at INTRODUCTION TO LINEAR MODELS by Ward and Jennings. The Purdue library might have a copy. Also, the Fountain-Ward JSE article shown at the URL below is related to your interest. http://www.ijoa.org/joeward/wardindex.html http://www.amstat.org/publications/jse/v4n3/ward.html -- Joe Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** - Original Message - From: "Christopher Tong" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Saturday, July 22, 2000 2:12 PM Subject: regression books? > > Does anyone have recommendations for introductory > books on regression analysis? I posted this question on > sci.stat.math and got only one reply so far. > > I am currently using Neter, Kutner, Nachtsheim, and > Wasserman, which I find unwieldy and not very concise. > I have my eye on Montgomery & Peck, but am wondering what anyone > else would recommend. My one reply so far suggested Cohen & Cohen. > > > > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = > = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: bump hunting in nonlinear regression
Daniela-- Does "nonlinear" refer to a LINEAR MODEL of the form: Y = a1*X1 + a2*X2 + a3*X3 + a4*X4 +... + ap*Xp + E where X1 = U - a predictor of all 1s. X2 = X - any numerical predictor X3 = X^2 - the "squares" of the elements in X X4 = X^3 - the "cubes" of the elements in X etc.? If this is the situation you can do wonderful things with a general polynomial form. You can use an nth degree polynomial and impose retrictions that allow you much flexibility about the shape of your curve. For example, you might choose to start with a 6th degree form and then impose restrictions FOR THE RANGE OF INTEREST ON THE X VARIABLE that allow you to use part of the function that has ONE wiggle (hump), TWO wiggles (humps), etc. You can FORCE any "undesired" wiggles (humps) to occur OUTSIDE your RANGE OF INTEREST. -- Joe ******** Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/watdindex.html *** - Original Message - From: "Daniela Ichim" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, July 18, 2000 12:17 PM Subject: bump hunting in nonlinear regression > > > In a nonlinear (univariate) regression problem, specifically > a calibration problem in thermometrics, I have the problem of > testing whether a curve expressing a relationship between > Electrical Resistance and Temperature is monotone > versus the possibility of it having bumps inverting the monotonicity. > > The problem of checking the existance of bumps becomes difficult > especially > in the regions of sparse data. > > I would like directions to the existing related statistical literature. > Thanks. > > > > > > > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = > = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: bivariate normality and correlation
Hi, Znarf -- Every so often I find an occasion to include (SEE THE END OF THIS MESSAGE) an earlier message from Mike Palij related to the results of a study by Jack Schmid about the RESTRICTION OF RANGE EFFECT ON THE CORRELATION COEFFICIENT. After many years of being around folks who were concerned about RESTRICTION OF RANGE it became obvious to me that the correlation coefficient should be used with EXTREME CAUTION. -- Joe Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/watdindex.html *** - Original Message - From: "Znarf Akfak" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Monday, July 10, 2000 2:41 AM Subject: bivariate normality and correlation > I'm considering reporting Pearson's correlation coefficient with a > confidence interval for several bivariate associations. As bivariate > normality is assumed under the computation of the confidence interval, > I have two questions. > > 1. What is a good way to examine the assumption of bivariate normality > for a given data set? > > 2. To what extent are such confidence intervals robust to departures > from bivariate normality? > > References to publications would be much appreciated as I don't have > access to CIS, as would other suggestions and comments. > > Cheers, > > -- > Znarf > > > Sent via Deja.com http://www.deja.com/ > Before you buy. > > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = == INSERT BY JOE WARD OF MESSAGE FROM MIKE PALIJ == -- Forwarded message -- Date: Fri, 23 May 1997 09:30:20 -0400 (EDT) From: Mike Palij <[EMAIL PROTECTED]> To: [EMAIL PROTECTED], [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: Testing basic statistical concepts I'd like to thank Joe Ward for reminding us of this situation (his posting is appended below), as well as jogging my own memory for a previous posting I had made. A while back I had posted the Anscombe dataset (in the context of an SPSS program) which also clearly shows the benefit of plotting the data: the four situations produce almost identical Pearson r values but only one actually shows the classic scatterplot, the others show a nonlinear pattern and the influence that a single point has on the calculation of r. What does the value of r tell us here? Aren't the basic statistical concepts to be learned in this situation far more important and most clearly seen through a coordination of the graphical and numerical information? -Mike Palij/Psychology Dept/New York University Joe H Ward <[EMAIL PROTECTED]> writes: To Mike et al -- There have been several message related to the Simple Correlation Coefficient. IMHO, when out in the "real world" involving practical decision-making the correlation coefficient has very limited value and sometimes dangerous consequences. The correlation coefficient may be an important topic for the history of statistics to learn the problems associated with its use . Attached below is an item that I submitted a long time ago, and it may be of interest to those following the discussion of "r". -- Joe *** * Joe WardHealth Careers High School * * 167 East Arrowhead Dr.4646 Hamilton Wolfe * * San Antonio, TX 78228-2402San Antonio, TX 78229 * * Phone: 210-433-6575 Phone: 210-617-5400 * * [EMAIL PROTECTED] *** NON-RANDOM SAMPLING AND REGRESSION -- PROVIDED (MANY YEARS AGO) BY JACK SCHMID, UNIV. OF NORTHERN COLORADO, GREELEY, COLORADO y from (MU=0, SIGMA = 1.25) x from (MU=0, SIGMA = 1.00) RHOxy = .60 Sample 10,000 cases at each level of progressive TRUNCATION ON x. Regression equation: y = bx + a _ _ %Remaining y x sigmay sigmax r=BETA baSyx 100% .01 .021.25 1.00 .60 .75 -.01 1.00 90%
Re: Novice questions about regression analysis.
Good comment, Paige-- "> A well-designed experiment will yield regression estimates with more > desirable properties than a poorly-designed experiment will. > Specifically, the parameter estimates may have smaller variance in a > well-design experiment, and the parameters will be less correlated (or > uncorrelated) with each other. The predicted values of the responses > likewise will have smaller variance in a well-designed experiment." However, it is safest to be sure that the "packaged" analyses do what the researcher wants.Do many "packaged COVARIANCE algorithms" still assume NO INTERACTION? Does SAS (or other stat packages) warn us when there is a "missing cell" in an ANOVA-LIKE GLM computation? -- Joe ****** Joe Ward Health Careers High School 167 East Arrowhead Dr. 4646 Hamilton Wolfe San Antonio, TX 78228-2402 San Antonio, TX 78229 Phone: 210-433-6575 Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/watdindex.html ** - Original Message - From: "Paige Miller" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, June 28, 2000 11:08 AM Subject: Re: Novice questions about regression analysis. > Wen-Feng Hsiao wrote: > > > > Dear listers, > > > > I am stuck with the experiment design of my dissertation. My experiment > > would like to investigate the influences of different factors of stimuli > > on the subject's response (each factor is a continuous variable), and > > further build a regression model for these relations. My questions are: > > > > 1. It seems that no experiment-design issues related to Regression > > Analysis are discussed in the usual statistics textbook. Why? Does it > > mean one needn't consider the experiment design if he uses Regression > > Analysis to analyze his data? > > A well-designed experiment will yield regression estimates with more > desirable properties than a poorly-designed experiment will. > Specifically, the parameter estimates may have smaller variance in a > well-design experiment, and the parameters will be less correlated (or > uncorrelated) with each other. The predicted values of the responses > likewise will have smaller variance in a well-designed experiment. > > > 2. Due to the measure of the dependent variable is the participants' > > subjective responses, to remove unrelated subject-specific variables, I > > am considering to employ a within-subject design. But there seems no > > statistical packages ready for dealing with within-subject design of > > Regression Analysis? > > SAS and JMP will perform these analyses, although the manual may not > specifically call them 'within-subject' analyses. Other packages > probably will handle them as well, but I cannot advise you of specifics. > > > Suppose a design in which each of the n subjects gives rise to a Y > > observation under each of c different conditions, then a total of N=ncY > > observations could be obtained. How can I use Regression Analysis to > > analyze these observations? > > The model will predict the response Y as a function of the subject and > each of the design variables, plus any desired interactions between > design variables, interactions between subject and design variables, and > polynomial terms (if desired) involving design variables. > > > -- > Paige Miller > Eastman Kodak Company > [EMAIL PROTECTED] > > "It's nothing until I call it!" -- Bill Klem, NL Umpire > "Those black-eyed peas tasted all right to me" -- Dixie Chicks > > > === > This list is open to everyone. Occasionally, less thoughtful > people send inappropriate messages. Please DO NOT COMPLAIN TO > THE POSTMASTER about these messages because the postmaster has no > way of controlling them, and excessive complaints will result in > termination of the list. > > For information about this list, including information about the > problem of inappropriate messages and information about how to > unsubscribe, please see the web page at > http://jse.stat.ncsu.edu/ > === > === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Novice questions about regression analysis.
Wen-Feng-- Briefly -- 1. While planning your experimental design, state your research questions in "natural language"-- before you start collecting data. 2. Create a Prediction/Regression/Linear Model that allows you to translate your "natural language" research questions in terms of your Model -- your ASSUMED MODEL. You may need to cycle through this process several times to get an appropriate model.l 3. Impose Restrictions implied by your research questions on your ASSUMED MODEL to obtain your RESTRICTED MODEL. 4. Compare your ASSUMED and RESTRICTED MODELS. You can do much PLANNING BEFORE YOU BEGIN YOUR COLLECTION AND ANALYSES. If some high school students can do it, I feel confident that you can do it, too. But be careful! If your committee members can't do it, then you may not "pass". -- Joe ****** Joe WardHealth Careers High School 167 East Arrowhead Dr. 4646 Hamilton Wolfe San Antonio, TX 78228-2402 San Antonio, TX 78229 Phone: 210-433-6575Phone: 210-617-5400 Fax: 210-433-2828 Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/watdindex.html - Original Message - From: "Wen-Feng Hsiao" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, June 28, 2000 10:10 AM Subject: Novice questions about regression analysis. > Dear listers, > > I am stuck with the experiment design of my dissertation. My experiment > would like to investigate the influences of different factors of stimuli > on the subject's response (each factor is a continuous variable), and > further build a regression model for these relations. My questions are: > > 1. It seems that no experiment-design issues related to Regression > Analysis are discussed in the usual statistics textbook. Why? Does it > mean one needn't consider the experiment design if he uses Regression > Analysis to analyze his data? > > 2. Due to the measure of the dependent variable is the participants' > subjective responses, to remove unrelated subject-specific variables, I > am considering to employ a within-subject design. But there seems no > statistical packages ready for dealing with within-subject design of > Regression Analysis? > > Suppose a design in which each of the n subjects gives rise to a Y > observation under each of c different conditions, then a total of N=ncY > observations could be obtained. How can I use Regression Analysis to > analyze these observations? > > Thanks for your help. > > Wen-Feng > > > === > This list is open to everyone. Occasionally, less thoughtful > people send inappropriate messages. Please DO NOT COMPLAIN TO > THE POSTMASTER about these messages because the postmaster has no > way of controlling them, and excessive complaints will result in > termination of the list. > > For information about this list, including information about the > problem of inappropriate messages and information about how to > unsubscribe, please see the web page at > http://jse.stat.ncsu.edu/ > === > === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Stupid question on relationship of r and t
Jason -- t^2 = r^2*(n-2) --- (1-r^2) is a special case of the more general case of using R^2 to compute the F statistic in a Prediction/Regression/Linear Models approach to research studies. Letting R^2(Assumed) = R^2 for the ASSUMED MODEL R^2(Restricted)= R^2 for the RESTRICTED MODEL NA = number of linearly independent predictor vectors (i.e., the number of parameters) in the ASSUMED MODEL. NR = number of linearly independent predictor vectors (i.e., the number of parameters) in the RESTRICTED MODEL N= total number of observations (cases) df1 = NA - NR =numerator degrees of freedom df2 = N - NA=denominator degrees of freedom F(df1,df2) = (R^2(Assumed) - R^2(Restricted))/(df1) --- (1 - R^2(Assumed))/(df2) Now consider the your special case when: The ASSUMED MODEL CONTAINS ONLY TWO PREDICTORS: Y = b0*U + b1*X + Ea and the Hypothesis is "b1 = 0"). Then the RESTRICTED MODEL is: Y = b0*U + Er In this special case, R^2(Restricted) = 0 and then F(df1,df2) = (R^2(Assumed)/(df1) --- (1 - R^2(Assumed))/(df2) and you can easily solve for R^2 if desired. R^2(Assumed) = F*(df1) --- (df2) + F*(df1) and in your special case of only ONE predictor (in addition to, U), sometimes called "simple regression". df1 = 2 - 1 = 1 and df2 = N - 2 R^2(Assumed) = r^2 =F N - 2 + F but since t^2(df2) = F(1,df2) then we have r^2 =t^2 - N - 2 + t^2 which is what you obtain from Bob's suggestion -- > > t= r * sqrt(n-2) > >- > >sqrt(1-r^2) > > > > I want to be able to calculate r from t. I tried algebraically > > manipulating the formula, but never quite got it to where I could do > > this. Any advice? > > > Try squaring both sides and re-arranging. ( Joe Ward's comment "GOOD SUGGESTION BY BOB") > > Bob > > -- > Bob O'Hara > Metapopulation Research Group > Division of Population Biology > Department of Ecology and Systematics > PO Box 17 (Arkadiankatu 7) > FIN-00014 University of Helsinki > Finland > > tel: +358 9 191 7382 fax: +358 9 191 7301 > email: [EMAIL PROTECTED] > To induce catatonia, visit: > http://www.helsinki.fi/science/metapop/ - Original Message - From: "Anon." <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Saturday, June 24, 2000 7:20 AM Subject: Re: Stupid question on relationship of r and t > "Jason Osborne, Ph.D." wrote: > > > > I am working on a power analysis project- we are reviewing old journal > > articles to calculate observed effect sizes and power. Some of these > > articles, for example reporting t-test results, only give means and > > t-test, no standard deviation. thus, no effect size calculation is > > possible. I was hoping to estimate an effect size by converting a t to > > an r. I seem to remember a formula that relates the two, but am having > > a dickens of a time tracking one down. The one I did track down, for > > calculating t from r, is not that helpful: > > > > t= r * sqrt(n-2) > >- > >sqrt(1-r^2) > > > > I want to be able to calculate r from t. I tried algebraically > > manipulating the formula, but never quite got it to where I could do > > this. Any advice? > > > Try squaring both sides and re-arranging. > > Bob > > -- > Bob O'Hara > Metapopulation Research Group > Division of Population Biology > Department of Ecology and Systematics > PO Box 17 (Arkadiankatu 7) > FIN-00014 University of Helsinki > Finland > > tel: +358 9 191 7382 fax: +358 9 191 7301 > email: [EMAIL PROTECTED] > To induce catatonia, visit: > http://www.helsinki.fi/science/metapop/ > > I have yet to see any problem, however complicated, which, when you > looked at it in the right way, did not become still more complicated. - > Poul Anderson > > > === > This list is open to everyone. Occasionally, less thoughtful > people send inappropriate messages. Please DO NOT COMPLAIN TO > THE POSTMASTER about these messages because the postmaster has no > way of controlling them, and excessive complaints will result in > termination of the list. > > For information about this list, including information about the > problem of inappropriate messages and information about how to > unsubscribe, please see the web page at > http://jse.stat.ncsu.edu/ > === > === This list is open to everyone. Occasionally, less thoughtful people send inappropriate message
Re: differences between groups/treatments ?
Great comments, Don -- You are right on target again. Yep, the way to investigate this type of questions is via PREDICTION/REGRESSION/LINEAR MODELS. By coincidence, I am working with some local high school students this summer, preparing them to "attack" their science fair projects. The example we are doing, at this very moment, involves predicting Final Performance of Students (Dependent or Response Attribute) from knowledge of Their Teacher's Name and the Students' Prior Performance. NOTICE THAT THIS IS A FIRST SHOT AT A "NATURAL LANGUAGE" STATEMENT OF THE QUESTION OF INTEREST. A more frequent approach is to talk about the TYPE OF ANALYSIS before stating the research questions in "NATURAL LANGUAGE". I will elaborate on this in detail later since I'm in the process of preparing an Email Activity for the students that asks them to investigate the INTERACTION between TEACHER and PRIOR PERFORMANCE. More later. -- Joe ****** Joe Ward Health Careers High School 167 East Arrowhead Dr. 4646 Hamilton Wolfe San Antonio, TX 78228-2402San Antonio, TX 78229 Phone: 210-433-6575 Phone: 210-617-5400 Fax: 210-433-2828 Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/watdindex.html - Original Message - From: "Donald Burrill" <[EMAIL PROTECTED]> To: "Donal" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Monday, June 19, 2000 4:30 PM Subject: Re: differences between groups/treatments ? > On Mon, 19 Jun 2000, Donal wrote: > > > I'm currently analysing data resulting from a study of children's > > reading ability. > > I shall resist the temptation to quibble over your inability to observe > reading ability (as distinct from some indeterminate lower bound on that > ability) ... > > As you describe the study, you have an unspecified number of children > divided into four groups in a two-way design of Treatments (2 levels) > by Prior Performance (2 levels). This would naturally lend itself to > a two-way analysis of variance, or equivalently (pace Joe Ward) to a > multiple regression analysis with three predictors: Treatment, > Performance, and Treatment*Performance. If there are indeed effects > attributable to Treatment and Performance, this analysis will be more > sensitive to them than the two separate t-tests you propose. And if > there is an interaction between Treatment and Performance, as there may > well be, the sensitivity to possible effects increases. > > Whether this is the best analysis available is another question entirely. > > 1. If there are children of different sexes, you may be able to > consider a three-way design, although I suspect it would be unbalanced, > which (I also suspect!) may induce serious difficulties for you. > > 2. Your Performance information you have chosen to dichotomize, > although it is presumably (quasi-)continuous to start with. You might > find out something useful by treating it as a continuous predictor > rather than as a dichotomy: in effect carrying out an analysis of > covariance with pre-treatment reading score as the covariate, whether you > used an "Analysis of Covariance" program or a "Multiple Regression" > program or a "General Linear Model" (GLM) program to do the arithmetic. > > 3. In addition to sex, there may be other lurking variables in your data > that could be used as predictors. Whether it is sensible to consider > including them in a hypothetical model depends partly on how many > children you have all together, and partly on the distribution of any > such candidate variable among _these_ children. > > > The study involves two treatments and each child's reading ability was > > measured before and after the application of one of the treatments. > > Thus, each child received one or the other (but not both) of two > > possible treatments. The children are divided into two groups: > > Well, that's not quite true. You chose to categorize them into two > groups, but they could equally well have been divided into three, or > four, or six (depending on the number of children available and one's > degree of interest in fine-tuning the "Weak/Strong" dimension). > And if you have both boys and girls, you have two sexes as well, and > it would not be surprising if they differed in their responses to the > two treatments. And how about the ages of the children? > > > Weak readers: those whose pre-treatment reading score was less than > > the mean pre-treatment reading score > > Strong r
Re: Beginner requests for help on ANOVA and T-tests (n SYSTAT97 --CAUTION)
Thanks, Richard -- Yes, there IS A PROBLEM! I called the IJOA folks about the situation. http://www.ijoa.org/joeward/wardindex.html IJOA is in the process of changing computer systems, so their URL will be down until sometime next week. Of course, those system changes are unpredictable. Thanks for your interest. By the way, we checked the EXCEL2000 Regression program yesterday and it still has the WRONG TOTAL and REGRESSION SUM OF SQUARES WHEN THE "CONSTANT IS ZERO" IS CHECKED. Of course, that makes the F statistic wrong, too. Fortunately, the RESIDUAL SUM OF SQUARES IS CORRECT. I'm sure that previous users must have mentioned this to the EXCEL REGRESSION folks. Perhaps no one cares. -- Joe ******** * Joe Ward * 167 East Arrowhead Dr. * San Antonio, TX 78228-2402 * Phone: 210-433-6575 * Fax: 210-433-2828 * Email: [EMAIL PROTECTED] * http://www.ijoa.org/joeward/wardindex.html * * Health Careers High School * 4646 Hamilton Wolfe * San Antonio, TX 78229 * Phone: 210-617-5400 * Fax: 210-617-5423 ** -Original Message- From: R.C. <[EMAIL PROTECTED]> To: Joe Ward <[EMAIL PROTECTED]> Date: Friday, June 16, 2000 10:08 AM Subject: Re: Beginner requests for help on ANOVA and T-tests (n SYSTAT97 --CAUTION) >IS THERE A PROBLEM WITH THE LINK PROVIDED HERE? OR IS >IT ME? > >THANKS, >RICHARD ====== >--- Joe Ward <[EMAIL PROTECTED]> wrote: >> Edmond-- >> >> You may want to use the REGRESSION program in Excel >> (WITH CAUTION). >> That way you can create your own models to do what >> YOU WANT TO DO. >> You might want to contact a statistician to help you >> use REGRESSION >> models. You don't need to use some of the >> Pre-Computer algorithms if >> you know who to create your models to answere YOUR >> QUESTIONS. >> >> The URL below has a few articles related to this >> message: >> http://www.ijoa.org/joeward/wardindex.html >> >> If the "packaged" algorithms answer the questions of >> interest, >> then you can use them. >> >> I am using Excel 97 with three high school students >> this summer. >> 2 Sophomores and 1 Senior in preparation for their >> Science Fair >> Research Projects. I usually use SYSTAT. However, >> these students >> already have Excel, so we are "testing" the use of >> REGRESSION in Excel. >> >> Incidentally, when you use REGRESSION models that >> need to: >> >> NOT HAVE THE Y-INTERCEPT TO PASS THROUGH ZERO, >> >> THE REGRESSION SUM OF SQUARES ARE NOT CORRECT. >> >> So be careful when you use REGRESSION in Excel 97. >> >> The Excel97 Error is due to the fact that the >> REGRESSION SUM OF SQUARES >> IS CALCULATED FROM THE "TOTAL SUM OF SQUARES" MINUS >> THE "RESIDUAL >> SUM OF SQUARES". THE "TOTAL SUM OF SQUARES" IS NOT >> CORRECT >> WHEN YOU INDICATED THAT YOU DO NOT WANT THE >> INTERCEPT TO PASS THROUGH >> THE ORIGIN. >> >> THE EXCEL PROGRAM USES THE "ADJUSTED SUM OF >> SQUARES" >> (REMOVING the REGRESSION SUM OF SQUARES ACCOUNTED >> FOR BY THE >> UNIT VECTOR (the "MEAN"). The REAL TOTAL SUM OF >> SQUARES IN THIS >> CASE SHOULD BE THE SUM OF SQUARES FOR THE DEPENDENT >> VARIABLE. >> >> Apparently the programmer of the REGRESSION >> procedure did not know how to >> compute the REAL TOTAL SUM OF SQUARES. >> >> As some of the users and creators of Statistical >> Software Packages >> frequently mention: >> >> "Using the statistical routines in Excel can be >> risky." >> >> Of course, ALL statistical packages should be used >> with caution. >> >> We have not had time to check on the Excel2000 to >> find out if it is still >> has the >> same problem. >> >> Keep in touch. >> >> -- JHW >> >> * Joe Ward >> * 167 East Arrowhead Dr. >> * San Antonio, TX 78228-2402 >> * Phone: 210-433-6575 >> * Fax: 210-433-2828 >> * Email: [EMAIL PROTECTED] >> * http://www.ijoa.org/joeward/wardindex.html >> * >> * Health Careers High School >> * 4646 Hamilton Wolfe >> * San Antonio, TX 78229 >> * Phone: 210-617-5400 >> * Fax: 210-617-5423 >> ** >> >> -Original Message-
Re: Beginner requests for help on ANOVA and T-tests (n SYSTAT97 --CAUTION)
Edmond-- You may want to use the REGRESSION program in Excel (WITH CAUTION). That way you can create your own models to do what YOU WANT TO DO. You might want to contact a statistician to help you use REGRESSION models. You don't need to use some of the Pre-Computer algorithms if you know who to create your models to answere YOUR QUESTIONS. The URL below has a few articles related to this message: http://www.ijoa.org/joeward/wardindex.html If the "packaged" algorithms answer the questions of interest, then you can use them. I am using Excel 97 with three high school students this summer. 2 Sophomores and 1 Senior in preparation for their Science Fair Research Projects. I usually use SYSTAT. However, these students already have Excel, so we are "testing" the use of REGRESSION in Excel. Incidentally, when you use REGRESSION models that need to: NOT HAVE THE Y-INTERCEPT TO PASS THROUGH ZERO, THE REGRESSION SUM OF SQUARES ARE NOT CORRECT. So be careful when you use REGRESSION in Excel 97. The Excel97 Error is due to the fact that the REGRESSION SUM OF SQUARES IS CALCULATED FROM THE "TOTAL SUM OF SQUARES" MINUS THE "RESIDUAL SUM OF SQUARES". THE "TOTAL SUM OF SQUARES" IS NOT CORRECT WHEN YOU INDICATED THAT YOU DO NOT WANT THE INTERCEPT TO PASS THROUGH THE ORIGIN. THE EXCEL PROGRAM USES THE "ADJUSTED SUM OF SQUARES" (REMOVING the REGRESSION SUM OF SQUARES ACCOUNTED FOR BY THE UNIT VECTOR (the "MEAN"). The REAL TOTAL SUM OF SQUARES IN THIS CASE SHOULD BE THE SUM OF SQUARES FOR THE DEPENDENT VARIABLE. Apparently the programmer of the REGRESSION procedure did not know how to compute the REAL TOTAL SUM OF SQUARES. As some of the users and creators of Statistical Software Packages frequently mention: "Using the statistical routines in Excel can be risky." Of course, ALL statistical packages should be used with caution. We have not had time to check on the Excel2000 to find out if it is still has the same problem. Keep in touch. -- JHW * Joe Ward * 167 East Arrowhead Dr. * San Antonio, TX 78228-2402 * Phone: 210-433-6575 * Fax: 210-433-2828 * Email: [EMAIL PROTECTED] * http://www.ijoa.org/joeward/wardindex.html * * Health Careers High School * 4646 Hamilton Wolfe * San Antonio, TX 78229 * Phone: 210-617-5400 * Fax: 210-617-5423 ** -Original Message- From: [EMAIL PROTECTED] <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] <[EMAIL PROTECTED]> Date: Thursday, June 15, 2000 9:37 AM Subject: Beginner requests for help on ANOVA and T-tests >Hello, I am a 16 year old student and a beginner to statistics. >I'm lost. >Currently I only have Microsoft Excel 97. And I would like to know the >differences between the following ANOVA tests (in Excel): > >ANOVA Single Factor >ANOVA Two-Factors with replication >ANOVA Two-Factors without replication > >What do all these mean? Where and when should they be applied? And can >anyone please use simple english terms to explain? I am only a beginner. >What is one-way or two-way ANOVA? > >How about for T-Test? >T-Test: Paired two samples for means >T-Test: Two-sample assuming equal variances >T-Test: Two-sample assuming unequal variances > >Also, can I use ANOVA instead of T-test when testing null hypothesis? >Between 2 groups? > >Thanks for your help, >Edmund > > >Sent via Deja.com http://www.deja.com/ >Before you buy. > > >=== >This list is open to everyone. Occasionally, less thoughtful >people send inappropriate messages. Please DO NOT COMPLAIN TO >THE POSTMASTER about these messages because the postmaster has no >way of controlling them, and excessive complaints will result in >termination of the list. > >For information about this list, including information about the >problem of inappropriate messages and information about how to >unsubscribe, please see the web page at >http://jse.stat.ncsu.edu/ >=== > === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: MANOVA
If the 'ZERO' or 'DOT' means that you have some missing cells then that is a good time to "CREATE YOUR OWN MODEL". -- Joe ******** * Joe Ward * 167 East Arrowhead Dr. * San Antonio, TX 78228-2402 * Phone: 210-433-6575 * Fax: 210-433-2828 * Email: [EMAIL PROTECTED] * http://www.ijoa.org/joeward/wardindex.html * * Health Careers High School * 4646 Hamilton Wolfe * San Antonio, TX 78229 * Phone: 210-617-5400 * Fax: 210-617-5423 ** -Original Message- From: HAideren <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] <[EMAIL PROTECTED]> Date: Wednesday, June 14, 2000 8:12 PM Subject: MANOVA >Hi, > >I have run a MANOVA and in the 'Parameter Estimates' section of the results, >some of the cells are filled with a zero or a dot (.). Is there a way to >overcome this problem? If no, should I run a different multivariate test and >what would be the appropriate substitute test? > >Cheers. > > > > > >=== >This list is open to everyone. Occasionally, less thoughtful >people send inappropriate messages. Please DO NOT COMPLAIN TO >THE POSTMASTER about these messages because the postmaster has no >way of controlling them, and excessive complaints will result in >termination of the list. > >For information about this list, including information about the >problem of inappropriate messages and information about how to >unsubscribe, please see the web page at >http://jse.stat.ncsu.edu/ >=== > === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Inequalities constrains on the coefficients
I asked Lee Wilkinson how this is done in SYSTAT. Here is his reply. -- Joe * Joe Ward * 167 East Arrowhead Dr. * San Antonio, TX 78228-2402 * Phone: 210-433-6575 * Fax: 210-433-2828 * Email: [EMAIL PROTECTED] * http://www.ijoa.org/joeward/wardindex * * Health Careers High School * 4646 Hamilton Wolfe * San Antonio, TX 78229 * Phone: 210-617-5400 * Fax: 210-617-5423 ** -Original Message- From: Wilkinson, Leland <[EMAIL PROTECTED]> To: 'Joe Ward' <[EMAIL PROTECTED]> Date: Thursday, June 08, 2000 9:34 AM Subject: RE: Inequalities constrains on the coefficients >The SYSTAT procedure NONLIN does the same with the LOSS option and FUNPAR. >Could you perhaps post this to Ed-Stat in the same thread? >Thanks, >Lee > >-----Original Message- >From: Joe Ward [mailto:[EMAIL PROTECTED]] >Sent: Tuesday, June 06, 2000 11:56 AM >To: Wilkinson, Leland (SYSTAT >Subject: Fw: Inequalities constrains on the coefficients > > >Lee -- > >Is this available in any version of SYSTAT? >What about SYSTAT8-Student Version? > >-- Joe > = -Original Message- From: Jonathan Fry <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] <[EMAIL PROTECTED]> Date: Tuesday, June 06, 2000 11:05 AM Subject: Re: Inequalities constrains on the coefficients >Arie Beresteanu wrote: >> >> Hi, >> >> Estimation of linear (multivariate) regression with equality constrains >> on the coefficients is a well known problem (at least for me). What >> about if the constrains are inequalities? More specifically: >> >> Y=Xb+e >> s.t. >> Qb<=q >> >> where Q is a matrix and q is a vector. (for example Y=b0+b1*X1+b2*X2+e >> s.t. b1+2*b2>=0 ) >> >> How do I solve that? How do I test the constrain? Is there something on >> MatLab/STATA/SAS for that? >> >> Thank you, >> Arie. > >The SPSS procedure CNLR (constrained non-linear regression) handles this >kind of problem directly, using a quadratic programming solver. > >Jonathan Fry >SPSS Inc. > > >=== >This list is open to everyone. Occasionally, less thoughtful >people send inappropriate messages. Please DO NOT COMPLAIN TO >THE POSTMASTER about these messages because the postmaster has no >way of controlling them, and excessive complaints will result in >termination of the list. > >For information about this list, including information about the >problem of inappropriate messages and information about how to >unsubscribe, please see the web page at >http://jse.stat.ncsu.edu/ >=== > === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Inequalities constrains on the coefficients
Lee Wilkinson indicates how this is done in SYSTAT. --- Joe -Original Message- From: Wilkinson, Leland <[EMAIL PROTECTED]> To: 'Joe Ward' <[EMAIL PROTECTED]> Date: Thursday, June 08, 2000 9:34 AM Subject: RE: Inequalities constrains on the coefficients >The SYSTAT procedure NONLIN does the same with the LOSS option and FUNPAR. >Could you perhaps post this to Ed-Stat in the same thread? >Thanks, >Lee > >-----Original Message- >From: Joe Ward [mailto:[EMAIL PROTECTED]] >Sent: Tuesday, June 06, 2000 11:56 AM >To: Wilkinson, Leland (SYSTAT >Subject: Fw: Inequalities constrains on the coefficients > > >Lee -- > >Is this available in any version of SYSTAT? >What about SYSTAT8-Student Version? > >-- Joe > -Original Message- From: Jonathan Fry <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] <[EMAIL PROTECTED]> Date: Tuesday, June 06, 2000 11:05 AM Subject: Re: Inequalities constrains on the coefficients >Arie Beresteanu wrote: >> >> Hi, >> >> Estimation of linear (multivariate) regression with equality constrains >> on the coefficients is a well known problem (at least for me). What >> about if the constrains are inequalities? More specifically: >> >> Y=Xb+e >> s.t. >> Qb<=q >> >> where Q is a matrix and q is a vector. (for example Y=b0+b1*X1+b2*X2+e >> s.t. b1+2*b2>=0 ) >> >> How do I solve that? How do I test the constrain? Is there something on >> MatLab/STATA/SAS for that? >> >> Thank you, >> Arie. > >The SPSS procedure CNLR (constrained non-linear regression) handles this >kind of problem directly, using a quadratic programming solver. > >Jonathan Fry >SPSS Inc. > > >=== >This list is open to everyone. Occasionally, less thoughtful >people send inappropriate messages. Please DO NOT COMPLAIN TO >THE POSTMASTER about these messages because the postmaster has no >way of controlling them, and excessive complaints will result in >termination of the list. > >For information about this list, including information about the >problem of inappropriate messages and information about how to >unsubscribe, please see the web page at >http://jse.stat.ncsu.edu/ >=== > === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Regression and Correlation (Was Correlation)
Hi Brett, Herman et al -- Occasionally it seems appropriate to send some results that help reinforce the idea that the correlation coefficient can be of limited value in some situations. The table shown below illustrates what happens when the range of X is restricted. -- Joe * Joe Ward Health Careers High School * * 167 East Arrowhead Dr 4646 Hamilton Wolfe* * San Antonio, TX 78228-2402San Antonio, TX 78229 * * Phone: 210-433-6575 Phone: 210-617-5400* * Fax: 210-433-2828 Fax: 210-617-5423 * * [EMAIL PROTECTED]* * http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: Magill, Brett <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Friday, May 19, 2000 12:46 PM Subject: Regression and Correlation (Was Correlation) | I am no statistician, so let me make sure I am understanding what you are | saying. Your point is that you may have an identical regression equation | despite the fact that the correlation may vary depending on the amount of | variation in X. If this is your point, I agree and recognize this--r is a | measure of the fit about the regression line. | | Nonetheless, regression and correlation are the same in the bivariate case | with the exception of scale. In a bivariate regression, the standardized | Beta coefficient is equal to the Pearson r. As with any standardization, it | removes the scale of the variation and the result is that the slope | describes the relationship or B = r. | | Brett | BEGIN HERMAN'S MESSAGE | -Original Message- | From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] | Sent: Friday, May 19, 2000 11:43 AM | To: [EMAIL PROTECTED] | Subject: Re: Correlation | | | Magill, Brett <[EMAIL PROTECTED]> wrote: | >Mike, | | >In the bivariate case, regression and correlation are identical. | | This is false. Correlation is the measure of the | proportion of the variance of one variable explained by a | linear function of the other in a joint distribution, while | linear regression is the linear relation itself. One can | have non-linear versions as well. | | If in fact E(Y|X) = aX + b, this will also be the case no | matter how selection is made on X, whereas the correlation | can vary greatly. | --- END OF HERMAN'S MESSAGE ----- Beginning of insert by Joe Ward - -- Forwarded message -- Date: Fri, 23 May 1997 09:30:20 -0400 (EDT) From: Mike Palij <[EMAIL PROTECTED]> To: [EMAIL PROTECTED], [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: Testing basic statistical concepts I'd like to thank Joe Ward for reminding us of this situation (his posting is appended below), as well as jogging my own memory for a previous posting I had made. A while back I had posted the Anscombe dataset (in the context of an SPSS program) which also clearly shows the benefit of plotting the data: the four situations produce almost identical Pearson r values but only one actually shows the classic scatterplot, the others show a nonlinear pattern and the influence that a single point has on the calculation of r. What does the value of r tell us here? Aren't the basic statistical concepts to be learned in this situation far more important and most clearly seen through a coordination of the graphical and numerical information? -Mike Palij/Psychology Dept/New York University Joe H Ward <[EMAIL PROTECTED]> writes: To Mike et al -- There have been several message related to the Simple Correlation Coefficient. IMHO, when out in the "real world" involving practical decision-making the correlation coefficient has very limited value and sometimes dangerous consequences. The correlation coefficient may be an important topic for the history of statistics to learn the problems associated with its use . Attached below is an item that I submitted a long time ago, and it may be of interest to those following the discussion of "r". -- Joe *** * Joe WardHealth Careers High School * * 167 East Arrowhead Dr.4646 Hamilton Wolfe * * San Antonio, TX 78228-2402 San Antonio, TX 78229 * * Phone: 210-433-6575 Phone: 210-617-5400 * * [EMAIL PROTECTED] Fax : 210-617-5423 *
Re: R sq vs r sq
Good message, Jon -- :-) -- Joe - Original Message - From: Jon Cryer <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Friday, May 05, 2000 11:13 AM Subject: Re: R sq vs r sq | But the important issue, statistically, is that the model is | linear in the _parameters_ (not the predictor variables). When this is the | case | the equations from which the least squares estimates of the parameters | are obtained are linear equations (the so-called normal equations). | This is true even when fitting a quadratic (or higher order) equation. | | Statisticians always talk about linear models in this way. Statistically | speaking, response = quadratic curve in x + random error is a _linear_ model. | Statisticians use the term nonlinear model for more complex models that are | not linear in the parameters. | | Jon Cryer | | At 07:46 PM 5/5/00 +0200, you wrote: | >Joe, | > | >Well by linear *I* meant what we mean in algebra 2 class y = mx + b, | >but I do not object to calling y = a0 + a1 x1 + a2 x2 + a3 x3 + ... linear. | >I certainly DO object to your definition of linear, although I suppose | >it *is* used by some people, I find it very confusing. | > | >Cheers, | >Bill Larson | >Geneva, Switzerland | > | >- Original Message - | >From: Joe Ward <[EMAIL PROTECTED]> | >To: William J. Larson <[EMAIL PROTECTED]>; Paul Velleman | ><[EMAIL PROTECTED]> | >Cc: <[EMAIL PROTECTED]> | >Sent: 2000 May 05 9:07 PM | >Subject: Re: R sq vs r sq | > | > | > | >Hi Paul, William et al.-- | > | >This may be ANOTHER GOOD TIME TO COMMENT ON | >THE COMMUNICATION PROBLEMS OF STATISTICS (AND OTHER AREAS, TOO). | > | >I suggest that when we use the terms LINEAR and NONLINEAR that we | >tell the reader what the SENDER means by those terms. | > | >When I write: | > | >Y = b1*X1 + b2*X2 + ... + bp*Xp + E | > | >where bi (i = 1,2,...p) are least-squares regression coefficients, I | >will refer to this as a LINEAR MODEL. | > | >The Xs can be any numbers that I choose-- log(z), ln(z), z^3, cos(z), 1/z, | >binary (1or 0), ... | > | >If a person writes the form: | > | >Y = a0 + a1*X + a2*X^2 + a3*X ^3 + E | > | >then they might say that this is a NONLINEAR model. | > | >As long as the reader knows exactly what the model is-- then we are | >communicating. | > | >In these days of fancy 3D graphic displays, it is interesting to picture the | >function: | > | >Y = a0 + a1*X + a2*X^2 | > | >in the 2D space of Y and X -- which appears as a CURVE. | > | >and then picture the function in the 3D space of Y, X and X^2 or | >re-designating X^2 as Z | > | >Y = a0 + a1*X + a2*Z | > | >We notice that the 3D function lies in a PLANE -- reminding us that | >we have a "LINEAR MODEL". | > | >If we hurriedly say to someone that "this function is NONLINEAR in the 2D | >space of Y and X, but | >LINEAR in the 3D space of Y,X and Z", then we might even cause more | >frustration. :-( | > | >"COMMUNICATION" IS A PROBLEM EVERYWHERE! | > | >DO WILLIAM AND PAUL HAVE THE SAME MEANING FOR "NONLINEAR"? | >:-) | > | >--- Joe | > | >* Joe Ward Health Careers High School * | >* 167 East Arrowhead Dr 4646 Hamilton Wolfe* | >* San Antonio, TX 78228-2402San Antonio, TX 78229 * | >* Phone: 210-433-6575 Phone: 210-617-5400* | >* Fax: 210-433-2828 Fax: 210-617-5423 * | >* [EMAIL PROTECTED]* | >* http://www.ijoa.org/joeward/wardindex.html * | > | > | > | > | > | >- Original Message - | >From: Paul Velleman <[EMAIL PROTECTED]> | >To: William J. Larson <[EMAIL PROTECTED]> | >Cc: <[EMAIL PROTECTED]> | >Sent: Friday, May 05, 2000 6:43 AM | >Subject: Re: R sq vs r sq | > | > | >| At 11:18 AM +0200 05/05/2000, William J. Larson wrote: | >| > | >| >It appears that R sq is some sort of generalization of r sq | >| >for nonlinear cases. True? | >| > | >| Not really. common convention is to capitalize the R for multiple | >| correlation. The R sqr reported in regressions allows for the | >| generalization of simple regression to a multiple regression (2 or | >| more predictors). In both cases R sqr is the squared correlation | >| between y and y-hat. Y-hat represents the best (in the least squares | >| sense) fit to y among all linear combinations of the x's. All of | >| these are
Re: R sq vs r sq
Bill -- You are so right!! The term NONLINEAR is very confusing. As I indicated in the earlier message, most folks in the statistics world refer to a LINEAR MODEL as I indicated. Y = b1*X1 + b2*X2 + ... + bp*Xp + E and some folks will write UNFORTUNATELY -- Y = b0 + b1*X1 + b2 * X2 + ... + bp*Xp + E that leads to more confusion!! The main point is that the functions are LINEAR IN THE UNKNOWN COEFFICIENTS. This is why we sometimes take the logs of the function so that the new function is LINEAR IN THE UNKNOWN COEFFICIENTS -- AND THE SOLUTIONS ARE EASIER. A "REAL" NONLINEAR MODEL NEEDS SOME SPECIAL ALGORITHMS FOR SOLUTION. --- Someday -- long after I'm out of this world -- the AP-Statistics objectives WILL ALLOW OUR STUDENTS TO HAVE -- "The power they deserve to use REGRESSION/LINEAR MODELS and COMPUTERS/CALCULATORS to their fullest". Perhaps the secondary teachers can speed up improvements through the NCTM "Principles and Standards for School Mathematics". Perhaps there should be an Applied Research Statistics course that has few restrictions on the content -- focusing on those topics that help students do what they NEED to accomplish practical results -- leading to more enthusiasm for statistics and data analysis. Change is slow!! :-) -- Joe ******** * Joe Ward Health Careers High School * * 167 East Arrowhead Dr 4646 Hamilton Wolfe* * San Antonio, TX 78228-2402San Antonio, TX 78229 * * Phone: 210-433-6575 Phone: 210-617-5400* * Fax: 210-433-2828 Fax: 210-617-5423 * * [EMAIL PROTECTED]* * http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: William J. Larson <[EMAIL PROTECTED]> To: Joe Ward <[EMAIL PROTECTED]>; Paul Velleman <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Friday, May 05, 2000 10:46 AM Subject: Re: R sq vs r sq | Joe, | | Well by linear *I* meant what we mean in algebra 2 class y = mx + b, | but I do not object to calling y = a0 + a1 x1 + a2 x2 + a3 x3 + ... linear. | I certainly DO object to your definition of linear, although I suppose | it *is* used by some people, I find it very confusing. | | Cheers, | Bill Larson | Geneva, Switzerland | | - Original Message - | From: Joe Ward <[EMAIL PROTECTED]> | To: William J. Larson <[EMAIL PROTECTED]>; Paul Velleman | <[EMAIL PROTECTED]> | Cc: <[EMAIL PROTECTED]> | Sent: 2000 May 05 9:07 PM | Subject: Re: R sq vs r sq | | | | Hi Paul, William et al.-- | | This may be ANOTHER GOOD TIME TO COMMENT ON | THE COMMUNICATION PROBLEMS OF STATISTICS (AND OTHER AREAS, TOO). | | I suggest that when we use the terms LINEAR and NONLINEAR that we | tell the reader what the SENDER means by those terms. | | When I write: | | Y = b1*X1 + b2*X2 + ... + bp*Xp + E | | where bi (i = 1,2,...p) are least-squares regression coefficients, I | will refer to this as a LINEAR MODEL. | | The Xs can be any numbers that I choose-- log(z), ln(z), z^3, cos(z), 1/z, | binary (1or 0), ... | | If a person writes the form: | | Y = a0 + a1*X + a2*X^2 + a3*X ^3 + E | | then they might say that this is a NONLINEAR model. | | As long as the reader knows exactly what the model is-- then we are | communicating. | | In these days of fancy 3D graphic displays, it is interesting to picture the | function: | | Y = a0 + a1*X + a2*X^2 | | in the 2D space of Y and X -- which appears as a CURVE. | | and then picture the function in the 3D space of Y, X and X^2 or | re-designating X^2 as Z | | Y = a0 + a1*X + a2*Z | | We notice that the 3D function lies in a PLANE -- reminding us that | we have a "LINEAR MODEL". | | If we hurriedly say to someone that "this function is NONLINEAR in the 2D | space of Y and X, but | LINEAR in the 3D space of Y,X and Z", then we might even cause more | frustration. :-( | | "COMMUNICATION" IS A PROBLEM EVERYWHERE! | | DO WILLIAM AND PAUL HAVE THE SAME MEANING FOR "NONLINEAR"? | :-) | | --- Joe | | * Joe Ward Health Careers High School * | * 167 East Arrowhead Dr 4646 Hamilton Wolfe* | * San Antonio, TX 78228-2402San Antonio, TX 78229 * | * Phone: 210-433-6575 Phone: 210-617-5400* | * Fax: 210-433-2828 Fax: 210-617-5423 * | * [EMAIL PROTECTED]* | * http://www.ijoa.org/joeward/wardindex.html * | *
Re: R sq vs r sq
Hi Paul, William et al.-- This may be ANOTHER GOOD TIME TO COMMENT ON THE COMMUNICATION PROBLEMS OF STATISTICS (AND OTHER AREAS, TOO). I suggest that when we use the terms LINEAR and NONLINEAR that we tell the reader what the SENDER means by those terms. When I write: Y = b1*X1 + b2*X2 + ... + bp*Xp + E where bi (i = 1,2,...p) are least-squares regression coefficients, I will refer to this as a LINEAR MODEL. The Xs can be any numbers that I choose-- log(z), ln(z), z^3, cos(z), 1/z, binary (1or 0), ... If a person writes the form: Y = a0 + a1*X + a2*X^2 + a3*X ^3 + E then they might say that this is a NONLINEAR model. As long as the reader knows exactly what the model is-- then we are communicating. In these days of fancy 3D graphic displays, it is interesting to picture the function: Y = a0 + a1*X + a2*X^2 in the 2D space of Y and X -- which appears as a CURVE. and then picture the function in the 3D space of Y, X and X^2 or re-designating X^2 as Z Y = a0 + a1*X + a2*Z We notice that the 3D function lies in a PLANE -- reminding us that we have a "LINEAR MODEL". If we hurriedly say to someone that "this function is NONLINEAR in the 2D space of Y and X, but LINEAR in the 3D space of Y,X and Z", then we might even cause more frustration. :-( "COMMUNICATION" IS A PROBLEM EVERYWHERE! DO WILLIAM AND PAUL HAVE THE SAME MEANING FOR "NONLINEAR"? :-) --- Joe **** * Joe Ward Health Careers High School * * 167 East Arrowhead Dr 4646 Hamilton Wolfe* * San Antonio, TX 78228-2402San Antonio, TX 78229 * * Phone: 210-433-6575 Phone: 210-617-5400* * Fax: 210-433-2828 Fax: 210-617-5423 * * [EMAIL PROTECTED]* * http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: Paul Velleman <[EMAIL PROTECTED]> To: William J. Larson <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Friday, May 05, 2000 6:43 AM Subject: Re: R sq vs r sq | At 11:18 AM +0200 05/05/2000, William J. Larson wrote: | > | >It appears that R sq is some sort of generalization of r sq | >for nonlinear cases. True? | > | Not really. common convention is to capitalize the R for multiple | correlation. The R sqr reported in regressions allows for the | generalization of simple regression to a multiple regression (2 or | more predictors). In both cases R sqr is the squared correlation | between y and y-hat. Y-hat represents the best (in the least squares | sense) fit to y among all linear combinations of the x's. All of | these are statistics for linear models. It is dangerous to apply them | to nonlinear models. | | -- Paul | -- | Paul F. Velleman | Cornell University Data Description, Inc. | 358 Ives Hall Box 4555 | Ithaca, NY 14853 Ithaca, NY 14852-4555 | (607) 255-4411 (607) 257-1000 | (607) 255-8484 fax(607) 257-4146 fax | === | The Advanced Placement Statistics List | To UNSUBSCRIBE send a message to [EMAIL PROTECTED] containing: | unsubscribe apstat-l | Discussion archives are at | http://forum.swarthmore.edu/epigone/apstat-l | Problems with the list or your subscription? mailto:[EMAIL PROTECTED] | === | === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: no correlation assumption among X's in MLR
And in addition to: 1. A Correlation Matrix and 2. A Covariance Matrix another person may simply use 3. An X'X matrix of inner products of the "raw" vectors. Numerical accuracy is always an important consideration. -- Joe ******** * Joe Ward Health Careers High School * * 167 East Arrowhead Dr 4646 Hamilton Wolfe* * San Antonio, TX 78228-2402San Antonio, TX 78229 * * Phone: 210-433-6575 Phone: 210-617-5400* * Fax: 210-433-2828 Fax: 210-617-5423 * * [EMAIL PROTECTED]* * http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: David A. Heiser <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]>; Warren Sarle <[EMAIL PROTECTED]> Sent: Thursday, May 04, 2000 7:29 PM Subject: Re: no correlation assumption among X's in MLR | | - Original Message - | From: Warren Sarle <[EMAIL PROTECTED]> | To: <[EMAIL PROTECTED]> | Sent: Thursday, May 04, 2000 12:23 PM | Subject: Re: no correlation assumption among X's in MLR | | | > Of course Herman is right (as usual)! Where are people getting this | > ridiculous idea that correlation and collinearity are the same thing? | | .. | Statistics is one field that has almost no agreed on usage of terms. | Everybody is independent. | | In one of my books, "Applied Linear Regression Models", by Neter, Wasserman | and Kutner (1989) says "When the independent variables are correlated among | themselves, intercorrelation or multicollinearity among them is said to | exist. (Sometimes the latter term is reserved for those instances when the | correlation among independent variables is very high.)..." The authors use | multicolinearlity to refer to the correlation between X variables. (Who is | right?) | | From a numerical analysis viewpoint the basic matrix in OLS is the | normalized X matrix which is called the correlation matrix. If | standardization is not applied, the matrix is the covariance matrix. | | It is clear then that there is a numerical difference between the covariance | and correlation matricies. | | ... | > | > Assuming you're using an intercept, a pair of variables is | > collinear if and only if their correlation is 1.0 or -1.0. | > Three or more variables are collinear if and only if there | > is at least one of the variables that has a multiple | > correlation of 1.0 with the other variables. | | ... | This may be your interpretation, but it is not universal. | ... | > | > If the independent variables in a multiple linear regression are | > collinear, there are infinitely many sets of least-squares | > regression coefficients that produce the same predictions, MSE, | > R-squared, etc. | .. | This is only true when the correlation matrix has off diagonals with | 1.0. If it is slightly | different because of numerical representations in the computer, there will | be a finite set of apparent identical solutions. | ... | | Although least squares does not produce unique | > estimates, if you have prior information, you may be able to get | > meaningful and useful Bayesian estimates. Regardless of whether you | > have prior information, you can get useful predictions for new | > cases lying in the same subspace as the original sample. Without | > prior information, you cannot get useful extrapolations outside of | > that subspace. Statisticians who are not data miners sometimes | > forget the distinction between estimation and prediction. :-) | | | For many years the method of ridge analysis (non-Bayesian) has been | extensively used in industry to get valid and workable extrapolations (i.e. | predictions) beyond the range of the data used. The technique of varying | lambda to reduce the variance inflation factor is a very good way to obtain | useful and valid predictions. (All non-Baysian). | | > Collinearity generally will NOT cause different machines or | > different
STATISTICS AT ISEF2000- International Science & Engineering Fair -- Detroit May 7-13 --Summer Workshop in San Antonio
Topic #1 --The directory of finalists for ISEF2000 is now available at: http://www.sciserv.org/isef/finaldir.pdf There are finalists from all U.S. states and over 40 nations. I did a brief search for MICHIGAN and a few schools represented are: Renaissance HS Saginaw Arts & Science Academy Western High School Redford HS It is easy to find finalists near your location. If you know any finalists, teachers, parents or others who might be interested I will present the annual Shop Talk titled: "Combining the Power of Statistics and Computers to Enhance Science Fair Projects" at 9:00-10:00 a.m. on Monday, May 8, 2000 in Cobo Hall Room O2-41. The purpose of this session is to provide guidance to Science Fair students, teachers and others to help them acquire statistics advice and suggest kinds of questions they might ask their statistical advisors. As you can guess, I will encourage the participants to get assistance from those who can teach them to create the models needed to answer their, possibly unique, questions of interest. You can tell your friends that there will be some valuable drawing prizes for those who get there early and stay 'til the end. This year, none of the students with whom I advised in their data-analysis made it to ISEF2000. ---sigh :-( == Topic #2 -- We have decided to open our Summer Workshop, emphasizing the Power of Statistics and Computers in Science Research, to a select few folks who may want to attend FROM OUTSIDE THE SAN ANTONIO REGION. The application form with detailed information can be seen at the web site shown below. This may be of interest to those who work with student research projects and those AP-Statistics teachers who have some extra school time AFTER THE MAY AP-EXAM to introduce their students to some additional data-analysis ideas. http://www.ijoa.org/joeward/wardindex.html The dates are May 29 - June 9. If a participant can stay for only the first week, that's OK. Those who may be interested can call me to discuss details. --- Joe **** * Joe Ward Health Careers High School * * 167 East Arrowhead Dr 4646 Hamilton Wolfe* * San Antonio, TX 78228-2402San Antonio, TX 78229 * * Phone: 210-433-6575 Phone: 210-617-5400* * Fax: 210-433-2828 Fax: 210-617-5423 * * [EMAIL PROTECTED]* * http://www.ijoa.org/joeward/wardindex.html * === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing -Reply
Hi, Robert and all -- Yes, there occasionally were discussions in our Air Force research whether or not we were working with the POPULATION or a SAMPLE. As Dennis comments: | | > the flaw here is that ... she has population data i presume ... or about | as | > close as one can come to it ... within the institution ... via the budget | > or comptroller's office ... THE salary data are known ... so, whatever | > differences are found ... DEMS are it! | > One of my Professors used to use the Invertebrate Paleontologists as his example of a POPULATION. I think at that time there were less than 20 people who were Invertebrate Paleontologists. -- Joe **** * Joe Ward Health Careers High School * * 167 East Arrowhead Dr 4646 Hamilton Wolfe* * San Antonio, TX 78228-2402San Antonio, TX 78229 * * Phone: 210-433-6575 Phone: 210-617-5400* * Fax: 210-433-2828 Fax: 210-617-5423 * * [EMAIL PROTECTED]* * http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: Robert Dawson <[EMAIL PROTECTED]> To: dennis roberts <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Monday, April 17, 2000 9:54 AM Subject: Re: hyp testing -Reply | | - Original Message - | From: dennis roberts | > At 10:32 AM 4/17/00 -0300, Robert Dawson wrote: | > | > > There's a chapter in J. Utts' mostly wonderful but flawed low-math | intro | > >text "Seeing Through Statistics", in which she does much the same. She | > >presents a case study based on some of her own work in which she looked | at | > >the question of gender discrimination in pay at her own university, and | > >fails to reject the null hypothesis [no systemic difference in pay | between | > >male and female faculty]. She heads the example "Important, but not | > >significant, differences in salaries"; comments (_perhaps_ technically | > >correctly but misleadingly) that "a statistically naive reader could | > >conclude that there is no problem" and in closing states: | | and Dennis Roberts replied: | | > the flaw here is that ... she has population data i presume ... or about | as | > close as one can come to it ... within the institution ... via the budget | > or comptroller's office ... THE salary data are known ... so, whatever | > differences are found ... DEMS are it! | > | > the notion of statistical significance in this case seems IRRELEVANT ... | > the real issue is ... given that there are a variety of factors that might | > account for such differences (numbers in ranks, time in ranks, etc. etc.) | > is the remaining difference (if there is one) IMPORTANT TO DEAL WITH | ... | | | If one can totally explain all contributing factors, so that a model | with significantly fewer parameters than there are faculty fits everybody to | within a practically significant margin of error, then yes, either the model | continues to work with gender removed or it doesn't. | | If, on the other hand, there are unknown sources of variation (a | reasonable assumption in any situation involving people), or more sources of | variation than there are data (another good bet if one thought hard enough), | one cannot automatically go from the observation | | (*) "The average pay of female faculty members here is less than that of | male faculty members" | | to the apparently desired conclusion | | (**) "There is a gender-based _pattern_ of discrimination in faculty | salaries" | | without considering the study as a pseudo-experiment, and analyzing it as | such. One would be trying to decide: is the difference between mean male | and female faculty salaries greater than one would expect if one took N1 | males and N2 females and assigned factors such as experience, rank, | skill/luck at negotiating a first contract, demand for specialties, merit | pay actually deserved [as opposed to given on a gender basis], etc. at | random? | | This is what Utts and her coauthors were, it seems, trying to do. | However, when the tests were not significant at the chosen level they seem | to have fallen back on inferring (**) directly from (*). | | -Robert Dawson | | | | === | This list is open to everyone. Occasionally, less thoughtful | people send inappropriate messages. Please DO NOT COMPLAIN TO | THE POSTMASTER about these messages because the postmaster has no | way of controlling the
Re: linear model or interactive model?
Good message, Alan -- As you indicate, the model is LINEAR in the coefficients b0, b1, b2, b3 and in the 4-D space of y,x1,x2,x3(i.e., x1*x2) the function lies in a PLANE. But in the 3-D space of y,x1,x2 the surface is TWISTED ( not in a PLANE). -- Joe - Original Message - From: Alan McLean <[EMAIL PROTECTED]> To: Wen-Feng Hsiao <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Sunday, April 16, 2000 4:01 PM Subject: Re: linear model or interactive model? | The model | | y = b0 + b1 * x1 + b2 * x2 + b3 * x1*x2 | | is a nonlinear model, just as in engineering. However, it is 'linear in the | variables'. In statistics this is useful, because in estimating the model from a | data set, one can define a 'new' variable x3 = x2*x2 and apply, for example, a | linear regression algorithm. | | But in interpreting the results you have to remember that the model is nonlinear! | | Regards, | Alan | | | | | | Wen-Feng Hsiao wrote: | | > Dear Hartig, | > | > Thanks for your reply. I am sorry for my poor knowledge in statistics. | > But I wonder why the definition of 'linearity' of statistics is different | > from that of engineering mathematics, which defines 'linear' as: | > | > Each unknown xj appears to the first power only, and that there are no | > cross product terms xi*xj with i!=j. | > | > Wen-Feng | > | > In article <[EMAIL PROTECTED]>, | > [EMAIL PROTECTED] says... | > > Generally, you can include an interaction (or moderator) term in a linear | > > model, like | > > y = b0 + b1 * x1 + b2 * x2 + b3 * x1*x2, | > > and the model still is linear. If you decide not to include x1 and x2, like | > > y = b0 + b1 * x1*x2, | > > you still have a linear model. | > | > === | > This list is open to everyone. Occasionally, less thoughtful | > people send inappropriate messages. Please DO NOT COMPLAIN TO | > THE POSTMASTER about these messages because the postmaster has no | > way of controlling them, and excessive complaints will result in | > termination of the list. | > | > For information about this list, including information about the | > problem of inappropriate messages and information about how to | > unsubscribe, please see the web page at | > http://jse.stat.ncsu.edu/ | > === | | -- | Alan McLean ([EMAIL PROTECTED]) | Department of Econometrics and Business Statistics | Monash University, Caulfield Campus, Melbourne | Tel: +61 03 9903 2102Fax: +61 03 9903 2007 | | | | | === | This list is open to everyone. Occasionally, less thoughtful | people send inappropriate messages. Please DO NOT COMPLAIN TO | THE POSTMASTER about these messages because the postmaster has no | way of controlling them, and excessive complaints will result in | termination of the list. | | For information about this list, including information about the | problem of inappropriate messages and information about how to | unsubscribe, please see the web page at | http://jse.stat.ncsu.edu/ | === | === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: linear model or interactive model?
Wen-Feng- The term LINEAR is a difficult term. As I mentioned to you in an earlier message (included for reference as the end of this message), a LINEAR STATISTICAL MODEL is "LINEAR" in the unknown coefficients, a1, a2,... ap in the model: Y = a1*X1 + a2*X2 + ... + ap*Xp + E The X predictors can be ANY NUMBERS THAT WE LIKE. If we write -- Y = a1*U + a2*X + a2*X^2 + E where U = 1 X = a continuous predictor X^2 = X*X E = error or residual we might say that the function is NON-LINEAR in the two-dimensional, Y-X plane, but it is LINEAR in the three dimensional space of Y-X-X^2. With 3-D displays that we can rotate as we would like, it is enlightening to observe that the CURVE seen in the two-dimensional space lies in a PLANE in the three-dimensional space of Y-X-X^2. -- Joe ******** * Joe Ward Health Careers High School * * 167 East Arrowhead Dr 4646 Hamilton Wolfe* * San Antonio, TX 78228-2402San Antonio, TX 78229 * * Phone: 210-433-6575 Phone: 210-617-5400* * Fax: 210-433-2828 Fax: 210-617-5423 * * [EMAIL PROTECTED]* * http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: Wen-Feng Hsiao <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Saturday, April 15, 2000 5:14 AM Subject: Re: linear model or interactive model? | Dear Hartig, | | Thanks for your reply. I am sorry for my poor knowledge in statistics. | But I wonder why the definition of 'linearity' of statistics is different | from that of engineering mathematics, which defines 'linear' as: | | Each unknown xj appears to the first power only, and that there are no | cross product terms xi*xj with i!=j. | | Wen-Feng | | In article <[EMAIL PROTECTED]>, | [EMAIL PROTECTED] says... | > Generally, you can include an interaction (or moderator) term in a linear | > model, like | > y = b0 + b1 * x1 + b2 * x2 + b3 * x1*x2, | > and the model still is linear. If you decide not to include x1 and x2, like | > y = b0 + b1 * x1*x2, | > you still have a linear model. | ==== - Original Message - From: Joe Ward <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]>; Wen-Feng Hsiao <[EMAIL PROTECTED]> Sent: Thursday, April 13, 2000 10:30 AM Subject: Re: linear model or interactive model? - Original Message - From: Wen-Feng Hsiao <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, April 13, 2000 3:06 AM Subject: linear model or interactive model? | Dear all, | | Suppose I have an aggregation model which is in the following form: | Y = c1*(X11 * X12) + c2*(X21 * X22)? | | This model could be thought as an aggregation of two knowledge, namely | X1. and X2.. Each knowledge contains two pieces of information | (attributes). For example, X1 contains X11 ans X12. Now if X.1 is the | height, and X.2 is the weight of a person. Then, the aggregation of any | two persons, say, Student1(height=170cm, weight=60kg), | Student2(height=180cm, weight=68kg) can be represented by | | Y = 170*60+180*68=22440. | | My question: a model as the above form is linear or interactive? I doubt | it is not a linear model. Since it is not in this form: Y= c1 X1 + c2 X2, | where c1 and c2 are constant. I doubt it is not a pure interactive form, | since X.1 and X.2 are dependent. Sorry for this stupid question. | | Wen-Feng | Joe Ward writes| === Wen-Feng--- Your model -- Y = X11 * X12 + X21 * X22. does not have any unknowns. Did you mean to write: Y = c1*(X11 * X12) + c2*(X21 * X22)? All models of the form: Y = c1*X1 + c2*X2 + ... + cp*Xp + E are LINEAR MODELS. It does not matter what NUMBERS are included in the Xs. Y = c1*X1 + c2*X2 + c3*(X1*X2) + c4*(X1^2) + c5*(lnX1) + E is LINEAR in the unknown coefficients c1, c2, ... The most useful Xs are the BINARY( 1 or 0) predictors. --- Joe * Joe Ward Health Careers High School * * 167 East Arrowhead Dr 4646 Hamilton Wolfe* * San Antonio, TX 78228-2402San Antonio, TX 78229 * * Phone: 210-433-6575 Phone: 210-617-5400* * Fax: 210-433-2828 Fax: 210-617-5423 * * [EMAIL PROTECTED]* * http://www.ijoa.org/joeward/wardindex.html *
Re: linear model or interactive model?
- Original Message - From: Wen-Feng Hsiao <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, April 13, 2000 3:06 AM Subject: linear model or interactive model? | Dear all, | | Suppose I have an aggregation model which is in the following form: | Y = c1*(X11 * X12) + c2*(X21 * X22)? | | This model could be thought as an aggregation of two knowledge, namely | X1. and X2.. Each knowledge contains two pieces of information | (attributes). For example, X1 contains X11 ans X12. Now if X.1 is the | height, and X.2 is the weight of a person. Then, the aggregation of any | two persons, say, Student1(height=170cm, weight=60kg), | Student2(height=180cm, weight=68kg) can be represented by | | Y = 170*60+180*68=22440. | | My question: a model as the above form is linear or interactive? I doubt | it is not a linear model. Since it is not in this form: Y= c1 X1 + c2 X2, | where c1 and c2 are constant. I doubt it is not a pure interactive form, | since X.1 and X.2 are dependent. Sorry for this stupid question. | | Wen-Feng | ==== Joe Ward writes| === Wen-Feng--- Your model -- Y = X11 * X12 + X21 * X22. does not have any unknowns. Did you mean to write: Y = c1*(X11 * X12) + c2*(X21 * X22)? All models of the form: Y = c1*X1 + c2*X2 + ... + cp*Xp + E are LINEAR MODELS. It does not matter what NUMBERS are included in the Xs. Y = c1*X1 + c2*X2 + c3*(X1*X2) + c4*(X1^2) + c5*(lnX1) + E is LINEAR in the unknown coefficients c1, c2, ... The most useful Xs are the BINARY( 1 or 0) predictors. --- Joe **** * Joe Ward Health Careers High School * * 167 East Arrowhead Dr 4646 Hamilton Wolfe* * San Antonio, TX 78228-2402San Antonio, TX 78229 * * Phone: 210-433-6575 Phone: 210-617-5400* * Fax: 210-433-2828 Fax: 210-617-5423 * * [EMAIL PROTECTED]* * http://www.ijoa.org/joeward/wardindex.html * === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
Hi, Dennis-- Yes, "LOT of years!" ago (the 1950's), when I first started into the real applied world, our main job was to PREDICT, PREDICT, PREDICT outcomes. We had some real cost figures to evaluate our predictions. Before the term Bootstrap arrived on the scene, we were Cross-Validating like mad. We would divide those punched cards into "random?" groups and shuffle them over and over again and "re-group". Then apply the predictions developed from one data set to the others to see how well he were doing. Hypothesis testing -- in the classical sense -- was not involved I still believe that TWO important ideas in life are: - PREDICTION and - OPTIMIZATION (choosing among alternative PREDICTIONS to MAXIMIZE or MINIMIZE one or more OBJECTIVE FUNCTIONS). If "Hypothesis testing" helps improve PREDICTION and OPTIMIZATION then that's great. One of the difficulties in academia may be due to the lack of practical, decision-making opportunities. What PRACTICAL ACTIONS do we take as a result of analyzing a two-way table with a Chi-Square "test" if we find a "statistically significant" outcome? I imagine we will get some suggestions from our readers! :-) -- Joe ******** * Joe Ward Health Careers High School * * 167 East Arrowhead Dr 4646 Hamilton Wolfe* * San Antonio, TX 78228-2402San Antonio, TX 78229 * * Phone: 210-433-6575 Phone: 210-617-5400* * Fax: 210-433-2828 Fax: 210-617-5423 * * [EMAIL PROTECTED]* * http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: dennis roberts <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Friday, April 07, 2000 6:41 AM Subject: hyp testing | let's say that today ... we as the statistical community decided, by | democratic vote, that the concept of 'hypothesis testing' ... which has | essentially dominated statistical work for as long as i can remember | (which, er um ... is a LOT of years!) ... is relegated to the 'we USED | to do this stuff' category | | just THINK about this | | what would the vast majority of folks who either do inferential work and/or | teach it ... DO | what analyses would they be doing? what would they be teaching? | | | | === | This list is open to everyone. Occasionally, less thoughtful | people send inappropriate messages. Please DO NOT COMPLAIN TO | THE POSTMASTER about these messages because the postmaster has no | way of controlling them, and excessive complaints will result in | termination of the list. | | For information about this list, including information about the | problem of inappropriate messages and information about how to | unsubscribe, please see the web page at | http://jse.stat.ncsu.edu/ | === | === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Reference for "regression discontinuity"
Hi, Carl --- If you still have your copy of Introduction to Linear Models (Ward & Jennings) you will find many examples in Chapters 10 and 11. An interesting example is on paged 217, 11.9 Discontinuity Between Two Second-Degree Polynomials. With facility to create linear models appropriate to the research questions of interest, many seemingly-unique problems can be handled easily, e.g. Cubic Splines. -- Joe * Joe Ward Health Careers High School ** 167 East Arrowhead Dr 4646 Hamilton Wolfe ** San Antonio, TX 78228-2402 San Antonio, TX 78229 ** Phone: 210-433-6575 Phone: 210-617-5400 ** Fax: 210-433-2828 Fax: 210-617-5423 ** [EMAIL PROTECTED] ** http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: Carl J Huberty <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, March 22, 2000 8:30 AM | Will someone give me a (readable) reference for "regression| discontinuity"? Thanks in advance.| | Carl Huberty| | | | ===| This list is open to everyone. Occasionally, less thoughtful| people send inappropriate messages. Please DO NOT COMPLAIN TO| THE POSTMASTER about these messages because the postmaster has no| way of controlling them, and excessive complaints will result in| termination of the list.| | For information about this list, including information about the| problem of inappropriate messages and information about how to| unsubscribe, please see the web page at| http://jse.stat.ncsu.edu/| ===|
Re: Matrix multiplication
David -- Great message!! One of most "revealing" numerical analysis problems is when there is interest in "POWERING" a transition matrix in a Markov model. PRE-MULTIPLYING to "POWER" the matrix compared to POST-MULTIPLYING can get quite different results This due to the different order of accumulation of the sum of products of numbers between 0 and 1. Numerical analysts can have lots of challenging problems. -- Joe ******** * Joe Ward Health Careers High School ** 167 East Arrowhead Dr 4646 Hamilton Wolfe ** San Antonio, TX 78228-2402 San Antonio, TX 78229 ** Phone: 210-433-6575 Phone: 210-617-5400 ** Fax: 210-433-2828 Fax: 210-617-5423 ** [EMAIL PROTECTED] ** http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: David A. Heiser <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]>; Anthony Pleticos <[EMAIL PROTECTED]> Sent: Friday, March 17, 2000 2:27 PM Subject: Re: Matrix multiplication | | - Original Message -| From: Anthony Pleticos <[EMAIL PROTECTED]>| To: <[EMAIL PROTECTED]>| Sent: Wednesday, March 15, 2000 4:24 PM| Subject: Matrix multiplication| | | > I don't know if I hit the correct site but would be grateful for an| answer -| > it is a fundamental one. We all know that linear regression can be| > accomplished by matrix multiplication and that there are packages which| will| > do it for you. I am teaching myself C++ and for the purposes of the| > excercise I would like to know how to create a matrix or obtain ready made| > code (ie "numerical recipe" )class so I could declare in a program:| >| > #include | > #include | > #include /* if there is such a file */| | | The basic problem is that there is an enormous differences between real| world matricies. There is no one method for numerical matrix reductions. For| example note the very large number of Fortran subroutines that focus on| peculiar aspects (banded, complex, sparse, near singular, positive definite,| not positive definate, triangular, rank deficient, etc., etc) Note the large| number of free Fortran subroutines devoted to matrices in "NETLIB". There| are other free Fortran libraries available from the web.| | Matrix multiplication is not numerically straightforward given a finite| computer environment. One can get very misleading results doing the standard| multiply and add method using standard single precision.| | I would suggest you get familiar with numerical analysis methods. I| personally prefer the works of G. W. Stewart as a source.| | DAHeiser| | | | ===| This list is open to everyone. Occasionally, less thoughtful| people send inappropriate messages. Please DO NOT COMPLAIN TO| THE POSTMASTER about these messages because the postmaster has no| way of controlling them, and excessive complaints will result in| termination of the list.| | For information about this list, including information about the| problem of inappropriate messages and information about how to| unsubscribe, please see the web page at| http://jse.stat.ncsu.edu/| ===|
Re: Looking for text on resampling...
Scott -- Peter Bruce should be able to give us the latest "word". -- Joe ******** * Joe Ward Health Careers High School ** 167 East Arrowhead Dr 4646 Hamilton Wolfe ** San Antonio, TX 78228-2402 San Antonio, TX 78229 ** Phone: 210-433-6575 Phone: 210-617-5400 ** Fax: 210-433-2828 Fax: 210-617-5423 ** [EMAIL PROTECTED] ** http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Sunday, February 06, 2000 12:08 PM Subject: Looking for text on resampling... | Our small college library has a collection of basic biostats texts but| nothing that specifically covers the area of resampling. I am currently| looking over a 1991 text by Bryan Manly (Randomization and Monte Carlo| Methods in Biology) - the first two chapters seem quite accessible (to| someone unfamiliar with the field!)| | Could anyone suggest other texts that might cover bootstrapping and| jacknife techniques - I would favour texts that have a biology bent and| are written so non-specialists can follow...| | Many thanks!| | Scott| | | Sent via Deja.com http://www.deja.com/| Before you buy.| | | ===| This list is open to everyone. Occasionally, people lacking respect| for other members of the list send messages that are inappropriate| or unrelated to the list's discussion topics. Please just delete the| offensive email.| | For information concerning the list, please see the following web page:| http://jse.stat.ncsu.edu/| ===|
Re: Why do we use and teach z?
Josh, Bill, et al -- I can't resist!! Yes, those who have invested much of their life in acquiring certain knowledge tend to want future generations to have those "exciting" historical experiences. It is rather unfortunate that we have a hard time making changes to give future generations some of the power they deserve. I experienced some difficulty in the 1950's with those folks who had become "masters" of the various analysis of variance algorithms that were developed before computers became available. My first major job in the 1950s was to "get us off of Frieden, Marchant and Monroe desk calculators onto the IBM 602A followed by IBM 607, then IBM 650 etc." The biggest difficulty was to get researchers to take advantage of the computer power that allowed them the freedom to create their own models to answer their questions of interest. It was very difficult for persons with Ph.D. degrees to give up that for which they had invested so much time to learn. It was a little "traumatic" in the 1950s when a Ph.D. was told that "you don't need to have equal or proportional Ns in a two-way ANOVA". And it was really interesting to see the reaction when they were told that "you don't need a response in every cell". As a matter of fact, the managers of our Air Force research organization assembled a panel of experts to come in to find out what Bob Bottenberg and I were up to when we were promoting the use of a more general approach to creating models to answer research questions of interest. It is indeed amazing that, 40 years later, many first-course statistics students are told that "IT IS BEYOND THE SCOPE OF THIS TEXT TO DEAL WITH SITUATIONS IN WHICH SAMPLE SIZES ARE UNEQUAL IN THE CELLS OF TWO-WAY ANOVA". It is little wonder that these students can do very little data analysis in support of practical research. A few of you have heard this "sermon" before!! By the way, those of you who have six weeks of school after the exam might want to give your students some power to use Prediction/Regression/Linear Models and Computers. They might be able to do some useful data analysis and appreciate your efforts!! Well, that's enough from a "NON-INFLUENTIAL OUTLIER". -- Joe * Joe Ward Health Careers High School ** 167 East Arrowhead Dr 4646 Hamilton Wolfe ** San Antonio, TX 78228-2402 San Antonio, TX 78229 ** Phone: 210-433-6575 Phone: 210-617-5400 ** Fax: 210-433-2828 Fax: 210-617-5423 ** [EMAIL PROTECTED] ** http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: Joshua Tabor To: William J. Larson ; AP Stats. list Sent: Friday, March 17, 2000 9:11 AM Subject: RE: Why do we use and teach z? Reply to: RE: Why do we use and teach z? I agree with you completely. The only explanation I received for why it is still in most books is that it is a nice stepping stone to a full fledged t-test (of course, it is very likely I am misinformed). Anyway, this year I have decided to teach inference for proportions first (as the stepping stone) and then go straight into t-tests, eliminating z-tests for means. It helps make the course more realistic, and it saves me precious time (we start the second week of september and have 6 weeks of school after the AP!).I am curious to hear what the college folks (and textbook authors) have to sayjoshJosh TaborWilson HSHacienda Heights, CA[EMAIL PROTECTED]William J. Larson wrote:>Why do we use and teach z?>>As I continually tell my students, normally (no pun intended) we do >not know sigma, so we should use t not z. Indeed can we ever know>sigma? If not why do we even bother to mention z? Is it historical >reasons? Or because in the real world lots of people ignore the above >fact & use z anyway, so we are conscientiously preparing our students >for the real world? Or (more likely) am I missing something?>>Dr. William J. Larson>[EMAIL PROTECTED]>Institut Monte Rosa>Montreux, Switzerland>>>>>>===>The Advanced Placement Statistics List>To UNSUBSCRIBE send a message to [EMAIL PROTECTED] containing:>unsubscribe apstat-l >Discussion archives are at>http://forum.swarthmore.edu/epigone/apstat-l>Problems with the list or your subscription? mailto:[EMAIL PROTECTED]>==
Re: When *must* use weighted LS?
John-- If you are interested in PREDICTION then the way YOU use your information is up to YOU. By Cross-validation, Resampling etc. you can determine which prediction method seems to be "best" for your situation. -- Joe ******** * Joe Ward Health Careers High School ** 167 East Arrowhead Dr 4646 Hamilton Wolfe ** San Antonio, TX 78228-2402 San Antonio, TX 78229 ** Phone: 210-433-6575 Phone: 210-617-5400 ** Fax: 210-433-2828 Fax: 210-617-5423 ** [EMAIL PROTECTED] ** http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: John Hendrickx <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, March 15, 2000 1:22 AM Subject: Re: When *must* use weighted LS? | In article <8am7d1$hqj$[EMAIL PROTECTED]">8am7d1$hqj$[EMAIL PROTECTED]>, | [EMAIL PROTECTED] says...| > | > I think I made the formulation too wordy in previous| > post. | > | > Let me try this simple question:| > | > When one wishes to do a (multi)linear regression on a set of | > observed data, and one is in the (unusual) position of possessing| > a set of sample standard deviations (of varying degrees of f.) | > at each value of the "explanatory" variable, how does one| > determine whether one ought or ought not to solve the weighted| > least squares problem using those sample standard deviations?| > | > What is the usual decision test for "heterscedasticity" *before* one| > solves the regression system? What do people do in practise?| > | Most social scientists don't worry very much about the assumptions of OLS | regression, noting that OLS estimates are fairly robust and can give | unbiased estimates even if those assumptions aren't fulfilled. Exceptions | are multilevel models and time series data, data for which the assumption | of uncorrelated error terms is violated. But these require special | programs, not weighted least squares.| | There is also some debate on using weights for stratified sampling and/or | to correct for sampling bias. Weighting leads to correct estimates but | incorrect standard errors. One solution is to include the design | variables in the model instead of weighting. Stata and Wesvar are two | programs that can take weighting into account when calculating standard | errors of estimates. But a quite common approach is to use weights for | descriptive statistics, but not in multivariate models.| | Weights can also be used for certain dependent variables that will | violate the assumption of heteroscedasticity, e.g. a dichotomous | dependent. I recently did a weighted least squares analysis for a co-| worker to replicate an analysis in another paper. The weight was | groupn*pct*(1-pct), where groupn was the number of cases per group and | pct was the proportion with a positive response within each group. But | this basically amounts to a poor approximation of a logit model. Programs | like GLIM that use iteratively reweighted least squares use pct*(1-pct) | as the weight when estimating the model, but now pct is the predicted | probability from the previous iteration.| | As for a test for heteroscedasticity, Stata has a "hettest", which | performs a Cook-Weisberg test and produces a chi-square statistic. They | wrote a book in 1982, "Residuals and influence in regression". I've never | used it though.| | Hope this helps,| John Hendrickx| | | ===| This list is open to everyone. Occasionally, less thoughtful| people send inappropriate messages. Please DO NOT COMPLAIN TO| THE POSTMASTER about these messages because the postmaster has no| way of controlling them, and excessive complaints will result in| termination of the list.| | For information about this list, including information about the| problem of inappropriate messages and information about how to| unsubscribe, please see the web page at| http://jse.stat.ncsu.edu/| ===|
Re: Cluster and outliers
Nicolas -- Most of the statistical software systems have Clustering Algorithms with a variety of objective functions. It is certainly reasonable to use a several approaches to help identify "OUTLIERS" or "INFLUENTIAL" observations. The identification AND definition of an "OUTLIER" or "INFLUENTIAL" observations should be the responsibility of the researcher who KNOWS the context of the analysis. Also, regression models can be used to provide information to help the researcher identify "OUTLIERS" or "INFLUENTIAL" observations. One approach is to LEAVE EACH OBSERVATION OUT OF THE ANALYSIS and "test the hypothesis that the observed Y value for each of the "left-out" observations is equal to the PREDICTED value from the other N-1 observations." The output of these N hypotheses can be helpful. The Classification Society of North America has a web site at http://www.pitt.edu/~csna/ that might be helpful in your search about Clustering. A good contact is the Secretary/Treasurer of CSNA:Stanley L. ScloveDepartment of Information and Decision Sciences M/C 294College of Business AdministrationUniversity of Illinois at Chicago601 S. Morgan StreetChicago, IL 60607-7124www.uic.edu/~slsclove Have fun!! --Joe **** * Joe Ward Health Careers High School ** 167 East Arrowhead Dr 4646 Hamilton Wolfe ** San Antonio, TX 78228-2402 San Antonio, TX 78229 ** Phone: 210-433-6575 Phone: 210-617-5400 ** Fax: 210-433-2828 Fax: 210-617-5423 ** [EMAIL PROTECTED] ** http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: Nicolas MEYER <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Sunday, March 12, 2000 4:52 AM Subject: Cluster and outliers | Hi everybody !!| | I'm desperately looking for books or papers on possible links beetwen| cluster analysis and outliers, cluster being of course used to detect| outlier(s).| Does anybody knows anything about this ?| Thank's !!| | Nicolas MEYER| Interne en Santé Publique| CHU Strasbourg-FRANCE| | | | | ===| This list is open to everyone. Occasionally, less thoughtful| people send inappropriate messages. Please DO NOT COMPLAIN TO| THE POSTMASTER about these messages because the postmaster has no| way of controlling them, and excessive complaints will result in| termination of the list.| | For information about this list, including information about the| problem of inappropriate messages and information about how to| unsubscribe, please see the web page at| http://jse.stat.ncsu.edu/| ===|
Re: Repeated measures
Hi, Kaspar-- The CORRECT model is the one that allows YOU to answer YOUR OWN questions of interest. If the "packaged" PROCs have been verified to do what YOU want, then that's good. It is sometimes difficult to know what question a "packaged" PROC is attempting to answer. Be careful -- especially if there may be "missing cells". :-) --Joe ******** * Joe Ward Health Careers High School ** 167 East Arrowhead Dr 4646 Hamilton Wolfe ** San Antonio, TX 78228-2402 San Antonio, TX 78229 ** Phone: 210-433-6575 Phone: 210-617-5400 ** Fax: 210-433-2828 Fax: 210-617-5423 ** [EMAIL PROTECTED] ** http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: Kasper Hornbæk <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, March 09, 2000 1:41 AM Subject: Q: Repeated measures | Hi everybody.| I have a question concerning repeated measures analysis. I am not sure of| whether a linear model with a factor that varies as repeated measures are| taken (e.g., order or session) is identical to a repeated measures analyses.| I'll detail the question below.| | I have a within-subject study in which subjects used three methods to solve| six different tasks. The experiment is run in three sessions, each| consisting of two tasks. Three of the tasks are very different from the| other three tasks.| | For analysing this experiment, I plan to use a model like Y[ijkl]:= u+| subject[i]+ task[j]+ session[k]+ method[l]+ e[ijkl],| possibly adding interactions between task, method and session. Is this a| repeated measures analysis or equivalent to a repeated measures analysis?| | If not, how should I analyse these data using SAS's repeated measures| option?| | Kind regards,| Kasper Hornbæk/| kash(at)diku.dk| | | | | ===| This list is open to everyone. Occasionally, less thoughtful| people send inappropriate messages. Please DO NOT COMPLAIN TO| THE POSTMASTER about these messages because the postmaster has no| way of controlling them, and excessive complaints will result in| termination of the list.| | For information about this list, including information about the| problem of inappropriate messages and information about how to| unsubscribe, please see the web page at| http://jse.stat.ncsu.edu/| ===|
Fw: other uses for Minitab
Hi, Tim -- It's good to hear that some folks think it is useful to fit a least-squaresline through the origin. Of course it is even better to be able to "force"a least-squares model to have a wide range of properties (restrictions).Without any connection to statistics, students should be given the opportunityto use their algebra "savvy" to impose restrictions on math models. For example, Given a model of the form: Y = a0 + a1*X + a2*X^2 + E it might be of interest to "restrict" the model to: -- Pass through the originor-- Pass through X=1 and Y = 2or-- Slope = 0 at X=5 (For the calculus crowd) orMany others!---Using Algebra, Geometry and Trig. the "least-squares story" can be presented to students WITHOUT CALCULUS. Minimizing "distance" from a point to a line, or plane, or hyper-plane seems tobe more appealing than taking partial derivatives. Connecting "perpendicularity" to"orthogonality" seems to work well. -- Joe******** * Joe Ward Health Careers High School ** 167 East Arrowhead Dr 4646 Hamilton Wolfe ** San Antonio, TX 78228-2402 San Antonio, TX 78229 ** Phone: 210-433-6575 Phone: 210-617-5400 ** Fax: 210-433-2828 Fax: 210-617-5423 ** [EMAIL PROTECTED] ** http://www.ijoa.org/joeward/wardindex.html * ----- Original Message - From: Tim Erickson <[EMAIL PROTECTED]>To: Joe Ward <[EMAIL PROTECTED]>Sent: Sunday, March 05, 2000 3:28 PMSubject: Re: other uses for Minitab| on 00.03.03 10:51 PM, Joe Ward at [EMAIL PROTECTED] wrote:| | > A Bob, you remembered.| > | > I've been "bugging" the calculator makers for many years about including| > the least-squares model of the form:| > | > LinReg(bx), Letting the function pass through the origin.| | | just a note -- Fathom has a "lock Intercept at Zero" command for its least| squares regression, hich amounts to the same thing.| | I think it's also an interesting exercise for a (calculus?) student to| derive a formula for "b" given an arbitrary set of data and the constraint| that b must minimize the sum of squares of the residuals. At least it was| interesting to me!| | Tim| | | Earl Jennings Phone: (512) 345-0628 | 6917 Thorncliffe Dr. e-mail address: | Austin, TX 78731-2955 [EMAIL PROTECTED] |
Fw: other uses for Minitab
Hi, Tim -- It's good to hear that some folks think it is useful to fit a least-squaresline through the origin. Of course it is even better to be able to "force"a least-squares model to have a wide range of properties (restrictions).Without any connection to statistics, students should be given the opportunityto use their algebra "savvy" to impose restrictions on math models. For example, Given a model of the form: Y = a0 + a1*X + a2*X^2 + E it might be of interest to "restrict" the model to: -- Pass through the originor-- Pass through X=1 and Y = 2or-- Slope = 0 at X=5 (For the calculus crowd) orMany others!---Using Algebra, Geometry and Trig. the "least-squares story" can be presented to students WITHOUT CALCULUS. Minimizing "distance" from a point to a line, or plane, or hyper-plane seems tobe more appealing than taking partial derivatives. Connecting "perpendicularity" to"orthogonality" seems to work well. -- Joe******** * Joe Ward Health Careers High School ** 167 East Arrowhead Dr 4646 Hamilton Wolfe ** San Antonio, TX 78228-2402 San Antonio, TX 78229 ** Phone: 210-433-6575 Phone: 210-617-5400 ** Fax: 210-433-2828 Fax: 210-617-5423 ** [EMAIL PROTECTED] ** http://www.ijoa.org/joeward/wardindex.html * ----- Original Message - From: Tim Erickson <[EMAIL PROTECTED]>To: Joe Ward <[EMAIL PROTECTED]>Sent: Sunday, March 05, 2000 3:28 PMSubject: Re: other uses for Minitab| on 00.03.03 10:51 PM, Joe Ward at [EMAIL PROTECTED] wrote:| | > A Bob, you remembered.| > | > I've been "bugging" the calculator makers for many years about including| > the least-squares model of the form:| > | > LinReg(bx), Letting the function pass through the origin.| | | just a note -- Fathom has a "lock Intercept at Zero" command for its least| squares regression, hich amounts to the same thing.| | I think it's also an interesting exercise for a (calculus?) student to| derive a formula for "b" given an arbitrary set of data and the constraint| that b must minimize the sum of squares of the residuals. At least it was| interesting to me!| | Tim| | | Earl Jennings Phone: (512) 345-0628 | 6917 Thorncliffe Dr. e-mail address: | Austin, TX 78731-2955 [EMAIL PROTECTED] |
Re: Howto interpret interactions in an ANOVA
Hi all -- Again -- I'm jumping on the band wagon in support of these messages that advocate-- what I call -- a PREDICTION/REGRESSION/LINEAR MODELS approach. I was attracted to Lee Wilkinson and SYSTAT many years ago when Lee had a sign at one of his SYSTAT BOOTHS that said: "Ask me about Cell Means Analysis" (May not be Lee's exact words) I was so excited to see a software package that required the user to insert the word CONSTANT in the regression model when the user wanted it -- NOT AS THE DEFAULT. When using SAS at Clemson in 1985-86, I had to tell students that they must use the NOINT OPTION until I explained why. A most misunderstood and troublesome idea is the lack of understanding of the predictor, U, a vector of 1's. If students would -- in the beginning -- insert THEIR OWN U, when needed, then they might have a better understanding of the "efficiency" of having the CONSTANT or INTERCEPT as the DEFAULT. This lack of understanding about the CONSTANT or INTERCEPT is revealed by the many Email messages we see related to "What is RSQ WHEN there is NO CONSTANT or INTERCEPT". It is interesting that the more "modern" versions of SYSTAT require the user to REMOVE THE CONSTANT when appropriate. It would be really great if the statistics education folks would advocate the introduction of PREDICTION/REGRESSION/LINEAR MODELS early so that the students would have something useful in their experience and perhaps continue their study of statistics. I'm afraid that many FIRST STATISTICS COURSES have little "selling/marketing" effect on students. The "Cell-Means Approach" is easy to introduce to high school students, since these students have experiences with AVERAGES, MEANS, GPAs. And the "Missing Cells Problem?" is really not a problem until the students are told that some folks don't know what to do about "Missing Cells". Enough "preaching to the choir"!! --Joe * Joe Ward Health Careers High School ** 167 East Arrowhead Dr 4646 Hamilton Wolfe ** San Antonio, TX 78228-2402 San Antonio, TX 78229 ** Phone: 210-433-6575 Phone: 210-617-5400 ** Fax: 210-433-2828 Fax: 210-617-5423 ** [EMAIL PROTECTED] ** http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: Gregory C. Mayer <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, February 29, 2000 6:46 AM Subject: Re: Howto interpret interactions in an ANOVA | R.R. Sokal & F.J. Rohlf in Biometry (1995, Freeman) emphasize the unity of| anova, ancova and regression (and in their shorter Introduction to| Biostatistics, anova and regression). They introduce them in turn,| however; I agree that a text that began with glm and then took up anova,| ancova and regression as instances of the general approach would be| preferable. This is especially so when using Systat, as the model| statements closely parallel the models, allowing more complex| models to be grasped and implemented immediately, instead of being treated| as some new technique.| | Gregory C. Mayer| [EMAIL PROTECTED]| | | | | On Mon, 28 Feb 2000, Bob Madden wrote:| | > I agree. In fact, I have sought in vain for an introductory level statistics| > text that does not treat ANOVA and regression as two totally separate,| > disconnected techniques.| > With disconcerting monotony, they all monkey each other in this respect. I| > think students| > would be better served by being shown early on that regression, ANOVA, and for| > that| > matter, ANCOVA, are all special cases of the glm.| > | > --Bob Madden| > | > James Friedrich wrote:| > | > > Let me ad to the speculation regarding why interaction effects are often| > > omitted from multiple regression. I think the reality is that people are| > > generally trained in one "mode" or the other (ANOVA or Regression) without| > > a sense of their connectedness (a point already alluded to in previoous| > > posts). In an in-press national survey of undergraduate statistical| > > instruction for psychology majors, I found that ANOVA dominates, with| > > little attention to regression (except "simple"). The specialties of| > > those teaching the stats / methods courses tends to be in laboratory -| > > experimental areas where ANOVAs are the norm. The bottom line is that i| > > don't think budding psychologists, at least,
Re: Linear Regression with known intercept (Long Message)
Mark writes - - Original Message - From: <[EMAIL PROTECTED]>To: <[EMAIL PROTECTED]>Sent: Saturday, February 12, 2000 4:51 PMSubject: Linear Regression with known intercept| Hi,| | If I want to find the least squares estimator of the slope of a simple| linear regression model where my intercept is known, will this| estimator will be the same as if I did not know my intercept(=Sxy/sxx)?| How about the variance and the confidence interval of my estimator?| will they be bigger or smaller than the estimator for the case where| both my intercept and slope unknown?| | Thank you for your help.| | Mark| | | Sent via Deja.com http://www.deja.com/---Hi, Mark --Glad you sent this Email. It is a nice and simple example of the useof Prediction/Regression/Linear Models -- which should be one of theimportant objectives of a FIRST NON-CALCULUS-BASED STATISTICS COURSE.Consider, first, the Simple Regression Model:Y = a1*U + a2*X + E1where Y = a vector containing observations on a dependent or response variable.U = a predictor (vector) containing all 1's. (THE MOST NEGLECTED AND NON-UNDERSTOOD PREDICTOR OF ALL)X = another predictor with any elements -- could be BINARY (0,1).E1= the Error or Residual vector.a1 = least-squares regression coefficient of U (this is frequently referred to as the "Y-intercept").a2 = least-squares regression coefficient of X (this is frequently referred to as the "Slope".A powerful capability to give students who are comfortable withAlgebra is to be able to IMPOSE ANY DESIRED LINEAR RESTRICTIONSON A LINEAR MODEL OF THE FORM:Y = a1*X1 + a2*X2 + ... + ap *Xp + EThis capability is useful in many applications BESIDES STATISTICS.Now, to your neat example:"If I want to find the least squares estimator of the slope of a simplelinear regression model where my intercept is known, ... "You wish to impose the restriction that-a1 = k (a known value)Imposing that restriction on Model 1 above gives:Y = k*U + a2*X + E2The only unknown regression coefficient is a2 which I will rename as:Let b2 = a2 to remind us that the numerical value of the coefficient of Xin Model 1 is most likely different from the value in Model 2.Then, Y = k*U + b2*X + E2Since k*U is known, the least-squares value for b2 is obtained from:Y-k*U = b2*X + E2or letting Y-k*U be designated by a single symbol, WW = b2*X + E2and the least-squares value of b2 for Model 2 (and for any ONE-PREDICTOR model) is: b2 = (W'X)/(X'X) = Sum(wi*xi)/Sum (xi*xi) b2 is the "slope of the line which is "forced by the restriction" a1 = k Most software now allows one to find the value of b2 by forcing an option that requires that the vector U be omitted as a predictor. If you have good software available, the software will produce the standard errors of a1 and a2 by solving equation 1 and the standard error b2 by solving equation 2. ---Now, if it is "interesting" to TEST AN HYPOTHESIS THAT --a1 = kThen a statistic student may want to compute:F = (SSQE2 - SSQE1)/(2-1) --- (SSQE1)/(n-2)F = (SSQE2 - SSQE1)/1 --- (SSQE1)/(n-2)and since F(1,df2) = t^2(df2)t(df2) = sqrt(F(1,df2))This IS a "t-test".And, perhaps, from this value of "t" another statistics studentmight want to compute the Standard Error of a1, and then computea Confidence Interval.The astute student can compute the Standard Error from: t = Statistic/Standard Errorbut sine the numerical values of t and the "Statistic" are known we have:Standard Error = Statistic/t In this particular case, Standard Error = a1/tThis procedure allows for easy computation of the "StandardError" of any of the 'weights' (intercept or slope) in a regression model and in the more general case, any linearcombination of the weights in a multiple linear regression model. Sorry for the length of this message, but I couldn't resist promoting theuse of Prediction/Regression/Linear Models for ALL STUDENTS.--- Joe
Re: ANN vs. nonlinear regression: forecasting
John -- Sounds very interesting-- If you mean "classical" least-squares model, there are no assumptions involved in fitting least-squares. It's only the "statistics" assumptions that get added into the extra "assumptions". PREDICTION is the important thing. Compare the PREDICTIVE accuracy/costs/etc.of various approaches. You may wish to include RESAMPLING/BOOTSTRAP/CROSS-VALIDATION in your research. The proof of the "best" is how well it PREDICTS I will be interested in what you learn. -- Joe ******** * Joe Ward Health Careers High School ** 167 East Arrowhead Dr 4646 Hamilton Wolfe ** San Antonio, TX 78228-2402 San Antonio, TX 78229 ** Phone: 210-433-6575 Phone: 210-617-5400 ** Fax: 210-433-2828 Fax: 210-617-5423 ** [EMAIL PROTECTED] ** http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Friday, February 11, 2000 7:01 AM Subject: ANN vs. nonlinear regression: forecasting | I'm working on a study that compares neural networks to classical non-| linear statistical estimators in forecasting time series. My thesis is| that the NN would be robust under conditions where the assumptions of| the classical model are not met, and the nn would be inferior where the| classical assumptions are satisfied.| | What would be a good classical model to compare a neural network to?| Does anyone know of any papers/sources on this subject?| | I sincerely appreciate any help/suggestions.| | John Carrier| [EMAIL PROTECTED]| | | Sent via Deja.com http://www.deja.com/| Before you buy.| | | ===| This list is open to everyone. Occasionally, people lacking respect| for other members of the list send messages that are inappropriate| or unrelated to the list's discussion topics. Please just delete the| offensive email.| | For information concerning the list, please see the following web page:| http://jse.stat.ncsu.edu/| ===|
Re: adjusting marks; W. Edwards Deming
Robert Knodt writes in response to the message at http://www.remarq.com The Internet's Discussion Network (SEE BELOW) --- Re: adjusting marks; W. Edwards Deming It would be nice if those sending to the mailing list would clearly identify themselves. It would also be nice if they used an e-mail address so individuals might send them e-mail directly. Thanks, Dr. Robert C. Knodt 4949 Samish Way, #31 Bellingham, WA 98226 [EMAIL PROTECTED] End of Robert Knodt's message Beginning of Joe Ward's comment -- Good comment, Robert -- Perhaps the unidentified writer is a frustrated product of "Non-mastery" Spelling Education and is intentionally (or unintentionally) showing the results. See BOLD items below. -- Joe ******** * Joe Ward Health Careers High School ** 167 East Arrowhead Dr 4646 Hamilton Wolfe ** San Antonio, TX 78228-2402 San Antonio, TX 78229 ** Phone: 210-433-6575 Phone: 210-617-5400 ** Fax: 210-433-2828 Fax: 210-617-5423 ** [EMAIL PROTECTED] ** http://www.ijoa.org/joeward/wardindex.html * - End of Joe Ward's comment -- - Original Message - From: Consultantssuck <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Monday, February 07, 2000 5:12 PM Subject: Re: adjusting marks; W. Edwards Deming | Dr. Deming Naive? You, sir, are misguided and unfortunately,| misinformed of the genius of the master Dr. Shewhart, and his| disiple and messenger to the latter half of the 20th century,| Dr. Deming.| | Humans want to do a good job. Dr. Deming was pellucid on this| point. People and school fit nicely into this axiom.| | what you fail to understand is the profound knowledge of| thinking preparing, and continual improvement. Grading is nice,| succinct, and above all, usually useless in its existing| design. Does grading permit our student to readdress problem or| slow areas? In many cases grading only shows how well you did,| based on varying factors-The next test, completely different.| | we have all seen studies where the pretty girl is awarded better| grades for the same caliber of work as others. we have all| seen reports where teachers are wrong in their suppositions,| then corrected or challenged by students, ultimately leading| these educators to hold a grudge for "attitude and behavior"| when report card time recurs.| | Do you want to know why the AFT and the NEA are against teaching| LOGIC in elementary schools (Logic being the foundation for all| higher math applications)?| | Could it be because some protege will learn to ask the harder| questions? Possibly Some "smart alec" will not accept our| educator's "Because I told you it did."| | A recent report found Elementary educators, when pressed for| answers they did not know, simply "winged it." This sophristry| unfortunately happens when our educators are not versed in the| sciences, history or math, and they wish to appear (to| themselves and) to their students, smart.| | People want to do a good job. Grading allows teachers to make| decisions in our children's early years based on mostly the| faliable educator's emotions toward that one particular budding| mind. Grading should be benchmarks for ever improvement based| on practice, practice practice of the fundementals. Then of| course moving foward with a keen sence of where the student is| going. Any good music teacher will tell you the ones who| practice the fundemental scales, dilegently, go on to master the| difficult pieces.| | Read the book OUT OF CRISES again, and again. I assure you, you| will soon "get it."| | | | * Sent from RemarQ http://www.remarq.com The Internet's Discussion Network *| The fastest and easiest way to search and participate in Usenet - Free!| | | | ===| This list is open to everyone. Occasionally, people lacking respect| for other members of the list send messages that are inappropriate| or unrelated to the list's discussion topics. Please just delete the| offensive email.| | For information concerning the list, please see the following web page:| http://jse.stat.ncsu.edu/| ===|
Re: Looking for text on resampling...
Scott -- Peter Bruce is the contact!!! -- Joe - Original Message - From: <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Sunday, February 06, 2000 12:08 PM Subject: Looking for text on resampling... | Our small college library has a collection of basic biostats texts but | nothing that specifically covers the area of resampling. I am currently | looking over a 1991 text by Bryan Manly (Randomization and Monte Carlo | Methods in Biology) - the first two chapters seem quite accessible (to | someone unfamiliar with the field!) | | Could anyone suggest other texts that might cover bootstrapping and | jacknife techniques - I would favour texts that have a biology bent and | are written so non-specialists can follow... | | Many thanks! | | Scott | | | Sent via Deja.com http://www.deja.com/ | Before you buy. | | | === | This list is open to everyone. Occasionally, people lacking respect | for other members of the list send messages that are inappropriate | or unrelated to the list's discussion topics. Please just delete the | offensive email. | | For information concerning the list, please see the following web page: | http://jse.stat.ncsu.edu/ | === | === This list is open to everyone. Occasionally, people lacking respect for other members of the list send messages that are inappropriate or unrelated to the list's discussion topics. Please just delete the offensive email. For information concerning the list, please see the following web page: http://jse.stat.ncsu.edu/ ===
Re: Course Curriculum
Perhaps in the short time you have, it may be appropriate to give your students the power to ask statisticians the appropriate research questions "in natural language". In this regard your medical residents should expect their support statisticians to use the combined POWER OF COMPUTERS and GENERAL LINEAR MODELS/REGRESSION (and other computer aided techniques; e.g. Resampling, Bootstrap, Simulation, to answer useful, non-trivial research questions. Your residents may not need to know how to do it themselves, but it would be great if they can communicate with those who can help. -- Joe ******** * Joe Ward Health Careers High School ** 167 East Arrowhead Dr 4646 Hamilton Wolfe ** San Antonio, TX 78228-2402 San Antonio, TX 78229 ** Phone: 210-433-6575 Phone: 210-617-5400 ** Fax: 210-433-2828 Fax: 210-617-5423 ** [EMAIL PROTECTED] ** http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: SAlbert <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, February 03, 2000 10:14 PM Subject: Re: Course Curriculum | >I'm organizing a BASIC research methods/statistical analysis course for| >medical residents. The course will be held over multiple sessions for a| >total of about 15-20 hours.| >| >Any suggestions for textbooks, course materials, format for conducting| >the course, etc?| >| >Thanks!| >SR Millis| >-- | | Take a look at Harvey Motulsky's book "Intuitive Biostatistics." It's| readable, has good examples, and not so technical as to throw people off. | While not perfect, it's the best I've seen of its kind. (I understand Dr.| Motulsky is doing a revision, but I don't know when that might come out. The| book is published by Oxford University Press, if I remember right.)| | Steve Albert| | | | ===| This list is open to everyone. Occasionally, people lacking respect| for other members of the list send messages that are inappropriate| or unrelated to the list's discussion topics. Please just delete the| offensive email.| | For information concerning the list, please see the following web page:| http://jse.stat.ncsu.edu/| ===|
Fw: CORRECTION TO EARLIER MESSAGE-Correlation - Constraints on Variables
My Apologies -- "Haste makes waste!" Notice the serious errors in the previous version!!! The model (1) below should have read: (1)Y = a1*U + a2*X + E and not (1)Y = a1*U + a1*X + E and where there were statements about the hypothesis "a1=0" it should read "a2=0" :-( -- Joe - Original Message - From: Joe Ward <[EMAIL PROTECTED]> To: bkamen <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Cc: APSTAT-L <[EMAIL PROTECTED]> Sent: Tuesday, January 04, 2000 3:08 PM Subject: Re: Correlation - Constraints on Variables In the beginning all information is BINARY/CATEGORICAL (not DUMMY). I refer to models of the very general form: Y = a1*X1 + a2*X2 + a3*X3 + ... + ap*Xp + E as Prediction/Regression/Linear Models. The predictors X1, X2, X3, ...,Xp can be defined in many ways. a1, a2, a3,...,ap are usually least-squares coefficients that MINIMIZE THE SUM OF SQUARES OF THE ELEMENTS OF THE "Error" 'E'. If the model is of the form: (1)Y = a1*U + a1*X + E CORRECTION: SHOULD HAVE READ (1)Y = a1*U + a2*X + E where Y = a dependent variable, usually "continuous" (Mile run time, Blood pressure) U = a predictor with every element equal 1 X = a continuous variable, e.g. Age, Height, Weight, Test Score E = "error" or sometime called "residual" then the model is sometimes called "simple regression". In this form, a test of the Hypothesis a2=0 is sometimes called a test of "ZERO CORRELATION" or "SLOPE = 0". Now consider Model 1 as above: (1) Y = a1*U + a1*X + E CORRECTION: THIS SHOULD HAVE READ (1)Y = a1*U + a2*X + E and we let Y = a dependent variable, usually "continuous" (Mile run time, Blood pressure) U = a predictor with every element equal 1 (as above) but X = 1 if the Y observation is from a Male; 0 if the Y observation is from a Female. In this model, a test of the Hypothesis a2=0 is sometime called a test of the hypothesis that the Expected Value of Y (Mean) for Males = Expected Value of Y (Mean) for Females or a "t-test for the difference between two means". Other special forms of the GENERAL MODEL are called different names, such as One-way Analysis of Variance (ANOVA), Analysis of Covariance, Two-way Analysis of Variance, etc. Before we acquired high-speed computers, we needed special easy-to-calculate computational procedures. WE SHOULD NOT BE CONSTRAINED NOW THAT WE HAVE THE COMPUTER POWER. Many seemingly-different algorithms of statistics can be accomplished under ONE GENERAL FORM. But of most importance, the ONE GENERAL FORM can be used to create models that fit unique research questions. The items contained in the URL shown below are related to your question. If you would like to see some detailed examples, you may want to look in a university library at: Introduction to Linear Models by Ward & Jennings,Prentice-Hall, 1973. Copies of this book are available from the Institute for Job and Occupation Analysis: Jimmy L. Mitchell, Ph.D., Director [EMAIL PROTECTED] 10010 San Pedro, Suite 440, San Antonio, Texas 78216 (210) 349-8525 Fax: (210) 349-0168 --- Joe * Joe Ward Health Careers High School * * 167 East Arrowhead Dr 4646 Hamilton Wolfe* * San Antonio, TX 78228-2402San Antonio, TX 78229 * * Phone: 210-433-6575 Phone: 210-617-5400* * Fax: 210-433-2828 Fax: 210-617-5423 * * [EMAIL PROTECTED]* * http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: bkamen <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Sunday, January 02, 2000 12:17 PM Subject: Correlation - Constraints on Variables | This is a multi-part message in MIME format. | | --=_NextPart_000_0054_01BF551B.4D1409A0 | Content-Type: text/plain; | charset="iso-8859-1" | Content-Transfer-Encoding: quoted-printable | | This practical question arose between myself and a colleague at work. = | It concerns whether we can use correlation analysis if one of the = | variables is non-continuous or "categorical." She believes that both = | variables must be continuous. However she cannot say why, and I cannot = | find any such constraint in the statistics book I have relied on since = | graduating in Industrial Engineering a few years ago, Miller and Freund, = | 'Probability and Statistics for Engineers.' =20 | | I have been thinking that if x is discrete and can assume only a few = | values compared with y which is continuous, the correlation study may =
Re: Correlation - Constraints on Variables
In the beginning all information is BINARY/CATEGORICAL (not DUMMY). I refer to models of the very general form: Y = a1*X1 + a2*X2 + a3*X3 + ... + ap*Xp + E as Prediction/Regression/Linear Models. The predictors X1, X2, X3, ...,Xp can be defined in many ways. a1, a2, a3,...,ap are usually least-squares coefficients that MINIMIZE THE SUM OF SQUARES OF THE ELEMENTS OF THE "Error" 'E'. If the model is of the form: (1)Y = a1*U + a1*X + E where Y = a dependent variable, usually "continuous" (Mile run time, Blood pressure) U = a predictor with every element equal 1 X = a continuous variable, e.g. Age, Height, Weight, Test Score E = "error" or sometime called "residual" then the model is sometimes called "simple regression". In this form, a test of the Hypothesis a1=0 is sometimes called a test of "ZERO CORRELATION" or "SLOPE = 0". Now consider Model 1 as above: (1) Y = a1*U + a1*X + E and we let Y = a dependent variable, usually "continuous" (Mile run time, Blood pressure) U = a predictor with every element equal 1 (as above) but X = 1 if the Y observation is from a Male; 0 if the Y observation is from a Female. In this model, a test of the Hypothesis a1=0 is sometime called a test of the hypothesis that the Expected Value of Y (Mean) for Males = Expected Value of Y (Mean) for Females or a "t-test for the difference between two means". Other special forms of the GENERAL MODEL are called different names, such as One-way Analysis of Variance (ANOVA), Analysis of Covariance, Two-way Analysis of Variance, etc. Before we acquired high-speed computers, we needed special easy-to-calculate computational procedures. We SHOULD NOT BE CONSTRAINED NOW THAT WE HAVE THE COMPUTER POWER. Many seemingly-different algorithms of statistics can be accomplished under ONE GENERAL FORM. But of most importance, the ONE GENERAL FORM can be used to create models that fit unique research questions. The items contained in the URL shown below are related to your question. If you would like to see some detailed examples, you may want to look in a university library at: Introduction to Linear Models by Ward & Jennings,Prentice-Hall, 1973. Copies of this book are available from the Institute for Job and Occupation Analysis: Jimmy L. Mitchell, Ph.D., Director [EMAIL PROTECTED] 10010 San Pedro, Suite 440, San Antonio, Texas 78216 (210) 349-8525 Fax: (210) 349-0168 --- Joe * Joe Ward Health Careers High School * * 167 East Arrowhead Dr 4646 Hamilton Wolfe* * San Antonio, TX 78228-2402San Antonio, TX 78229 * * Phone: 210-433-6575 Phone: 210-617-5400* * Fax: 210-433-2828 Fax: 210-617-5423 * * [EMAIL PROTECTED]* * http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: bkamen <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Sunday, January 02, 2000 12:17 PM Subject: Correlation - Constraints on Variables | This is a multi-part message in MIME format. | | --=_NextPart_000_0054_01BF551B.4D1409A0 | Content-Type: text/plain; | charset="iso-8859-1" | Content-Transfer-Encoding: quoted-printable | | This practical question arose between myself and a colleague at work. = | It concerns whether we can use correlation analysis if one of the = | variables is non-continuous or "categorical." She believes that both = | variables must be continuous. However she cannot say why, and I cannot = | find any such constraint in the statistics book I have relied on since = | graduating in Industrial Engineering a few years ago, Miller and Freund, = | 'Probability and Statistics for Engineers.' =20 | | I have been thinking that if x is discrete and can assume only a few = | values compared with y which is continuous, the correlation study may = | yield a high probability of type-one error. I interpret this as = | providing insufficient evidence with which to reject the null = | hypothesis. But I have not thought of this as an inappropriate use of = | correlation. =20 | | On the other hand in attempting to probe Miller and Freund I find that = | correlation is based on the "bivariate normal distribution," the = | formula for which has numerous parameters including alpha and beta, the = | least squares regression coefficients. I am aware that to obtain the = | latter requires that the function be differentiable, hence x must also = | be continuous. This seems to support my friend's view. | | I would appreciate clarification of any such con
Re: Factor analysis
Haider -- You may want to consider another approach: 1. Use "Policy Capturing", "Judgment Analysis (JAN)", "Policy Specifying" or any of your favorite Multi-Attribute Decision Model approaches to obtain ONE function of your THREE DEPENDENT VARIABLES. IMHO, Only human(s) should make judgments about how to combine multiple dependent variables. After that, you now have Y = function of (your THREE DEPENDENT VARIABLES) 2. Then you use your favorite regression program to predict Y = function of (your PREDICTOR VARIABLES) This approach is not involved with factor analysis interpretation. However, if you want to do a factor analysis on the PREDICTORS, then you can USE THE FACTOR SCORES AS PREDICTORS. The disadvantage of using factor scores is that you still have to use ALL OF THE PREDICTOR VARIABLES. So if you would like to reduce the number of predictors, then you should NOT use factor scores but use regression models. -- Joe ******** * Joe Ward Health Careers High School ** 167 East Arrowhead Dr 4646 Hamilton Wolfe ** San Antonio, TX 78228-2402 San Antonio, TX 78229 ** Phone: 210-433-6575 Phone: 210-617-5400 ** Fax: 210-433-2828 Fax: 210-617-5423 ** [EMAIL PROTECTED] ** http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: Haider Al-Katem <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Saturday, December 18, 1999 4:00 AM Subject: Factor analysis | Hi,| | I have conducted a factor analysis on some questionnaire items. The| dependent variables that I am measuring for example ('Intention To Buy',| 'Attitude towards a product' and 'Trust in buying the product from a| merchant' ) seem to load significantly high on two factors which leaves me| with a NOT SIMPLE FACTOR STRUCTURE.| | I am assuming that since 'Intention To Buy', 'Attitude towards a product'| and 'Trust in buying the product from a merchant' all seem to be some type| of an ATTITUDE , the significantly high factor loadings on the two factors| may be justifiable.| | My questions are:| | 1. Are my above interpretations of the result correct?| | 2. If not, is there a statistical method that can help me overcome this| 'non-simple factor structure'?| | Thanks.| | |
Re: grading on the curve
Herman -- I liked your last sentence indicating that MASTERY IS IMPORTANT!! " I do not use a linear grading method; fortunately, early in my teaching, I had a student put it all together on the final." ^ Joe ******** * Joe Ward Health Careers High School * * 167 East Arrowhead Dr 4646 Hamilton Wolfe * * San Antonio, TX 78228-2402San Antonio, TX 78229 * * Phone: 210-433-6575 Phone: 210-617-5400 * * Fax: 210-433-2828Fax: 210-617-5423 * * [EMAIL PROTECTED] * * http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: Herman Rubin <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, December 23, 1999 7:23 AM Subject: Re: grading on the curve | In article <[EMAIL PROTECTED]>, | dennis roberts <[EMAIL PROTECTED]> wrote: | >this discussion is interesting ... | | >there seems to be TWO general kinds of "grading" on the curve ... it would | >be interesting to try to "estimate" how frequently each happens ... | | >1. LOWERing cutoffs ... thus, INcreasing the #s of those getting various | >higher grades | | >2. making cutoffs such that the distribution of GRADES resembles a normal | >distribution | | >i assume that #1 occurs much more frequently and, from my perspective, | >there is NO good rationale for doing #2 ... unless one assumes that ability | >within a class is normally distributed AND ... and far more crucial ... | >that achievement SHOULD resemble the distribution of ability ... | | Something like #2 occurs far too often. But either one of these | defeats the value of a grade in indicating anything about what | the student has accomplished. | | NOTHING is normally distributed, so grades should not be. | | Also, classes are not equal; even different sections of the same | course in the same term are not equal. Trying a different approach | to teaching may well change the distribution of the amount of | knowledge, and thus should change the distribution of grades. | | Only absolute grading is a meaningful assessment of what the | student has accomplished. Relative grading almost forces | levels to go down. The American undergraduate grades in the | strong mathematics courses preparing for graduate work are | essentially meaningless at this time. | | >in any case ... instructors are suppose to give students some reasonable | >description of the grading system used ... at the BEginning of a course ... | >which i assume would include some facimile of a grading scale ... or what | >one has to do to earn certain grades ... and in this context, i would think | >that anyone who might 'consider" RAISING cutoffs so that FEWER students get | >higher grades ... would be challenged from students .. as this appears to | >border on unethical practice ... | | One is not required to go that far. Saying that you will give | your best assessment of what the student knows and can do, based | on scores given on various items, meets the legal requirements. | I do not use a linear grading method; fortunately, early in my | teaching, I had a student put it all together on the final. | | | >At 02:32 PM 12/22/99 -0500, [EMAIL PROTECTED] wrote: | >> I never, as a teacher, used any curving | >>procedure to lower students grades! | -- | This address is for information only. I do not claim that these views | are those of the Statistics Department or of Purdue University. | Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 | [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 |
Re: teaching statistical methods by rules?
Yep!! As you say: "Why are people so obsessed with T and Z? " Perhaps it would be even better (easier?) to focus on F since F(df1,df2) = t^2(df2) (Reminder: when using a t-table, the p-values usually involve ONE-TAIL and when using the F-table, the p-values involve TWO-TAILS ) Example: The critical-value of t for probability of p = .05 at t(18) = 1.734 The critical-value of F for probability of p = .10 at F(1,18) = (1.734)^2 = 3.01 :-) -- Joe ******** * Joe Ward Health Careers High School * * 167 East Arrowhead Dr 4646 Hamilton Wolfe* * San Antonio, TX 78228-2402San Antonio, TX 78229 * * Phone: 210-433-6575 Phone: 210-617-5400* * Fax: 210-433-2828 Fax: 210-617-5423 * * [EMAIL PROTECTED]* * http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Sunday, December 19, 1999 4:44 PM Subject: Re: teaching statistical methods by rules? | In article <[EMAIL PROTECTED]>, | [EMAIL PROTECTED] says... | > | > | > | >On the other hand, a body of knowledge can be thought of as a set of | >'rules'. The important thing is that this set is constructed by the | >individual, so our aim should not be to teach statistics as a set of | >rules, but in such a way that each student can develop his or her own | >set of rules. They won't be the same for all, and they will different | >from the teacher's, but they hopefully will work. (If you like, this is | >a defintion of a 'good student' - one who manages to construct a | >successful set of rules for each subject. | | | It's either undergraduate students in Australia are much smarter than those | living in the United States or you live on a different planet. The last time I | taught an undergraduate introductory statistics class, some students couldn't | even do fractions and simple algebra. Can you expect them to develop their own | rules? | | Why are people so obsessed with T and Z? When the degrees of freedom exceeds | say 30, the difference between T and Z is practically negligible. You can use T | or Z in such a case. However, the P-value from Z is easier to compute. | | -- | Tjen-Sien Lim | [EMAIL PROTECTED] | www.Recursive-Partitioning.com | | Get your free Web-based email! http://recursive-partitioning.zzn.com | |
Re: Prediction Model Question
- Original Message - From: Burke Johnson <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Thursday, December 16, 1999 9:13 AM Subject: Prediction Model Question | Hi, | | A student of mine is getting ready to develop a GLM prediction model that will |include a mixture of categorical and quantitative predictor variables. We will |probably not include interaction terms in the model (i.e., it will be a main effects |only model). | | Here's my question: Do you suggest using dummy coding (0,1) or effects coding |(1,0,-1) for the categorical variables included in the model? | | The reason I'm asking is because dummy coding does not always give the same result |for a factorial design as does ANOVA and effects coding, and, hence, Pedhazur |recommends using effects coding rather than dummy coding in the factorial case. Do |you know if the choice of dummy or effects coding matters for a main effects only |model with multiple categorical and quantitatively scaled predictor variables? | | Thanks in advance, | Burke Johnson | -- Hi, Burke -- First, I use the words BINARY (or INDICATOR) predictors -- and NOT "DUMMY" predictors. In the beginning ALL PREDICTOR INFORMATION IS BINARY! It is unfortunate that the word DUMMY has became popular. Students might get the idea that there is something wrong with using DUMMIES!! I think that the BINARIES are really the most BRILLIANT!! Now to your concern -- Your last paragraph "The reason I'm asking is because dummy coding does not always give the same result for a factorial design as does ANOVA and effects coding, and, hence, Pedhazur recommends using effects coding rather than dummy coding in the factorial case. Do you know if the choice of dummy or effects coding matters for a main effects only model with multiple categorical and quantitatively scaled predictor variables?" is a very good example of the situation that arises in the use of "packaged" algorithms. The user of the "package" may have no idea what questions are being answered by the "package". I always suggest that researchers create their own models! That is the only SAFE WAY! If a "packaged" procedure is verified to produce the results desired by the researcher then it certainly should be used. The researcher should: 1. State their research questions in "natural language" -- avoid terms such as "MAIN EFFECTS" and "EFFECTS CODING" since those expressions may mean different things to different people. In some instances the user of those terms may not know what is meant when they utter the statement. Ask someone what they mean if they utter something about MAIN EFFECTS in a 3-factor ANOVA with unequal numbers of observations in the cells. 2. Create an ASSUMED MODEL that allows the researcher to investigate their research questions of interest. 3. Impose resrictions on the parameters of ASSUMED MODEL that are implied by the research questions of interest. This results in a RESTRICTED MODEL. 4. Compare the Error Sum of Squares between the ASSUMED and RESTRICTED MODELS using an F-test and obtain confidence intervals if appropriate. I assume there must be a reason for assuming that there is NO INTERACTION among the predictors. Many researchers would test for NO INTERACTION first. Then, if appropriate, switch to the NO INTERACTION MODEL. I would be interested in seeing the models that your student develops to investigate his/her OWN QUESTIONS OF INTEREST!! :-) -- Joe ** * Joe Ward Health Careers High School * 167 East Arrowhead Dr 4646 Hamilton Wolfe * San Antonio, TX 78228-2402 San Antonio, TX 78229 * Phone: 210-433-6575 Phone: 210-617-5400 * Fax: 210-433-2828 Fax: 210-617-5423 * [EMAIL PROTECTED] * http://www.ijoa.org/joeward/wardindex.html
Re: could someone help me with this intro to stat. problem
- Original Message - From: Donald F. Burrill <[EMAIL PROTECTED]> To: Mike Wogan <[EMAIL PROTECTED]> Cc: Luv 2 muah 143 <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Wednesday, December 08, 1999 12:41 PM Subject: Re: could someone help me with this intro to stat. problem | On Wed, 8 Dec 1999, Mike Wogan wrote, in response to Luv 2 muah 143's | question: | | > > 5 of 10 volunteers are randomly selected to receive self-defense | > > training. The other 5 receive no training. At the end of the | > > training period, all subjects complete a self-confidence | > > questionnaire. | | > > a.) Is there a difference in self-confidence between the 2 groups | > > (p<.01)? | | > > b.) What are the effects of self-defense traing on self-confidence | > > (I'm assuming a two-tailed test?). Explain analysis | | > Without a pre-test measure of self-confidence, taken prior to the | > training, even if there is a significant difference post-training, it's | > not possible to tell whether the difference is the result of the | > training or was there to begin with. | | Oh, come on, Mike. What did you think "randomly selected" was | in there for? (Or were you trying to confuse the querent because he | had the effrontery to ask a homework (or perhaps exam) question of this | list?) | | > If there is a pre-post measurement of self-confidence, then you need a | > mixed model Anova, with Training vs. No Training as the between groups | > factor and Pre-Post as the within groups factor. | | This sure must sound scary to someone who's having trouble with | the first semester of an elementary stats course! | -- DFB. | | Donald F. Burrill [EMAIL PROTECTED] | 348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED] | MSC #29, Plymouth, NH 03264 603-535-2597 | 184 Nashua Road, Bedford, NH 03110 603-471-7128 | | -- Joe Ward writes -- Hi Don, et al -- While it seems that the question is stimulated from a student's assignment, it seems to me that students should be given the "power they deserve" to do something useful when they complete their course of instruction. You indicated that-- "This sure must sound scary to someone who's having trouble with the first semester of an elementary stats course!" IT SHOULD NOT BE SCARY. If students can't "control for the uncontrollable" such as "PRE-TEST", or GENDER, etc. then they are not being given what they deserve in A NON-CALCULUS ELEMENTARY STATS COURSE. I realize that I am an "outlier" in what I believe to be a lack of SALESMANSHIP about the power that statistics can give students -- before they are "turned off". But talented high school students can do it -- so why not college students? But I get more cynical in my old age! -- Joe * Joe Ward Health Careers High School 167 East Arrowhead Dr 4646 Hamilton Wolfe San Antonio, TX 78228-2402 San Antonio, TX 78229 Phone: 210-433-6575 Phone: 210-617-5400 Fax: 210-433-2828 Fax: 210-617-5423 [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *
Re: Coefficient of Determination Question
Hi, GM -- We always have trouble trying to give "names" to things. Usually we increase misunderstanding as we give ambiguous names to things. For example, how many folks know what is meant when they hear someone say "In a 3-factor ANOVA (A,B,C) there is a "significant 'A' MAIN EFFECT." The "someone" should just say what they really mean -- if they know! r^2 should have been "unnamed" since it's as easy to say "r square" as it is to say "coefficient of determination". However, if someone insists on giving a name to (1-r^2) then why not call it the "coefficient of non-determination". But "one minus r square" is about as easy to say as "coefficient of non-determination". -- Joe * Joe Ward Health Careers High School 167 East Arrowhead Dr 4646 Hamilton Wolfe San Antonio, TX 78228-2402 San Antonio, TX 78229 Phone: 210-433-6575 Phone: 210-617-5400 Fax: 210-433-2828 Fax: 210-617-5423 [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: Gaurang Mehta <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, December 08, 1999 10:15 AM Subject: Coefficient of Determination Question | I am looking for the coefficient name for (1-r^2). I know r^2 is the | Coefficient of Determination, but I do not know the name of the (1-r^2) | coefficient. | | Any assistance would be greatly appreciated. | | Thanks in advance | | GM | | |
Re: could someone help me with this intro to stat. problem
Mike Wogan writes -- - Original Message - From: Mike Wogan <[EMAIL PROTECTED]> To: Luv 2 muah 143 <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Wednesday, December 08, 1999 11:16 AM Subject: Re: could someone help me with this intro to stat. problem | On 8 Dec 1999, Luv 2 muah 143 wrote: | | > 5 of 10 volunteers are randomly selected to receive self-defense training. The | > other 5 receive no training. At the end of the training period, all subjects | > complete a self-confidence questionnaire. | > | > a.) Is there a difference in self-confidence between the 2 groups (p<.01)? | > | > | > b.) What are the effects of self-defense traing on self-confidence (I'm | > assuming a two-tailed test?). Explain analysis | > | > Please help, I can't figure it out...my mind has gone blank | | Without a pre-test measure of self-confidence, taken prior to the | training, even if there is a significant difference post-training, it's | not possible to tell whether the difference is the result of the training | or was there to begin with. | | If there is a pre-post measurement of self-confidence, then you need a | mixed model Anova, with Training vs. No Training as the between groups | factor and Pre-Post as the within groups factor. | | Mike | -- End of Mike's message -- Great suggestion, Mike -- " Without a pre-test measure of self-confidence, taken prior to the training, even if there is a significant difference post-training, it's not possible to tell whether the difference is the result of the training or was there to begin with. " The question "IN NATURAL LANGUAGE" might be stated slightly differently as: (1) For subjects who have the SAME PRE-TEST MEASURE OF SELF-CONFIDENCE but have DIFFERENT TRAINING (i.e., TRAINING vs NO-TRAINING) is their a DIFFERENCE IN THE EXPECTED POST-TEST MEASURE OF SELF-CONFIDENCE. or perhaps (2) If their is a difference between the two groups, is the difference the SAME FOR ALL VALUES OF THE PRE-TEST MEASURE OF SELF-CONFIDENCE? In these "NATURAL LANGUAGE FORMS" of the research questions the researcher should be able to write an ASSUMED MODEL that allows for the expression of the hypotheses of interest in terms OF PARAMETERS OF THE ASSUMED MODEL. Then the restrictions implied by the questions of interest can be imposed on the ASSUMED MODEL to obtain a RESTRICTED MODEL to test the hypotheses. AND NOW FOR MY "STANDARD SERMON"! The approach described as: " If there is a pre-post measurement of self-confidence, then you need a mixed model Anova, with Training vs. No Training as the between groups factor and Pre-Post as the within groups factor." DOES NOT COMMUNICATE clearly how to proceed. The reader has to learn the meaning of: "mixed model Anova" "between groups factor" "Pre-Post as within groups factor." or be able to locate a "packaged" algorithm that sounds similar to: Mixed model Anova, with Training vs. No Training as the between groups factor and Pre-Post as the within groups factor." Another "advisor" might suggest: "Do an Analysis of Covariance, with the Pre-Test Measure of Self-Confidence as the Covariable". As before, the researcher must know the meaning of the advice or locate a "package" that is labeled as "COVARIANCE ANALYSIS". This second approach is dangerous since many "packaged" COVARIANCE ANALYSIS" algorithms my not allow the researcher to answer the questions of interest, e.g. question #2 above. And even if such "packages" are located the researcher may not be able to verify that the answers produced by the "package" are related to the natural language questions of interest. In summary, statistics instruction should give students (researchers) the power to: 1.State their research questions in NATURAL LANGUAGE so that normal humans can understand. 2.Create models that allow the researcher to express hypotheses of interest. 3.Translate NATURAL LANGUAGE questions into RESTRICTIONS on parameters of the ASSUMED MODEL. 4.Impose the RESTRICTIONS to obtain a RESTRICTED MODEL. 5. Verify that the RESTRICTED MODEL has the RESTRICTIONS IMPLIED BY THE QUESTIONS OF INTEREST. 5.Use information from the ASSUMED and RESTRICTED MODELS to HELP make decisions about the questions of interest. Hopefully, (Luv 2 muah 143) is being provided the opportunity to do the above!! Reasonably talented high school students should be given to power to do this. :-) --- Joe * Joe Ward Health Careers High School 167 East Arrowhead Dr 4646 Hamilton Wolfe San Antonio, TX 78228-2402 San Antonio, TX 78229 Phone: 210-433-6575Phone: 210-617-5400 Fax: 210-433-2828 Fax: 210-617-5423 [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *
Re: ancova
DENNIS ROBERTS WRITES - - Original Message - From: dennis roberts <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Sunday, December 05, 1999 7:24 AM Subject: ancova | some time ago, i sent out a note about a handout i had re: ancova. now, in | that handout, i illustrated a very simple case of how ancova might account | for some of the within groups 'error'. in that handout, i showed, near the | end ... some minitab output for the analysis. now, in that output ... the | adjusted SS adds up to MORE than what the simple anova adds too. NOTE: the | dependent measure in the Exp and Cont group example was performance on a | test .. and the covariate was IQ. | | the one way shows: | | One-way Analysis of Variance | | Analysis of Variance | Source DFSSMSFP | Factor 1 252 252 1.540.231 | Error 18 2949 164 | Total 19 3201 | | and the ancova shows: | | Analysis of Variance for TOTY, using Adjusted SS for Tests | | Source DF Seq SS Adj SS Adj MS F P | TOTIQ 1 1539.9 2057.9 2057.9 39.26 0.000 | Group 1 770.0 770.0 770.0 14.69 0.001 | Error 17 891.0 891.0 52.4 | Total 19 3200.9 | | in the handout, i showed that the adjusted SS(TOT) equals the sum of the | 770 and 891 values for Group and Error in the Adj SS columns ... but where | does the 2057 come from and, when you add to the 770 and 891 values .. you | get a much larger value than the original 3201? | | what would be the simplest way to discuss this with students? in what way | could you use the original data on the dependent measure ... and show how | this new SS(TOT) value could be obtained? | | thanks | -- | 208 Cedar Bldg., University Park, PA 16802 | AC 814-863-2401Email mailto:[EMAIL PROTECTED] | WWW: http://roberts.ed.psu.edu/users/droberts/drober~1.htm | FAX: AC 814-863-1002 JOE WARD COMMENTS --- Hi, Dennis -- You probably can predict my comments!! It is very difficult to try to explain the computer outputs without knowing (or guessing) what hypotheses are being "tested". A continuing situation that is observed on various Email lists involves interpretations of computer outputs without concern for understanding what questions are being "answered" by the computer package. In some situations -- particularly the "case of the missing cell" -- the answers might be for "uninteresting questiong". The communications provided by the internet continue to reveal the short-comings of statistics education. Until we change our statistics education these problems will not go away. Without going into details of exactly how to proceed your students should: 1. State their hypotheses of interest in "natural language". 2. Create an ASSUMED MODEL that allows them to express their "natural language" hypotheses in terms of parameters in their ASSUMED MODEL. 3. Impose the restrictions on the ASSUMED MODEL to obtain a RESTRICTED MODEL. 4. Compare the Error Sum of Squares from the ASSUMED MODEL with the Error Sum of Squares from the RESTRICTED MODEL using an F statistic. In the 1960's, when we presented short courses on Prediction/Regression/Linear Models at the American Educational Research Association (AERA) it was indeed rare to find anyone who knew the meaning of the MAIN EFFECTS HYPOTHESES (ROW and COLUMN MAIN EFFECTS) in a TWO-FACTOR ANOVA. Of course, everyone knows it these days --I hope. After your students have become acquainted with the PREDICTION/REGRESSION/LINEAR MODELS approach, then it is fun to ask them to do some DETECTIVE WORK to indicate the hypotheses that are being tested in the computer output that you show in your example -- and for more complicated computer outputs. The "homework" or "exam" assignment might be as follows": Indicate the hypotheses that are being "tested" in the computer outputs shown below. 1.Explain in as much detail as you can, including a "natural language" statement and/or in terms of ASSUMED and RESTRICTED MODELS. 2. How do you use the ANCOVA output to test for (NO)INTERACTION between IQ and GROUP? Is it possible? If not, why not? 3. Create a model that will allow you to test for (NO) INTERACTION between IQ and GROUP. Impose the restrictions needed to test for (NO)INTERACTION. Compare your ASSUMED MODEL with your RESTRICTED MODEL. Incidentally, talented high school students SHOULD be able to handle this if they are given the opportunity! | One-way Analysis of Variance | | Analysis of Variance | Source DFSSMSFP | Factor 1 252 252 1.540.231 (EXPLAIN THIS HYPOTHESIS) | Error 18 2949 164 | Total 19 3201 | | and t
Re: categorical or numerical
I've watched the many thoughtful "categorical or numerical" messages with interest. For many, many years I've proposed that IN THE BEGINNING ALL RECORDED INFORMATION is BINARY, CATEGORICAL, NOMINAL -- (not DUMMY). AFTER humans associate MEANING with the categories then the information acquires various NAMES/TYPES. It is important to know the MEANING that is associated with the recorded information. This is a problem with all communication. -- An interesting example is the discussion about GRADES GIVEN FOR ASSESSING PERFORMANCE BY STUDENTS. In some situations we observe grade names as: C, D,B,F,A (CATEGORICAL, NOMINAL, BINARY, ...?) THEN WE SOMETIMES THINK OF THEM AS BEING IN SOME "ORDER". A,B,C,D,F (we probably use "F" for Failure, but why don't we use "E" for Excellent?) THEN WE MIGHT LIKE TO COMPUTE GRADE POINT AVERAGES AND Let 4 = (another name for) "A" 3 = (another name for) "B" 2 = (another name for) "C" 1 = (another name for) "D" 0 = (another name for) "F" (Why is the difference between D(1) and F(0) the same as ALL other adjacent categories?) And on the other hand we may have "SCORES" on a 100-point scale. but someone might desire to have some "LETTERS". So we may Let 90-100 = A 80-89 = B 70-79 = C 60-69 = D 00-59 = F :-) -- Joe * Joe Ward Health Careers High School 167 East Arrowhead Dr 4646 Hamilton Wolfe San Antonio, TX 78228-2402 San Antonio, TX 78229 Phone: 210-433-6575 Phone: 210-617-5400 Fax: 210-433-2828 Fax: 210-617-5423 [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: Jan de Leeuw <[EMAIL PROTECTED]> To: Paul Velleman <[EMAIL PROTECTED]>; Hankins <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Cc: Paul F Velleman <[EMAIL PROTECTED]> Sent: Saturday, December 04, 1999 7:59 AM Subject: RE: categorical or numerical | It's nice to sort of disagree with Paul for a change. | | Students should be taught that ALL measurements are categorical and WHY | we are usually pretty succesfull treating data AS IF it were measured on a | continuous scale EVEN THOUGH WE KNOW IT IS NOT. | | Thus, on a minor point, it is neither a good nor a bad idea to "slice | continuous | measurements into categories". There are no continuous measurements, so | the whole notion is irrelevant. We can just choose to make our categories more | broad, and this is a choice which is part of the analysis. The argument | that this "throws away information" seems to suggest that is inherently bad. | But statistics is the art of throwing away information. | | Think about the shift of emphasis if the normal would be moved back to its | rightful historical place: as a convenient and widely applicable numerical | approximation tool. No more nonsense such as "Assume the data are a sample from | a normal distribution ... ". | | Of course I agree with Paul that unnecessary discretizing is, well, | unnecessary. | | At 11:46 PM -0500 12/3/99, Paul Velleman wrote: | >At 11:14 PM 12/03/1999, Hankins wrote: | > >We would not be able to measure anything, then not able to record the | > >measurement, if slicing continuous measurements into categories is "almost | > >always a bad idea"! | > | >Perhaps I should have been more precise. Of course every recorded | >measurement is discretized to some degree. What I oppose is *unnecessary* | >discretizing. | > | > >The students should rather be taught that ALL measurements are categorical. | > | >On this, however, I disagree. Calling a variable categorical usually | >suggests a limited range of analysis possibilities. In fact, we are usually | >pretty successful treating discretized data as if it were meausred on a | >continuous scale even when we know it is not. | > | >-- paul | > | >Paul F. Velleman | >Cornell University Data Description, Inc. | >358 Ives Hall Box 4555 | >Ithaca, NY 14853 Ithaca, NY 14852-4555 | >(607) 255-4411 (607) 257-1000 | >(607) 255-8484 fax(607) 257-4146 fax | > | > | >=== | >The Advanced Placement Statistics List | >To UNSUBSCRIBE send a message to [EMAIL PROTECTED] containing: | >unsubscribe apstat-l | >Discussion archives are at | >http://forum.
Re: sets of values
Bob makes his, as-usual, valuable comments!! - Original Message - From: Bob Hayden <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Sunday, November 28, 1999 9:17 PM Subject: Re: sets of values | > On 27 Nov 1999 18:44:33 -0800, [EMAIL PROTECTED] wrote: | > | > > Obviously the sets are not related in a linear fashion. | > > | > > I would suggest that a 4th degree polynomial equation best fits the data. | > | > Oh! that should have been obvious | | > Rich Ulrich, [EMAIL PROTECTED] | > http://www.pitt.edu/~wpilib/index.html | | I was hoping someone else would respond to the polynomial problem. | Rich did, but I fear his point and his humor may be lost on those who | need it most. | | Higher order polynomial fits are problematic in many ways. It would | be VERY unusual for a polynomial of degree higher than two to be a | reasonable model (outside of cases where prior theory specifically | predicts a higher order polynomial). A software package that | recommends fitting a slew of higher order polynomials and then | choosing among them is of dubious statistical quality. To know what | to do instead you would need to know more about the context and | meaning of the data. For example, in my current regression class we | had data on electrical consumption of condominium units of different | sizes. A parabola gave a considerably better fit than a straight line | -- but it also predicted that costs would peak out at a size within | the range of the data and then drop off for larger sizes. This is not | very sensible. My choice was to transform size into 1/size^2. This | was not perfect but it was reasonable for the range of sizes studied | and did not do bizarre things just beyond that range. It gave a model | that rose more slowly for large sizes but never went down with | increasing size. | | PS I learned about the dangers of fitting higher order polynomials as | part of a final programming assignment in a Fortran course I took as | an undergraduate at MIT in about 1970. If you have n data points with | distinct x-values, a polynomial of degree n-1 gives a PERFECT fit in | the sense of going right through each point. However, for n more than | a few, it wiggles wildly between points and the matrix algebra croaked | all the canned packages we had at the time because of multicollinearity | problems. The point of the assignment: having a computer is no | substitute for knowing what you're doing. | | | _ | | | Robert W. Hayden | | | Department of Mathematics | / | Plymouth State College MSC#29 || | Plymouth, New Hampshire 03264 USA || * | Rural Route 1, Box 10 | /| Ashland, NH 03217-9702 | | ) (603) 968-9914 (home) | L_/ [EMAIL PROTECTED] | fax (603) 535-2943 (work) | - Joe Ward comments -- Hi, Bob -- Re your first paragraph-- nth-degree polynomials CAN BE USEFUL IN FITTING A WIDE RANGE OF MODELS! I'm assuming that you are referring to a LINEAR MODEL of the form: Y = a0*U + a1*X + a2*X^2 + a3*X^3 + ... + an*X^n + E (where U is a predictor of 1's -- the most neglected and misunderstood predictor of all time) By applying the capabilities acquired in learning to apply restrictions to investigate hypotheses using LINEAR MODELS it is possible to use ONLY THOSE PARTS OF A GENERAL POLYNOMIAL THAT DO A GOOD JOB OF FITTING THE DATA. We might START with a model of the general form shown above and then impose restrictions so that we can use only THAT PART OF THE FUNCTION THAT HAS a monotonic increasing or decreasing portion of the more-general form; or, if desired, use only a portion of the function that has TWO CHANGES OF DIRECTION, etc. It isn't necessary to use ALL of the predictors in the general form. If students are given the capability by their statistics teachers to impose restrictions on models, these students will have useful tools outside the statistics world. A student might want to create a model of the general form: Y = a0*U + a1*X + a2*X^2 such that the slope = 0 at X = k or such that the slope = s at X = k A curious student might want to spend many hours exploring the possibilities. In the SAS system, it is quite easy have a very general STARTING MODEL, then use the RESTRICT STATEMENT to create an ASSUMED MODEL and then use the TEST STATEMENT to test hypotheses. When students are first learning to impose restrictions it seems best to have them actually develop the RESTRICTED MODEL and then to VERIFY THAT THE RESTRICTED MODEL HAS THE DESIRED PROPERTIES. Even the SAS system might create "strange" models that are not what the user has in mind. Students can apply their "basic" algebra (IF they have "basic" algebra)to some practical use! And, referring to your MIT exper
Re: Need to evaluate difference between two R's
- Original Message - From: Herman Rubin <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, November 24, 1999 10:07 AM Subject: Re: Need to evaluate difference between two R's | In article <[EMAIL PROTECTED]>, | Rich Ulrich <[EMAIL PROTECTED]> wrote: | >On Tue, 23 Nov 1999 04:39:28 GMT, [EMAIL PROTECTED] wrote: | | >> Does any one know how one might test for significant differences | >> between two multiple R's (or R squar's)generated from two sets of data? | >> I need to determine if two R's generated on two separate occasions | >> using the same DV and IV's differ significantly from one another. | | >Correlations are not very good candidates for comparisons, since it is | >so easy to do tests that are more precise. | > - to test whether the predictive relations are different, you would | >test the regressions -- do a Chow test or the equivalent, to see if a | >different set of regressors are needed for a different sampling. | > - to test whether the variances are different (which is something | >that would change the correlations), you might test variances | >directly. | | This is correct. In fact, it is generally the case that | correlations, except as measures of how well the model | fits, do not have any real meaning. | | Even the amount of the variance explained can change | drastically with a change in design, but the parameters of | the model do not change, if normalizations are not done. | For example, if one has a "normal" model with correlation | coefficient .5, 25% of the variance is explained. Now | suppose that the predictor variable is selected to be | 2 standard deviations away from the mean, equally likely | to be in either direction. Then the correlation becomes | .756, and the proportion of the variance explained goes | up to 57%. But the prediction model is still the same. | -- | This address is for information only. I do not claim that these views | are those of the Statistics Department or of Purdue University. | Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 | [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 | -- Herman -- Great comment! Discussions about correlation coefficients arise periodically on various lists. So when the time seems appropriate I resend an old message (see below and the WORD attachment) that might be of interest. IMHO their is too much time spent on the correlation coefficient since it is of limited and sometimes misleading value for practical decision-making in the real world. However, there are still some folks who are adjusting correlation coefficients for "restriction of range" in hopes that it might be useful. -- Joe * Joe Ward Health Careers High School 167 East Arrowhead Dr 4646 Hamilton Wolfe San Antonio, TX 78228-2402 San Antonio, TX 78229 Phone: 210-433-6575 Phone: 210-617-5400 Fax: 210-433-2828 Fax: 210-617-5423 [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html * -- Forwarded message -- Date: Fri, 23 May 1997 09:30:20 -0400 (EDT) From: Mike Palij <[EMAIL PROTECTED]> To: [EMAIL PROTECTED], [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: Testing basic statistical concepts I'd like to thank Joe Ward for reminding us of this situation (his posting is appended below), as well as jogging my own memory for a previous posting I had made. A while back I had posted the Anscombe dataset (in the context of an SPSS program) which also clearly shows the benefit of plotting the data: the four situations produce almost identical Pearson r values but only one actually shows the classic scatterplot, the others show a nonlinear pattern and the influence that a single point has on the calculation of r. What does the value of r tell us here? Aren't the basic statistical concepts to be learned in this situation far more important and most clearly seen through a coordination of the graphical and numerical information? -Mike Palij/Psychology Dept/New York University Joe H Ward <[EMAIL PROTECTED]> writes: To Mike et al -- There have been several message related to the Simple Correlation Coefficient. IMHO, when out in the "real world" involving practical decision-making the correlation coefficient has very limited value and sometimes dangerous consequences. The correlation coefficient may be an important topic for the history of statistics to learn the problems associated with i
Re: blocking for variation/confounding
Steve -- Your students are asking the good questions!! This comes up repeatedly. I try to minimize reference to unfamiliar statistical terms when I introduce students to the use of Prediction/Regression/Linear Models in science research projects. Without using special, unfamiliar terms such as "reducing variation","blocking","confounding","main effects" ,etc. students can understand natural language statements, such as: (Consider Example 10.22, page 774 of M&M 2nd edition) 1. "Is there a difference between the MEAN HEART RATE AFTER 6 MINUTES OF TREADMILL EXERCISE of RUNNERS WHO AVERAGED AT LEAST 15 MILES PER WEEK and a GROUP OF "SEDENTARY" SUBJECTS? A shorthand might be the following ONE expression: Is MEAN PERFORMANCE OF THE EXERCISE GROUP = MEAN PERFORMANCE OF THE "COUCH POTATOES"? or, equivalently Is MEAN PERFORMANCE OF THE EXERCISE GROUP - MEAN PERFORMANCE OF THE "COUCH POTATOES = 0"? And after an interesting discussion about the problem, students might turn to: 2. "Is there a difference between the MEAN HEART RATE AFTER 6 MINUTES OF TREADMILL EXERCISE of RUNNERS WHO AVERAGED AT LEAST 15 MILES PER WEEK and a GROUP OF "SEDENTARY" SUBJECTS AND WHO ARE OF THE SAME SEX? A shorthand might be the TWO expressions: Is MEAN PERFORMANCE OF THE MALE EXERCISE GROUP = MEAN PERFORMANCE OF THE MALE "COUCH POTATOES"? and Is MEAN PERFORMANCE OF THE FEMALE EXERCISE GROUP = MEAN PERFORMANCE OF THE FEMALE "COUCH POTATOES"? or, equivalently, Is MEAN PERFORMANCE OF THE MALE EXERCISE GROUP - MEAN PERFORMANCE OF THE MALE "COUCH POTATOES = 0"? and Is MEAN PERFORMANCE OF THE FEMALE EXERCISE GROUP - MEAN PERFORMANCE OF THE FEMALE "COUCH POTATOES = 0"? And, further discussion might lead to: 3. If there ARE DIFFERENCES in 2. above: Is MEAN PERFORMANCE OF THE MALE EXERCISE GROUP - MEAN PERFORMANCE OF THE MALE "COUCH POTATOES" = MEAN PERFORMANCE OF THE FEMALE EXERCISE GROUP - MEAN PERFORMANCE OF THE FEMALE "COUCH POTATOES"? Students can easily discuss these questions without special terminology. Also, they can discuss "controlling for", "holding fixed" or other expressions that make sense to them -- WITHOUT SPECIAL TERMINOLOGY. These TWO-ATTRIBUTE PROBLEMS CAN BE DISCUSSED DURING THE FIRST DAY OF CLASS TO SHOW THE POWERFUL QUESTIONS THAT CAN BE INVESTIGATED IF THE STUDENTS STICK WITH THEIR STUDY OF STATISTICS. Also, it is easy to enter additional predictor attributes if students feel that such attributes are relevant. Notice that there is no need to discuss whether or not the FOUR MUTUALLY EXCLUSIVE GROUPS MUST HAVE EQUAL NUMBERS OF OBSERVATIONS since computers eliminate the computational problems presented by UNEQUAL N's. Even if the curriculum does not allow time to actually analyze these questions -- students should be aware of WHAT THEY MIGHT BE ABLE TO DO IF GIVEN THE OPPORTUNITY. I strongly suggest that students are HIGHLY MOTIVATED by their POWER TO CONTROL FOR THE UNCONTROLLABLE. They really like the motto: IF YOU CAN'T CONTROL IT -- MEASURE IT AND PUT IT IN THE MODEL. In addition, if their future research leads to an interest in the possible INTERACTION (a special term associated with question #3 above) between attributes, then the attributes MUST BE MEASURED AND INCLUDED IN THE MODEL. And the sermon comes to the end!! Amen!! :-) -- Joe * Joe Ward Health Careers High School 167 East Arrowhead Dr 4646 Hamilton Wolfe San Antonio, TX 78228-2402 San Antonio, TX 78229 Phone: 210-433-6575 Phone: 210-617-5400 Fax: 210-433-2828 Fax: 210-617-5423 [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: SUGHRUE, STEVE <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, November 18, 1999 7:15 PM Subject: blocking for variation/confounding | Hi everyone! | In the AP course description booklet, multiple choice question | number 13 asks for the primary reason for blocking when designing an | experiment. My students and I agree that reducing variation is a good | answer, but isn't reducing confounding also pretty good? Are we missing | something here?? | Thanks to anyone who can help . | | Steve Sughrue | Tabor Academy | Marion, MA | | === | The Advanced Placement Statistics List | To UNSUBSCRIBE send a message to [EMAIL PROTECTED] containing: | unsubscribe apstat-l | Discussion archives are at | http://forum.swarthmore.edu/epigone/apstat-l | Problems with the list or your subscription? mailto:[EMAIL PROTECTED] | === |