Re: adjusted r-square
If the least-squares regression algorithm does not REQUIRE THE NUMBER OF OBSERVATIONS TO EXCEED THE NUMBER OF PREDICTORS, THEN THE REGRESSION ALGORITHM COULD BE USED TO SOLVE A SYSTEM OF SIMULTANEOUS EQUATIONS THAT WOULD HAVE NO ERRORS. Another interesting characteristic of Excel Regression is that it requires the number of observations to exceed the number of predictors. Fortunately, Colin Bell is working with the Excel folks at Microsoft to improve the numerous interesting characteristics of Statistics in Excel. -- Joe *** Joe H. Ward, Jr. *** 167 East Arrowhead Dr. *** San Antonio, TX 78228-2402 *** Phone: 210-433-6575 *** Fax: 210-433-2828 *** Email: [EMAIL PROTECTED] *** http://www.ijoa.org/resumes/ward.html *** --- *** Health Careers High School *** 4646 Hamilton-Wolfe *** San Antonio, TX 78229 * - Original Message - From: Graeme Byrne [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, August 22, 2001 4:42 AM Subject: Re: adjusted r-square In short, you don't. If the number of terms in the model equals the number of observations you have much bigger problems than not being able to compute adjusted R^2. It should always be the case that the number of observations exceed the number of terms in the model otherwise you cannot calculate any of the standard regression diagnostics (F-stats, t-stats etc). My advice is get more data or remove terms from the model. If neither of these is an option you are stuck. Atul [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... I have a doubt regarding adjusted r-square How do we calculate the adjusted r-square when the error degrees of freedom are zero ? (or in other words, number of samples is equal to the number of regression terms including the constant) Such a situation leads to a zero in the denominator in the expression for calculating adjusted r-square. Your help is highly appreciated. Thanks Atul = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
Eric -- Good comment! Also, it is helpful to keep in mind that: t^2 (df2) = F(1,df2) -- Joe Joe Ward 167 East Arrowhead Dr. San Antonio, TX 78228-2402 Home phone: 210-433-6575 Home fax: 210-433-2828 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html Health Careers High School 4646 Hamilton Wolfe San Antonio, TX 78229 Phone: 210-617-5400 Fax: 210-617-5423 - Original Message - From: "Eric Bohlman" [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Monday, April 16, 2001 3:43 PM Subject: Re: Student's t vs. z tests Mark W. Humphries [EMAIL PROTECTED] wrote: Hi, I am attempting to self-study basic multivariate statistics using Kachigan's "Statistical Analysis" (which I find excellent btw). Perhaps someone would be kind enough to clarify a point for me: If I understand correctly the t test, since it takes into account degrees of freedom, is applicable whatever the sample size might be, and has no drawbacks that I could find compared to the z test. Have I misunderstood something? You're running into a historical artifact: in pre-computer days, using the normal distribution rather than the t distribution reduced the size of the tables you had to work with. Nowadays, a computer can compute a t probability just as easily as a z probability, so unless you're in the rare situation Karl mentioned, there's no reason not to use a t test. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: cite for using linear regression instead of logistic regression
David -- Logistic Regression is more appealing to some folks since it maps the Predicted values into the range 0-1. If you do a least-squares regression predicting a 0-1 dependent variable, the predicted values may not be mapped into 0-1 (e.g. some predicted values may be 0 and some may be 1. However, for "practical" decision-making such as "selection", "classification" the results will be the same. Since you brought up the question, I'm sure that the "logistic regression" folks can enlighten us on the "practical" advantages of "logistic regression". -- Joe Joe Ward167 East Arrowhead Dr.San Antonio, TX 78228-2402Home phone: 210-433-6575Home fax: 210-433-2828Email: [EMAIL PROTECTED]http://www.ijoa.org/joeward/wardindex.htmlHealth Careers High School4646 Hamilton WolfeSan Antonio, TX 78229Phone: 210-617-5400Fax: 210-617-5423 - Original Message - From: "David Duffy" [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, March 18, 2001 8:41 PM Subject: Re: cite for using linear regression instead of logistic regression Scheltema, Karen [EMAIL PROTECTED] wrote: I've read several times on this listserve comments from people that when p(y) is not extreme, a logistic regression model can be estimated by a linear regression model. Some references cited by Harvey (1982): also BFH Harvey WR (1982). Least squares analysis of discrete data. J Anim Sci 54: 1067-1071. Cochran WG (1940). The analysis of variance when experimental errors follow the Poisson or binomial laws. Ann Math Statis 11: 335. Cochran WG (1943). Analysis of variance for percentages based on unequal numbers. JASA 38:287. Li JCR (1964). Introduction to statistical inference I. Ann Arbor: Edwards. -- | David Duffy. ,-_|\ | email: [EMAIL PROTECTED] ph: INT+61+7+3362-0217 fax: -0101 / * | Epidemiology Unit, The Queensland Institute of Medical Research \_,-._/ | 300 Herston Rd, Brisbane, Queensland 4029, Australia v = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Re: topic?
Happy New Year -- Perhaps Laurie Snell will make a good start through the future CHANCE issues. -- Joe Joe Ward 167 East Arrowhead Dr. San Antonio, TX 78228-2402 Home phone: 210-433-6575 Home fax: 210-433-2828 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html Health Careers High School 4646 Hamilton Wolfe San Antonio, TX 78229 Phone: 210-617-5400 Fax: 210-617-5423 - Original Message - From: "Bokhorst, Frank" [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, January 02, 2001 4:41 AM Subject: Re: topic? Bob Hayden asked: Anybody have anything to say about statistical education??? I would like to turn the question round, and ask if it might be possible to summarize relevant material from the recent discussion on the forum about the US election saga into a form suitable for teaching purposes? In particular, to sift through the EDSTAT archive and edit a resource text. There was much off-topic discussion, but there was also a huge volume of generally polite and reasonable talk with many good points illustrating key issues relevant to education. The topic itself was extremely pertinent and interesting to a wide audience. For example, someone recently asked for examples of the misuse of statistics - surely many examples could be found in the US election saga? What we need is a good summary. As another example, I note that Herman Rubin frequently argues the need for proper understanding of statistics: Could he, or someone anybody else on the EDSTAT forum, perhaps help educators by compiling some examples that arose in the recent discussion? What kind of understanding of statistics might be required of lawyers, politicians, voters, media editors? Maybe someone could list key points that came out of these EDSTAT discussions? Frank Bokhorst http://www.uct.ac.za/depts/psychology/bok _O tel: 021 650-3708 -\, fax: 021 689-7572 One car less (.)/(.) Psychology Dept., The owner of this bicycle University of takes responsibility for Cape Town, the shape of his drawing Rondebosch 7701,only if you use a fixed South Africa. size font such as Courier. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Statistical penalties for sequential analyses
Rich - You might want to consider doing some Resampling (Cross-Validation, Bootstrap) as you continue through your analyses. -- Joe Joe Ward Health Careers High School 167 East Arrowhead Dr _ 4646 Hamilton Wolfe San Antonio, TX 78228-2402 San Antonio, TX 78229 Phone: 210-433-6575__ Phone: 210-617-5400 Fax: 210-433-2828 Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** - Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Friday, December 08, 2000 3:30 PM Subject: Statistical penalties for sequential analyses Need some advice. We are doing a series of tests looking for correlations among age-sensitive variables in a population of mice. We will have about 600 mice in all, and it will take 3 years to test each mouse at about 200 mice tested each year. We are considering three strategies: A) Wait 3 years until all the data are in; then do the analyses. B) Analyze the data on the first 300 mice, and publish anything that looks exciting and meets conventional significance criteria. When the second set of mice is finished, we can use these second 300 animals as a replicate samples to (try to) confirm the significant findings we reported on the first set. And we can also pool all 600 mice to obtain higher statistical power than we had for the initial analysis with N = 300. Of course this represents testing some hypotheses twice, and thus increases the Type I error rate. I suspect that there are theoretically justified methods for adjusting significance criteria to "adjust" for taking two looks at the data, but I don't know how to do this. Anyone have a recipe, or a reference to get me started? Thanks. Rich Miller University of Michigan Reply to: [EMAIL PROTECTED] Sent via Deja.com http://www.deja.com/ Before you buy. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: [ap-stat] Textbook for regular statistics vs. AP Statistics
- Original Message - From: "Carole Black" [EMAIL PROTECTED] To: "AP Statistics" [EMAIL PROTECTED] Sent: Wednesday, November 29, 2000 12:58 PM Subject: [ap-stat] Textbook for "regular" statistics vs. AP Statistics I have taught a "regular" statistics class at my high school for the last 3 years using Elementary Statistics by Mario Triola. (This was the book I inherited.) This is textbook adoption year for Georgia and I have the priviledge of picking out Statistics books for both the "regular" stat class as well as a new AP class that will be offered for the first time next year. (I will be teaching both classes). My first question is, should I go with 2 different textbooks or the same textbook? My second question is much the same as many others posted on this site, which book? I am seriously considering the Yates, Moore and McCabe "The Practice of Statistics" for the AP class. I am considering either Moore's "Basic Practice of Statistice" or the "Elementary Statistics" book published by McGraw Hill for the regular statistics class. Any comments would be greatly appreciated. Carole Black --- = Joe Ward Comments == Hi, Carole -- Your opportunity of having an AP-Statistics class and a "regular" Statistics class can allow you the freedom of using the "regular" class to give students the capability to use the combined power of Regression/Linear Models and Computers to investigate some interesting and practical research questions. You might recruit some of your science students to give them useful techniques to support their research projects. You can give your students the power to create models to answer their research questions. It is certainly reasonable that you must give your AP-Statistics students the objectives that tend to match the corresponding college course. For the "regular" Statistics course you can make the course both interesting and practical without the constraints of AP-Statistics. There probably are many AP teachers who can accomplish the AP-Statistics objectives AND have extra time to give their students some more powerful capabilities. Try to make your "regular" statistics course available for ALL students. Frequently, the "regular" course is designed for the less talented. You CAN make the regular course the more popular since your students might be able to do some powerful research. Students who are involved with Science Fairs, Jr. Academy of Science and the ASA Project/Poster competitions should be your target population for the "regular" course. Be sure to have access to books that contain ideas of how to use Regression/Linear models to create models to answer the students research questions of interest. -- Joe Joe Ward Health Careers High School 167 East Arrowhead Dr _ 4646 Hamilton Wolfe San Antonio, TX 78228-2402 San Antonio, TX 78229 Phone: 210-433-6575__ Phone: 210-617-5400 Fax: 210-433-2828 Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
[ap-stat] RE: election proposal
Does anyone know WHY so many states DON'T DO IT THIS WAY? Perhaps the Political Science/History folks can comment. -- Joe Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** - Original Message - From: "Lee Creighton" [EMAIL PROTECTED] To: "AP Statistics" [EMAIL PROTECTED] Sent: Monday, November 13, 2000 8:11 AM Subject: [ap-stat] RE: election proposal People are listening! This is exactly how Nebraska and Maine vote, as we speak. It was decided after the disastrous 1824 election that the states would have the power to manage how they pick electors, and *not* the federal government. -Original Message- From: Jon Graetz [mailto:[EMAIL PROTECTED]] Sent: Sunday, November 12, 2000 11:30 PM To: AP Statistics Subject: [ap-stat] RE: election proposal I like it! Now, to get anyone else to listen... Jon Graetz The Miami Valley School 5151 Denise Drive Dayton, OH 45429 (937)434- [EMAIL PROTECTED] [EMAIL PROTECTED] -Original Message- From: Reba Taylor [mailto:[EMAIL PROTECTED]] Sent: Sunday, November 12, 2000 11:00 PM To: AP Statistics Subject: [ap-stat] election proposal I've been toying with this idea: Each state has the same number of electors as their congressional delegation: e.g. in VA, we have 11 congressional districts + 2 senators = 13 electors. Let's keep the electors, but have the ones representing the congressional districts vote the way their district votes. Then the 2 at-large electors will vote the way the state as a whole votes. I think this is more equable than winner-take-all. I also think it would be a more representative sample of the popular vote, but still giving the smaller states as much clout as the larger ones. Reba Taylor * * Reba Taylor [EMAIL PROTECTED] * * * * Home: School: * * Blacksburg High School * * 2418 Ridge Road 520 Patrick Henry Drive * * Blacksburg, VA 24060 Blacksburg, VA 24060 * * 540-953-2421 540-951-5706 * * * * AP Computer Science, AP Statistics, Math * * * * Black holes are where God divided by zero. * * * * "Can't never could, till it tried!" -- S.C. Taylor * * * * --- You are currently subscribed to ap-stat as: [EMAIL PROTECTED] To unsubscribe send a blank email to [EMAIL PROTECTED] Frequently Asked Questions(FAQ) Site is at http://www.ncssm.edu/statsteachers AP Statistics Archives are at http://forum.swarthmore.edu/epigone/apstat-l --- You are currently subscribed to ap-stat as: [EMAIL PROTECTED] To unsubscribe send a blank email to [EMAIL PROTECTED] Frequently Asked Questions(FAQ) Site is at http://www.ncssm.edu/statsteachers AP Statistics Archives are at http://forum.swarthmore.edu/epigone/apstat-l = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: [ap-stat] RE: election proposal
Does anyone know WHY so many states DON'T DO IT THIS WAY? Perhaps the Political Science/History folks can comment. -- Joe Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** - Original Message - From: "Lee Creighton" [EMAIL PROTECTED] To: "AP Statistics" [EMAIL PROTECTED] Sent: Monday, November 13, 2000 8:11 AM Subject: [ap-stat] RE: election proposal People are listening! This is exactly how Nebraska and Maine vote, as we speak. It was decided after the disastrous 1824 election that the states would have the power to manage how they pick electors, and *not* the federal government. -Original Message- From: Jon Graetz [mailto:[EMAIL PROTECTED]] Sent: Sunday, November 12, 2000 11:30 PM To: AP Statistics Subject: [ap-stat] RE: election proposal I like it! Now, to get anyone else to listen... Jon Graetz The Miami Valley School 5151 Denise Drive Dayton, OH 45429 (937)434- [EMAIL PROTECTED] [EMAIL PROTECTED] -Original Message- From: Reba Taylor [mailto:[EMAIL PROTECTED]] Sent: Sunday, November 12, 2000 11:00 PM To: AP Statistics Subject: [ap-stat] election proposal I've been toying with this idea: Each state has the same number of electors as their congressional delegation: e.g. in VA, we have 11 congressional districts + 2 senators = 13 electors. Let's keep the electors, but have the ones representing the congressional districts vote the way their district votes. Then the 2 at-large electors will vote the way the state as a whole votes. I think this is more equable than winner-take-all. I also think it would be a more representative sample of the popular vote, but still giving the smaller states as much clout as the larger ones. Reba Taylor * * Reba Taylor [EMAIL PROTECTED] * * * * Home: School: * * Blacksburg High School * * 2418 Ridge Road 520 Patrick Henry Drive * * Blacksburg, VA 24060 Blacksburg, VA 24060 * * 540-953-2421 540-951-5706 * * * * AP Computer Science, AP Statistics, Math * * * * Black holes are where God divided by zero. * * * * "Can't never could, till it tried!" -- S.C. Taylor * * * * --- You are currently subscribed to ap-stat as: [EMAIL PROTECTED] To unsubscribe send a blank email to [EMAIL PROTECTED] Frequently Asked Questions(FAQ) Site is at http://www.ncssm.edu/statsteachers AP Statistics Archives are at http://forum.swarthmore.edu/epigone/apstat-l = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Help needed ... :-(
Well said, Bob -- -- Joe Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** - Original Message - From: "Bob Hayden" [EMAIL PROTECTED] To: "EdStat-L" [EMAIL PROTECTED] Sent: Monday, November 13, 2000 9:46 PM Subject: Re: Help needed ... - Forwarded message from David Heiser - - Original Message - From: Dennis [EMAIL PROTECTED] Hello Newsgroup, I'm searching for real good books on stats. I'm a student of psychology and we've been taught very much stats. But I read all the time your postings and wonder why I've never heard about that what I read. ... Hopefully and with much regards yours Dennis --- What you need is a good class in written English DAH - End of forwarded message from David Heiser - From the email address, it appears that Dennis lives in a European country where English is not the predominant language. The written English here far surpasses my written French, German or Latin, to mention only languages I have studied. I note that, unlike most Americans, Dennis uses the word "hopefully" correctly. Of course, if Americans were as good with other people's languages as Europeans are, Dennis could have sent us a native-language posting, and then criticized us when we tried to respond in that language. I think this list can benefit greatly from being an INTERNATIONAL list. Let's make folks from other countries feel welcome. _ | | Robert W. Hayden | | Work: Department of Mathematics / | Plymouth State College MSC#29 | | Plymouth, New Hampshire 03264 USA | * | fax (603) 535-2943 /| Home: 82 River Street (use this in the summer) | ) Ashland, NH 03217 L_/ (603) 968-9914 (use this year-round) Map of New[EMAIL PROTECTED] (works year-round) Hampshire http://mathpc04.plymouth.edu (works year-round) The State of New Hampshire takes no responsibility for what this map looks like if you are not using a fixed-width font such as Courier. "Opportunity is missed by most people because it is dressed in overalls and looks like work." --Thomas Edison = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: [ap-stat] revote and Accuracy and Design of Voting Forms
Bob Hayden wrote to the AP list: == - Original Message - From: "Bob Hayden" [EMAIL PROTECTED] To: "AP Statistics" [EMAIL PROTECTED] Sent: Friday, November 10, 2000 10:01 AM Subject: [ap-stat] revote After considering all the issues raised on the lists regarding the election, I think the best solution would be a revote in every state of the union -- but with NEW CANDIDATES!-) -- | | Robert W. Hayden | | Work: Department of Mathematics / | Plymouth State College MSC#29 | | Plymouth, New Hampshire 03264 USA | * | fax (603) 535-2943 /| Home: 82 River Street (use this in the summer) | ) Ashland, NH 03217 L_/ (603) 968-9914 (use this year-round) Map of New[EMAIL PROTECTED] (works year-round) Hampshire http://mathpc04.plymouth.edu (works year-round) The State of New Hampshire takes no responsibility for what this map looks like if you are not using a fixed-width font such as Courier. "Opportunity is missed by most people because it is dressed in overalls and looks like work." --Thomas Edison === Joe Ward replied to Bob Hayden === Hey, Bob -- THAT really brought some hearty chuckles to Bettie and I. -- Joe ******** Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** == Then Bob Hayden wrote: = - Original Message - From: "Bob Hayden" [EMAIL PROTECTED] To: "Joe Ward" [EMAIL PROTECTED] Sent: Friday, November 10, 2000 10:41 AM Subject: Re: [ap-stat] revote Their post-election bickering did not endear them to me. I think they should both go home, return to their jobs, and SHUT UP. Joe Ward Comments about Accuracy of Voting Responses == Is there research on the Design of Voting Forms? = Hi, Bob -- In ANY election, the format for obtaining voting responses should be designed to minimize the chances for inaccurate responses. It is surprising that the "format-approval folks" in Palm Beach did not redesign the form. It looks like the form was designed for convenience of the computer folks or the print shop or others--but not for the accuracy of responses. No matter who is the winner in any election, there probably are some local voting systems that need "fine tuning". In San Antonio, we have gone through numerous varieties of voting formats. Some seem better than others. I'm not sure how the final forms are "approved". In this recent election we used felt-tip markers!!! The ink soaked through to the back side of the paper but when my wife mentioned it, the "judges" said that it had been checked and "did not interfere with the markings on the other side". But do we know what happens if there is a SMEAR of the wet ink? Does THAT BALLOT COUNT, or is it rejected? If I were running for election in our county and the voting was close, then I certainly would ask for a "hand" recount to find out how many votes were rejected by the scan machine because of "smear" or because the wet ink soaked through the paper (probably cheap paper) and was "sensed" on the back. Perhaps there should be a research project designed by a TASK FORCE of some ASA members to evaluate the many different forms to find out which form(s) MINIMIZE INACCURACY OF RESPONSE. It is likely that such research has been done since it is such an important activity. The studies should consider age, education, language and other variables. -- Joe Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: 2 factor ANOVA with empty cells
Right you are, Elliot. However, when one finds "no-interaction" among all of those cells that are present, then one can feel "better" about estimating the "missing" cell values. Of course, there could be a surprising explosion!! The more interaction that is detected the more dangerous it can be. When there is little or no interaction it is possible to design the study to save money and time. There is no need to fill in all the cells all the time -- particularly when the cost is great. The real experimental design "experts" can get lots of information from a small study that might have missing cells "strategically located". - Joe **** Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** - Original Message - From: "Elliot Cramer" [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, November 01, 2000 8:43 PM Subject: Re: 2 factor ANOVA with empty cells Jeff E. Houlahan [EMAIL PROTECTED] wrote: : Is it ever appropriate to do a 2-factor unreplicated ANOVA with : empty cells if you aren't sure there is no interaction between the ^ you can test the part of the interaction that is testable, but of course you can never know about the rest. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Independent-Dependent Variable Discussion--Inverse Estimation
Hi Dan and all -- I had intended to comment about the independent-dependent variable discussion earlier but I got side-tracked. Since Dan reminded us with his comment: " This problem statement also brings back the independent-dependent variable discussion. In the real context, the activity level of the crickets depends upon the temperature, so temperature is the independent variable and number of chirps the dependent variable. However, if you want to predict the temperature using the number of chirps, you must consider the number of chirps as the "independent" variable and temperature as the "dependent" variable." I have inserted some comments below: === Joe Ward writes == In the ancient past (1950s), for calibration studies -- Let Y be a reading from a measuring instrument, SUBJECT TO "ERRORS OF MEASUREMENT". and X be a KNOWN STANDARD, ASSUMED TO BE "WITHOUT ERROR" (FIXED). Then the least-squares regression model used to PREDICT THE "STANDARD" (X) from the measurement Y WAS computed as: Y = b0 + b1*X + Error Then from this equation to estimate (predict) the KNOWN STANDARD (X) from the measurement (Y), the past procedure was to solve for X in the above equation (leaving off the Error) Y = b0 + b1*X or X = (Y-b0)/b1 is used to PREDICT X from Y. Dan, you probably are better acquainted with the most recent approach from the Bureau of Standards since I have not kept up with any changes in the Standards calibration policy. Furthermore, in the distant past, it is interesting to note that simultaneous regression equations were solved to estimate unkown amounts of chemical compositions in a solution. An interesting study by Fisher, Hans, R.G. Hansen, and H.W. Norton (1955). Quantitative determination of glucose and galactose. Anal. Chem. 27, 857-859. is discussed in E.J Williams' book Regression Analysis, Wiley, 1959, page 163. Williams refers to this topic as INVERSE ESTIMATION. Even though the goal is to ESTIMATE (PREDICT) the values of X, the dependent variables (Y's) are the MEASURES SUBJECT TO ERROR. After the least-squares solutions are computed then the simultaneous regression equations are solved, INVERSELY, for unknown X values from measured(observed) values of Y (which are subject to ERRORS). It would be interesting to know if this approach is still used. Is the INVERSE method BETTER? Have there been recent studies comparing the REGULAR approach with the INVERSE approach? Comments from experienced "experts" in this area are welcome. -- Joe **** Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** === End of Joe Ward's message = - Original Message - From: "Teague, Dan" [EMAIL PROTECTED] To: "AP Statistics" [EMAIL PROTECTED] Sent: Friday, October 20, 2000 10:42 AM Subject: [ap-stat] RE: effect on LSRL Rebecca, If your student chose values of the independent variable that were very large (250-450) and found the y-values that correspond to these x-values using y = 56.212 + 0.1356x, then he could increase the slope. For these data, the point (249, 55) is below that portion of the regression line on the left. The regression line would be pulled towards the point, just as you said, but in this situation, it would cause the slope to increase. The student's argument is flawed to the extent that these values of the independent variable do not match the summary statistics (xbar = 167 and s = 31). We expect to find the number of chirps between 70 and 290 and the temperature roughly between 50 and 100. For these values of x, the slope will be pulled down by the addition of this point. This problem statement also brings back the independent-dependent variable discussion. In the real context, the activity level of the crickets depends upon the temperature, so temperature is the independent variable and number of chirps the dependent variable. However, if you want to predict the temperature using the number of chirps, you must consider the number of chirps as the "independent" variable and temperature as the "dependent" variable. Daniel J. Teague NC School of Science and Mathematics 1219 Broad Street Durham, NC 27705 [EMAIL PROTECTED] -Original Message- From: Rebecca Brewer [mailto:[EMAIL PROTECTED]] Sent: Friday, October 20, 2000 11:02 AM To: AP Statistics Subject: [ap-stat] effect on LSRL Help!
[ap-stat] Independent-Dependent Variable Discussion--Inverse Estimation
Hi Dan and all -- I had intended to comment about the independent-dependent variable discussion earlier but I got side-tracked. Since Dan reminded us with his comment: " This problem statement also brings back the independent-dependent variable discussion. In the real context, the activity level of the crickets depends upon the temperature, so temperature is the independent variable and number of chirps the dependent variable. However, if you want to predict the temperature using the number of chirps, you must consider the number of chirps as the "independent" variable and temperature as the "dependent" variable." I have inserted some comments below: === Joe Ward writes == In the ancient past (1950s), for calibration studies -- Let Y be a reading from a measuring instrument, SUBJECT TO "ERRORS OF MEASUREMENT". and X be a KNOWN STANDARD, ASSUMED TO BE "WITHOUT ERROR" (FIXED). Then the least-squares regression model used to PREDICT THE "STANDARD" (X) from the measurement Y WAS computed as: Y = b0 + b1*X + Error Then from this equation to estimate (predict) the KNOWN STANDARD (X) from the measurement (Y), the past procedure was to solve for X in the above equation (leaving off the Error) Y = b0 + b1*X or X = (Y-b0)/b1 is used to PREDICT X from Y. Dan, you probably are better acquainted with the most recent approach from the Bureau of Standards since I have not kept up with any changes in the Standards calibration policy. Furthermore, in the distant past, it is interesting to note that simultaneous regression equations were solved to estimate unkown amounts of chemical compositions in a solution. An interesting study by Fisher, Hans, R.G. Hansen, and H.W. Norton (1955). Quantitative determination of glucose and galactose. Anal. Chem. 27, 857-859. is discussed in E.J Williams' book Regression Analysis, Wiley, 1959, page 163. Williams refers to this topic as INVERSE ESTIMATION. Even though the goal is to ESTIMATE (PREDICT) the values of X, the dependent variables (Y's) are the MEASURES SUBJECT TO ERROR. After the least-squares solutions are computed then the simultaneous regression equations are solved, INVERSELY, for unknown X values from measured(observed) values of Y (which are subject to ERRORS). It would be interesting to know if this approach is still used. Is the INVERSE method BETTER? Have there been recent studies comparing the REGULAR approach with the INVERSE approach? Comments from experienced "experts" in this area are welcome. -- Joe **** Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** === End of Joe Ward's message = - Original Message - From: "Teague, Dan" [EMAIL PROTECTED] To: "AP Statistics" [EMAIL PROTECTED] Sent: Friday, October 20, 2000 10:42 AM Subject: [ap-stat] RE: effect on LSRL Rebecca, If your student chose values of the independent variable that were very large (250-450) and found the y-values that correspond to these x-values using y = 56.212 + 0.1356x, then he could increase the slope. For these data, the point (249, 55) is below that portion of the regression line on the left. The regression line would be pulled towards the point, just as you said, but in this situation, it would cause the slope to increase. The student's argument is flawed to the extent that these values of the independent variable do not match the summary statistics (xbar = 167 and s = 31). We expect to find the number of chirps between 70 and 290 and the temperature roughly between 50 and 100. For these values of x, the slope will be pulled down by the addition of this point. This problem statement also brings back the independent-dependent variable discussion. In the real context, the activity level of the crickets depends upon the temperature, so temperature is the independent variable and number of chirps the dependent variable. However, if you want to predict the temperature using the number of chirps, you must consider the number of chirps as the "independent" variable and temperature as the "dependent" variable. Daniel J. Teague NC School of Science and Mathematics 1219 Broad Street Durham, NC 27705 [EMAIL PROTECTED] -Original Message- From: Rebecca Brewer [mailto:[EMAIL PROTECTED]] Sent: Friday, October 20, 2000 11:02 AM To: AP Statistics Subject: [ap-stat] effect on LSRL Help!
Re: How to Pool Slopes
Hi, Stan -- I've inserted a reply at the end of your message. Let me know how things turn out. -- Joe Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** - Original Message - From: "Stanley110" [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, October 08, 2000 1:59 PM Subject: Q: How to Pool Slopes Assume I have three sets of x,y data. I fit each by least-squares to a straight line. I determine that the three fitted lines are homogeneous and indistinguishable at a certain significance level. I want to express the slope (of the three) as a single point estimate and as a confidence interval. What is the formula for doing this? Please reply to this newsgroup and to the writer at [EMAIL PROTECTED]. Thank you for your help. stan alekman = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = == JOE WARD REPLIES === Hi, Stan -- Your Title says (1)"How to Pool Slopes" and you indicate later that (2)"I determine that the three fitted lines are homogeneous and indistinguishable. For (1) it sounds like you will want THREE DIFFERENT INTERCEPTS, but for case (2) it sounds like you may want only ONE INTERCEPT. This is good example of the use of the Regression Option of "NO INT" option in SAS or "Y-intercept = zero". The reason that this appears to be a difficult problem is the use of the frequently-used DEFAULT option in most statistics packages. The approach used below for your THREE GROUP DATA is shown for TWO groups of data in the Prentice-Hall published book (1973) -- "Introduction to Linear Models" by Ward and Jennings. Chapter 8, page 143. I don't know which Regression Software you are using, but you should be sure to FORCE THE Y-intercept THROUGH THE ORIGIN.. First, it is important to put ALL THREE SETS OF DATA in the same model. Let Y = dependent variable (containing ALL THREE SETS OF DATA) D1 = 1 if the corresponding element of Y is from DATA SET #1; 0 otherwise D2 = 1 if the corresponding element of Y is from DATA SET #2; 0 otherwise D3 = 1 if the corresponding element of Y is from DATA SET #3; 0 otherwise X1 = Value of x if the corresponding element of Y is from DATA SET #1; 0 otherwise X2 = Value of x if the corresponding element of Y is from DATA SET #2; 0 otherwise X3 = Value of x if the corresponding element of Y is from DATA SET #3; 0 otherwise X = Value of x for ALL corresponding elements of Y. U = 1 for every element. Then your ASSUMED MODEL is shown below: (this should give you the same regression coefficients that you already have computed -- a check that your new model is correct) Y = a1*D1 + b1*X1 + a2*D2 + b2*X2 + a3*D3 + b3*X3 + E1 (Model #1) After you have computed this ASSUMED MODEL you may want to TEST THE HYPOTHESIS that you imply in CASE (1) above, that the THREE SLOPES ARE EQUAL, i.e., b1=b2= b3=bc (THE COMMON SLOPE) Then substituting these restrictions into Model #1 produces the RESTRICTED MODEL FOR CASE (1): Y = a1*D1 + bc*X1 + a2*D2 + bc*X2 + a3*D3 + bc*X3 + E2 (Model #2) Factoring (or collecting terms) produces: Y = a1*D1 + a2*D2 + a3*D3 + bc*X + E2 (Model #2) (Note that the values of a1, a2, and a3 in Model #2 are NOT numerically equal to the values in Model #1) From Model #2, bc is the least-squares SINGLE POINT estimate of the COMMON SLOPE. Your favorite Regression procedure should give what you need to compute a confidence interval (such as the standard error of bc). Now for CASE (2) above you may want to test that: THREE SLOPES ARE EQUAL, i.e., b1=b2= b3=bc ( THE COMMON SLOPE) and THREE INTERCEPTS ARE EQUAL, i.e., a1=a2=a3=ac (THE COMMON INTERCEPT) In which case, the RESTRICTED MODEL becomes: Y = ac*D1 + bc*X1 + ac*D2 + bc*X2 + ac*D3 + bc*X3 + E3 (Model #3) Factoring (or collecting terms) produces: Y = ac*U + bc*X + E3 (Model #3) (Note that the value of bc in Model #3 is NOT numerically equal to the value in Model #2) And, as before, your favorite Regression procedure should give what you need to compute a confidence interval (such as the standard error of bc). Let me know how this works out. If you have any problems with this approach you are welcome
Re: How many Olympic Medals should Great Britain have won?
Hi, Graham -- It's been a long time since I've heard any discussion about UNDERACHIEVERS and OVERACHIEVERS. I've never been able to understand the discussions. NO MATTER WHAT VALUE THE CORRELATION (SLOPE OF THE REGRESSION LINE) HAS we know that the ALGEBRAIC SUM OF THE ERRORS IS ZERO. Now that says that the SUM OF THE ABSOLUTE VALUES OF THE POSITIVE ERRORS IS EQUAL TO THE SUM OF THE ABSOLUTE VALUES OF THE NEGATIVE ERRORS. THEN WE WOULD EXPECT TO OBSERVE ABOUT ONE-HALF OF THE OBSERVATIONS TO HAVE POSITIVE ERRORS AND ONE-HALF TO HAVE NEGATIVE VALUES. THEREFORE, FOR ALL CORRELATIONS (ZERO INCLUDED) WE SHOULD EXPECT TO CONCLUDE THAT ABOUT ONE-HALF OF ALL CASES WOULD BE CALLED "OVER-ACHIEVERS" AND ABOUT ONE-HALF WOULD BE CALLED "UNDER-ACHIEVERS". DOES THAT DESIGNATION HAVE ANY OPERATIONALLY USEFUL MEANING? --Joe ********Joe Ward.Health Careers High School167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229Phone: 210-433-6575...Phone: 210-617-5400Fax: 210-433-2828Fax: 210-617-5423Email: [EMAIL PROTECTED]http://www.ijoa.org/joeward/wardindex.html*** - Original Message - From: Dr Graham D Smith To: Edstat Sent: Monday, October 02, 2000 11:40 AM Subject: How many Olympic Medals should Great Britain have won? How many Olympic Medals should Great Britain have won? British Olympians won a grand total of 28 medals at the Sydney 2000 Games, our best medal haul for 80 years. Many commentators have suggested that the big improvement in British fortunes compared to the Atlanta 1996 Games is due to the use of Lottery funding to help our top sportsmen and sportswomen. But how many medals should Britain expect to win? Did we fulfil our potential or fall short of it? One important determinant of a country's Olympic success is the size of its population. USA, China and Russia head the Sydney 2000 medal table, they also have large populations. However, population size does not fully account for the number of medals won. Both India and China have much larger populations than USA but won fewer medals. Another important predictor of a nation's Olympic performance is economic prosperity. Richer nations often outperform poorer nations of the same size. Gross domestic product (GDP) is an economic index that reflects both economic success and population size. A scatterplot of the number of medals won and GDP of the 80 medal winning countries at the 2000 Olympics shows a positive correlation; r = 0.595, p 0.01 (see attached). GDP accounts for 35.4% of the variance of medals won. A regression analysis was performed on the data to estimate the number of medals Team GB should expect. Given that the UK GDP is equivalent to US$ 1.29 trillion the expected number of medals is 15. It seems that our Olympians did far better than we could have expected. Well done team GB! And well done too to Team USA, their expected medal count is 26.5. However, the top overachiever was Russia (followed by USA and Australia). The top underachiever was India. *Dr Graham D. SmithPsychology DivisionPark CampusUniversity College NorthamptonBoughton Green Rd.NorthamptonNN2 7AL Tel: +44 (0) 1604 735500 Ext 2393E-mail: [EMAIL PROTECTED]* *Dr Graham D. SmithPsychology DivisionPark CampusUniversity College NorthamptonBoughton Green Rd.NorthamptonNN2 7AL Tel: +44 (0) 1604 735500 Ext 2393E-mail: [EMAIL PROTECTED]*
Re: How many Olympic Medals should Great Britain have won?
Hi, Paige -- Good comments about "There are so many different factors..." "To say that half the observations should have positive errors and halfshould have negative errors is to confuse median with mean." I used the word ABOUT intentionally to distinguish from EXACTLY. --Joe - Original Message - From: "Paige Miller" [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, October 03, 2000 10:19 AM Subject: Re: How many Olympic Medals should Great Britain have won? Hi, Graham -- It's been a long time since I've heard any discussion about UNDERACHIEVERS and OVERACHIEVERS. I've never been able to understand the discussions.NO MATTER WHAT VALUE THE CORRELATION (SLOPE OF THE REGRESSION LINE) HAS we know that the ALGEBRAIC SUM OF THE ERRORS IS ZERO. Now that says that the SUM OF THE ABSOLUTE VALUES OF THE POSITIVE ERRORS IS EQUAL TO THE SUM OF THE ABSOLUTE VALUES OF THE NEGATIVE ERRORS. THEN WE WOULD EXPECT TO OBSERVE ABOUT ONE-HALF OF THE OBSERVATIONS TO HAVE POSITIVE ERRORS AND ONE-HALF TO HAVE NEGATIVE VALUES. THEREFORE, FOR ALL CORRELATIONS (ZERO INCLUDED) WE SHOULD EXPECT TO CONCLUDE THAT ABOUT ONE-HALF OF ALL CASES WOULD BE CALLED "OVER-ACHIEVERS" AND ABOUT ONE-HALF WOULD BE CALLED "UNDER-ACHIEVERS". DOES THAT DESIGNATION HAVE ANY OPERATIONALLY USEFUL MEANING? Paige writes There are so many different factors that go into the amount of medals won that it seems silly to perform a regression based upon population and GDP to use as predictors. Organization of Olympic Committees, training facility quality, programs for youths, weather, etc. all can affect the number of medals won, and then there is the factor of injuries, which to me seems like it cannot be modelled except as random noise. To say that half the observations should have positive errors and half should have negative errors is to confuse median with mean. -- Paige Miller Eastman Kodak Company [EMAIL PROTECTED] "It's nothing until I call it!" -- Bill Klem, NL Umpire "Those black-eyed peas tasted all right to me" -- Dixie Chicks = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: What is today's Hogg Craig?
Hi, Gary, Jerry et al -- Here is a message from Bob Hogg. -- Joe - Original Message - From: "Robert V. Hogg" [EMAIL PROTECTED] To: "Joe Ward" [EMAIL PROTECTED] Sent: Friday, September 22, 2000 9:19 AM Subject: Re: Fw: What is today's Hogg Craig? joe, HOGG AND TANIS is used more for undergrads.COSELLA AND BERGER for first year grad students in stat.HOGG AND CRAIG for good seniors and first year grad students in other areas[like actuarial sci]. bob At 11:24 PM 9/21/00 -0500, Joe Ward wrote: Bob -- Any suggestions for Jerry? -- Joe ******* * Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** - Original Message - From: "Jerry Dallal" [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, September 21, 2000 9:32 PM Subject: What is today's Hogg Craig? Back in the "old days", the standard text for an undergraduate math stat course was Hogg Craig. I had some fondness for Lindgren. I haven't taught this course in nearly 20 years. Which texts occupy their position today? Thanks. - Original Message - From: "Gary McClelland" [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Friday, September 22, 2000 11:49 AM Subject: Re: What is today's Hogg Craig? in article [EMAIL PROTECTED], Jerry Dallal at [EMAIL PROTECTED] wrote on 9/21/00 8:32 PM: Back in the "old days", the standard text for an undergraduate math stat course was Hogg Craig. I had some fondness for Lindgren. I haven't taught this course in nearly 20 years. Which texts occupy their position today? Thanks. According to amazon.com, the 1994 5th edition is still in print. I keep my much earlier edition closely guarded. But I too would be interested in hearing what the kids learn with today. gary -- [EMAIL PROTECTED] = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: cluster
Hi, Thomas -- If you have a SAS Manual the McQuitty method is described briefly in the CLUSTER Chapter. Also, I think the original article is: McQuitty, L.L. (1966) "Similarity Analysis by Reciprocal Pairs for Discrete and Continuous Data" Ed and Psy Meas, 17, 207-229. Look at: Anderberg, M.R. (1973) "Cluster Analysis for Applications" New York, Academic Press. --- Joe ******** Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** - Original Message - From: "Thomas Pesl" [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Friday, September 22, 2000 4:19 AM Subject: cluster Does anyone know the formula of the McQuitty clustering method? Thanks, Thomas = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: How can I analyze split-design by SPSS v9.0?
Anuvat -- Here comes my "standard" comment! 1. State your research question(s) in "natural language". 2. Create a model that enables you to answer the "natural language" questions that YOU WANT. 3. Impose restrictions on YOUR MODEL that answers YOUR questions of interest. 4. Use the computer to get YOUR DESIRED RESULTS. Then AFTER YOU HAVE VERIFIED THAT THERE EXISTS A "PACKAGED" ALGORITHM THAT ANSWERS YOUR QUESTIONS OF INTEREST, THEN USE THE "PACKAGED" ALGORITHM. Since many "interesting" research questions involve creating models for unique problems, it can be more efficient to create your OWN MODELS rather than searching for "packaged" algorithms that MAY fit YOUR research questions of interest. IMHO it seems best to take time to develop "model-creation" skills so that you can have the POWER that is available. If you have time to take a look at the URL below, Slides 7 and 8 of the PowerPoint presentation on "Using Calculators and Computers in Statistics" - Laura Niland Joe Ward, CAMT98 45th Annual Conference, San Antonio, July 23, 1998 - give a pictorial view of "Forcing" vs. "Creating" Models. Good luck-- Joe Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** - Original Message - From: "Anuvat Jangchud" [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, September 06, 2000 10:32 PM Subject: How can I analyze split-design by SPSS v9.0? I would like to use SPSS v.9.0 for SPLIT Design anlysis. Could you help me out? = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Math Education of Mathematics Teachers
Dick -- I'm staying 'til Friday to attend THAT SESSION. The discussions should be of interest to secondary teachers in the Indianapolis area. It would be great if arrangements could be made for teachers to attend THAT session without needing to register for the JSM. I think it is Session 281, Thursday, Aug. 17 10:30 a.m. - 12:30. -- Joe Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** - Original Message - From: "Richard L. Scheaffer" [EMAIL PROTECTED] To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Tuesday, August 01, 2000 1:22 PM Subject: Math Education of Mathematics Teachers I would like to call your attention to a session at the Joint Statistics Meetings that those of you interested in statistics education might have overlooked. Session 279, The Importance of Statistics in the Education of Future Teachers reports on a project of the Conference Board of the Mathematical Sciences, funded by NSF an DoEd, that will attempt to get departments of mathematical sciences more involved in the education of future teachers. Teachers coming out of colleges of education are ill equipped to teach in the modern math curriculum - a curriculum that includes much statistics. This project makes a series of recommendations on how to solve this problem. Among the recommendations are strong statements about the importance of statistics. The panel consists of Alan Tucker, mathematician and lead writer of the CBMS report, Judy Sowder, math educator responsible for the middle school section of the report, Gail Burrill, former president of NCTM and now head of the Math Sciences Education Board at the NAS, and Jerry Moreno, a well-known statistics educator. Unfortunately, this session is in the last time slot of the meeting, 10:30 Thursday morning. So, I hope some of you will have the time and interest to stop by. It should be a lively discussion of a very important topic. Hope to see you there! Dick Scheaffe ps A draft of the report is on the web. CBMS Math Education of Teachers Project Draft Report on the Web www.maa.org/cbms -- Richard L. Scheaffer [EMAIL PROTECTED] Department of Statistics phone 352-392-1941 (#224) Box 118545 fax 352-392-5175 University of Florida Gainesville, FL 32611 907 NW 21 Terrace 352-378-1996 Gainesville, FL 32603 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: regression books?
If you are near a university library you may want to take a look at INTRODUCTION TO LINEAR MODELS by Ward and Jennings. The Purdue library might have a copy. Also, the Fountain-Ward JSE article shown at the URL below is related to your interest. http://www.ijoa.org/joeward/wardindex.html http://www.amstat.org/publications/jse/v4n3/ward.html -- Joe Joe Ward.Health Careers High School 167 East Arrowhead Dr4646 Hamilton Wolfe San Antonio, TX 78228-2402...San Antonio, TX 78229 Phone: 210-433-6575...Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html *** - Original Message - From: "Christopher Tong" [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Saturday, July 22, 2000 2:12 PM Subject: regression books? Does anyone have recommendations for introductory books on regression analysis? I posted this question on sci.stat.math and got only one reply so far. I am currently using Neter, Kutner, Nachtsheim, and Wasserman, which I find unwieldy and not very concise. I have my eye on Montgomery Peck, but am wondering what anyone else would recommend. My one reply so far suggested Cohen Cohen. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Novice questions about regression analysis.
Good comment, Paige-- " A well-designed experiment will yield regression estimates with more desirable properties than a poorly-designed experiment will. Specifically, the parameter estimates may have smaller variance in a well-design experiment, and the parameters will be less correlated (or uncorrelated) with each other. The predicted values of the responses likewise will have smaller variance in a well-designed experiment." However, it is safest to be sure that the "packaged" analyses do what the researcher wants.Do many "packaged COVARIANCE algorithms" still assume NO INTERACTION? Does SAS (or other stat packages) warn us when there is a "missing cell" in an ANOVA-LIKE GLM computation? -- Joe ********** Joe Ward Health Careers High School 167 East Arrowhead Dr. 4646 Hamilton Wolfe San Antonio, TX 78228-2402 San Antonio, TX 78229 Phone: 210-433-6575 Phone: 210-617-5400 Fax: 210-433-2828Fax: 210-617-5423 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/watdindex.html ** - Original Message - From: "Paige Miller" [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, June 28, 2000 11:08 AM Subject: Re: Novice questions about regression analysis. Wen-Feng Hsiao wrote: Dear listers, I am stuck with the experiment design of my dissertation. My experiment would like to investigate the influences of different factors of stimuli on the subject's response (each factor is a continuous variable), and further build a regression model for these relations. My questions are: 1. It seems that no experiment-design issues related to Regression Analysis are discussed in the usual statistics textbook. Why? Does it mean one needn't consider the experiment design if he uses Regression Analysis to analyze his data? A well-designed experiment will yield regression estimates with more desirable properties than a poorly-designed experiment will. Specifically, the parameter estimates may have smaller variance in a well-design experiment, and the parameters will be less correlated (or uncorrelated) with each other. The predicted values of the responses likewise will have smaller variance in a well-designed experiment. 2. Due to the measure of the dependent variable is the participants' subjective responses, to remove unrelated subject-specific variables, I am considering to employ a within-subject design. But there seems no statistical packages ready for dealing with within-subject design of Regression Analysis? SAS and JMP will perform these analyses, although the manual may not specifically call them 'within-subject' analyses. Other packages probably will handle them as well, but I cannot advise you of specifics. Suppose a design in which each of the n subjects gives rise to a Y observation under each of c different conditions, then a total of N=ncY observations could be obtained. How can I use Regression Analysis to analyze these observations? The model will predict the response Y as a function of the subject and each of the design variables, plus any desired interactions between design variables, interactions between subject and design variables, and polynomial terms (if desired) involving design variables. -- Paige Miller Eastman Kodak Company [EMAIL PROTECTED] "It's nothing until I call it!" -- Bill Klem, NL Umpire "Those black-eyed peas tasted all right to me" -- Dixie Chicks === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ === === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Stupid question on relationship of r and t
Jason -- t^2 = r^2*(n-2) --- (1-r^2) is a special case of the more general case of using R^2 to compute the F statistic in a Prediction/Regression/Linear Models approach to research studies. Letting R^2(Assumed) = R^2 for the ASSUMED MODEL R^2(Restricted)= R^2 for the RESTRICTED MODEL NA = number of linearly independent predictor vectors (i.e., the number of parameters) in the ASSUMED MODEL. NR = number of linearly independent predictor vectors (i.e., the number of parameters) in the RESTRICTED MODEL N= total number of observations (cases) df1 = NA - NR =numerator degrees of freedom df2 = N - NA=denominator degrees of freedom F(df1,df2) = (R^2(Assumed) - R^2(Restricted))/(df1) --- (1 - R^2(Assumed))/(df2) Now consider the your special case when: The ASSUMED MODEL CONTAINS ONLY TWO PREDICTORS: Y = b0*U + b1*X + Ea and the Hypothesis is "b1 = 0"). Then the RESTRICTED MODEL is: Y = b0*U + Er In this special case, R^2(Restricted) = 0 and then F(df1,df2) = (R^2(Assumed)/(df1) --- (1 - R^2(Assumed))/(df2) and you can easily solve for R^2 if desired. R^2(Assumed) = F*(df1) --- (df2) + F*(df1) and in your special case of only ONE predictor (in addition to, U), sometimes called "simple regression". df1 = 2 - 1 = 1 and df2 = N - 2 R^2(Assumed) = r^2 =F N - 2 + F but since t^2(df2) = F(1,df2) then we have r^2 =t^2 - N - 2 + t^2 which is what you obtain from Bob's suggestion -- t= r * sqrt(n-2) - sqrt(1-r^2) I want to be able to calculate r from t. I tried algebraically manipulating the formula, but never quite got it to where I could do this. Any advice? Try squaring both sides and re-arranging. ( Joe Ward's comment "GOOD SUGGESTION BY BOB") Bob -- Bob O'Hara Metapopulation Research Group Division of Population Biology Department of Ecology and Systematics PO Box 17 (Arkadiankatu 7) FIN-00014 University of Helsinki Finland tel: +358 9 191 7382 fax: +358 9 191 7301 email: [EMAIL PROTECTED] To induce catatonia, visit: http://www.helsinki.fi/science/metapop/ - Original Message - From: "Anon." [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Saturday, June 24, 2000 7:20 AM Subject: Re: Stupid question on relationship of r and t "Jason Osborne, Ph.D." wrote: I am working on a power analysis project- we are reviewing old journal articles to calculate observed effect sizes and power. Some of these articles, for example reporting t-test results, only give means and t-test, no standard deviation. thus, no effect size calculation is possible. I was hoping to estimate an effect size by converting a t to an r. I seem to remember a formula that relates the two, but am having a dickens of a time tracking one down. The one I did track down, for calculating t from r, is not that helpful: t= r * sqrt(n-2) - sqrt(1-r^2) I want to be able to calculate r from t. I tried algebraically manipulating the formula, but never quite got it to where I could do this. Any advice? Try squaring both sides and re-arranging. Bob -- Bob O'Hara Metapopulation Research Group Division of Population Biology Department of Ecology and Systematics PO Box 17 (Arkadiankatu 7) FIN-00014 University of Helsinki Finland tel: +358 9 191 7382 fax: +358 9 191 7301 email: [EMAIL PROTECTED] To induce catatonia, visit: http://www.helsinki.fi/science/metapop/ I have yet to see any problem, however complicated, which, when you looked at it in the right way, did not become still more complicated. - Poul Anderson === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ === === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of
Re: Beginner requests for help on ANOVA and T-tests (n SYSTAT97 --CAUTION)
Edmond-- You may want to use the REGRESSION program in Excel (WITH CAUTION). That way you can create your own models to do what YOU WANT TO DO. You might want to contact a statistician to help you use REGRESSION models. You don't need to use some of the Pre-Computer algorithms if you know who to create your models to answere YOUR QUESTIONS. The URL below has a few articles related to this message: http://www.ijoa.org/joeward/wardindex.html If the "packaged" algorithms answer the questions of interest, then you can use them. I am using Excel 97 with three high school students this summer. 2 Sophomores and 1 Senior in preparation for their Science Fair Research Projects. I usually use SYSTAT. However, these students already have Excel, so we are "testing" the use of REGRESSION in Excel. Incidentally, when you use REGRESSION models that need to: NOT HAVE THE Y-INTERCEPT TO PASS THROUGH ZERO, THE REGRESSION SUM OF SQUARES ARE NOT CORRECT. So be careful when you use REGRESSION in Excel 97. The Excel97 Error is due to the fact that the REGRESSION SUM OF SQUARES IS CALCULATED FROM THE "TOTAL SUM OF SQUARES" MINUS THE "RESIDUAL SUM OF SQUARES". THE "TOTAL SUM OF SQUARES" IS NOT CORRECT WHEN YOU INDICATED THAT YOU DO NOT WANT THE INTERCEPT TO PASS THROUGH THE ORIGIN. THE EXCEL PROGRAM USES THE "ADJUSTED SUM OF SQUARES" (REMOVING the REGRESSION SUM OF SQUARES ACCOUNTED FOR BY THE UNIT VECTOR (the "MEAN"). The REAL TOTAL SUM OF SQUARES IN THIS CASE SHOULD BE THE SUM OF SQUARES FOR THE DEPENDENT VARIABLE. Apparently the programmer of the REGRESSION procedure did not know how to compute the REAL TOTAL SUM OF SQUARES. As some of the users and creators of Statistical Software Packages frequently mention: "Using the statistical routines in Excel can be risky." Of course, ALL statistical packages should be used with caution. We have not had time to check on the Excel2000 to find out if it is still has the same problem. Keep in touch. -- JHW * Joe Ward * 167 East Arrowhead Dr. * San Antonio, TX 78228-2402 * Phone: 210-433-6575 * Fax: 210-433-2828 * Email: [EMAIL PROTECTED] * http://www.ijoa.org/joeward/wardindex.html * * Health Careers High School * 4646 Hamilton Wolfe * San Antonio, TX 78229 * Phone: 210-617-5400 * Fax: 210-617-5423 ** -Original Message- From: [EMAIL PROTECTED] [EMAIL PROTECTED] To: [EMAIL PROTECTED] [EMAIL PROTECTED] Date: Thursday, June 15, 2000 9:37 AM Subject: Beginner requests for help on ANOVA and T-tests Hello, I am a 16 year old student and a beginner to statistics. I'm lost. Currently I only have Microsoft Excel 97. And I would like to know the differences between the following ANOVA tests (in Excel): ANOVA Single Factor ANOVA Two-Factors with replication ANOVA Two-Factors without replication What do all these mean? Where and when should they be applied? And can anyone please use simple english terms to explain? I am only a beginner. What is one-way or two-way ANOVA? How about for T-Test? T-Test: Paired two samples for means T-Test: Two-sample assuming equal variances T-Test: Two-sample assuming unequal variances Also, can I use ANOVA instead of T-test when testing null hypothesis? Between 2 groups? Thanks for your help, Edmund Sent via Deja.com http://www.deja.com/ Before you buy. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ === === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: MANOVA
If the 'ZERO' or 'DOT' means that you have some missing cells then that is a good time to "CREATE YOUR OWN MODEL". -- Joe ******** * Joe Ward * 167 East Arrowhead Dr. * San Antonio, TX 78228-2402 * Phone: 210-433-6575 * Fax: 210-433-2828 * Email: [EMAIL PROTECTED] * http://www.ijoa.org/joeward/wardindex.html * * Health Careers High School * 4646 Hamilton Wolfe * San Antonio, TX 78229 * Phone: 210-617-5400 * Fax: 210-617-5423 ** -Original Message- From: HAideren [EMAIL PROTECTED] To: [EMAIL PROTECTED] [EMAIL PROTECTED] Date: Wednesday, June 14, 2000 8:12 PM Subject: MANOVA Hi, I have run a MANOVA and in the 'Parameter Estimates' section of the results, some of the cells are filled with a zero or a dot (.). Is there a way to overcome this problem? If no, should I run a different multivariate test and what would be the appropriate substitute test? Cheers. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ === === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Inequalities constrains on the coefficients
I asked Lee Wilkinson how this is done in SYSTAT. Here is his reply. -- Joe * Joe Ward * 167 East Arrowhead Dr. * San Antonio, TX 78228-2402 * Phone: 210-433-6575 * Fax: 210-433-2828 * Email: [EMAIL PROTECTED] * http://www.ijoa.org/joeward/wardindex * * Health Careers High School * 4646 Hamilton Wolfe * San Antonio, TX 78229 * Phone: 210-617-5400 * Fax: 210-617-5423 ** -Original Message- From: Wilkinson, Leland [EMAIL PROTECTED] To: 'Joe Ward' [EMAIL PROTECTED] Date: Thursday, June 08, 2000 9:34 AM Subject: RE: Inequalities constrains on the coefficients The SYSTAT procedure NONLIN does the same with the LOSS option and FUNPAR. Could you perhaps post this to Ed-Stat in the same thread? Thanks, Lee -Original Message- From: Joe Ward [mailto:[EMAIL PROTECTED]] Sent: Tuesday, June 06, 2000 11:56 AM To: Wilkinson, Leland (SYSTAT Subject: Fw: Inequalities constrains on the coefficients Lee -- Is this available in any version of SYSTAT? What about SYSTAT8-Student Version? -- Joe = -Original Message- From: Jonathan Fry [EMAIL PROTECTED] To: [EMAIL PROTECTED] [EMAIL PROTECTED] Date: Tuesday, June 06, 2000 11:05 AM Subject: Re: Inequalities constrains on the coefficients Arie Beresteanu wrote: Hi, Estimation of linear (multivariate) regression with equality constrains on the coefficients is a well known problem (at least for me). What about if the constrains are inequalities? More specifically: Y=Xb+e s.t. Qb=q where Q is a matrix and q is a vector. (for example Y=b0+b1*X1+b2*X2+e s.t. b1+2*b2=0 ) How do I solve that? How do I test the constrain? Is there something on MatLab/STATA/SAS for that? Thank you, Arie. The SPSS procedure CNLR (constrained non-linear regression) handles this kind of problem directly, using a quadratic programming solver. Jonathan Fry SPSS Inc. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ === === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: R sq vs r sq
Hi Paul, William et al.-- This may be ANOTHER GOOD TIME TO COMMENT ON THE COMMUNICATION PROBLEMS OF STATISTICS (AND OTHER AREAS, TOO). I suggest that when we use the terms LINEAR and NONLINEAR that we tell the reader what the SENDER means by those terms. When I write: Y = b1*X1 + b2*X2 + ... + bp*Xp + E where bi (i = 1,2,...p) are least-squares regression coefficients, I will refer to this as a LINEAR MODEL. The Xs can be any numbers that I choose-- log(z), ln(z), z^3, cos(z), 1/z, binary (1or 0), ... If a person writes the form: Y = a0 + a1*X + a2*X^2 + a3*X ^3 + E then they might say that this is a NONLINEAR model. As long as the reader knows exactly what the model is-- then we are communicating. In these days of fancy 3D graphic displays, it is interesting to picture the function: Y = a0 + a1*X + a2*X^2 in the 2D space of Y and X -- which appears as a CURVE. and then picture the function in the 3D space of Y, X and X^2 or re-designating X^2 as Z Y = a0 + a1*X + a2*Z We notice that the 3D function lies in a PLANE -- reminding us that we have a "LINEAR MODEL". If we hurriedly say to someone that "this function is NONLINEAR in the 2D space of Y and X, but LINEAR in the 3D space of Y,X and Z", then we might even cause more frustration. :-( "COMMUNICATION" IS A PROBLEM EVERYWHERE! DO WILLIAM AND PAUL HAVE THE SAME MEANING FOR "NONLINEAR"? :-) --- Joe **** * Joe Ward Health Careers High School * * 167 East Arrowhead Dr 4646 Hamilton Wolfe* * San Antonio, TX 78228-2402San Antonio, TX 78229 * * Phone: 210-433-6575 Phone: 210-617-5400* * Fax: 210-433-2828 Fax: 210-617-5423 * * [EMAIL PROTECTED]* * http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: Paul Velleman [EMAIL PROTECTED] To: William J. Larson [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Friday, May 05, 2000 6:43 AM Subject: Re: R sq vs r sq | At 11:18 AM +0200 05/05/2000, William J. Larson wrote: | | It appears that R sq is some sort of generalization of r sq | for nonlinear cases. True? | | Not really. common convention is to capitalize the R for multiple | correlation. The R sqr reported in regressions allows for the | generalization of simple regression to a multiple regression (2 or | more predictors). In both cases R sqr is the squared correlation | between y and y-hat. Y-hat represents the best (in the least squares | sense) fit to y among all linear combinations of the x's. All of | these are statistics for linear models. It is dangerous to apply them | to nonlinear models. | | -- Paul | -- | Paul F. Velleman | Cornell University Data Description, Inc. | 358 Ives Hall Box 4555 | Ithaca, NY 14853 Ithaca, NY 14852-4555 | (607) 255-4411 (607) 257-1000 | (607) 255-8484 fax(607) 257-4146 fax | === | The Advanced Placement Statistics List | To UNSUBSCRIBE send a message to [EMAIL PROTECTED] containing: | unsubscribe apstat-l email address used to subscribe | Discussion archives are at | http://forum.swarthmore.edu/epigone/apstat-l | Problems with the list or your subscription? mailto:[EMAIL PROTECTED] | === | === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: R sq vs r sq
Bill -- You are so right!! The term NONLINEAR is very confusing. As I indicated in the earlier message, most folks in the statistics world refer to a LINEAR MODEL as I indicated. Y = b1*X1 + b2*X2 + ... + bp*Xp + E and some folks will write UNFORTUNATELY -- Y = b0 + b1*X1 + b2 * X2 + ... + bp*Xp + E that leads to more confusion!! The main point is that the functions are LINEAR IN THE UNKNOWN COEFFICIENTS. This is why we sometimes take the logs of the function so that the new function is LINEAR IN THE UNKNOWN COEFFICIENTS -- AND THE SOLUTIONS ARE EASIER. A "REAL" NONLINEAR MODEL NEEDS SOME SPECIAL ALGORITHMS FOR SOLUTION. --- Someday -- long after I'm out of this world -- the AP-Statistics objectives WILL ALLOW OUR STUDENTS TO HAVE -- "The power they deserve to use REGRESSION/LINEAR MODELS and COMPUTERS/CALCULATORS to their fullest". Perhaps the secondary teachers can speed up improvements through the NCTM "Principles and Standards for School Mathematics". Perhaps there should be an Applied Research Statistics course that has few restrictions on the content -- focusing on those topics that help students do what they NEED to accomplish practical results -- leading to more enthusiasm for statistics and data analysis. Change is slow!! :-) -- Joe ******** * Joe Ward Health Careers High School * * 167 East Arrowhead Dr 4646 Hamilton Wolfe* * San Antonio, TX 78228-2402San Antonio, TX 78229 * * Phone: 210-433-6575 Phone: 210-617-5400* * Fax: 210-433-2828 Fax: 210-617-5423 * * [EMAIL PROTECTED]* * http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: William J. Larson [EMAIL PROTECTED] To: Joe Ward [EMAIL PROTECTED]; Paul Velleman [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Friday, May 05, 2000 10:46 AM Subject: Re: R sq vs r sq | Joe, | | Well by linear *I* meant what we mean in algebra 2 class y = mx + b, | but I do not object to calling y = a0 + a1 x1 + a2 x2 + a3 x3 + ... linear. | I certainly DO object to your definition of linear, although I suppose | it *is* used by some people, I find it very confusing. | | Cheers, | Bill Larson | Geneva, Switzerland | | - Original Message - | From: Joe Ward [EMAIL PROTECTED] | To: William J. Larson [EMAIL PROTECTED]; Paul Velleman | [EMAIL PROTECTED] | Cc: [EMAIL PROTECTED] | Sent: 2000 May 05 9:07 PM | Subject: Re: R sq vs r sq | | | | Hi Paul, William et al.-- | | This may be ANOTHER GOOD TIME TO COMMENT ON | THE COMMUNICATION PROBLEMS OF STATISTICS (AND OTHER AREAS, TOO). | | I suggest that when we use the terms LINEAR and NONLINEAR that we | tell the reader what the SENDER means by those terms. | | When I write: | | Y = b1*X1 + b2*X2 + ... + bp*Xp + E | | where bi (i = 1,2,...p) are least-squares regression coefficients, I | will refer to this as a LINEAR MODEL. | | The Xs can be any numbers that I choose-- log(z), ln(z), z^3, cos(z), 1/z, | binary (1or 0), ... | | If a person writes the form: | | Y = a0 + a1*X + a2*X^2 + a3*X ^3 + E | | then they might say that this is a NONLINEAR model. | | As long as the reader knows exactly what the model is-- then we are | communicating. | | In these days of fancy 3D graphic displays, it is interesting to picture the | function: | | Y = a0 + a1*X + a2*X^2 | | in the 2D space of Y and X -- which appears as a CURVE. | | and then picture the function in the 3D space of Y, X and X^2 or | re-designating X^2 as Z | | Y = a0 + a1*X + a2*Z | | We notice that the 3D function lies in a PLANE -- reminding us that | we have a "LINEAR MODEL". | | If we hurriedly say to someone that "this function is NONLINEAR in the 2D | space of Y and X, but | LINEAR in the 3D space of Y,X and Z", then we might even cause more | frustration. :-( | | "COMMUNICATION" IS A PROBLEM EVERYWHERE! | | DO WILLIAM AND PAUL HAVE THE SAME MEANING FOR "NONLINEAR"? | :-) | | --- Joe | ******** | * Joe Ward Health Careers High School * | * 167 East Arrowhead Dr 4646 Hamilton Wolfe* | * San Antonio, TX 78228-2402San Antonio, TX 78229 * | * Phone: 210-433-6575 Phone: 210-617-5400* | * Fax: 210-433-2828 Fax: 210-617-5423 * | * [EMAIL PROTECTED]* | * http://www.ijoa.org/joeward/wardindex.html * | |
STATISTICS AT ISEF2000- International Science Engineering Fair -- Detroit May 7-13 --Summer Workshop in San Antonio
Topic #1 --The directory of finalists for ISEF2000 is now available at: http://www.sciserv.org/isef/finaldir.pdf There are finalists from all U.S. states and over 40 nations. I did a brief search for MICHIGAN and a few schools represented are: Renaissance HS Saginaw Arts Science Academy Western High School Redford HS It is easy to find finalists near your location. If you know any finalists, teachers, parents or others who might be interested I will present the annual Shop Talk titled: "Combining the Power of Statistics and Computers to Enhance Science Fair Projects" at 9:00-10:00 a.m. on Monday, May 8, 2000 in Cobo Hall Room O2-41. The purpose of this session is to provide guidance to Science Fair students, teachers and others to help them acquire statistics advice and suggest kinds of questions they might ask their statistical advisors. As you can guess, I will encourage the participants to get assistance from those who can teach them to create the models needed to answer their, possibly unique, questions of interest. You can tell your friends that there will be some valuable drawing prizes for those who get there early and stay 'til the end. This year, none of the students with whom I advised in their data-analysis made it to ISEF2000. ---sigh :-( == Topic #2 -- We have decided to open our Summer Workshop, emphasizing the Power of Statistics and Computers in Science Research, to a select few folks who may want to attend FROM OUTSIDE THE SAN ANTONIO REGION. The application form with detailed information can be seen at the web site shown below. This may be of interest to those who work with student research projects and those AP-Statistics teachers who have some extra school time AFTER THE MAY AP-EXAM to introduce their students to some additional data-analysis ideas. http://www.ijoa.org/joeward/wardindex.html The dates are May 29 - June 9. If a participant can stay for only the first week, that's OK. Those who may be interested can call me to discuss details. --- Joe ******** * Joe Ward Health Careers High School * * 167 East Arrowhead Dr 4646 Hamilton Wolfe* * San Antonio, TX 78228-2402San Antonio, TX 78229 * * Phone: 210-433-6575 Phone: 210-617-5400* * Fax: 210-433-2828 Fax: 210-617-5423 * * [EMAIL PROTECTED]* * http://www.ijoa.org/joeward/wardindex.html * === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing -Reply
Hi, Robert and all -- Yes, there occasionally were discussions in our Air Force research whether or not we were working with the POPULATION or a SAMPLE. As Dennis comments: | | the flaw here is that ... she has population data i presume ... or about | as | close as one can come to it ... within the institution ... via the budget | or comptroller's office ... THE salary data are known ... so, whatever | differences are found ... DEMS are it! | One of my Professors used to use the Invertebrate Paleontologists as his example of a POPULATION. I think at that time there were less than 20 people who were Invertebrate Paleontologists. -- Joe * Joe Ward Health Careers High School * * 167 East Arrowhead Dr 4646 Hamilton Wolfe* * San Antonio, TX 78228-2402San Antonio, TX 78229 * * Phone: 210-433-6575 Phone: 210-617-5400* * Fax: 210-433-2828 Fax: 210-617-5423 * * [EMAIL PROTECTED]* * http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: Robert Dawson [EMAIL PROTECTED] To: dennis roberts [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Monday, April 17, 2000 9:54 AM Subject: Re: hyp testing -Reply | | - Original Message - | From: dennis roberts | At 10:32 AM 4/17/00 -0300, Robert Dawson wrote: | | There's a chapter in J. Utts' mostly wonderful but flawed low-math | intro | text "Seeing Through Statistics", in which she does much the same. She | presents a case study based on some of her own work in which she looked | at | the question of gender discrimination in pay at her own university, and | fails to reject the null hypothesis [no systemic difference in pay | between | male and female faculty]. She heads the example "Important, but not | significant, differences in salaries"; comments (_perhaps_ technically | correctly but misleadingly) that "a statistically naive reader could | conclude that there is no problem" and in closing states: | | and Dennis Roberts replied: | | the flaw here is that ... she has population data i presume ... or about | as | close as one can come to it ... within the institution ... via the budget | or comptroller's office ... THE salary data are known ... so, whatever | differences are found ... DEMS are it! | | the notion of statistical significance in this case seems IRRELEVANT ... | the real issue is ... given that there are a variety of factors that might | account for such differences (numbers in ranks, time in ranks, etc. etc.) | is the remaining difference (if there is one) IMPORTANT TO DEAL WITH | ... | | | If one can totally explain all contributing factors, so that a model | with significantly fewer parameters than there are faculty fits everybody to | within a practically significant margin of error, then yes, either the model | continues to work with gender removed or it doesn't. | | If, on the other hand, there are unknown sources of variation (a | reasonable assumption in any situation involving people), or more sources of | variation than there are data (another good bet if one thought hard enough), | one cannot automatically go from the observation | | (*) "The average pay of female faculty members here is less than that of | male faculty members" | | to the apparently desired conclusion | | (**) "There is a gender-based _pattern_ of discrimination in faculty | salaries" | | without considering the study as a pseudo-experiment, and analyzing it as | such. One would be trying to decide: is the difference between mean male | and female faculty salaries greater than one would expect if one took N1 | males and N2 females and assigned factors such as experience, rank, | skill/luck at negotiating a first contract, demand for specialties, merit | pay actually deserved [as opposed to given on a gender basis], etc. at | random? | | This is what Utts and her coauthors were, it seems, trying to do. | However, when the tests were not significant at the chosen level they seem | to have fallen back on inferring (**) directly from (*). | | -Robert Dawson | | | | === | This list is open to everyone. Occasionally, less thoughtful | people send inappropriate messages. Please DO NOT COMPLAIN TO | THE POSTMASTER about these messages because the postmaster has no | way of controlling them, and excessive complaints will result in | termination of the list. | | For information about this list, including information about the | problem of inappropriate messages and information a
Re: linear model or interactive model?
Wen-Feng- The term LINEAR is a difficult term. As I mentioned to you in an earlier message (included for reference as the end of this message), a LINEAR STATISTICAL MODEL is "LINEAR" in the unknown coefficients, a1, a2,... ap in the model: Y = a1*X1 + a2*X2 + ... + ap*Xp + E The X predictors can be ANY NUMBERS THAT WE LIKE. If we write -- Y = a1*U + a2*X + a2*X^2 + E where U = 1 X = a continuous predictor X^2 = X*X E = error or residual we might say that the function is NON-LINEAR in the two-dimensional, Y-X plane, but it is LINEAR in the three dimensional space of Y-X-X^2. With 3-D displays that we can rotate as we would like, it is enlightening to observe that the CURVE seen in the two-dimensional space lies in a PLANE in the three-dimensional space of Y-X-X^2. -- Joe ******** * Joe Ward Health Careers High School * * 167 East Arrowhead Dr 4646 Hamilton Wolfe* * San Antonio, TX 78228-2402San Antonio, TX 78229 * * Phone: 210-433-6575 Phone: 210-617-5400* * Fax: 210-433-2828 Fax: 210-617-5423 * * [EMAIL PROTECTED]* * http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: Wen-Feng Hsiao [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Saturday, April 15, 2000 5:14 AM Subject: Re: linear model or interactive model? | Dear Hartig, | | Thanks for your reply. I am sorry for my poor knowledge in statistics. | But I wonder why the definition of 'linearity' of statistics is different | from that of engineering mathematics, which defines 'linear' as: | | Each unknown xj appears to the first power only, and that there are no | cross product terms xi*xj with i!=j. | | Wen-Feng | | In article [EMAIL PROTECTED], | [EMAIL PROTECTED] says... | Generally, you can include an interaction (or moderator) term in a linear | model, like | y = b0 + b1 * x1 + b2 * x2 + b3 * x1*x2, | and the model still is linear. If you decide not to include x1 and x2, like | y = b0 + b1 * x1*x2, | you still have a linear model. | - Original Message ----- From: Joe Ward [EMAIL PROTECTED] To: [EMAIL PROTECTED]; Wen-Feng Hsiao [EMAIL PROTECTED] Sent: Thursday, April 13, 2000 10:30 AM Subject: Re: linear model or interactive model? - Original Message - From: Wen-Feng Hsiao [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, April 13, 2000 3:06 AM Subject: linear model or interactive model? | Dear all, | | Suppose I have an aggregation model which is in the following form: | Y = c1*(X11 * X12) + c2*(X21 * X22)? | | This model could be thought as an aggregation of two knowledge, namely | X1. and X2.. Each knowledge contains two pieces of information | (attributes). For example, X1 contains X11 ans X12. Now if X.1 is the | height, and X.2 is the weight of a person. Then, the aggregation of any | two persons, say, Student1(height=170cm, weight=60kg), | Student2(height=180cm, weight=68kg) can be represented by | | Y = 170*60+180*68=22440. | | My question: a model as the above form is linear or interactive? I doubt | it is not a linear model. Since it is not in this form: Y= c1 X1 + c2 X2, | where c1 and c2 are constant. I doubt it is not a pure interactive form, | since X.1 and X.2 are dependent. Sorry for this stupid question. | | Wen-Feng | Joe Ward writes| === Wen-Feng--- Your model -- Y = X11 * X12 + X21 * X22. does not have any unknowns. Did you mean to write: Y = c1*(X11 * X12) + c2*(X21 * X22)? All models of the form: Y = c1*X1 + c2*X2 + ... + cp*Xp + E are LINEAR MODELS. It does not matter what NUMBERS are included in the Xs. Y = c1*X1 + c2*X2 + c3*(X1*X2) + c4*(X1^2) + c5*(lnX1) + E is LINEAR in the unknown coefficients c1, c2, ... The most useful Xs are the BINARY( 1 or 0) predictors. --- Joe ******** * Joe Ward Health Careers High School * * 167 East Arrowhead Dr 4646 Hamilton Wolfe* * San Antonio, TX 78228-2402San Antonio, TX 78229 * * Phone: 210-433-6575 Phone: 210-617-5400* * Fax: 210-433-2828 Fax: 210-617-5423 * * [EMAIL PROTECTED]* * http://www.ijoa.org/joeward/wardindex.html * === This list is open t
Re: hyp testing
Hi, Dennis-- Yes, "LOT of years!" ago (the 1950's), when I first started into the real applied world, our main job was to PREDICT, PREDICT, PREDICT outcomes. We had some real cost figures to evaluate our predictions. Before the term Bootstrap arrived on the scene, we were Cross-Validating like mad. We would divide those punched cards into "random?" groups and shuffle them over and over again and "re-group". Then apply the predictions developed from one data set to the others to see how well he were doing. Hypothesis testing -- in the classical sense -- was not involved I still believe that TWO important ideas in life are: - PREDICTION and - OPTIMIZATION (choosing among alternative PREDICTIONS to MAXIMIZE or MINIMIZE one or more OBJECTIVE FUNCTIONS). If "Hypothesis testing" helps improve PREDICTION and OPTIMIZATION then that's great. One of the difficulties in academia may be due to the lack of practical, decision-making opportunities. What PRACTICAL ACTIONS do we take as a result of analyzing a two-way table with a Chi-Square "test" if we find a "statistically significant" outcome? I imagine we will get some suggestions from our readers! :-) -- Joe **** * Joe Ward Health Careers High School * * 167 East Arrowhead Dr 4646 Hamilton Wolfe* * San Antonio, TX 78228-2402San Antonio, TX 78229 * * Phone: 210-433-6575 Phone: 210-617-5400* * Fax: 210-433-2828 Fax: 210-617-5423 * * [EMAIL PROTECTED]* * http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: dennis roberts [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Friday, April 07, 2000 6:41 AM Subject: hyp testing | let's say that today ... we as the statistical community decided, by | democratic vote, that the concept of 'hypothesis testing' ... which has | essentially dominated statistical work for as long as i can remember | (which, er um ... is a LOT of years!) ... is relegated to the 'we USED | to do this stuff' category | | just THINK about this | | what would the vast majority of folks who either do inferential work and/or | teach it ... DO | what analyses would they be doing? what would they be teaching? | | | | === | This list is open to everyone. Occasionally, less thoughtful | people send inappropriate messages. Please DO NOT COMPLAIN TO | THE POSTMASTER about these messages because the postmaster has no | way of controlling them, and excessive complaints will result in | termination of the list. | | For information about this list, including information about the | problem of inappropriate messages and information about how to | unsubscribe, please see the web page at | http://jse.stat.ncsu.edu/ | === | === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Reference for regression discontinuity
Hi, Carl --- If you still have your copy of Introduction to Linear Models (Ward Jennings) you will find many examples in Chapters 10 and 11. An interesting example is on paged 217, 11.9 Discontinuity Between Two Second-Degree Polynomials. With facilityto create linear models appropriate to the research questions of interest, many seemingly-unique problems can be handled easily, e.g. Cubic Splines. -- Joe * Joe Ward Health Careers High School ** 167 East Arrowhead Dr 4646 Hamilton Wolfe ** San Antonio, TX 78228-2402 San Antonio, TX 78229 ** Phone: 210-433-6575 Phone: 210-617-5400 ** Fax: 210-433-2828 Fax: 210-617-5423 ** [EMAIL PROTECTED] ** http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: Carl J Huberty [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, March 22, 2000 8:30 AM | Will someone give me a (readable) reference for "regression| discontinuity"? Thanks in advance.| | Carl Huberty| | | | ===| This list is open to everyone. Occasionally, less thoughtful| people send inappropriate messages. Please DO NOT COMPLAIN TO| THE POSTMASTER about these messages because the postmaster has no| way of controlling them, and excessive complaints will result in| termination of the list.| | For information about this list, including information about the| problem of inappropriate messages and information about how to| unsubscribe, please see the web page at| http://jse.stat.ncsu.edu/| ===|
Re: Matrix multiplication
David -- Great message!! One of most "revealing" numerical analysis problems is when there is interest in "POWERING" a transition matrix in a Markov model. PRE-MULTIPLYING to "POWER" the matrix compared to POST-MULTIPLYING can get quite different results This due to the different order of accumulation of the sum of products of numbers between 0 and 1. Numerical analysts can have lots of challenging problems. -- Joe ******** * Joe Ward Health Careers High School ** 167 East Arrowhead Dr 4646 Hamilton Wolfe ** San Antonio, TX 78228-2402 San Antonio, TX 78229 ** Phone: 210-433-6575 Phone: 210-617-5400 ** Fax: 210-433-2828 Fax: 210-617-5423 ** [EMAIL PROTECTED] ** http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: David A. Heiser [EMAIL PROTECTED] To: [EMAIL PROTECTED]; Anthony Pleticos [EMAIL PROTECTED] Sent: Friday, March 17, 2000 2:27 PM Subject: Re: Matrix multiplication | | - Original Message -| From: Anthony Pleticos [EMAIL PROTECTED]| To: [EMAIL PROTECTED]| Sent: Wednesday, March 15, 2000 4:24 PM| Subject: Matrix multiplication| | | I don't know if I hit the correct site but would be grateful for an| answer -| it is a fundamental one. We all know that linear regression can be| accomplished by matrix multiplication and that there are packages which| will| do it for you. I am teaching myself C++ and for the purposes of the| excercise I would like to know how to create a matrix or obtain ready made| code (ie "numerical recipe" )class so I could declare in a program:| | #include iostream.h| #include math.h| #include matrix.h /* if there is such a file */| | | The basic problem is that there is an enormous differences between real| world matricies. There is no one method for numerical matrix reductions. For| example note the very large number of Fortran subroutines that focus on| peculiar aspects (banded, complex, sparse, near singular, positive definite,| not positive definate, triangular, rank deficient, etc., etc) Note the large| number of free Fortran subroutines devoted to matrices in "NETLIB". There| are other free Fortran libraries available from the web.| | Matrix multiplication is not numerically straightforward given a finite| computer environment. One can get very misleading results doing the standard| multiply and add method using standard single precision.| | I would suggest you get familiar with numerical analysis methods. I| personally prefer the works of G. W. Stewart as a source.| | DAHeiser| | | | ===| This list is open to everyone. Occasionally, less thoughtful| people send inappropriate messages. Please DO NOT COMPLAIN TO| THE POSTMASTER about these messages because the postmaster has no| way of controlling them, and excessive complaints will result in| termination of the list.| | For information about this list, including information about the| problem of inappropriate messages and information about how to| unsubscribe, please see the web page at| http://jse.stat.ncsu.edu/| ===|
Re: Why do we use and teach z?
Josh, Bill, et al -- I can't resist!! Yes, those who have invested much of their life in acquiring certain knowledge tend to want future generations to have those "exciting" historical experiences. It is rather unfortunate that we have a hard time making changes to give future generations some of the power they deserve. I experienced some difficulty in the 1950's with those folks who had become "masters" of the various analysis of variancealgorithms that were developed before computers became available. My first major job in the 1950s was to "get us off of Frieden, Marchant and Monroe desk calculators onto the IBM 602A followed by IBM 607, then IBM 650 etc." The biggest difficulty was to get researchers to take advantage of the computer power that allowed them the freedom to create their own models to answer their questions of interest. It was very difficult for persons with Ph.D. degrees to give up that for which they had invested so much time to learn. It wasa little "traumatic" in the 1950s when a Ph.D. was told that"you don't need to have equal or proportional Ns in a two-way ANOVA". And it was really interesting to see the reaction when they were told that "you don't need a response in every cell". As a matter of fact, the managers of our Air Force research organization assembled a panel of experts to come in to find out what Bob Bottenberg and I were up to when we were promoting the use of a more general approach to creating models to answer research questions of interest. It is indeed amazing that, 40 years later, many first-course statistics students are told that "IT IS BEYOND THE SCOPE OF THIS TEXT TO DEAL WITH SITUATIONS IN WHICH SAMPLE SIZES ARE UNEQUAL IN THE CELLS OF TWO-WAY ANOVA". It is little wonder that these students can do very little data analysis in support of practical research. A few of you have heard this "sermon" before!! By the way, those of you who have six weeks of school after the exammight want to give your students some power to use Prediction/Regression/Linear Models and Computers. They might be able to do some useful data analysis and appreciate your efforts!! Well,that's enough from a "NON-INFLUENTIAL OUTLIER". -- Joe * Joe Ward Health Careers High School ** 167 East Arrowhead Dr 4646 Hamilton Wolfe ** San Antonio, TX 78228-2402 San Antonio, TX 78229 ** Phone: 210-433-6575 Phone: 210-617-5400 ** Fax: 210-433-2828 Fax: 210-617-5423 ** [EMAIL PROTECTED] ** http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: Joshua Tabor To: William J. Larson ; AP Stats. list Sent: Friday, March 17, 2000 9:11 AM Subject: RE: Why do we use and teach z? Reply to: RE: Why do we use and teach z? I agree with you completely. The only explanation I received for why it is still in most books is that it is a nice stepping stone to a full fledged t-test (of course, it is very likely I am misinformed). Anyway, this year I have decided to teach inference for proportions first (as the stepping stone) and then go straight into t-tests, eliminating z-tests for means. It helps make the course more realistic, and it saves me precious time (we start the second week of september and have 6 weeks of school after the AP!).I am curious to hear what the college folks (and textbook authors) have to sayjoshJosh TaborWilson HSHacienda Heights, CA[EMAIL PROTECTED]William J. Larson wrote:Why do we use and teach z?As I continually tell my students, normally (no pun intended) we do not know sigma, so we should use t not z. Indeed can we ever knowsigma? If not why do we even bother to mention z? Is it historical reasons? Or because in the real world lots of people ignore the above fact use z anyway, so we are conscientiously preparing our students for the real world? Or (more likely) am I missing something?Dr. William J. Larson[EMAIL PROTECTED]Institut Monte RosaMontreux, Switzerland===The Advanced Placement Statistics ListTo UNSUBSCRIBE send a message to [EMAIL PROTECTED] containing:unsubscribe apstat-l email address used to subscribeDiscussion archives are athttp://forum.swarthmore.edu/epigone/apstat-lProblems with the list or your subscription? mailto:[EMAIL PROTECTED]===RFC822 header--- Return-Path: [EMAIL PROTECTED] Received: from learn.etc.bc.ca ([142.44.5.2]) by ns700-1.enet.hlpusd.k12.ca.us (Post.Office MTA v3.1.2 release (PO203-101c) ID# 0-57237U2600L100S0V35) with ESMTP id AAA20002; Fri, 17 Mar 2000 00:47:46
Re: Looking for text on resampling...
Scott -- Peter Bruce should be able to give us the latest "word". -- Joe ******** * Joe Ward Health Careers High School ** 167 East Arrowhead Dr 4646 Hamilton Wolfe ** San Antonio, TX 78228-2402 San Antonio, TX 78229 ** Phone: 210-433-6575 Phone: 210-617-5400 ** Fax: 210-433-2828 Fax: 210-617-5423 ** [EMAIL PROTECTED] ** http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, February 06, 2000 12:08 PM Subject: Looking for text on resampling... | Our small college library has a collection of basic biostats texts but| nothing that specifically covers the area of resampling. I am currently| looking over a 1991 text by Bryan Manly (Randomization and Monte Carlo| Methods in Biology) - the first two chapters seem quite accessible (to| someone unfamiliar with the field!)| | Could anyone suggest other texts that might cover bootstrapping and| jacknife techniques - I would favour texts that have a biology bent and| are written so non-specialists can follow...| | Many thanks!| | Scott| | | Sent via Deja.com http://www.deja.com/| Before you buy.| | | ===| This list is open to everyone. Occasionally, people lacking respect| for other members of the list send messages that are inappropriate| or unrelated to the list's discussion topics. Please just delete the| offensive email.| | For information concerning the list, please see the following web page:| http://jse.stat.ncsu.edu/| ===|
Re: When *must* use weighted LS?
John-- If you are interested in PREDICTION then the wayYOU use your information is up to YOU. By Cross-validation, Resampling etc. you can determine which prediction method seems to be "best" for your situation. -- Joe ******** * Joe Ward Health Careers High School ** 167 East Arrowhead Dr 4646 Hamilton Wolfe ** San Antonio, TX 78228-2402 San Antonio, TX 78229 ** Phone: 210-433-6575 Phone: 210-617-5400 ** Fax: 210-433-2828 Fax: 210-617-5423 ** [EMAIL PROTECTED] ** http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: John Hendrickx [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, March 15, 2000 1:22 AM Subject: Re: When *must* use weighted LS? | In article 8am7d1$hqj$[EMAIL PROTECTED]">8am7d1$hqj$[EMAIL PROTECTED], | [EMAIL PROTECTED] says...| | I think I made the formulation too wordy in previous| post. | | Let me try this simple question:| | When one wishes to do a (multi)linear regression on a set of | observed data, and one is in the (unusual) position of possessing| a set of sample standard deviations (of varying degrees of f.) | at each value of the "explanatory" variable, how does one| determine whether one ought or ought not to solve the weighted| least squares problem using those sample standard deviations?| | What is the usual decision test for "heterscedasticity" *before* one| solves the regression system? What do people do in practise?| | Most social scientists don't worry very much about the assumptions of OLS | regression, noting that OLS estimates are fairly robust and can give | unbiased estimates even if those assumptions aren't fulfilled. Exceptions | are multilevel models and time series data, data for which the assumption | of uncorrelated error terms is violated. But these require special | programs, not weighted least squares.| | There is also some debate on using weights for stratified sampling and/or | to correct for sampling bias. Weighting leads to correct estimates but | incorrect standard errors. One solution is to include the design | variables in the model instead of weighting. Stata and Wesvar are two | programs that can take weighting into account when calculating standard | errors of estimates. But a quite common approach is to use weights for | descriptive statistics, but not in multivariate models.| | Weights can also be used for certain dependent variables that will | violate the assumption of heteroscedasticity, e.g. a dichotomous | dependent. I recently did a weighted least squares analysis for a co-| worker to replicate an analysis in another paper. The weight was | groupn*pct*(1-pct), where groupn was the number of cases per group and | pct was the proportion with a positive response within each group. But | this basically amounts to a poor approximation of a logit model. Programs | like GLIM that use iteratively reweighted least squares use pct*(1-pct) | as the weight when estimating the model, but now pct is the predicted | probability from the previous iteration.| | As for a test for heteroscedasticity, Stata has a "hettest", which | performs a Cook-Weisberg test and produces a chi-square statistic. They | wrote a book in 1982, "Residuals and influence in regression". I've never | used it though.| | Hope this helps,| John Hendrickx| | | ===| This list is open to everyone. Occasionally, less thoughtful| people send inappropriate messages. Please DO NOT COMPLAIN TO| THE POSTMASTER about these messages because the postmaster has no| way of controlling them, and excessive complaints will result in| termination of the list.| | For information about this list, including information about the| problem of inappropriate messages and information about how to| unsubscribe, please see the web page at| http://jse.stat.ncsu.edu/| ===|
Re: Repeated measures
Hi, Kaspar-- The CORRECT model is the one that allows YOU to answer YOUR OWN questions of interest. If the "packaged" PROCs have been verified to do what YOU want, then that's good. It is sometimes difficult to know what question a "packaged" PROC is attempting to answer. Be careful -- especially if theremay be "missing cells". :-) --Joe ******** * Joe Ward Health Careers High School ** 167 East Arrowhead Dr 4646 Hamilton Wolfe ** San Antonio, TX 78228-2402 San Antonio, TX 78229 ** Phone: 210-433-6575 Phone: 210-617-5400 ** Fax: 210-433-2828 Fax: 210-617-5423 ** [EMAIL PROTECTED] ** http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: Kasper Hornbæk [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, March 09, 2000 1:41 AM Subject: Q: Repeated measures | Hi everybody.| I have a question concerning repeated measures analysis. I am not sure of| whether a linear model with a factor that varies as repeated measures are| taken (e.g., order or session) is identical to a repeated measures analyses.| I'll detail the question below.| | I have a within-subject study in which subjects used three methods to solve| six different tasks. The experiment is run in three sessions, each| consisting of two tasks. Three of the tasks are very different from the| other three tasks.| | For analysing this experiment, I plan to use a model like Y[ijkl]:= u+| subject[i]+ task[j]+ session[k]+ method[l]+ e[ijkl],| possibly adding interactions between task, method and session. Is this a| repeated measures analysis or equivalent to a repeated measures analysis?| | If not, how should I analyse these data using SAS's repeated measures| option?| | Kind regards,| Kasper Hornbæk/| kash(at)diku.dk| | | | | ===| This list is open to everyone. Occasionally, less thoughtful| people send inappropriate messages. Please DO NOT COMPLAIN TO| THE POSTMASTER about these messages because the postmaster has no| way of controlling them, and excessive complaints will result in| termination of the list.| | For information about this list, including information about the| problem of inappropriate messages and information about how to| unsubscribe, please see the web page at| http://jse.stat.ncsu.edu/| ===|
Fw: other uses for Minitab
Hi, Tim --It's good to hear that some folks think it is useful to fit a least-squaresline through the origin. Of course it is even better to be able to "force"a least-squares model to have a wide range of properties (restrictions).Without any connection to statistics, students should be given the opportunityto use their algebra "savvy" to impose restrictions on math models.For example, Given a model of the form:Y = a0 + a1*X + a2*X^2 + Eit might be of interest to "restrict" the model to:-- Pass through the originor-- Pass through X=1 and Y = 2or-- Slope = 0 at X=5 (For the calculus crowd) orMany others!---Using Algebra, Geometry and Trig. the "least-squares story" can be presented to students WITHOUT CALCULUS.Minimizing "distance" from a point to a line, or plane, or hyper-plane seems tobe more appealing than taking partial derivatives. Connecting "perpendicularity" to"orthogonality" seems to work well.-- Joe**** * Joe Ward Health Careers High School ** 167 East Arrowhead Dr 4646 Hamilton Wolfe ** San Antonio, TX 78228-2402 San Antonio, TX 78229 ** Phone: 210-433-6575 Phone: 210-617-5400 ** Fax: 210-433-2828 Fax: 210-617-5423 ** [EMAIL PROTECTED] ** http://www.ijoa.org/joeward/wardindex.html *----- Original Message - From: Tim Erickson [EMAIL PROTECTED]To: Joe Ward [EMAIL PROTECTED]Sent: Sunday, March 05, 2000 3:28 PMSubject: Re: other uses for Minitab| on 00.03.03 10:51 PM, Joe Ward at [EMAIL PROTECTED] wrote:| | A Bob, you remembered.| | I've been "bugging" the calculator makers for many years about including| the least-squares model of the form:| | LinReg(bx), Letting the function pass through the origin.| | | just a note -- Fathom has a "lock Intercept at Zero" command for its least| squares regression, hich amounts to the same thing.| | I think it's also an interesting exercise for a (calculus?) student to| derive a formula for "b" given an arbitrary set of data and the constraint| that b must minimize the sum of squares of the residuals. At least it was| interesting to me!| | Tim| | | Earl Jennings Phone: (512) 345-0628 | 6917 Thorncliffe Dr. e-mail address: | Austin, TX 78731-2955 [EMAIL PROTECTED] |
Re: Howto interpret interactions in an ANOVA
Hi all -- Again -- I'm jumping on the band wagon in support of these messages that advocate-- what I call -- a PREDICTION/REGRESSION/LINEAR MODELS approach. I was attracted to Lee Wilkinson and SYSTAT many years ago when Lee had a sign at one ofhis SYSTAT BOOTHS that said: "Ask me about Cell Means Analysis" (May not be Lee's exact words) I was so excited to see a software package that required the user to insert the word CONSTANT in the regression model when the user wanted it -- NOT AS THE DEFAULT. When using SAS at Clemson in 1985-86, I had to tell students that they must use the NOINT OPTION until I explained why. A most misunderstood and troublesome idea is the lack of understanding of the predictor, U, a vector of 1's. If students would -- in the beginning -- insert THEIR OWN U, when needed, then they might have a better understanding of the "efficiency" of having the CONSTANT or INTERCEPT as the DEFAULT. This lack of understanding about the CONSTANT or INTERCEPT is revealed by the many Email messages we see related to "What is RSQ WHEN there is NO CONSTANT or INTERCEPT". It is interesting that the more "modern" versions of SYSTAT require the user to REMOVE THE CONSTANT when appropriate. It would be really great if the statistics education folks would advocate the introduction of PREDICTION/REGRESSION/LINEAR MODELS early so that the students would have something useful in their experience and perhaps continue their study of statistics. I'm afraid that many FIRST STATISTICS COURSES have little "selling/marketing" effect on students. The "Cell-Means Approach" is easy to introduce to high school students, since these students have experiences with AVERAGES, MEANS, GPAs. And the "Missing Cells Problem?" is really not a problem until the students are told that some folks don't know what to do about "Missing Cells". Enough "preaching to the choir"!! --Joe * Joe Ward Health Careers High School ** 167 East Arrowhead Dr 4646 Hamilton Wolfe ** San Antonio, TX 78228-2402 San Antonio, TX 78229 ** Phone: 210-433-6575 Phone: 210-617-5400 ** Fax: 210-433-2828 Fax: 210-617-5423 ** [EMAIL PROTECTED] ** http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: Gregory C. Mayer [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, February 29, 2000 6:46 AM Subject: Re: Howto interpret interactions in an ANOVA | R.R. Sokal F.J. Rohlf in Biometry (1995, Freeman) emphasize the unity of| anova, ancova and regression (and in their shorter Introduction to| Biostatistics, anova and regression). They introduce them in turn,| however; I agree that a text that began with glm and then took up anova,| ancova and regression as instances of the general approach would be| preferable. This is especially so when using Systat, as the model| statements closely parallel the models, allowing more complex| models to be grasped and implemented immediately, instead of being treated| as some new technique.| | Gregory C. Mayer| [EMAIL PROTECTED]| | | | | On Mon, 28 Feb 2000, Bob Madden wrote:| | I agree. In fact, I have sought in vain for an introductory level statistics| text that does not treat ANOVA and regression as two totally separate,| disconnected techniques.| With disconcerting monotony, they all monkey each other in this respect. I| think students| would be better served by being shown early on that regression, ANOVA, and for| that| matter, ANCOVA, are all special cases of the glm.| | --Bob Madden| | James Friedrich wrote:| | Let me ad to the speculation regarding why interaction effects are often| omitted from multiple regression. I think the reality is that people are| generally trained in one "mode" or the other (ANOVA or Regression) without| a sense of their connectedness (a point already alluded to in previoous| posts). In an in-press national survey of undergraduate statistical| instruction for psychology majors, I found that ANOVA dominates, with| little attention to regression (except "simple"). The specialties of| those teaching the stats / methods courses tends to be in laboratory -| experimental areas where ANOVAs are the norm. The bottom line is that i| don't think budding psychologists, at least, get much training - or good| training - in MR or GLM perspectives. I also see this in advising /| consulting I do with biology students. Sadly, I think the heavy ANOVA| emphasis and minimal attention to regression approaches has the side| effect of leaving people poorly schooled in measurement issues. My| experience has been that professionals well-versed in MR / GLM are much| more in tune with these concerns.| |
Re: Linear Regression with known intercept (Long Message)
Mark writes - - Original Message - From: [EMAIL PROTECTED]To: [EMAIL PROTECTED]Sent: Saturday, February 12, 2000 4:51 PMSubject: Linear Regression with known intercept| Hi,| | If I want to find the least squares estimator of the slope of a simple| linear regression model where my intercept is known, will this| estimator will be the same as if I did not know my intercept(=Sxy/sxx)?| How about the variance and the confidence interval of my estimator?| will they be bigger or smaller than the estimator for the case where| both my intercept and slope unknown?| | Thank you for your help.| | Mark| | | Sent via Deja.com http://www.deja.com/---Hi, Mark --Glad you sent this Email. It is a nice and simple example of the useof Prediction/Regression/Linear Models -- which should be one of theimportant objectives of a FIRST NON-CALCULUS-BASED STATISTICS COURSE.Consider, first, the Simple Regression Model:Y = a1*U + a2*X + E1where Y = a vector containing observations on a dependent or response variable.U = a predictor (vector) containing all 1's.(THE MOST NEGLECTED AND NON-UNDERSTOOD PREDICTOR OF ALL)X = another predictor with any elements -- could be BINARY (0,1).E1= the Error or Residual vector.a1 = least-squares regression coefficient of U (this is frequently referred to as the "Y-intercept").a2 = least-squares regression coefficient of X (this is frequently referred to as the "Slope".A powerful capability to give students who are comfortable withAlgebra is to be able to IMPOSE ANY DESIRED LINEAR RESTRICTIONSON A LINEAR MODEL OF THE FORM:Y = a1*X1 + a2*X2 + ... + ap *Xp + EThis capability is useful in many applications BESIDES STATISTICS.Now, to your neat example:"If I want to find the least squares estimator of the slope of a simplelinear regression model where my intercept is known, ... "You wish to impose the restriction that-a1 = k (a known value)Imposing that restriction on Model 1 above gives:Y = k*U + a2*X + E2The only unknown regression coefficient is a2 which I will rename as:Let b2 = a2 to remind us that the numerical value of the coefficient of Xin Model 1 is most likely different from the value in Model 2.Then, Y = k*U + b2*X + E2Since k*U is known, the least-squares value for b2 is obtained from:Y-k*U = b2*X + E2or letting Y-k*U be designated by a single symbol, WW = b2*X + E2and the least-squares value of b2 for Model 2 (and for any ONE-PREDICTOR model) is: b2 = (W'X)/(X'X) = Sum(wi*xi)/Sum (xi*xi) b2 is the "slope of the line which is "forced by the restriction" a1 = kMost software now allows one to find the value of b2 by forcing an option that requires that the vector U be omitted as a predictor. If you have good software available, the software will produce the standard errors of a1 and a2 by solving equation 1 and the standard error b2 by solving equation 2. ---Now, if it is "interesting" to TEST AN HYPOTHESIS THAT --a1 = kThen a statistic student may want to compute:F = (SSQE2 - SSQE1)/(2-1) --- (SSQE1)/(n-2)F = (SSQE2 - SSQE1)/1 --- (SSQE1)/(n-2)and since F(1,df2) = t^2(df2)t(df2) = sqrt(F(1,df2))This IS a "t-test".And, perhaps, from this value of "t" another statistics studentmight want to compute the Standard Error of a1, and then computea Confidence Interval.The astute student can compute the Standard Error from: t = Statistic/Standard Errorbut sine the numerical values of t and the "Statistic" are known we have:Standard Error = Statistic/tIn this particular case,Standard Error = a1/tThis procedure allows for easy computation of the "StandardError" of any of the 'weights' (intercept or slope) in a regression model and in the more general case, any linearcombination of the weights in a multiple linear regression model. Sorry for the length of this message, but I couldn't resist promoting theuse of Prediction/Regression/Linear Models for ALL STUDENTS.--- Joe
Re: ANN vs. nonlinear regression: forecasting
John -- Sounds very interesting-- If you mean "classical" least-squares model, there are no assumptions involved in fitting least-squares. It's only the "statistics" assumptions that get added into the extra "assumptions". PREDICTION is the important thing. Compare the PREDICTIVE accuracy/costs/etc.of various approaches. You may wish to include RESAMPLING/BOOTSTRAP/CROSS-VALIDATION in your research. The proof of the "best" is how well it PREDICTS I will be interested in what you learn. -- Joe ******** * Joe Ward Health Careers High School ** 167 East Arrowhead Dr 4646 Hamilton Wolfe ** San Antonio, TX 78228-2402 San Antonio, TX 78229 ** Phone: 210-433-6575 Phone: 210-617-5400 ** Fax: 210-433-2828 Fax: 210-617-5423 ** [EMAIL PROTECTED] ** http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Friday, February 11, 2000 7:01 AM Subject: ANN vs. nonlinear regression: forecasting | I'm working on a study that compares neural networks to classical non-| linear statistical estimators in forecasting time series. My thesis is| that the NN would be robust under conditions where the assumptions of| the classical model are not met, and the nn would be inferior where the| classical assumptions are satisfied.| | What would be a good classical model to compare a neural network to?| Does anyone know of any papers/sources on this subject?| | I sincerely appreciate any help/suggestions.| | John Carrier| [EMAIL PROTECTED]| | | Sent via Deja.com http://www.deja.com/| Before you buy.| | | ===| This list is open to everyone. Occasionally, people lacking respect| for other members of the list send messages that are inappropriate| or unrelated to the list's discussion topics. Please just delete the| offensive email.| | For information concerning the list, please see the following web page:| http://jse.stat.ncsu.edu/| ===|
Re: adjusting marks; W. Edwards Deming
Robert Knodt writes in response to themessage at http://www.remarq.com The Internet's Discussion Network (SEE BELOW) --- Re: adjusting marks; W. Edwards Deming It would be nice if those sending to the mailing list would clearly identify themselves. It would also be nice if they used an e-mail address so individuals might send them e-mail directly. Thanks, Dr. Robert C. Knodt 4949 Samish Way, #31 Bellingham, WA 98226 [EMAIL PROTECTED] End of Robert Knodt's message Beginning of Joe Ward's comment -- Good comment, Robert -- Perhaps the unidentified writer is afrustrated product of "Non-mastery" Spelling Education and is intentionally (or unintentionally) showing the results. See BOLD items below. -- Joe ******** * Joe Ward Health Careers High School ** 167 East Arrowhead Dr 4646 Hamilton Wolfe ** San Antonio, TX 78228-2402 San Antonio, TX 78229 ** Phone: 210-433-6575 Phone: 210-617-5400 ** Fax: 210-433-2828 Fax: 210-617-5423 ** [EMAIL PROTECTED] ** http://www.ijoa.org/joeward/wardindex.html * - End of Joe Ward's comment -- - Original Message - From: Consultantssuck [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Monday, February 07, 2000 5:12 PM Subject: Re: adjusting marks; W. Edwards Deming | Dr. Deming Naive? You, sir, are misguided and unfortunately,| misinformed of the genius of the master Dr. Shewhart, and his| disiple and messenger to the latter half of the 20th century,| Dr. Deming.| | Humans want to do a good job. Dr. Deming was pellucid on this| point. People and school fit nicely into this axiom.| | what you fail to understand is the profound knowledge of| thinking preparing, and continual improvement. Grading is nice,| succinct, and above all, usually useless in its existing| design. Does grading permit our student to readdress problem or| slow areas? In many cases grading only shows how well you did,| based on varying factors-The next test, completely different.| | we have all seen studies where the pretty girl is awarded better| grades for the same caliber of work as others. we have all| seen reports where teachers are wrong in their suppositions,| then corrected or challenged by students, ultimately leading| these educators to hold a grudge for "attitude and behavior"| when report card time recurs.| | Do you want to know why the AFT and the NEA are against teaching| LOGIC in elementary schools (Logic being the foundation for all| higher math applications)?| | Could it be because some protege will learn to ask the harder| questions? Possibly Some "smart alec" will not accept our| educator's "Because I told you it did."| | A recent report found Elementary educators, when pressed for| answers they did not know, simply "winged it." This sophristry| unfortunately happens when our educators are not versed in the| sciences, history or math, and they wish to appear (to| themselves and) to their students, smart.| | People want to do a good job. Grading allows teachers to make| decisions in our children's early years based on mostly the| faliable educator's emotions toward that one particular budding| mind. Grading should be benchmarks for ever improvement based| on practice, practice practice of the fundementals. Then of| course moving foward with a keen sence of where the student is| going. Any good music teacher will tell you the ones who| practice the fundemental scales, dilegently, go on to master the| difficult pieces.| | Read the book OUT OF CRISES again, and again. I assure you, you| will soon "get it."| | | | * Sent from RemarQ http://www.remarq.com The Internet's Discussion Network *| The fastest and easiest way to search and participate in Usenet - Free!| | | | ===| This list is open to everyone. Occasionally, people lacking respect| for other members of the list send messages that are inappropriate| or unrelated to the list's discussion topics. Please just delete the| offensive email.| | For information concerning the list, please see the following web page:| http://jse.stat.ncsu.edu/| ===|
Re: Looking for text on resampling...
Scott -- Peter Bruce is the contact!!! -- Joe - Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, February 06, 2000 12:08 PM Subject: Looking for text on resampling... | Our small college library has a collection of basic biostats texts but | nothing that specifically covers the area of resampling. I am currently | looking over a 1991 text by Bryan Manly (Randomization and Monte Carlo | Methods in Biology) - the first two chapters seem quite accessible (to | someone unfamiliar with the field!) | | Could anyone suggest other texts that might cover bootstrapping and | jacknife techniques - I would favour texts that have a biology bent and | are written so non-specialists can follow... | | Many thanks! | | Scott | | | Sent via Deja.com http://www.deja.com/ | Before you buy. | | | === | This list is open to everyone. Occasionally, people lacking respect | for other members of the list send messages that are inappropriate | or unrelated to the list's discussion topics. Please just delete the | offensive email. | | For information concerning the list, please see the following web page: | http://jse.stat.ncsu.edu/ | === | === This list is open to everyone. Occasionally, people lacking respect for other members of the list send messages that are inappropriate or unrelated to the list's discussion topics. Please just delete the offensive email. For information concerning the list, please see the following web page: http://jse.stat.ncsu.edu/ ===
Re: teaching statistical methods by rules?
Yep!! As you say: "Why are people so obsessed with T and Z? " Perhaps it would be even better (easier?) to focus on F since F(df1,df2) = t^2(df2) (Reminder: when using a t-table, the p-values usually involve ONE-TAIL and when using the F-table, the p-values involve TWO-TAILS ) Example: The critical-value of t for probability of p = .05 at t(18) = 1.734 The critical-value of F for probability of p = .10 at F(1,18) = (1.734)^2 = 3.01 :-) -- Joe ******** * Joe Ward Health Careers High School * * 167 East Arrowhead Dr 4646 Hamilton Wolfe* * San Antonio, TX 78228-2402San Antonio, TX 78229 * * Phone: 210-433-6575 Phone: 210-617-5400* * Fax: 210-433-2828 Fax: 210-617-5423 * * [EMAIL PROTECTED]* * http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, December 19, 1999 4:44 PM Subject: Re: teaching statistical methods by rules? | In article [EMAIL PROTECTED], | [EMAIL PROTECTED] says... | | snip | | On the other hand, a body of knowledge can be thought of as a set of | 'rules'. The important thing is that this set is constructed by the | individual, so our aim should not be to teach statistics as a set of | rules, but in such a way that each student can develop his or her own | set of rules. They won't be the same for all, and they will different | from the teacher's, but they hopefully will work. (If you like, this is | a defintion of a 'good student' - one who manages to construct a | successful set of rules for each subject. | | | It's either undergraduate students in Australia are much smarter than those | living in the United States or you live on a different planet. The last time I | taught an undergraduate introductory statistics class, some students couldn't | even do fractions and simple algebra. Can you expect them to develop their own | rules? | | Why are people so obsessed with T and Z? When the degrees of freedom exceeds | say 30, the difference between T and Z is practically negligible. You can use T | or Z in such a case. However, the P-value from Z is easier to compute. | | -- | Tjen-Sien Lim | [EMAIL PROTECTED] | www.Recursive-Partitioning.com | | Get your free Web-based email! http://recursive-partitioning.zzn.com | |
Re: Prediction Model Question
- Original Message - From: Burke Johnson [EMAIL PROTECTED] To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Thursday, December 16, 1999 9:13 AM Subject: Prediction Model Question | Hi, | | A student of mine is getting ready to develop a GLM prediction model that will |include a mixture of categorical and quantitative predictor variables. We will |probably not include interaction terms in the model (i.e., it will be a main effects |only model). | | Here's my question: Do you suggest using dummy coding (0,1) or effects coding |(1,0,-1) for the categorical variables included in the model? | | The reason I'm asking is because dummy coding does not always give the same result |for a factorial design as does ANOVA and effects coding, and, hence, Pedhazur |recommends using effects coding rather than dummy coding in the factorial case. Do |you know if the choice of dummy or effects coding matters for a main effects only |model with multiple categorical and quantitatively scaled predictor variables? | | Thanks in advance, | Burke Johnson | -- Hi, Burke -- First, I use the words BINARY (or INDICATOR) predictors -- and NOT "DUMMY" predictors. In the beginning ALL PREDICTOR INFORMATION IS BINARY! It is unfortunate that the word DUMMY has became popular. Students might get the idea that there is something wrong with using DUMMIES!! I think that the BINARIES are really the most BRILLIANT!! Now to your concern -- Your last paragraph "The reason I'm asking is because dummy coding does not always give the same result for a factorial design as does ANOVA and effects coding, and, hence, Pedhazur recommends using effects coding rather than dummy coding in the factorial case. Do you know if the choice of dummy or effects coding matters for a main effects only model with multiple categorical and quantitatively scaled predictor variables?" is a very good example of the situation that arises in the use of "packaged" algorithms. The user of the "package" may have no idea what questions are being answered by the "package". I always suggest that researchers create their own models! That is the only SAFE WAY! If a "packaged" procedure is verified to produce the results desired by the researcher then it certainly should be used. The researcher should: 1. State their research questions in "natural language" -- avoid terms such as "MAIN EFFECTS" and "EFFECTS CODING" since those expressions may mean different things to different people. In some instances the user of those terms may not know what is meant when they utter the statement. Ask someone what they mean if they utter something about MAIN EFFECTS in a 3-factor ANOVA with unequal numbers of observations in the cells. 2. Create an ASSUMED MODEL that allows the researcher to investigate their research questions of interest. 3. Impose resrictions on the parameters of ASSUMED MODEL that are implied by the research questions of interest. This results in a RESTRICTED MODEL. 4. Compare the Error Sum of Squares between the ASSUMED and RESTRICTED MODELS using an F-test and obtain confidence intervals if appropriate. I assume there must be a reason for assuming that there is NO INTERACTION among the predictors. Many researchers would test for NO INTERACTION first. Then, if appropriate, switch to the NO INTERACTION MODEL. I would be interested in seeing the models that your student develops to investigate his/her OWN QUESTIONS OF INTEREST!! :-) -- Joe ** * Joe Ward Health Careers High School * 167 East Arrowhead Dr 4646 Hamilton Wolfe * San Antonio, TX 78228-2402 San Antonio, TX 78229 * Phone: 210-433-6575 Phone: 210-617-5400 * Fax: 210-433-2828 Fax: 210-617-5423 * [EMAIL PROTECTED] * http://www.ijoa.org/joeward/wardindex.html
Re: Need to evaluate difference between two R's
- Original Message - From: Herman Rubin [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, November 24, 1999 10:07 AM Subject: Re: Need to evaluate difference between two R's | In article [EMAIL PROTECTED], | Rich Ulrich [EMAIL PROTECTED] wrote: | On Tue, 23 Nov 1999 04:39:28 GMT, [EMAIL PROTECTED] wrote: | | Does any one know how one might test for significant differences | between two multiple R's (or R squar's)generated from two sets of data? | I need to determine if two R's generated on two separate occasions | using the same DV and IV's differ significantly from one another. | | Correlations are not very good candidates for comparisons, since it is | so easy to do tests that are more precise. | - to test whether the predictive relations are different, you would | test the regressions -- do a Chow test or the equivalent, to see if a | different set of regressors are needed for a different sampling. | - to test whether the variances are different (which is something | that would change the correlations), you might test variances | directly. | | This is correct. In fact, it is generally the case that | correlations, except as measures of how well the model | fits, do not have any real meaning. | | Even the amount of the variance explained can change | drastically with a change in design, but the parameters of | the model do not change, if normalizations are not done. | For example, if one has a "normal" model with correlation | coefficient .5, 25% of the variance is explained. Now | suppose that the predictor variable is selected to be | 2 standard deviations away from the mean, equally likely | to be in either direction. Then the correlation becomes | .756, and the proportion of the variance explained goes | up to 57%. But the prediction model is still the same. | -- | This address is for information only. I do not claim that these views | are those of the Statistics Department or of Purdue University. | Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 | [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 | -- Herman -- Great comment! Discussions about correlation coefficients arise periodically on various lists. So when the time seems appropriate I resend an old message (see below and the WORD attachment) that might be of interest. IMHO their is too much time spent on the correlation coefficient since it is of limited and sometimes misleading value for practical decision-making in the real world. However, there are still some folks who are adjusting correlation coefficients for "restriction of range" in hopes that it might be useful. -- Joe ***** Joe Ward Health Careers High School 167 East Arrowhead Dr 4646 Hamilton Wolfe San Antonio, TX 78228-2402 San Antonio, TX 78229 Phone: 210-433-6575 Phone: 210-617-5400 Fax: 210-433-2828 Fax: 210-617-5423 [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html * -- Forwarded message -- Date: Fri, 23 May 1997 09:30:20 -0400 (EDT) From: Mike Palij [EMAIL PROTECTED] To: [EMAIL PROTECTED], [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: Testing basic statistical concepts I'd like to thank Joe Ward for reminding us of this situation (his posting is appended below), as well as jogging my own memory for a previous posting I had made. A while back I had posted the Anscombe dataset (in the context of an SPSS program) which also clearly shows the benefit of plotting the data: the four situations produce almost identical Pearson r values but only one actually shows the classic scatterplot, the others show a nonlinear pattern and the influence that a single point has on the calculation of r. What does the value of r tell us here? Aren't the basic statistical concepts to be learned in this situation far more important and most clearly seen through a coordination of the graphical and numerical information? -Mike Palij/Psychology Dept/New York University Joe H Ward [EMAIL PROTECTED] writes: To Mike et al -- There have been several message related to the Simple Correlation Coefficient. IMHO, when out in the "real world" involving practical decision-making the correlation coefficient has very limited value and sometimes dangerous consequences. The correlation coefficient may be an important topic for the history of statistics to learn the problems associated with its use . Attached below is an item that I submitted a long time ago, and it may be of interest to those following the discussion of "r". -- Joe *******