RE: Boston Globe: MCAS results show weakness in teens' grasp of
On Tue, 28 Aug 2001, Dennis Roberts wrote in part: however ... the flagging of outliers is totally arbitrary ... i see no rationale for saying that if a data point is 1.5 IQRs away from some point ... that there is something significant about that If the data are normally distributed (or even approximately so, what seems to be called empirically distributed these days), the 3rd quartile + 1.5 IQR locates a point 2.0 std. devs. above the mean; symmetrically, the 1st quartile minus 1.5 IQR gets you 2.0 SDs below the mean. Close enough to the central 95% of the distribution, for the precision of the 1.5. Of course, the antique 5% standard is rather out of fashion nowadays, but this was, I believe, the underlying rationale for Tukey's choice of the region box +/- 1.5 IQR as a rule-of-thumb (or convention) for initial identificaiton of potential outliers. On the question of whether the whiskers of a box--whisker plot should be made to cease at box +/- 1.5 IQR, note that some current undergraduate textbooks distinguish between a quick boxplot which shows the range but not outliers, and a full boxplot which uses the box +/- 1.5 IQR rule. (Of course, if there are no outliers -- by that definition -- the two are identical.) Donald F. Burrill [EMAIL PROTECTED] 184 Nashua Road, Bedford, NH 03110 603-471-7128 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Factor analysis - which package is best for Windows?
I have tried it and it is amazing. A bargain ;) Richard Wright [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... KyPlot runs under Windows, is freeware and gives you several factor analysis algorithms to choose from. http://www.rocketdownload.com/Details/Math/kyplot.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Boston Globe: MCAS results show weakness in teens' grasp of
I wrote: Er, no. Q1 ~ mu - 2/3 sigma Q3 ~ mu + 2/3 sigma 1 IQR ~ 4/3 sigma 1.5 IQR ~ 2 sigma inner fence ~ mu +- 2 2/3 sigma which is about the 0.5 percentile. -right so far - and then burbled The inner fences are selected to give a false positive rate of about 1 in 1000. I suppose that if we take into account the Unwritten Rule of Antique Statistics that all data sets have 30 elements, this *does* give a p-value of (1-e)*30*0.001 = 5% grin which is obviously wrong. The false positive rate is about 1 in 100 and my fanciful 5% calculation is unsalvageable. -Robert Dawson = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
RE: Factor analysis - which package is best for Windows?
Also check out R, a GNU implementation of the S language, most prominently known through its use in S-Plus. R is a fully featured statisitical programming environment. In its MVA (Multivariate) package, it includes routines for factor analysis using maximum liklihood estimation with varimax and promax rotations. R is open-source, which means that it is frequently updated and, most importantly, it can be downloaded free of charge. The only downside (to some) is that at this stage of its development R is completely command-prompt driven. However, I find the R language intuitive and easy to learn. http://www.r-project.org -Original Message- From: Aron Landy [mailto:[EMAIL PROTECTED]] Sent: Thursday, August 30, 2001 6:33 AM To: [EMAIL PROTECTED] Subject: Re: Factor analysis - which package is best for Windows? I have tried it and it is amazing. A bargain ;) Richard Wright [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... KyPlot runs under Windows, is freeware and gives you several factor analysis algorithms to choose from. http://www.rocketdownload.com/Details/Math/kyplot.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
doxplots
speaking of combining info from a dotplot and a boxplot ... which i want to dub ... DOXPLOT ...minitab does have a macro file ... called %describe ... that shows the histogram of a distribution and below it, the boxplot ... one example is at http://roberts.ed.psu.edu/users/droberts/introstat/desc.png of course, there really is no importance to the BOX .. part of the boxplot ... and if one could indicate along the baseline of the histogram ... or, dotplot ... the same points indicated on the boxplot ... Q1, median, Q3 ... seems like that would do the trick ... SO, AGAIN, DOES ANYONE KNOW OF A REGULAR GRAPH ROUTINE ... IN ANY PACKAGE ... THAT DOES JUST THAT?? IN one GRAPH ... show both the frequency distribution ... and, the summary Q points along the baseline? as for flagging extreme values ... which the boxplot above shows an example of ... it becomes rather visually obvious ... in looking at the histogram graph ... independent OF the dot in the boxplot ... out past the upper whisker doxplots would kill two birds with one graphic stone _ dennis roberts, educational psychology, penn state university 208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED] http://roberts.ed.psu.edu/users/droberts/drober~1.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
RE: Boston Globe: MCAS results show weakness in teens' grasp of
Donald Burrill wrote: If the data are normally distributed (or even approximately so, what seems to be called empirically distributed these days), the 3rd quartile + 1.5 IQR locates a point 2.0 std. devs. above the mean; symmetrically, the 1st quartile minus 1.5 IQR gets you 2.0 SDs below the mean. Close enough to the central 95% of the distribution, for the precision of the 1.5. Of course, the antique 5% standard is rather out of fashion nowadays, but this was, I believe, the underlying rationale for Tukey's choice of the region box +/- 1.5 IQR as a rule-of-thumb (or convention) for initial identificaiton of potential outliers. On the question of whether the whiskers of a box--whisker plot should be made to cease at box +/- 1.5 IQR, note that some current undergraduate textbooks distinguish between a quick boxplot which shows the range but not outliers, and a full boxplot which uses the box +/- 1.5 IQR rule. (Of course, if there are no outliers -- by that definition -- the two are identical.) Interesting. As noted in earlier posts, the National Council of Mathematics Teachers and MCAS include only the quick boxplot in their definition of boxplot. The 10th grade MCAS test question 39 shows a quick boxplot, not a Tukey boxplot. I can imagine that it would be difficult to change the NCMT definition of boxplot, but I would hope that they should put a note in their definition that their boxplot differs from the Tukey boxplot. Massachusetts' Dept of Ed mentions the concept of standard deviation for the first time in the 11th - 12th grade math curriculum guidelines, so it can't be included in the curriculum- based and high-stakes 10th grade math test. Since 45% of MA 10th graders failed the last year's 10th grade test, this morning's headlines in Boston's papers reveal that the Gov of MA will announce today that she will make up to $1000 available to any student that fails the test three times. The $1000 can be used for private tutors, textbooks or software. A student has 5 chances to pass the exam which is required for graduation: http://www.boston.com/dailyglobe2/242/metro/Swift_seeks_grants_to_MCAS_str ugglers+.shtml Unfortunately, there is only a bit over a month between the late October failure notices on last spring's Math test and the first of 4 retakes in December. I would imagine only students right on the cusp would stand much of a chance of improving their knowledge of content for this first retake. They certainly could learn how to interpret a quick boxplot or a stem-and-leaf diagram (asked on last year's test), but not anything more substantive. One would hope that the MCAS testers review their statistics and probabilityquestions. Here is a problematic probability question from this spring's 8th grade math test. Because of the inappropriate premise, part c is either very difficult to answer (two different answers 1/3 or 1/8) or unanswerable if students are allowed to be in the same act. 8th grade math, 2001 MCAS test, Question 12: 12. An eight grade class will perform the first four acts in the annual talent show. Every student is in exactly one of the four acts. The order in which the acts will be presented is to be decided by drawing so that each act has an equal chance of being drawn. a. Chantal is a member of the eighth grade class. What is the probability that her act will be presented first? b. Chantal's act was chosen to be presented first. Make a tree diagram, chart or list showing all the possible orders in which the other three acts could be presented. Use the letters A, B, and C to represent these three acts. c. Rory, Jesse, and Chantal are all members of the eighth-grade class who will each perform an act. What is the probability that Rory's act will immediately follow Jesse's? Explain how you found your answer. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Bimodal distributions
Does a bimodal distribution necessarily have two modes? This might seem like a silly question, but in my experience many folks apply the term bimodal whenever the PDF has two peaks that are not very close to one another, even if the one peak is much lower than the other. For example, David Howell (Statistical Methods for Psychology, 5th, p. 29) presents Bradley's (1963) reaction time data as an example of a bimodal distribution. The frequency distribution shows a peak at about 10 hundredths of a second (freq about 520), no observations between about 18 and 33 hundredths, and then a second (much lower) peak at about 50 hundredths (freq about 25). + Karl L. Wuensch, Department of Psychology, East Carolina University, Greenville NC 27858-4353 Voice: 252-328-4102 Fax: 252-328-6283 [EMAIL PROTECTED] http://core.ecu.edu/psyc/wuenschk/klw.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Bimodal distributions
hi karl ... i think the answer is yes ... if you want it to have 2 modes the mode is a problematical statistic ... since there is no good definition for it and ... a few frequencies shifting around ... could radically change the mode or modes in minitab, there is no place where ANY mode is even identified ... i have heard about some software that report modes ... but, how they handle multiple peaks with differing ns ... i have no idea in the example you cite ... what if there were a spike at 12 hundredths ... with = frequency to 10 ... would you call it bimodal ... or unimodal??? that is ... is there something of significance about the difference between 10 and 12 ... that we would want to separate them out as REALLY different values? ... maybe it is still mono modal a former student and now a academic vice president .. i know, a demotion! ... coined a new term for when you had two adjacent values ... each with the highest frequency in the distribution ... he said take the median of the two modes ... and call it the ... MODIAN At 12:54 PM 8/30/01 -0400, Wuensch, Karl L. wrote: Does a bimodal distribution necessarily have two modes? This might seem like a silly question, but in my experience many folks apply the term bimodal whenever the PDF has two peaks that are not very close to one another, even if the one peak is much lower than the other. For example, David Howell (Statistical Methods for Psychology, 5th, p. 29) presents Bradley's (1963) reaction time data as an example of a bimodal distribution. The frequency distribution shows a peak at about 10 hundredths of a second (freq about 520), no observations between about 18 and 33 hundredths, and then a second (much lower) peak at about 50 hundredths (freq about 25). + Karl L. Wuensch, Department of Psychology, East Carolina University, Greenville NC 27858-4353 Voice: 252-328-4102 Fax: 252-328-6283 [EMAIL PROTECTED] http://core.ecu.edu/psyc/wuenschk/klw.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = _ dennis roberts, educational psychology, penn state university 208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED] http://roberts.ed.psu.edu/users/droberts/drober~1.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Bimodal distributions
Karl Wuensch asks an interesting question, though I would phrase it somewhat more generally. At what point does a bimodal distribution become just a distribution with two peaks? Except for a few quite extreme situations, dealing with mixtures of distributions and the like, it will rarely ever be the case that the two peaks of a distribution are EXACTLY the same height. But if they are extremely similar, no one would ever quibble. The case that I use from Bradley (1963) has two peaks that are quite clearly different in height--in fact, one might argue that the second peak is so diffuse as not to deserve to be called a peak. And yet it seems to me that calling the distribution bimodal is saying something useful about the distribution. Perhaps someone can suggest a better term. Dave Howell At 12:54 PM 8/30/2001 -0400, Wuensch, Karl L. wrote: Does a bimodal distribution necessarily have two modes? This might seem like a silly question, but in my experience many folks apply the term bimodal whenever the PDF has two peaks that are not very close to one another, even if the one peak is much lower than the other. For example, David Howell (Statistical Methods for Psychology, 5th, p. 29) presents Bradley's (1963) reaction time data as an example of a bimodal distribution. The frequency distribution shows a peak at about 10 hundredths of a second (freq about 520), no observations between about 18 and 33 hundredths, and then a second (much lower) peak at about 50 hundredths (freq about 25). + Karl L. Wuensch, Department of Psychology, East Carolina University, Greenville NC 27858-4353 Voice: 252-328-4102 Fax: 252-328-6283 [EMAIL PROTECTED] http://core.ecu.edu/psyc/wuenschk/klw.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = ** David C. Howell Phone: (802) 656-2670 Dept of Psychology Fax: (802) 656-8783 University of Vermont email: [EMAIL PROTECTED] Burlington, VT 05405 http://www.uvm.edu/~dhowell/StatPages/StatHomePage.html http://www.uvm.edu/~dhowell/gradstat/index.html
RE: Bimodal distributions
A bomodal distibution is often thought to be a mixture of two other distibution with different modes. If the distributions have different sizes, then it is possible to have two or more humps. I once read somewhere (and now can't remember where) that this may be referred to as bimodal (or multimodal). In the bimodal case, some refer to the higher hump as the major mode and the other as the minor mode. Paul R. Swank, Ph.D. Professor Developmental Pediatrics UT Houston Health Science Center -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Wuensch, Karl L. Sent: Thursday, August 30, 2001 11:54 AM To: edstat (E-mail) Subject: Bimodal distributions Does a bimodal distribution necessarily have two modes? This might seem like a silly question, but in my experience many folks apply the term bimodal whenever the PDF has two peaks that are not very close to one another, even if the one peak is much lower than the other. For example, David Howell (Statistical Methods for Psychology, 5th, p. 29) presents Bradley's (1963) reaction time data as an example of a bimodal distribution. The frequency distribution shows a peak at about 10 hundredths of a second (freq about 520), no observations between about 18 and 33 hundredths, and then a second (much lower) peak at about 50 hundredths (freq about 25). + Karl L. Wuensch, Department of Psychology, East Carolina University, Greenville NC 27858-4353 Voice: 252-328-4102 Fax: 252-328-6283 [EMAIL PROTECTED] http://core.ecu.edu/psyc/wuenschk/klw.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Bimodal distributions
At 02:04 PM 8/30/01 -0400, David C. Howell wrote: Karl Wuensch asks an interesting question, though I would phrase it somewhat more generally. At what point does a bimodal distribution become just a distribution with two peaks? or allow me to rephrase as ... when are there enough frequencies at a location or several rather distinct locations ... to identify it/them as MODAL locations? the issue here is really not about equality of peaks ... but, when IS it a peak that warrants special mention for example ... you might have a class of intro stat students ... some of which have had 3 courses before ... and, most of which who have had no stat classes before ... and you give a final exam the first day of class ... and see a low peak at the high end ... and a big peak down at the low end of the score scale ... now, the heights of the peaks will surely be different but, there is explanatory importance to these two peaks ... that can be explained (primarily) by the amount of pre work that has been done since i don't think there are any technical definitions of what exactly a bimodal ... or trimodal distribution is ... we have to take any representation OF distributions AS SUCH with a grain of salt ... and of course, insist on actually SEEING the distribution ... as our own personal check for example ... here are 5 randomly generated patterns of data (n=100 each time) using minitab ... from an integer distribution ... which assumes equal probability across the numbers ... 10 to 20 would anyone want to take a stab in some definitive way ... and characterize the modality of these? Each dot represents up to 2 points : :: ..: . ::::::.::.: ::::::::::: ---+-+-+-+-+-+---C1 10.0 12.0 14.0 16.0 18.0 20.0 Each dot represents up to 2 points :. . ::: .. : ::: ::::.: ::::::::::: ---+-+-+-+-+-+---C2 10.0 12.0 14.0 16.0 18.0 20.0 Each dot represents up to 2 points . : : :.. :: :::::::.:: .:::::::::: ---+-+-+-+-+-+---C3 10.0 12.0 14.0 16.0 18.0 20.0 Each dot represents up to 2 points . : : : . :. : :::.::::: ::::::::::: ---+-+-+-+-+-+---C4 10.0 12.0 14.0 16.0 18.0 20.0 . : :. : : :: ::. ::: :: :::: :::::: :::: ::::::.:::: ::::::::::: ---+-+-+-+-+-+---C5 10.0 12.0 14.0 16.0 18.0 20.0 MTB _ dennis roberts, educational psychology, penn state university 208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED] http://roberts.ed.psu.edu/users/droberts/drober~1.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
RE: Bimodal distributions
At 01:22 PM 8/30/01 -0500, Paul R. Swank wrote: A bomodal distibution is often thought to be a mixture of two other distibution with different modes. If the distributions have different sizes, then it is possible to have two or more humps. I once read somewhere (and now can't remember where) that this may be referred to as bimodal (or multimodal). In the bimodal case, some refer to the higher hump as the major mode and the other as the minor mode. this is an interesting point but, one we have to be careful about ... in the minitab pulse data set ... c6 is heights of 92 college students ... a mixture of males and females ... : : : . : : : . : : : : : : : : . : : : : : : : : : : : : . . .: .: : : . : : : : . : : . : : : : : -+-+-+-+-+-+-Height 62.5 65.0 67.5 70.0 72.5 75.0 now, if we were to 'roughly' see the 'peaks' ... around 68/69 ... and 72/73 ... one might say that THIS is because of the gender differences (ie, where the modes or averages BY sex were)... but look at the separate dotplots Dotplot: Height by Sex . : . Sex : . : : : . 1 : : : : : : : : : . : : : : . : : . : : : : : -+-+-+-+-+-+-Height : Sex : : : : . : . 2 . .: .: : : . : : : : . -+-+-+-+-+-+-Height 62.5 65.0 67.5 70.0 72.5 75.0 but look at the desc. stats ... Descriptive Statistics: Height by Sex Variable Sex N Mean Median TrMean StDev Height 157 70.754 71.000 70.784 2.583 235 65.400 65.500 65.395 2.563 using the modes ... as approximations for the averages ... means or medians ... might not be a good idea ... in this case ... we get a 'peak' around 68/69 not because of ONE gender concentrating there ... but, OVERLAPPING between the sexes ... at this approximate location of heights modes are tricky _ dennis roberts, educational psychology, penn state university 208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED] http://roberts.ed.psu.edu/users/droberts/drober~1.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
outliers (was MCAS)
At 9:41 AM -0400 8/30/01, Dennis Roberts wrote: all of this is assuming of course, that some extreme value ... by ANY definition ... is bad in some way ... that is, worthy of special attention for fear that it got there by some nefarious method i am not sure the flagging of extreme values has any particular value ... certainly, to flag and look at these ... makes no more sense to me than examining all the data points ... to make sure that all seem legitimate ... and accounted for ... Well, yes. You should always stare at all your data (though I guess a lot of people leave this crucial step out). I thought there were two important reasons to look for outliers. An outlier may be a mistake, therefore not real data, therefore should not be included (as if you had a bunch of people's heights and one person was 60 feet tall -- that's a mistake and shouldn't be included as part of your results). Though this idea has come to be translated in many people's minds (incorrectly) as It's an outlier, so delete it. which makes no sense. Just because some data is far away from the rest doesn't mean it ain't data. Also, some tests only work when there are no outliers, so if you have outliers, those tests won't work and you need to do something else. (This, I believe, is the real motivation behind delete them.) Or am I being an idot (again)? Jill Binker Fathom Dynamic Statistics Software KCP Technologies, an affiliate of Key Curriculum Press 1150 65th St Emeryville, CA 94608 1-800-995-MATH (6284) [EMAIL PROTECTED] http://www.keypress.com http://www.keycollege.com __ = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Teaching Intro Biostat using Daniel's book
I'm using Daniel's book too. I've used it for the last couple of years, switching from Glantz Primer. The 7th edition still has quite a few errors, but I like it for some of the exercises. Would love to be on your mailing list. Warren [EMAIL PROTECTED] (Robert Hamer) wrote in message news:9lrn6u$6bg$[EMAIL PROTECTED]... I have been teaching an introductory biostatistics course for graduate students in public health for many years. This course is intended for first year graduate students in public health (not Biostatistics students, for whom a higher level course might be more appropriate). I've been using for a while the book by Wayne Daniel, Biostatistics: A Foundation for Analysis in the Health Sciences. I wonder how I might get in touch with others using this book; maybe to share test questions, share methods of teaching, etc. I have at this point quite a file of tests and questions; others might have similar material. Thanks. Bob Hamer = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
VENTURE CAPITAL
FROM:MALIK MADAKI URGENT BUSINESS PROPOSAL This letter may come to you as a surprise since it is coming from Someone you have not met before. However, we decided to contact you based on a satisfactory information we had about your business person as regard business information concerning your country and the safety of our funds in a steady economy such as that of your country compared to our country Nigeria Africa. I am a civil adviser currently working with the monitoring committee overseeing the winding up of the petroleum trust fund(PTF).Myself and my close and trusted colleagues need your assistance in the transfer of US$25 million into any reliable Account you may nominate overseas. This fund was generated from over-invoicing of contracts executed by the PTF under the administration of the past military government. These were discovered while we were reviewing the PTF accounts. From Our discoveries, these contracts have been executed and the contractors in question were all paid. The difference of US$25,000,000 being the over-invoiced amount is the funds, we want your corporate entity to help us receive. What we want from you is a good and reliable company or personal Account into which we shall transfer this fund. Details should include the following: 1. Name of Bank 2. Address of Bank with Fax Tel. 3. Account Number 4. Beneficiary/Signatory to Account (Account Name) Upon the Successful crediting of your account. The fund will be shared as follows: 1. 20% for you and your assistance 2. 75% for myself my Colleagues 3. 5% for contingency expenses Please after your first reply through e-mail I will want us to continue further communication by fax and telephone for confidential purpose. We wish to assure you that your involvement should you decide to assist us, will be well protected, and also, this business, proposal is 100% risk free as we have put a whole lot into it. Thank you for your anticipated cooperation while we look forward to a mutually benefiting business relationship with you. Please when replying to my e-mail kindly include your telephone, fax number and mobile telephone numbers preferably extremely private numbers where we can reach you any time of the day. Please be aware that a high level of confidentiality and trustis required in this business. You can reach me on my confidential Fax number 234-1- 7596791. MALIK MADAKI Fax 234-1-7596791 Email:[EMAIL PROTECTED] _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =