Re: Statistics Software
In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] says... > >> Shana Mueller wrote: > >> > I am looking for a stastical software package I have only used the JMP >> > software (mostly for Design of Experiments) in the past but am looking >> > for software that will not only help in planning experiments, but also >> > one in which I can input my data from Excel, do regression analysis and >> > be able to get presentation worthy graphs. As I recall, JMP did not >> > have a good graphing program. > >Lood at the Excel plugin >http://www.winstat.com >(there is a 30-days free evaluation version) > >It is also always worth to check with www.LISTSOFT.com for free software. > >Sergei > Shana, Try MedCalc avialable at http://www.medcalc.be. An download version is available for evaluation. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Programs for Standard Deviation
In article, [EMAIL PROTECTED] says... > >My daughter has asked me if there are any tools / software programs that can >resolve standard deviations, while Excel can determine a standard deviation >of the Population, what formula is used for the >(A) 5th Standard Deviation >(B) 10th Standard Deviation >(C) 25th Standard Deviation >(D) 40'th Standard Deviation >My Intro to Stat's and Probability , Stats for Business and Economics, and >Elements of Probability Textbooks >have no data on these. If you can either point me to a web page, know the >formula, or where I can locate the software I would appreciate{:-> > >Thanks >Robert A. Meyer >[EMAIL PROTECTED] > > > >Robert, I believe Mr. Lim is correct about the percentiles. There is an statistical program available for scientists etc. It is called MedCalc and a version is available for evaluation a http://www.medcalc.be. The percentiles are found in summary statistics. The software is easy to use and understand. David > === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Statistics Conference Announcement
~~~ BEYOND THE FORMULA IV "Introductory Statistics for A New Century: Looking at the Complete Picture of Curriculum, Teaching Techniques, Technology and Applications" A Statistics Conference for Mathematics Teachers Teaching Introductory Statistics [This conference is for all teachers of Introductory Statistics, the first time to the experienced teacher. There are sessions planned for all.] DATE: Thursday, August 3, 2000, 8:30 AM to 4:30 PM 6:30 PM to 9:00 PM Friday, August 4, 2000, 8:30 AM to 3:00 PM LOCATION: Monroe Community College, 1000 East Henrietta Road, Rochester, NY 14623 THE AGENDA INCLUDES: Four addresses by keynote speaker Richard Scheaffer, Two real world statistical case studies, Several sessions encompassing curriculum issues, and classroom teaching strategies, Several hands-on computer (web and software) and calculator sessions, Dinner and an after-dinner speaker, Robert Hogg, Publisher's Book Exhibit And much more!! MAJOR THEMES INCLUDE: Curriculum Teaching Techniques Technology Applications INVITED SPEAKERS: Prof. Richard L. Scheaffer, Keynote Speaker Prof. Joan Garfield, Prof. Robert V. Hogg, Prof. Kyle Siegrist And the list is growing!! REGISTRATION FEE: $125, includes 5 meals. FOR COMPLETE INFORMATION: See our web site (updated 3/24/2000) for Titles, Abstracts, Schedule, Registration Form, Hotel Information http//www.monroecc.edu/depts/math/beyond1.htm Or ~~ Contact Bob Johnson by: E-mail: [EMAIL PROTECTED] Phone: Dept. Office: 716-292-2930 Fax:Dept. Office: 716-292-3874 Post: Dept. of Mathematics Monroe Community College Rochester, NY 14623 ~~ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Kruskal-Wallis & equal variances
On Fri, 24 Mar 2000 02:15:52 GMT, Gene Gallagher < snip, good summary of some issues > > SO, my question is "What is the current thinking on the robustness of > the Kruskal-Wallis test for testing groups with very different > variances?" Is Underwood right in his assessment that the nonparametric > procedures are little better than the parametric tests for > heteroscedastic data? - Here is an item that is immune to "current thinking." If one group dominates the densities at BOTH extremes, then "stochastic dominance" does not exist; in that case, neither group is robustly "greater" in score than the other, independent of all assumptions about the scale of measurement, and there is *no* test which is safe and robust. This is a logical problem, not a numerical one. That illustrates when-and-where rank-ordering does not always work to give a robust test there is a potential non-robustness to the forced-metric or enforced-equal-metric, which goes by the name, non-parametric. So, "unequal variances" is a phrase that does not capture the problem. If taking the log-transform will equalilze the variances, then you can - instead - safely perform the rank-transform, and test the ranks. It is pretty secure to do a t-test on the ranks, by the way, even with Ns that are pretty small, since the t-test is pretty robust. > Do people concur with Zar's statement that the > K-W test isn't much affected by unequal variances? Have simulation > tests been performed to assess the relative robustness of random > permutations, K-W, & Student's t (equal and unequal variance versions)? > Could someone provide a cite to good articles on this issue? I haven't checked Zar's statement. But the t-test on the raw numbers isn't "much affected," either, so long as Ns are equal. In *neither* case are the tests justified if the data "derive from different distributions" and I hope Zar is not indicating an immunity for K-W. What do you want to be robust against? Rank ordering protects against bad scaling, when there is an enormous outlier or two at ONE end of the distribution. That is much more frequent that outliers at both ends. It gives you a test with outliers at both ends, but that is the test that is highly doubtful, and logically flawed. I hope someone provides some good references. The people who publish books tend to be timid about conclusions. The people who publish new articles, unfortunately, tend to be wrong, because all the really *solid*, simple points were made as casual observations 40 or more years ago. For instance, it is common for a writer to mention that the t-test or F-test on two samples is "robust". It is rarer, but more precise, to say that the one-tailed t-test, for samples with strongly unequal N, is rather badly vulnerable to skewness in the distributions. If you do a two tailed t-test, you will always achieve approximately the proper "test size" for a 5% test. But with unequal N and skewness, most of the rejections will be at "the same end". Transforming the data to achieve the natural (and symmetric) metric is important, if you want to achieve the natural (and symmetric) rejection region. I have made statements on this before. I am still trying to get it all complete and concise. I think I will have a bit to add to the above, on Sunday. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: log-Reg and colinearity
On Fri, 24 Mar 2000 Llorenç Badiella <[EMAIL PROTECTED]> wrote: > Is it a problem that there exists colinearity between > variables when performing a Log-Reg? If so, how avoid it? It can be a problem, or for that matter several problems. Depends on how severe the collinearity is, among other things. If any predictor is a linear combination of a set of other predictors, for example, the last potential predictor among these cannot enter the regression (if the software is proceeding one variable at a time), or the correlation matrix (or the variance-covariance matrix) among the predictors cannot be inverted and a solution cannot be obtained (if the software is proceeding by inversion of the complete matrix of predictors). If it is merely that many or all of the predictors are highly intercorrelated with some or all of the remaining predictors, Rich Ulrich's response indicates some of the problems you _might_ be encountering. There are methods of dealing with the problems, but the choice(s) of method(s) will depend in part on whether the intercorrelations are reflecting some characteristic(s) of the universe of discourse that you wish to retain, or are artifacts introduced by the way in which (some of) the variables were defined. -- DFB. Donald F. Burrill [EMAIL PROTECTED] 348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED] MSC #29, Plymouth, NH 03264 603-535-2597 184 Nashua Road, Bedford, NH 03110 603-471-7128 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Normality & parametric tests (WAS: Kruskal-Wallis & equal va
On Fri, 24 Mar 2000, Bernard Higgins wrote: > > > Hi Bruce Hello Bernard. > > The point I was making is that when developing hypothesis tests, > from a theoretical point of view, the sampling distribution of the > test statistic from which critical values or p-values etc are > obtained, is determined by the null hypothesis. We need a probability > model to enable use to determine how likely observed patterns are. > These probability models will often work well in practice even if we > relax the usual assumptions. When using distribution-free tests as > an alternative to a parametric test we may need to specify > restrictions in order that the tests can be considered "equivalent". Agreed. > > In my view the t-test is fairly robust and will work well in most > situations where the distribution is not too skewed, and constant > variance is reasonable. Indeed I have no problems in using it for the > majority of problems. When comparing two independent samples using > t-tests, lack of normality and constant variance are often not too > serious if the samples are of similar size, always a good idea in > planned experiments. Agreed here too. > > As you say, when samples are fairly large, some say 30+ or even > less, the sampling distribution of the mean can often be approximated > by a normal distribution (Central Limit Theorem) and hence the use of > an (asymptotic) Z-test is frequently used. It would not, I think, be > strictly correct to call such a statistic t, although from a > practical point of view there may be little difference. The formal > definition of the single sample t-test is derived from the ratio of a > Standard Normal random variable to a Chi-squared random variable and > does, in theory, require independent observations from a normal > distribution. I think we are no longer in complete agreement here. I am not a mathematician, but for what it's worth, here is my understanding of t- and z-tests: numerator = (statistic - parameter|H0) denominator = SE(statistic) test statistic = z if SE(statistic) is based on pop. SD test statistic = t if SE(statistic) is based on sample SD The most common 'statistics' in the numerator are Xbar and (Xbar1 - Xbar2); but others are certainly possible (e.g., for large-sample versions of rank-based tests). An assumption of both tests is that the statistic in the numerator has a sampling distribution that is normal. This is where the CLT comes into play: It lays out the conditions under which the sampling distribution of the statistic is approximately normal--and those conditions can vary depending on what statistic you're talking about. But having a normal sampling distribution does not mean that we can or should use a critical z-value rather than a critical t when the population variance is unknown (which is what I thought you were suggesting). As you say, one can substitute critical z for critical t when n gets larger, because the differences become negligible. But nowadays, most of us are using computer programs that give us more or less exact p-values anyway, so this is less of an issue than it once was. Cheers, Bruce -- Bruce Weaver [EMAIL PROTECTED] http://www.angelfire.com/wv/bwhomedir/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: successive intervals?
On Fri, 24 Mar 2000 06:05:27 GMT, [EMAIL PROTECTED] wrote: > I saw in a recent journal article mention of a technique for converting > ordinal scales to interval scales called *successive intervals*. I have > searched several references and can find no mention of it. Does anyone > know of a published description of this method? I am surprised if a (professional) journal article would let the author get by with an unreferenced citation. These on-line search engines are great for this sort of unguided research. The first thing I tried with Google was "successive intervals" statistics and the first two items seemed to be exactly what you want. The first led me to: A.L. Edwards, and Gonzales in Applied Psychological Measurement, 1993, 17:21-27; an article about simplified scoring of the successive interval method. The article has a brief review, and that includes similar methods under other names which date back to the 1930s. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: log-Reg and colinearity
On Fri, 24 Mar 2000 15:28:38 GMT, [EMAIL PROTECTED] wrote: > Is it a problem that there exists colinearity between > variables when performing a Log-Reg? If so, how avoid it? Sure, it is about the same problem as exists for OLS regression. If you really want to *avoid* it, you have to rationally select (or, pre-process) your predictor variables to be a nearly-orthogonal set. I don't know if you are worried about the extreme collinearity that makes standard errors huge, or the lesser amount that makes it impossible to uniquely assign "effects" to "sources". When it comes to a variable like "race" and "socio-economic background" which are just about *inextricably* intertwined in the U.S., racialist ignoramuses will avoid the problem by never noticing that the confounding exists. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Public domain nomenclature
Tony Rossini has taken the time to explain to me the importance of distinguishing between "freely available", "open source", and "public domain" software. The first two have a similar meaning and are both different from the last. My software is more in the first two categories. I stand corrected. Thanks Tony, and thanks for the free ESS, which I use all the time with free Emacs. -Frank === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Fwd: time tests for S-plus
>Date: Fri, 24 Mar 2000 10:00:18 -0800 >To: EDSTAT-L <[EMAIL PROTECTED]> >From: Jan de Leeuw <[EMAIL PROTECTED]> >Subject: time tests for S-plus >Cc: >Bcc: >X-Attachments: > >This is where we stand so far > > Program:test1test2 test3test4test5 > Iterations: 500 1000 1 1000 10 > > S-PLUS 3.4: 18.3 9.0 38.4 31.0232.9 > S-PLUS 5.1: 75.0152.6 317.1544.8224.1 >R 0.90.1: 56.5 5.7 47.0 18.1 27.9 > > S-PLUS 2000:13 2724 90 29 > >Once again, this should be compared with XLISP-STAT. I only >did test5, because these things are so boring. > > > (defun test-5 (n) > (let ((k 0)) >(dotimes (j n) > (let ((v (vector 1 2 j j 3))) > (if (= (aref v 1) 2) (incf k)) >TEST-5 >> (time (test-5 10) >) >The evaluation took 1.88 seconds; 0.23 seconds in gc. >NIL >> (compile 'test-5) >TEST-5 >> (time (test-5 10)) >The evaluation took 0.35 seconds; 0.05 seconds in gc. >NIL > >Thus the raw lisp version takes 1.88 seconds and the byte-compiled >version 0.35 seconds. By increasing the memory somewhat, and by >using a typed vector for v we may be able to speed this up another >fraction. For the time being, it seems that XLISP-STAT, on this >test, is about 100 times as fast as the fastest versions of S. O, I forgot some important info. This is on a 450MHz G4. On a SPARC Ultra-1 it's 5.43 seconds and 1.66 byte-compiled. On a 333Mhz iBook it's 2.13 and 0.43 seconds. -- === Jan de Leeuw; Professor and Chair, UCLA Department of Statistics; US mail: 8142 Math Sciences Bldg, Box 951554, Los Angeles, CA 90095-1554 phone (310)-825-9550; fax (310)-206-5658; email: [EMAIL PROTECTED] http://www.stat.ucla.edu/~deleeuw and http://home1.gte.net/datamine/ No matter where you go, there you are. --- Buckaroo Banzai http://webdev.stat.ucla.edu/sounds/nomatter.au === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
time tests for S-plus
This is where we stand so far Program:test1test2 test3test4test5 Iterations: 500 1000 1 1000 10 S-PLUS 3.4: 18.3 9.0 38.4 31.0232.9 S-PLUS 5.1: 75.0152.6 317.1544.8224.1 R 0.90.1: 56.5 5.7 47.0 18.1 27.9 S-PLUS 2000:13 2724 90 29 Once again, this should be compared with XLISP-STAT. I only did test5, because these things are so boring. > (defun test-5 (n) (let ((k 0)) (dotimes (j n) (let ((v (vector 1 2 j j 3))) (if (= (aref v 1) 2) (incf k)) TEST-5 > (time (test-5 10) ) The evaluation took 1.88 seconds; 0.23 seconds in gc. NIL > (compile 'test-5) TEST-5 > (time (test-5 10)) The evaluation took 0.35 seconds; 0.05 seconds in gc. NIL Thus the raw lisp version takes 1.88 seconds and the byte-compiled version 0.35 seconds. By increasing the memory somewhat, and by using a typed vector for v we may be able to speed this up another fraction. For the time being, it seems that XLISP-STAT, on this test, is about 100 times as fast as the fastest versions of S. -- === Jan de Leeuw; Professor and Chair, UCLA Department of Statistics; US mail: 8142 Math Sciences Bldg, Box 951554, Los Angeles, CA 90095-1554 phone (310)-825-9550; fax (310)-206-5658; email: [EMAIL PROTECTED] http://www.stat.ucla.edu/~deleeuw and http://home1.gte.net/datamine/ No matter where you go, there you are. --- Buckaroo Banzai http://webdev.stat.ucla.edu/sounds/nomatter.au === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Normality & parametric tests (WAS: Kruskal-Wallis & equal va
Hi Bruce The point I was making is that when developing hypothesis tests, from a theoretical point of view, the sampling distribution of the test statistic from which critical values or p-values etc are obtained, is determined by the null hypothesis. We need a probability model to enable use to determine how likely observed patterns are. These probability models will often work well in practice even if we relax the usual assumptions. When using distribution-free tests as an alternative to a parametric test we may need to specify restrictions in order that the tests can be considered "equivalent". In my view the t-test is fairly robust and will work well in most situations where the distribution is not too skewed, and constant variance is reasonable. Indeed I have no problems in using it for the majority of problems. When comparing two independent samples using t-tests, lack of normality and constant variance are often not too serious if the samples are of similar size, always a good idea in planned experiments. As you say, when samples are fairly large, some say 30+ or even less, the sampling distribution of the mean can often be approximated by a normal distribution (Central Limit Theorem) and hence the use of an (asymptotic) Z-test is frequently used. It would not, I think, be strictly correct to call such a statistic t, although from a practical point of view there may be little difference. The formal definition of the single sample t-test is derived from the ratio of a Standard Normal random variable to a Chi-squared random variable and does, in theory, require independent observations from a normal distribution. Regards - Bernie > On 24 Mar 2000, Bernard Higgins wrote: > > > These are my thoughts: > > > > The sampling distribution of a test statistic is determined by the > > null hypothesis. So analysis of variance is used to test that a > > number of samples come from an identical Normal distribution > > against the alternative that the "subpopulations" have different > > means (but the same variances). The mean and standard deviation of > > normally distributed random variables are independent of one > > another. > > > > Distribution free (non-parametric) procedures do not require the > > underlying distribution to be normal. For the majority of these > -- >8 --- Bruce replied: > > I think it is overly restrictive to say that the samples must come > from normally distributed populations under a true null hypothesis. > Take the simplest paramtric test, a single sample t-test. The > assumption is that the sampling distribution of X-bar is > (approximately) normal, not that the population from which you've > sampled is normal. If the population is normal, then of course the > sampling distribution of X-bar will be too, for any size sample > (even n=1). But if your sample size is large enough (e.g., some > authors suggest around n=300), the sampling distribution of X-bar > will be close to normal no matter what the population distribution > looks like. For populations that are not normal, but are reasonably > symmetrical, the sampling distribution of X-bar will be near enough > to normal with samples somewhere between these extremes. --- Bernie Higgins Division of Mathematics and Statistics University of Portsmouth Mercantile House Hampshire Terrace Portsmouth PO1 2EG Tel: 01705 843031 Fax: 01705 843106 Email: [EMAIL PROTECTED] --- === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Multidimensional Models IRT
I study statistics in the University of Pernambuco, Brazil. I'd like to obtain informations(paper, software) about Multidimensional Models for Item Response Theory. Thanks Ronald === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
log-Reg and colinearity
Hi all, Is it a problem that there exists colinearity between variables when performing a Log-Reg? If so, how avoid it? Thanks in advance, Llorenç Badiella [EMAIL PROTECTED] Sent via Deja.com http://www.deja.com/ Before you buy. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Help! Survival curve plot (Kaplan-Meier) with absolute numbers -
I stand by what I said. The function is a public-domain S-PLUS library. It is not an R library at present. If it clarifies this for anyone, in the future I'll label this as being in the public-domain S-PLUS library for S-PLUS users. -Frank "A.J. Rossini" wrote: > > "FEH" == Frank E Harrell <[EMAIL PROTECTED]> writes: > > FEH> The survplot function in the public-domain S-Plus library > > Frank - > > To nit-pick, the library is not public-domain, but a modified version > of the GPL, that you've licensed it under. (I just checked again). > So, it's sort of "open-source", to use the latest software licensing > buzzwords... > > If it had been public domain (or if you'd remove the disclaimer), I > would've at least evaluated it and possibly ported it to R if it was > useful for me (probably likely), instead of considering it (maybe > unfairly) a pretty useless piece of software... > > best, > -tony > > -- > A.J. Rossini Research Assistant Professor of Biostatistics > Biostatistics/Univ. of Washington (Th) Box 357232 206-543-1044 (3286=fax) > Center for AIDS Research/HMC/UW (M/F) Box 359931 206-731-3647 (3693=fax) > VTN/SCHARP/FHCRC (Tu/W) Box 358080 206-667-7025 (4812=fax) > rossini@(biostat.washington.edu|u.washington.edu|hivnet.fhcrc.org) > http://www.biostat.washington.edu/~rossini -- Frank E Harrell Jr Professor of Biostatistics and Statistics Division of Biostatistics and Epidemiology Department of Health Evaluation Sciences University of Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: successive intervals?
On Fri, 24 Mar 2000 Wendell <[EMAIL PROTECTED]> wrote: > I saw in a recent journal article mention of a technique for converting > ordinal scales to interval scales called *successive intervals*. I have > searched several references and can find no mention of it. Does anyone > know of a published description of this method? Sounds to me like a method of scaling (converting nominal or ordinal scales to one or more interval scales); in that context "successive intervals" refers, I believe, to an ordinal scale in which the several adjacent ("successive") intervals are not assumed to be equal, but are assumed to have a constant length on an underlying latent scale. I'd recommend starting with Professor Nishisato's text on dual scaling; I've taken the liberty of copying this reply to Nishi in case he wants to comment. (I know he's offering a short course on dual scaling sometime soon, but I've forgotten the dates.) -- DFB. Donald F. Burrill [EMAIL PROTECTED] 348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED] MSC #29, Plymouth, NH 03264 603-535-2597 184 Nashua Road, Bedford, NH 03110 603-471-7128 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
6 NJ short courses & seminars
Springtime for Statistics (April-May-June) Six New Jersey Area announcements [1] Logistic Regression Short Course [2] Clinical Trials Short Course [3] Multiple Comparison & Exact Inference Short Courses [4] Bates' Nonlinear Regression Short Course [5] ICSA Symposium [6] NJ Chapter, ASA Spring Symposium [7] announcement of conscience ===( Announcement #1: Short Course )=== The New Jersey and New York City Metro Chapters Present: An American Statistical Association Short Course, An Introduction to Logistic Regression Stanley Lemeshow, Ph.D. FRIDAY April 7, 2000 9:00 A.M.-1:00 P.M. Course Outline: * The Logistic Regression Model (Chap 1 and 2) * Estimating the Coefficients in the Logistic Model (Chap 1 and 2) * Assessing Model Performance (Chap 5) Text: Hosmer, D. W., & Lemeshow, S. (1989). Applied Logistic Regression. New York: Wiley. Handout will be provided. Text is available from John Wiley Publishers Dr. Lemeshow is Director of the Ohio State University Biostatistics Program and Professor of Biostatistics in School of Public Health and Department of Statistics. He has 25 years experience in research and teaching in biomedical applications; he is an internationally recognized statistician for his contributions to the fields of logistic regression, sample survey methods, and survival analysis. He is Fellow of the American Statistical Association and co-author of 4 recent texts in applied statistical methods: Applied Logistic Regression, Applied Survival Analysis, Sampling of Populations, and Adequacy of Sample Size. Location: Montclair State University, Upper Montclair, NJ Richardson Hall, RI-106 Time:9:00 A. M. to 1:00 P. M. 8:30 A.M. Registration and Continental Breakfast Registration: $85 Chapter members, $95 Non members, $50 Students Fee includes handout, continental breakfast and box lunch Reg. Deadline: March 31, 2000 Directions:visit Montclair web site for directions & pub transp: http://www.montclair.edu/welcome/directions.html Information: Cynthia Scherer,[EMAIL PROTECTED] [212] 733-4085 Registration Form An Introduction to Logistic Regression Stanley Lemeshow, Ph.D. Friday April 7, 2000 = Name: Organization: ___ Busness Address: ___ ___ Phone:___ Email ___ Registration Deadline: Friday, March 31, 2000 ASA Chapter Member $85 Non Member$95 Full TimeStudents$50 Payment enclosed $15 additional fee to register on site. Checks should be made out to: New York Metro ASA Chapter. Mail this Registration form and your check: Marcia Levenstein Pfizer Pharmaceuticals 235 E. 42nd Street MS 205-8-24 New York, New York 10017 Fax 212 -309-4346 <> ===( Announcement # 2: Presentation )=== Covance, The Princeton-Trenton and New Jersey Chapters of the American Statistical Association present Dr. Gordon Lan, Ph.D. "The Use of Conditional Power in Interim Analyses of Clinical Trials." 28 April 2000 3:00 - 5:00pm Covance, Inc. 206 Carnegie Center, Princeton, NJ Please R.S.V.P. and fax to Covance at (609) 514-0971 by Tuesday, 25 April 2000. Dr. Lan is a Senior Technical Advisor at Pfizer Central Research, Groton, Connecticut. His tenure at Pfizer since 1995 follows an academic career, including the appointments of Professor of Statistics at George Washington University, and Mathematical Statistician at the National Heart, Lung and Blood Institute of the National Institutes of Health. Directions >From the New York - Northern New Jersey area: Take the New Jersey Turnpike South to Exit 9. Follow the signs for Route 18 North immediately watch for signs for Route 1 South. Proceed on Route 1 South for approximately 17 miles. Take the Alexander Road East exit (toward Princeton Junction) and cross over Route 1 (the Princeton Hyatt Hotel will be on your right). Tak
Re: Sample size: way tooo big?
On Thu, 23 Mar 2000 16:01:13 -0500, Rich Ulrich <[EMAIL PROTECTED]> wrote: >Now let me jump on Andy! > > >On Thu, 23 Mar 2000 17:51:04 GMT, [EMAIL PROTECTED] (Andy Gilpin) >wrote: > < snip, problem; comment > >> >> Still, it seems to me that, other things equal, (a) measuring data >> costs a researcher something, and (b) there are clear diminishing >> returns in terms of increased power. Consider the following estimated >> sample sizes for an independent-groups t-test with 2-tailed alpha=.05 >> and a moderate effect size (in Cohen's terms) of d=.5. >... > >Oh, Andy, this is such a naive *scaling* conclusion. How can you >regard "power" as a metric that ought to be equal-interval? Wait a minute, Rich! I'll admit I'm often naive, but here I thought I was saying "Don't forget that power is NOT so simply related to N." I do think that my students sometimes fail to appreciate this, however, perhaps since many texts represent increased N as the chief mechanism by which to increase power. I think that a quick examination of tables of critical values of the t distribution will help them see things more accurately. But my impression is that many low-level texts seem to discuss increased N and increased alpha as the best ways to increase power--ignoring factors such as increased reliability in measuring the dependent variable. > By my way >of thinking, extra power gets cheaper and cheaper, as the magnitude of >N increases. In your table, I am looking at successive doublings of >N, and Odds and Odds-ratios for power: > >The power at .80, in terms of an "Odds", >is 0,80/0.20 or 4:1, with N=128; >it is 39:1 when the N is doubled, to N=248; >it is :1 if N is doubled again, to N=518. > >The first doubling corresponds to an Odds ratio, for your increase in >power, of 10:1, which might seem sizable, but the second doubling >provides an OR of 250:1. That is *one* way to say that I disagree. > One way to look at it. But I'd be buying lottery tickets if I could get odds as 'low' as 4:1, and in research I think many researchers would be satisfied with power of .8 (which, like it or not, seems to be emerging as a convention at least in the field of psychology). Maybe not a really defensible choice, but that's a different issue. >Andy> >>... >The ground gained by quadrupling the number of cases is always, >basically for a t-test, the reduction of the width of the Confidence >Interval by half. > >Do you want a smaller CI? Do you need a smaller CI? > >"Barely not-overlapping zero" is what you have for the usual rejection >of the null hypothesis in psychology. That's not too bad with tiny N, >because it works out neatly: the 5% rejection when d=1.0, implying >"d>0.0" may be equivent to a 15% test (say) that "d>0.5". But to >set up a CI > 0.5 with a large sample is going to assume that there >is a *huge* power for detecting an effect that is merely " > 0.0" > >TO put it another way, you can't assume that the only goal is to >detect an effect as being non-zero. In fact, I think it is pretty >useless to cite 95% CI's as an "effect" when the test is barely at 5%; >the range is just LARGE. > Here I think Rich makes an important point, but it supposes the need to shift paradigms in experimentation. If researchers typically conceptualized two-group experiments in terms of attempts to estimate a confidence interval--as they probably should do, but sadly mostly don't--the interval would continue to shrink with additional N. But most actual researchers appear to want to test the null hypothesis that the means are identical, and for that purpose, once we have adequate power to reject that hypothesis most of the time (when it's false), adding more cases is only going to inflate the economic cost of the research without being likely to change the decision. It's not that really large sample sizes don't improve estimation of error--but that they don't necessarily change the decision regarding the hypothesis most often entertained in practice, viz., equal means. >-- >Rich Ulrich, [EMAIL PROTECTED] >http://www.pitt.edu/~wpilib/index.html Andy Gilpin, Dept. of PsychologyInternet: [EMAIL PROTECTED] Univ. of Northern Iowa, Cedar Falls,Phone: (319) 273-6104 IA (US) 50614-0505 Fax: (319) 273-6188 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Normality & parametric tests (WAS: Kruskal-Wallis & equal variances)
On 24 Mar 2000, Bernard Higgins wrote: > These are my thoughts: > > The sampling distribution of a test statistic is determined by the > null hypothesis. So analysis of variance is used to test that a > number of samples come from an identical Normal distribution > against the alternative that the "subpopulations" have different > means (but the same variances). The mean and standard deviation of > normally distributed random variables are independent of one another. > > Distribution free (non-parametric) procedures do not require the > underlying distribution to be normal. For the majority of these -- >8 --- I think it is overly restrictive to say that the samples must come from normally distributed populations under a true null hypothesis. Take the simplest paramtric test, a single sample t-test. The assumption is that the sampling distribution of X-bar is (approximately) normal, not that the population from which you've sampled is normal. If the population is normal, then of course the sampling distribution of X-bar will be too, for any size sample (even n=1). But if your sample size is large enough (e.g., some authors suggest around n=300), the sampling distribution of X-bar will be close to normal no matter what the population distribution looks like. For populations that are not normal, but are reasonably symmetrical, the sampling distribution of X-bar will be near enough to normal with samples somewhere between these extremes. -- Bruce Weaver [EMAIL PROTECTED] http://www.angelfire.com/wv/bwhomedir/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: over and above
>There is a conflict between practice and experience in that a goal of >practice is to reach the point where you carry out the procedure >WITHOUT THINKING ABOUT THE UNDERLYING CONCEPTS OR PRINCIPLES. It >beomes automated and brainless, a skill of the fingers (or toes). For >that reason, I think practice should be dispensed only with a doctor's >prescription, AFTER one has shown mastery of the concepts. In >pracitce our education system tends to work the other way around -- >try to get everybody to go through the motions and hope a few will see >the reasons. > i would point out however that the notion that we apply principles or concepts without thinking is precisely what we really want ... for example, a student comes in with some statistical question ... or idea about a research project ... and, WITHOUT sitting there 'thinking', you as the expert have ready made questions ... suggestions ... paths or leads for the student to follow ... THIS is what we do BECAUSE we have practiced these 'ideas' over and over again. we don't 'think' when we now apply what we have learned ... think about grading papers ... while students struggle with DOING the paper or project .. the GOOD faculty member has no such problem examining and finding flaws and good points ... it comes automatically BECAUSE we have done it a 1000 times ... this is not bad ... it is good. for, if we did not have that 'developed' level of skill ... we would take FORever to grade papers ... we would have to RElearn on the spot ... each and every aspect of what we are expecting from paper and project work ... === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Calculator policy
Muriel Strand wrote: > i would like to (approximately) echo the other responses on this topic. last > fall i was almost trapped in a very poorly taught econometrics class and the > clincher was when i showed up to take the first exam and found out it was to be > closed book - it never even occurred to me to ask. i cannot remember the last > time i had a test that was closed book/notes. this particular class focused on > memorizing definitions and using software like a black box, NOT on > understanding stats. > > when engineers take the state licensing exam, they can bring in a suitcase full > of books if they care to. i probably brought about 10, AND i knew what > formulae to use in what situations and where to find them. i'm not aware that > this practice has caused any dangers to the citizenry, and presumably the state > licensing board doesn't think so either. For advanced courses, I agree 100%. For students at lower levels, my experience has been that many have not got the maturity to do well in a completely open-book exam. I have tried this on a couple occasions in first-year courses and the results were so bad that I swore off open-book tests at that level. Many students were timed out on what should have been a rather trivial test, having spent the hour leafing through their book. I do permit a fairly ample sheet of notes (colleagues in multi-section courses permitting), and have had no similar problem. We have to cut tyros some slack here. In my experience, mediocre students seem to do *best* in closed-book exams in which significant credit is often given for memorizing facts, and - provided the exam is designed for the rules and does not give credit for writing down trivia - *worst* in open-book exams. (What some would *like* is an exam designed for closed-book conditions, written open-book. In their dreams...) By the time we get to serious stuff like licensing exams, we need exams designed for grownups - and that means open-book. -Robert Dawson === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: over and above
Herman Rubin wrote: > > snip > > > > This might possibly be the case for the weak students, but > > not for the strong ones. It is the concepts which are the > > most important part, and concepts need little, if any, > > practice and Muriel Strand responded: > i'm not sure the statement below is true for this strong student. when it comes > to applying the concepts, you do need to know them inside and out and much of > that intuition (for me) came from reams of homework problems. I'm with Muriel on this one. Even when you *think* you understand a concept, trying to *use* it is a (the?) acid test of whether you do. Even if you really do understand it as soon as you read it, there is always the question - if nothing else - "how would this work for me in practice?" that is best answered by rolling up the sleeves & getting your fingers dirty right up to the elbows. The strongest students have the most to gain by this, as they will develop a far deeper understanding. -Robert Dawson === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Calculator policy
> in addition, since good calculators allow storage ... The problem arises when (eg) a few calculators can construct boxplots or do t tests and the rest cannot. "Storage" is a convenient surrogate for "all-singing, all-dancing, able to mix a martini and talk football" and we would > encourage students to have this kind ... then when it comes to a test ... > you either have to SUPPLY any one who wants one with one ... OR, you are > putting at some disadvantage those with real goods ones ... Naaah. They can buy a "test calculator" for $10. Or borrow one. that they can't > bring ... compared to students who have simpler ones ... and CAN bring them > if you are that worried about what they can or can't do with the calculator > ... don't allow ANY calculator ... other than their HEAD ... to be used. That *is* the best policy in many cases; my preference is that, on tests, mean and standard deviation should be computed once or twice longhand for tiny [n=3..5] datasets, and given for all inference problems. In real life a computer should always be used. Granted, some handheld machines are essentially computers now; if the prices come down enough on these they may be the solution to the current "teach with a computer, examine without one" dilemma. We already have handhelds with cut-down spreadsheets and word processors; a handheld with cut-down MINITAB or SPSS [or a new product designed around the limitations of the handheld of 2005] might not be far off. I could get to like a product like that. -Robert Dawson === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Kruskal-Wallis & equal variances
Hi Gene > I'd just finished telling my class that when the assumption of > homogeneity of variances is violated, use the Kruskal-Wallis test > instead of the parametric equivalent. > One student pointed out to me afterwards that Underwood (1997, p. 131, > Experiments in Ecology) states that the K-W test also assumes equal > variances, citing Hollander & Wolfe. Underwood goes on to state that it > is silly for ecologists to abandon the t test for a nonparametric test > having the same assumption. > Indeed, Hollander & Wolfe (1999, Nonparametric statistical methods, p. > 195) comment on the K-W test "Assumption A3 requires that the k > underlying distributions belong to the same general family (F) and that > they do not differ in scale parameters (variability)" On p. 198, > Hollander & Wolfe cite a modified K-W statistic proposed by Rust and > Flinger (1984) as a possible solution for samples with unequal > variances. These are my thoughts: The sampling distribution of a test statistic is determined by the null hypothesis. So analysis of variance is used to test that a number of samples come from an identical Normal distribution against the alternative that the "subpopulations" have different means (but the same variances). The mean and standard deviation of normally distributed random variables are independent of one another. Distribution free (non-parametric) procedures do not require the underlying distribution to be normal. For the majority of these distributions the mean and variance will be functions of the same parameters and so changes in the mean will be accompanied by changes in the variance (e.g. Exponential, Poisson etc). If we assume that, under Ho, the underlying distributions are identical then the population mean, median and variance etc will be the same. Thus the non-parametric test could be stated as equivalent to a test of the means, under these conditions. If the null hypothesis is false we would expect to see differences in means, medians and hence variances etc. In practice I think distribution free tests are used to compare two (or more) medians under more relaxed conditions, that is we do not specify (under Ho) that the underlying distributions are identical but merely have the same median value(s). Consider the folllowing senario involving two independent samples: Sample A: 24 36 67 120 149 Sample B: 3 11 14 28 47 Apply a t-test to the raw data and that following a logartimic transformation. What happens? Apply a Mann Whitney test under the same two conditions. What happens? A transformation may be found which produces samples, (perhaps not normally distributed) with roughly equal variances so that the t-test might be argued as a valid procedure. However, the outcome of the non-parametric test has not changed, the test statistic and p-values are identical! I think that conditions under which distribution-free or non-parametric tests can be used are not always clearly stated in text books and indeed may be misleading or confusing to students (and teachers). Regards Bernie --- Bernie Higgins Division of Mathematics and Statistics University of Portsmouth Mercantile House Hampshire Terrace Portsmouth PO1 2EG Tel: 01705 843031 Fax: 01705 843106 Email: [EMAIL PROTECTED] --- === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Bruno de Finetti
Does anybody know where to find basic bibliografical info about Bruno de Finetti on the web? Thank you very much Renzo === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===