The best non-parametric book I know of is Marasculio and McSweeny. It is out of print so if you find a copy copy it. It is a classic
Pamela Auburn, PhD 2041 Branard Houston TX 77098 >From: [EMAIL PROTECTED] (edstat-digest) >Reply-To: [EMAIL PROTECTED] >To: [EMAIL PROTECTED] >Subject: edstat-digest V2000 #545 >Date: Fri, 2 Nov 2001 14:09:26 -0500 (EST) > >edstat-digest Friday, November 2 2001 Volume 2000 : Number >545 > > > > >---------------------------------------------------------------------- > >Date: Thu, 1 Nov 2001 17:00:31 -0000 >From: "Chia C Chong" <[EMAIL PROTECTED]> >Subject: Good book about non-parametric statistical hypothesis test > >Does anyone know any good reference book about non-parametric statistical >hypothesis test?? > >Thanks.... > >CCC > > > > >================================================================= >Instructions for joining and leaving this list and remarks about >the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ >================================================================= > >------------------------------ > >Date: Thu, 1 Nov 2001 12:24:29 -0500 >From: "Andrew E. Schulman" <[EMAIL PROTECTED]> >Subject: Re: inducing rank correlations > > > Now, lets say I specify a target correlation matrix as follows: > > > > > > A B C > > A 1 > > B 1 1 > > C 1 -1 1 > > > > The problem with above matirx is that we want large values of 'A' to > > be paired with large values of 'B' and also large values of 'A' to > > be paired with large values of 'C'. > > BUT, we specify a (-1) corrleation between B and C, which means we > > want large values of > > 'B' to be paired with small values of 'C'. This might pose a problem > > because of earlier > > specified correlations between A,B and A,C. > > > > Is there any way of checking for validity of the target correlation > > matirx. > >The problems you describe with conflicting values of A, B, C under this >correlation matrix arise because the matrix is not positive definite. >Therefore, it is not a possible correlation or covariance matrix. > >Positive definiteness of a matrix M is the property that a'*M*a>=0 for >all (real) vectors a. Since the variance of a linear combination a'*X of >a random vector X is a'*Var(X)*a, a positive definite covariance >matrix means that the variance of any linear combination of the >components of X must be nonnegative. This is obviously a necessary >condition in order for M to be a covariance matrix. It can also be shown >to be sufficient-- if M is positive definite, you can construct random >variables A, B, C with covariance matrix M. > >M is positive definite (or non-negative definite, to be more precise) iff >all of its eigenvalues are non-negative. The eigenvalues of your matrix >are 2, 2, and -1. So it's not positive definite, and can't be a >covariance (or correlation) matrix. Here's the proof: using that >correlation matrix, compute the variance of -A+B+C. It's negative. > >In the particular case of rank correlations, I'm sure there are other >conditions too, but I don't know what they are right now. > >A. > >- -- >To reply by e-mail, change "deadspam" to "home" > > >================================================================= >Instructions for joining and leaving this list and remarks about >the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ >================================================================= > >------------------------------ > >Date: Thu, 01 Nov 2001 18:30:21 GMT >From: [EMAIL PROTECTED] (Michael Dewey) >Subject: Re: Good book about non-parametric statistical hypothesis test > >On Thu, 1 Nov 2001 17:00:31 -0000, "Chia C Chong" ><[EMAIL PROTECTED]> wrote: > >:Does anyone know any good reference book about non-parametric statistical >:hypothesis test?? >: >:Thanks.... >: >:CCC >: >: >Try any one of >@BOOK{leach79, > author = {Leach, C}, > year = 1979, > title = {Introduction to statistics. {A} non-parametric approach for >the > social sciences}, > publisher = {Wiley}, > address = {Chichester}, > keywords = {non-parametric} >} >@BOOK{sprent93, > author = {Sprent, P}, > year = 1993, > title = {Applied nonparametric statistical methods}, > edition = {2nd}, > publisher = {Chapman and Hall}, > address = {London}, > keywords = {non-parametric} >} >@BOOK{siegel56, > author = {Siegel, S}, > year = 1956, > title = {Nonparametric statistics for the behavioral sciences}, > publisher = {McGraw-Hill}, > address = {New York}, > keywords = {non-parametric} >} > >- -- >Michael Dewey >http://www.aghmed.fsnet.co.uk/ > > >================================================================= >Instructions for joining and leaving this list and remarks about >the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ >================================================================= > >------------------------------ > >Date: Thu, 1 Nov 2001 17:07:52 -0500 >From: "Jonsey" <[EMAIL PROTECTED]> >Subject: Re: Good book about non-parametric statistical hypothesis test > >Try "Practical Nonparametric Statistics" by W.J. Conover >"Chia C Chong" <[EMAIL PROTECTED]> wrote in message >9rrv0e$4hk$[EMAIL PROTECTED]">news:9rrv0e$4hk$[EMAIL PROTECTED]... > > Does anyone know any good reference book about non-parametric >statistical > > hypothesis test?? > > > > Thanks.... > > > > CCC > > > > > > > > >================================================================= >Instructions for joining and leaving this list and remarks about >the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ >================================================================= > >------------------------------ > >Date: Thu, 01 Nov 2001 16:58:46 -0500 >From: Rich Ulrich <[EMAIL PROTECTED]> >Subject: Re: Testing for joint probability between 2 variables > >On Tue, 30 Oct 2001 21:10:02 -0000, "Chia C Chong" ><[EMAIL PROTECTED]> wrote: > >[ ... ] > > > > The observations were numbers. To be specified, the 2 variables are >DELAY > > and ANGLE. So, basically I am looking into some raw measurement data > > captured in the real environment and after post-proceesing these data, I > > will have information in these two domains. > > > > I do not know whether there are linearly correlated or sth else but, by > > physical mechanisms, there should be some kind of correlation between >them. > > They are observed over the TIME domain. > >I don't think it has been answered yet, whether they are >correlated because they are autocorrelated in a trivial way. >What does it mean here -- or does it happen to signify >nothing -- that observation is "over the TIME domain". > >That is, you have a real problem yet to be faced, if these are >measured as "cumulative delay" and "cumulative angle". > >- -- >Rich Ulrich, [EMAIL PROTECTED] >http://www.pitt.edu/~wpilib/index.html > > >================================================================= >Instructions for joining and leaving this list and remarks about >the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ >================================================================= > >------------------------------ > >Date: Thu, 1 Nov 2001 22:28:18 -0000 >From: "Chia C Chong" <[EMAIL PROTECTED]> >Subject: Re: Testing for joint probability between 2 variables > >"Rich Ulrich" <[EMAIL PROTECTED]> wrote in message >[EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > > On Tue, 30 Oct 2001 21:10:02 -0000, "Chia C Chong" > > <[EMAIL PROTECTED]> wrote: > > > > [ ... ] > > > > > > The observations were numbers. To be specified, the 2 variables are >DELAY > > > and ANGLE. So, basically I am looking into some raw measurement data > > > captured in the real environment and after post-proceesing these data, >I > > > will have information in these two domains. > > > > > > I do not know whether there are linearly correlated or sth else but, >by > > > physical mechanisms, there should be some kind of correlation between >them. > > > They are observed over the TIME domain. > > > > I don't think it has been answered yet, whether they are > > correlated because they are autocorrelated in a trivial way. > > What does it mean here -- or does it happen to signify > > nothing -- that observation is "over the TIME domain". > > > > That is, you have a real problem yet to be faced, if these are > > measured as "cumulative delay" and "cumulative angle". > > > > -- > > Rich Ulrich, [EMAIL PROTECTED] > > http://www.pitt.edu/~wpilib/index.html > > >In fact, what I was trying to say was, over the 5 seconds (TIME) domains, I >will measured 2 random variables i.e. DELAY and ANGLE. So, I would like to >test whether during the 5s, those angles and delays of the signal I >receievd >are correlated or not. > >By the way, what do u mean "cumulative delay" and "cumulative angle"?? > >thanks.. > >CCC > > > > >================================================================= >Instructions for joining and leaving this list and remarks about >the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ >================================================================= > >------------------------------ > >Date: Thu, 01 Nov 2001 18:15:50 -0500 >From: dennis roberts <[EMAIL PROTECTED]> >Subject: p value > >most software will compute p values (say for a typical two sample t test of >means) by taking the obtained t test statistic ... making it both + and - >... finding the two end tail areas in the relevant t distribution ... and >report that as p > >for example ... what if we have output like: > > > N Mean StDev SE Mean >exp 20 30.80 5.20 1.2 >cont 20 27.84 3.95 0.88 > >Difference = mu exp - mu cont >Estimate for difference: 2.95 >95% CI for difference: (-0.01, 5.92) >T-Test of difference = 0 (vs not =): T-Value = 2.02 P-Value = 0.051 DF = >35 > >for 35 df ... minitab finds the areas beyond -2.20 and + 2.02 ... adds them >together .. and this value in the present case is .051 > >now, traditionally, we would retain the null with this p value ... and, we >generally say that the p value means ... this is the probability of >obtaining a result (like we got) IF the null were true > >but, the result WE got was finding a mean difference in FAVOR of the exp >group ... > >however, the p value does NOT mean that the probability of finding a >difference IN FAVOR of the exp group ... if the null were true ... is .051 >... right? since the p value has been calculated based on BOTH ends of the >t distribution ... it includes both extremes where the exp is better than >the control ... AND where the cont is better than the exp > >thus, would it be fair to say that ... it is NOT correct to say that the p >value (as traditionally calculated) represents the probability of finding a >result LIKE WE FOUND ... if the null were true? that p would be 1/2 of >what is calculated > >this brings up another point ... in the above case ... typically we would >retain the null ... but, the p of finding the result LIKE WE DID ... if the >null were true ... is only 1/2 of .051 ... less than the alpha of .05 that >we have used > >thus ... what alpha are we really using when we do this? > >this is just a query about my continuing concern of what useful information >p values give us ... and, if the p value provides NO (given the results we >see) information as to the direction of the effect ... then, again ... all >it suggests to us (as p gets smaller) is that the null is more likely not >to be true ... > >given that it might not be true in either direction from the null ... how >is this really helping us when we are interested in the "treatment" effect? > >[given that we have the direction of the results AND the p value ... >nothing else] > >============================================================== >dennis roberts, penn state university >educational psychology, 8148632401 >http://roberts.ed.psu.edu/users/droberts/drober~1.htm > > > >================================================================= >Instructions for joining and leaving this list and remarks about >the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ >================================================================= > >------------------------------ > >Date: Thu, 1 Nov 2001 23:51:04 -0000 >From: "Chia C Chong" <[EMAIL PROTECTED]> >Subject: Can I Use Wilcoxon Rank Sum Test for Correlated & Clustered Data?? > >I am a beginner in the statistical analysis and hypothesis. I have 2 >variables (A and B) from an experiment that was observed for a certain >period time. I need to form a statistical model that will model these two >variables. As an initial step, I plot the histograms of A & B separately to >see how the data were distributed. However, it seems that both A & B can't >be easily described by a simple statistical distributions like Gaussian, >uniform etc via visualisation. Hence, I proceeded to plot the >Quantile-Quantile plot (Q-Q plot) and trying to the fit both A and B with >some theoretical distributions (all distributions avaiable in Matlab!!). >Again, none of the distributions seem can descibe then completely. Then I >was trying to perform the Wilcoxon Rank Sum test. From the data, it seems >that A & B might be correlated in som sense. > >My question is, what can I purely rely on the Wilcoxon Rank Sum Test to >find >the parameters of the distributions that can describe A & B??How do perform >test to see whether A & B are really correlated?? How if A or/and B are >overlay of two or more distributions?? Can this test tell me?? What make >thing more tricky is that clustering was also observed in both A & B. > >I really hope to get an idea how to start with the statistical analysis for >this kind problem...# > >Thanks for the time... > >Cheers, >CCC > > > > >================================================================= >Instructions for joining and leaving this list and remarks about >the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ >================================================================= > >------------------------------ > >Date: 1 Nov 2001 21:05:09 -0800 >From: [EMAIL PROTECTED] (Glen) >Subject: Re: Good book about non-parametric statistical hypothesis test > >"Chia C Chong" <[EMAIL PROTECTED]> wrote in message >news:<9rrv0e$4hk$[EMAIL PROTECTED]>... > > Does anyone know any good reference book about non-parametric >statistical > > hypothesis test?? > > > > Thanks.... > > > > CCC > >Read more than one. Here are some that I got >some value from, though I do have arguments >with all of them in places. Some are getting >very old. There's a fairly current Conover, though, >so you should at least be able to find it. > >- - Distribution-Free Tests, H.R. Neave and P.L. Worthington > >I quite like Neave and Worthington's discussion >of hypothesis testing, but their book then tends >to go a bit heavy on the recipes at times. > >- - Practical Nonparametric Statistics, W. J. Conover > >Conover's book is a good all round book, but it's >not my favourite for a variety of reasons. > >- - Nonparametric Statistics for the Behavioral Sciences, >Sidney Siegel, N. John Castellan > >- - Nonparametric and Distribution-Free Methods for the Social Sciences, > Marascuilo and McSweeney > >- - Distribution-free statistical tests, J.V. Bradley >Now very old, but some parts of his discussion I haven't seen >elsewhere > >- - Nonparametrics: statistical methods based on ranks, >E. L. Lehmann > > >================================================================= >Instructions for joining and leaving this list and remarks about >the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ >================================================================= > >------------------------------ > >Date: 1 Nov 2001 21:28:21 -0800 >From: [EMAIL PROTECTED] (Glen) >Subject: Re: Can I Use Wilcoxon Rank Sum Test for Correlated & Clustered >Data?? > >Are all the questions you post related to the same problem? > >Why not let us in on what you're actually doing, so we have more >of a clue how to answer your questions? > >Glen > > >================================================================= >Instructions for joining and leaving this list and remarks about >the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ >================================================================= > >------------------------------ > >Date: 1 Nov 2001 21:20:59 -0800 >From: [EMAIL PROTECTED] (Glen) >Subject: Re: Can I Use Wilcoxon Rank Sum Test for Correlated & Clustered >Data?? > >"Chia C Chong" <[EMAIL PROTECTED]> wrote in message >news:<9rsn26$98h$[EMAIL PROTECTED]>... > > I am a beginner in the statistical analysis and hypothesis. I have 2 > > variables (A and B) from an experiment that was observed for a certain > > period time. I need to form a statistical model that will model these >two > > variables. As an initial step, I plot the histograms of A & B separately >to > > see how the data were distributed. However, it seems that both A & B >can't > > be easily described by a simple statistical distributions like Gaussian, > > uniform etc via visualisation. Hence, I proceeded to plot the > > Quantile-Quantile plot (Q-Q plot) and trying to the fit both A and B >with > > some theoretical distributions (all distributions avaiable in Matlab!!). > > Again, none of the distributions seem can descibe then completely. Then >I > > was trying to perform the Wilcoxon Rank Sum test. > >WHY? What is it you're trying to find out? > > > From the data, it seems > > that A & B might be correlated in som sense. > >Can you be more specific? Are the variables observed together, and >related so that A(i) is correlated with B(i)? > >In that case, use a procedure that deals with the pairing, rather >than tossing them at a technique that relies on their independence. > >Are A and B serially correlated with themselves? > >Are they cross-correlated at some lag? > >Please be clearer. > > > My question is, what can I purely rely on the Wilcoxon Rank Sum Test to >find > > the parameters of the distributions that can describe A & B?? > >Even if A and B satisified all the assumptions for the test, >IT WILL NOT TELL YOU "the parameters of the distributions that >can describe A & B". > > >Again, what are you trying to achieve? > > > How do perform > > test to see whether A & B are really correlated?? How if A or/and B are > > overlay of two or more distributions?? Can this test tell me?? What make > > thing more tricky is that clustering was also observed in both A & B. > > > > I really hope to get an idea how to start with the statistical analysis >for > > this kind problem...# > >Don't start with some ill-chosen procedure, and then try to commit >acts of mayhem on your data until it will fit in the box. Start with >the questions you're trying to find out about, along with what you >know about the situation and believe about the data. > >So answer these questions >- -"What do I know?" > (write a list... e.g. i) data are pairs observed over time, ii)... ) >- -"What do I believe or expect before I start?" > (e.g. i) data pairs will be correlated ii) likely serial correlation, >iii)...) >- -"What do I want to know?" > >*THEN* worry about how to do it (what procedure to use). > >The methodology should not be the starting point! > >Glen > > >================================================================= >Instructions for joining and leaving this list and remarks about >the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ >================================================================= > >------------------------------ > >Date: Fri, 2 Nov 2001 01:23:26 -0500 (EST) >From: Donald Burrill <[EMAIL PROTECTED]> >Subject: Re: Can I Use Wilcoxon Rank Sum Test for Correlated & Clustered >Data?? > >On Thu, 1 Nov 2001, Chia C Chong wrote: > > > I am a beginner in the statistical analysis and hypothesis. I have 2 > > variables (A and B) from an experiment that was observed for a certain > > period time. I need to form a statistical model that will model these > > two variables. > >Seems to me you're asking in the wrong place. The _model_ cannot be >determined statistically, nor (in general) by statisticians. It arises >from the investigator's knowledge of the substantive area in which the >experiment was carried out, and of the reasons why the experiment was >designed & conducted in the first place. Given a model, or, better, a >series of more or less complex models, a statistician can help you decide >among them, and can help you arrive at numerical values for (at least >some of) the parameters of the models. > > > As an initial step, I plot the histograms of A & B separately to > > see how the data were distributed. > >How would you (or the investigator) expect them to be distributed? >In particular, why would you think they might follow any of the usual >theoretical distributions? (In other words, what's the theory behind >your expectations -- or your lack of expectations?) > > > However, it seems that both A & B can't be easily described by a simple > > statistical distributions like Gaussian, uniform etc via visualisation. > > Hence, I proceeded to plot the Quantile-Quantile plot (Q-Q plot) > >What did you think this would tell you? > > > and trying to the fit both A and B with some theoretical distributions > > (all distributions avaiable in Matlab!!). Again, none of the > > distributions seem can descibe then completely. Then I was trying to > > perform the Wilcoxon Rank Sum test. > >What hypothesis were you testing, and why was the Wilcoxon test relevant >to it? > > > From the data, it seems that A & B might be correlated in some sense. > >You have not described a scatterplot of A vs. B (or B vs. A, whichever >pleases you). Why not? > > > My question is, what can I purely rely on the Wilcoxon Rank Sum Test to > > find the parameters of the distributions that can describe A & B?? > >Since the Wilcoxon is allegedly a distribution-free test, I'm quite >bemused by the idea that it might help one _find_ parameters... > > > How do perform test to see whether A & B are really correlated?? > >Practically all pairs of variables are correlated, to one degree or >another. What will it signify to you if A and B are (or are not) >"really" correlated (whatever "really" is intended to mean)? > > > How if A or/and B are overlay of two or more distributions?? > >Hmm. By "overlay", do you mean "mixture", perhaps? > > > Can this test tell me?? What make thing more tricky is that clustering > > was also observed in both A & B. > >At the same times, or in the same places? > > > I really hope to get an idea how to start with the statistical analysis > > for this kind problem...# > >I'm sorry, but I don't yet perceive precisely what the problem is that >the data were intended (or designed?) to address. > -- DFB. > ------------------------------------------------------------------------ > Donald F. Burrill [EMAIL PROTECTED] > 184 Nashua Road, Bedford, NH 03110 603-471-7128 > > > >================================================================= >Instructions for joining and leaving this list and remarks about >the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ >================================================================= > >------------------------------ > >Date: Fri, 02 Nov 2001 06:49:21 +0000 >From: John Kane <[EMAIL PROTECTED]> >Subject: Re: They look different; are they really? > >Gus Gassmann wrote: > > > Stan Brown wrote: > > > > > Another instructor and I gave the same exam to our sections of a > > > course. Here's a summary of the results: > > > > > > Section A: n=20, mean=56.1, median=52.5, standard dev=20.1 > > > Section B: n=23 mean=73.0, median=70.0, standard dev=21.6 > > > > > > Now, they certainly _look_ different. (If it's of any valid I can > > > post the 20+23 raw data.) If I treat them as samples of two > > > populations -- which I'm not at all sure is valid -- I can compute > > > 90% confidence intervals as follows: > > > > > > Class A: 48.3 < mu < 63.8 > > > Class B: 65.4 < mu < 80.9 > > > > > > As I say, I have major qualms about whether this computation means > > > anything. So let me pose my question: given the two sets of results > > > shown earlier, _is_ there a valid statistical method to say whether > > > one class really is learning the subject better than the other, and > > > by how much? > > > > Before you jump out of a window, you should ask yourself if there > > is any reason to suspect that the samples should be homogeneous > > (assuming equal learning). Remember that the students are often > > self-selected into the sections, and the reasons for selecting one > > section over the other may well be correlated with learning styles > > and/or scholastic achievements. > >Speaking as someone who does a lot of psychometrics, is there any reason >to believe you have a reliable test? > >Reliable in the technical psychometric term that is? That is the first >and most important question. We will ignore the question of validity :) > >Are you and your associate using the same test? You say so but is there >any chance of minor modifications? Even in the instrutcions ? Sorry to >be so picky but it can be important. > >Are you sure that you and the other instructor are teaching the same >things (especially as to what will be on the exam?) Yes students do form >exam strategies. >- -- > ------------------ >John Kane >The Rideau Lakes, Ontario Canada > > > > >================================================================= >Instructions for joining and leaving this list and remarks about >the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ >================================================================= > >------------------------------ > >Date: Fri, 02 Nov 2001 07:31:07 +0000 >From: John Kane <[EMAIL PROTECTED]> >Subject: Re: They look different; are they really? > >Stan Brown wrote: > > > Jill Binker <[EMAIL PROTECTED]> wrote in sci.stat.edu: > > >Even assuming the test yields a good measure of how well the students >know > > >the material (which should be investigated, rather than assumed), it >isn't > > >telling you whether students have learned more from the class itself, > > >unless you assume all students started from the same place. > > > Good point! I was unconsciously making that very assumption, and I > > thank you for reminding me that it _is_ an assumption. > >I did assume that in my earlier post. Stupid! Albeit in the context of >my >old uni understandable. Just shows one cannot take anything for granted. > > > > > I had already decided to lead off with an assessment test the first > > day of class next time, for the students' benefit. > >Err, see below. Should anyone do this to me he/she might be in trouble. > > > (If they should > > be in a more or less advanced class, the sooner they know it the > > better for them.) But as you point out, that will benefit me too. > > The other instructor has developed a pre-assessment test over the > > past couple of years, and has offered to let me use it too, so we'll > > be able to establish comparable baselines. > >Can I suggest that this may or may not be a good idea? I once did some >data >analysis on a test for chemistry students. The unfortunate finding was >that >the Chemistry Profs who had constructed the test did not understand what >were the best predictors of success. Not published as far as I know. > >If you want a good test you need a good psychometrican. His/her stats >skills, >probably are indifferent (such as mine are) but what we do know is how to >measure people (en mass that is). And given the right people we can >analyze >what a student (worker) must do. It is often different from the ideal. Job >analysis is important even for students > >Give a call to the local Psych Dept. They always have a few grad students >wanting money and hopefuly a usable data base. Ask for an Indusriall or >I/O >grad. > >A home grown test without norms, reliabilyt , validty stats, etc. I can >see >lawyers (and myself if called as a witness- although I really don't have >the >qualifications) just salivating. > > > > > >As I gather is common in this field, the problem isn't statistics per >se, > > >but framing questions that can be answered by the kind of data you can >get. > >Err see above for the problem :) > > ------------------ >John Kane >The Rideau Lakes, Ontario Canada > > > > >================================================================= >Instructions for joining and leaving this list and remarks about >the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ >================================================================= > >------------------------------ > >Date: Fri, 02 Nov 2001 09:21:25 -0400 >From: "Robert J. MacG. Dawson" <[EMAIL PROTECTED]> >Subject: Re: Can I Use Wilcoxon Rank Sum Test for Correlated & Clustered >Data?? > >Chia C Chong wrote: > > > > I am a beginner in the statistical analysis and hypothesis. I have 2 > > variables (A and B) from an experiment that was observed for a certain > > period time. I need to form a statistical model that will model these >two > > variables. As an initial step, I plot the histograms of A & B separately >to > > see how the data were distributed. However, it seems that both A & B >can't > > be easily described by a simple statistical distributions like Gaussian, > > uniform etc via visualisation. Hence, I proceeded to plot the > > Quantile-Quantile plot (Q-Q plot) and trying to the fit both A and B >with > > some theoretical distributions (all distributions avaiable in Matlab!!). > > Again, none of the distributions seem can descibe then completely. Then >I > > was trying to perform the Wilcoxon Rank Sum test. From the data, it >seems > > that A & B might be correlated in some sense. > > If the data are (positively) correlated, do not use the >Wilcoxon-Mann-Whitney rank sum test; use the sign test on the >differences, which will usually be much more powerful in the presence of >significant correlation. > > If the two populations differ (roughly) only by translation, the >differences may well be (roughly) symmetrically distributed. Then you >may get more power yet by using the signed ranks test on the differences >(confusingly, this is also named for Wilcoxon). > >IN MINITAB: (data in C1, C2) > >let C3 = C1-C2 >wtest c3 > > >================================================================= >Instructions for joining and leaving this list and remarks about >the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ >================================================================= > >------------------------------ > >Date: Fri, 2 Nov 2001 14:04:22 +0100 >From: "StatSoft Benelux" <[EMAIL PROTECTED]> >Subject: Free Electronic Statistics Textbook > >StatSoft's free Electronic Statistical Textbook offers training in the >understanding and application of statistics. > >View the Textbook on www.statsoft.nl/textbook/stathome.html or download it >for free from: www.statsoft.nl/download.html#textbook. > >The material was developed at the StatSoft R&D department based on many >years of teaching undergraduate and graduate statistics courses and covers >a >wide variety of applications, including laboratory research (biomedical, >agricultural, etc.), business statistics and forecasting, social science >statistics and survey research, data mining, engineering and quality >control >applications, and many others. > >More information about StatSoft's products is available on www.statsoft.nl. > > >- -- >______________________________ >StatSoft Benelux >P.O. Box 6082 >9702 HB Groningen >The Netherlands >Phone +31-(0)50-526 7310 >Fax +31-(0)50-527 7665 >E-mail [EMAIL PROTECTED] >Web www.statsoft.nl >______________________________ > > > > > > >================================================================= >Instructions for joining and leaving this list and remarks about >the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ >================================================================= > >------------------------------ > >Date: Fri, 02 Nov 2001 07:42:35 +0000 >From: John Kane <[EMAIL PROTECTED]> >Subject: Re: They look different; are they really? > >Jon Miller wrote: > > > Stan Brown wrote: > > > > > You assume that it was my section that performed worse! (That's true, > > > but I carefully avoided saying so.) > > > > > > Section A (mine) meets at 8 am, Section B at 2 pm. Not only does the > > > time of day quite possibly have an effect, but since most people >prefer > > > not to have 8 am classes we can infer that it's likely many of the > > > students in Section A waited until relatively late to register, which >in > > > turns suggests they were less highly motivated > > > for the class. > > > > >I am not sure this is true, It is an emprical hypthisis but not to be >accpeted as gospel. > > > > > > The dean has suggested the same self-selection hypothesis you mention. > > > Another possible explanation, which I was unaware of when I posted, is > > > that the instructor for section B held a review session for the half > > > hour just before the exam. > >Well there goes the hypothis. > > > > > > > Which immediately leads also to the question of how much of the class >was > > teaching to the exam and how much was teaching the subject matter. > >Never been in an Ontario Gr 13 class? Most of the year was teaching to the >exam, not the subject matter. > > > > > > > However, I'm willing to suggest (without any evidence about _this >specific > > case_) that you gave the students too much freedom. > >I did not think that slavery was the purpose of education. > > > You assumed that they > > were adults, and didn't set up your lessons to force them to learn. I >am > > amazed by the number of students who think the purpose of school is to > > avoid learning anything. > > > > > > > > So no, I'm not jumping out of any windows. (I did hand out a lot of > > > referrals to the tutoring center.) Mostly I was curious about whether > > > the apparent difference was a real one (as Jerry Dallal has confirmed >it > > > is). But as you suggest, we may have two different populations here. > > > > This is a huge difference in test scores. But you know your students. >Do > > their test scores adequately reflect their knowledge? (This is probably >a > > better question to ask than whether the test scores are significantly > > different.) > >This within reason is very true. Test scores are useful but don't always >believe them. > > > Now, looking at your individual students, can you explain why > > they do or do not know the material? My guess is that some are > > unmotivated (can we still say lazy?), some have inadequate background, > > some have . . . > > > > I have always made it clear to my students that the grading scale is a > > guide and a guarantee for them: if they get 90%, they get an A. But I > > reserve the right to lower the scale so that, in theory at least, if I > > believe a 30% student is really an A student, then 30% becomes an A. > > After all, isn't that what "professional judgment" means: not slavishly > > following an arithmetic rule? > >No that is dishonest. If the student does not show his/her capability then >he/she does not get the mark. > >Anything else is fraud. > >- -- > ------------------ >John Kane >The Rideau Lakes, Ontario Canada > > > > >================================================================= >Instructions for joining and leaving this list and remarks about >the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ >================================================================= > >------------------------------ > >Date: 2 Nov 2001 06:55:58 -0800 >From: [EMAIL PROTECTED] (Chris R) >Subject: Re: p value > >[EMAIL PROTECTED] (dennis roberts) wrote > > > most software will compute p values (say for a typical two sample t test >of > > means) by taking the obtained t test statistic ... making it both + and >- > > ... finding the two end tail areas in the relevant t distribution ... >and > > report that as p > > > > for example ... what if we have output like: > > > > > > N Mean StDev SE Mean > > exp 20 30.80 5.20 1.2 > > cont 20 27.84 3.95 0.88 > > > > Difference = mu exp - mu cont > > Estimate for difference: 2.95 > > 95% CI for difference: (-0.01, 5.92) > > T-Test of difference = 0 (vs not =): T-Value = 2.02 P-Value = 0.051 DF >= 35 > > > > for 35 df ... minitab finds the areas beyond -2.20 and + 2.02 ... adds >them > > together .. and this value in the present case is .051 > > > > now, traditionally, we would retain the null with this p value ... and, >we > > generally say that the p value means ... this is the probability of > > obtaining a result (like we got) IF the null were true > > > > but, the result WE got was finding a mean difference in FAVOR of the exp > > group ... > > > > however, the p value does NOT mean that the probability of finding a > > difference IN FAVOR of the exp group ... if the null were true ... is >.051 > > ... right? since the p value has been calculated based on BOTH ends of >the > > t distribution ... it includes both extremes where the exp is better >than > > the control ... AND where the cont is better than the exp > > > > thus, would it be fair to say that ... it is NOT correct to say that the >p > > value (as traditionally calculated) represents the probability of >finding a > > result LIKE WE FOUND ... if the null were true? that p would be 1/2 of > > what is calculated > > > > this brings up another point ... in the above case ... typically we >would > > retain the null ... but, the p of finding the result LIKE WE DID ... if >the > > null were true ... is only 1/2 of .051 ... less than the alpha of .05 >that > > we have used > > > > thus ... what alpha are we really using when we do this? > > > > this is just a query about my continuing concern of what useful >information > > p values give us ... and, if the p value provides NO (given the results >we > > see) information as to the direction of the effect ... then, again ... >all > > it suggests to us (as p gets smaller) is that the null is more likely >not > > to be true ... > > > > given that it might not be true in either direction from the null ... >how > > is this really helping us when we are interested in the "treatment" >effect? > > > > [given that we have the direction of the results AND the p value ... > > nothing else] > > > >I fail to see the problem. >If the researcher has a priori expectations about the *direction* of the >effect, he should use a one-sided significance test. >That's what they are for, aren't they? > >Chris > > >================================================================= >Instructions for joining and leaving this list and remarks about >the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ >================================================================= > >------------------------------ > >Date: Fri, 2 Nov 2001 08:04:51 -0800 (PST) >From: Alfred Barron <[EMAIL PROTECTED]> >Subject: Conference: Deming Applied Statistics, NJ, Dec 10-13 > > ANNOUNCING... > > The 57th Annual Deming Conference > on Applied Statistics > Atlantic City, New Jersey > December 10-13, 2001 > > For details, registration costs, etc. see > > http://nimbus.ocis.temple.edu/~kghosh/deming01/ > > The Program will include: >================================================== > Regression Modeling Strategies > Professor Frank E. Harrell Jr. > University of Virginia > > • Modeling Variance and Covariance Structure > in Mixed Linear Models > Professor Ramon C. Littell > University of Florida > >1:00-4:00 > • Bayesian Computation and its Application > to Non-linear Classification and Regression > Professor Bani K. Mallick > Texas A&M University > > • Analysis of Covariance: Repeated Measures > and Some Other Interesting Applications > Professor George A. Milliken > > Statistical Methods for Clinical Trials > Mark X. Norleans, M.D., Ph.D. > The National Cancer Institute > > • Experiments: Planning, Analysis and Parameter > Design Optimization > Professor Jeff Wu > University of Michigan > >1:00-4:00 > • Sequential Clinical Trials: Design, > Monitoring & Analysis > Vlad Dragalin, PhD > GlaxoSmithKline > > • Multiple Comparisons for Making Decisions > Professor Jason C. Hsu > Ohio State University > > Simultaneous Monitoring and Adjustment > Professor J. Stuart Hunter > Princeton University > > • Applied Logistic Regression > Professor Stanley A. Lemeshow > Ohio State University > >1:00-4:00 > • Permutation Methods: A Distance > Function Approach > Professor Paul W. Mielke, Jr. > Colorado State University > > • Approaches to the Analysis of Microarray Data > and Related Issues > Profs Elisabetta Manduchi and Warren Ewens > University of Pennsylvania > > • Experimental Design and the Statistical Analysis > of Spotted Microarrays > Professor Kathleen Kerr > University of Washington > > • Challenges Posed by the Human Genome Project > Professor Warren Gish > Washington University in St. Louis > > Measurement Error in Nonlinear Models > Professor David Ruppert > Cornell University > > > > > > >__________________________________________________ >Do You Yahoo!? >Find a job, post your resume. >http://careers.yahoo.com > > >================================================================= >Instructions for joining and leaving this list and remarks about >the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ >================================================================= > >------------------------------ > >Date: Fri, 02 Nov 2001 12:27:45 -0400 >From: "Robert J. MacG. Dawson" <[EMAIL PROTECTED]> >Subject: Re: p value > >Chris R wrote: > > > > > If the researcher has a priori expectations about the *direction* of the > > effect, he should use a one-sided significance test. > > That's what they are for, aren't they? > > I think it depends on what you mean by "expectations". If an effect in >one direction can be absolutely ruled out _a_priori_ it certainly makes >sense. That's why the standard chi-square test is one-tailed. > > However, if the "expectation" is merely (say) a 75% subjective >probability, it would be most irresponsible to say "In the interest of a >slightly lower p-value I am prepared to take a 25% chance of throwing >away a valid result and never publishing it, even though I know it's >there and a two-tailed test would show it." > > If the researcher does a one-tailed test *without* fully and >unreservedly accepting this Faustian bargain, and is prepared to renege >if the effect in the less-expected direction turns up, then [s]he has >taken the first step on the road that leads to cheating at Solitaire. >That extra power has been paid for with counterfeit coin. > > Worse, if [s]he does a one-tailed test based on hopes, rather than >expectations, with the doing-away-with of an effect of an embarrassing >direction not merely a regrettable side-effect of the choice of test but >a bonus, then [s]he has fallen into grievous sin indeed. From here it is >but a short step to inventing data, painting "skin grafts" onto white >rats with an overhead marker, or suppressing a research paper at the >request of an industrial sponsor. > > Acceptance sampling, quality control, etc. are a whole different >ballgame; they are based on a well-defined risk/benefit tradeoff, not on >tring to show off low p-values to impress the viewers ("look what a >risky thing _I_ did... "). But one shouldn't be fooled by the formal >similarity between these procedures and the analysis of research data. > > > -Robert Dawson > > >================================================================= >Instructions for joining and leaving this list and remarks about >the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ >================================================================= > >------------------------------ > >Date: Fri, 02 Nov 2001 11:02:48 -0500 >From: Rich Ulrich <[EMAIL PROTECTED]> >Subject: Re: Testing for joint probability between 2 variables > >On Thu, 1 Nov 2001 22:28:18 -0000, "Chia C Chong" ><[EMAIL PROTECTED]> wrote: > >[ ... ] > > > In fact, what I was trying to say was, over the 5 seconds (TIME) >domains, I > > will measured 2 random variables i.e. DELAY and ANGLE. So, I would like >to > > test whether during the 5s, those angles and delays of the signal I >receievd > > are correlated or not. > > > > By the way, what do u mean "cumulative delay" and "cumulative angle"?? > >If you are observing someone moving in front of you, and the time >for each data point is reported as the duration from the start, >then you are looking at "cumulative time". > >That raises special concerns for statistical models and tests. >In particular, none of the statistical tests will be usable in >their simple forms if basic scores are cumulative. - Re-scoring >as differences *sometimes* will provide sufficient correction. > >If the person is moving slowly enough that the "angle" is >affected or determined by the angle at the previous recorded >measurement, then the angle is "cumulative", if we use that >word loosely. You might search for references about serial >correlation, or auto-correlation. "Serial correlation" has the >same effect of dis-allowing the simple version of statistical tests. > >- -- >Rich Ulrich, [EMAIL PROTECTED] >http://www.pitt.edu/~wpilib/index.html > > >================================================================= >Instructions for joining and leaving this list and remarks about >the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ >================================================================= > >------------------------------ > >Date: Sat, 3 Nov 2001 06:31:58 -0000 >From: "Leon Heller" <[EMAIL PROTECTED]> >Subject: Re: Can I Use Wilcoxon Rank Sum Test for Correlated & Clustered >Data?? > >The Spearman Test is a distribution-free test for correlation, not the >Wilcoxon. > >Leon >- -- >Leon Heller, G1HSM [EMAIL PROTECTED] >http://www.geocities.com/leon_heller >Low-cost Altera Flex design kit: http://www.leonheller.com > > > > > >================================================================= >Instructions for joining and leaving this list and remarks about >the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ >================================================================= > >------------------------------ > >End of edstat-digest V2000 #545 >******************************* > _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp ================================================================= Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =================================================================