RE: Analysis of covariance
On 27 Sep 2001, Paul R. Swank wrote: Some years ago I did a simulation on the pretest-posttest control group design lokking at three methods of analysis, ANCOVA, repeated measures ANOVA, and treatment by block factorial ANOVA (blocking on the pretest using a median split). I found that that with typical sample sizes, the repeated measures ANOVA was a bit more powerful than the ANCOVA procedure when the correlation between pretest and posttest was fairly high (say .90). As noted below, this is because the ANCOVA and ANOVA methods are approaching the same solution but ANCOVA loses a degree of freedom estimating the regression parameter when the ANOVA doesn't. Of course this effect diminshes as the sample size gets larger because the loss of one df is diminished. On the other hand, the treatment by block design tends to have a bit more power when the correlation between pretest and posttest is low ( .30). I tried to publish the results at the time but aimed a bit too high and received such a scathing review (what kind of idiot would do this kind of study?) that I shoved it a drawer and it has never seen the light of day since. I did the syudy because it seemed at the time that everyone was using this design but were unsure of the analysis and I thought a demonstration would be helpful. SO, to make a long story even longer, the ANCOVA seems to be most powerful in those circumstances one is likely to run into but does have somewhat rigid assumptions about homogeneity of regression slopes. Of course the repeated measures ANOVA indirectly makes the same assumption but at such high correlations, this is really a homogenity of variance issue as well. The second thought is for you reviewers out there trying to soothe your own egos by dumping on someone else's. Remember, the researcher you squelch today might be turned off to research and fail to solve a meaty problem tomorrow. Paul R. Swank, Ph.D. Professor Developmental Pediatrics UT Houston Health Science Center Paul's post reminded me of something I read in Keppel's Design and Analysis. Here's an excerpt from my notes on ANCOVA: Keppel (1982, p. 512) says: If the choice is between blocking and the analysis of covariance, Feldt (1958) has shown that blocking is more precise when the correlation between the covariate and the dependent variable is less than .4, while the analysis of covariance is more precise with correlations greater than .6. Since we rarely obtain correlations of this latter magnitude in the behavioral sciences, we will not find a unique advantage in the analysis of covariance in most research applications. Keppel (1982, p. 513) also prefers the Treatments X Blocks design to ANCOVA on the grounds that the underlying assumptions are less stringent: Both within-subjects designs and analyses of covariance require a number of specialized statistical assumptions. With the former, homogeneity of between treatment differences and the absence of differential carryover effects are assumptions that are critical for an unambiguous interpretation of the results of an experiment. With the latter, the most stringent is the assumption of homogeneous within-group regression coefficients. Both the analysis of covariance and the analysis of within-subjects designs are sensitive only to the linear relationship between X and Y, in the first case, and between pairs of treatment conditions in the second case. In contrast, the Treatments X Blocks design is sensitive to any type of relationship between treatments and blocks--not just linear. As Winer puts it, the Treatments X Blocks design is a function-free regression scheme (1971, p. 754). This is a major advantage of the Treatments X Blocks design. In short, the Treatments X Blocks design does not have restrictive assumptions and, for this reason, is to be preferred for its relative freedom from statistical assumptions underlying the data analysis. -- Bruce Weaver E-mail: [EMAIL PROTECTED] Homepage: http://www.angelfire.com/wv/bwhomedir/ = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Introducing inference using the binomial (was: Student's t vs. z
On 19 Apr 2001, Paul Swank wrote: I agree. I normally start inference by using the binomial and then then the normal approximation to the binomial for large n. It might be best to begin all graduate students with nonparametric statistics followed by linear models. Then we could get them to where they can do something interesting without taking four courses. At 01:28 PM 4/19/01 -0500, you wrote: Why not introduce hypothesis testing in a binomial setting where there are no nuisance parameters and p-values, power, alpha, beta,... may be obtained easily and exactly from the Binomial distribution? Jon Cryer I concur with Jon and Paul. (I'll refrain from making a crack about Ringo.) When I was an undergrad, the approach was z-test, t-test, ANOVA, simple linear regression, and if there was time, a bit on tests for categorical data (chi-squares) and rank-based tests. I got great marks, but came away with very little understanding of the logic of hypothesis testing. The stats class in 1st year grad school (psychology again) was different, and it was there that I first started to feel like I was achieving some understanding. The first major chunk of the course was all about simple rules of probability, and how we could use them to generate discrete distributions, like the binomial. Then, with a good understanding of where the numbers came from, and with some understanding of conditional probability etc, we went on to hypothesis testing in that context. One thing I found particularly beneficial was that we started with the case where the sampling distribution could be specified under both the null and alternative hypotheses. This allowed us to calculate the likelihood ratio, and to use a decision rule to minimize the overall probability of error. We could also talk about alpha, beta, and power in this simple context. Then we moved on to the more common case where the distribution cannot be specified under the alternative hypothesis, and came up with a different decision rule--i.e., one that controlled the level of alpha. The other thing I found useful was that all of this was without reference to any of the standard statistical tests--although we found out that the sign test was the same thing when we did get to our first test with a proper name. We followed that with the Wilcoxon signed ranks test and Mann-Whitney U before ever getting to z- and t-tests. By the time we got to these, we already had a good understanding of the logic: Calculate a statistic, and see where it lies in its sampling distribution under a true null hypothesis. An undergrad text that takes a similar approach (in terms of order of topics) is Understanding Statistics in the Behavioral Sciences, by Robert R. Pagano. Not only is the ordering of topics good, but the explanations are generally quite clear. I would certainly use Pagano's book again (and supplement certain sections with my own notes) for a psych-stats class. -- Bruce Weaver New e-mail: [EMAIL PROTECTED] (formerly [EMAIL PROTECTED]) Homepage: http://www.angelfire.com/wv/bwhomedir/ = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: The meaning of the p value
On 30 Jan 2001, Will Hopkins wrote: -- 8 --- I haven't followed this thread closely, but I would like to state the only valid and useful interpretation of the p value that I know. If you observe a positive effect, then p/2 is the probability that the true value of the effect is negative. Equivalently, 1-p/2 is the probability that the true value is positive. The probability that the null hypothesis is true is exactly 0. The probability that it is false is exactly 1. Suppose you were conducting a test with someone who claimed to have ESP, such that they were able to predict accurately which card would be turned up next from a well-shuffled deck of cards. The null hypothesis, I think, would be that the person does not have ESP. Is this null false? And what about when one has a one-tailed alternative hypothesis, e.g., mu 100. In this case, the null covers a whole range of values (mu or = 100). Is this null false? In such a case, one still uses the point null (mu = 100) for testing, because it is the most extreme case. If you can reject the point null of mu=100, you will certainly be able to reject the null if mu is actually some value less than 100. But the point is, the null can be true. With a two-tailed alternative, the point null may not be true, but as one of the regulars in these newsgroups often points out, we don't know the direction of the difference. So again, it makes sense to use the point null for testing purposes. Estimation is the name of the game. Hypothesis testing belongs in another century--the 20th. Unless, that is, you base hypotheses not on the null effect but on trivial effects... Bob Frick has a paper with some interesting comments on this in the context of experimental psychology. In that context, he argues, models that make "ordinal" predictions are more useful than ones that try to estimate effect sizes, and certainly more generalizable. (An ordinal prediction is something like performance will be impaired in condtion B relative to condition A. Impairment might be indicated by slower responding and more errors, for example.) A lot of cognitive psychologists use reaction time as their primary DV. But note that they are NOT primarily interested in explaining all (or as much as they can) of the variation in reaction time. RT is just a tool they use to make inferences about some underlying construct that really interests them. Usually, they are trying to test some theory which leads them to expect slower responding in one condition relative to another, for example--such as slower responding when distractors are present compared to when only a target item appears. The difference between these conditions almost certainly will explain next to none of the overall variation in RT, so eta-squared and omega-squared measures will not be very impressive looking. But that's fine, because the whole point is to test the ordinal prediction of the theory--not to explain all of the variation in RT. If one was able to measure the underlying construct directly, THEN it might make some sense to try estimating parameters. But with indirect measurements like RT, I think Frick's recommended approach is a better one. There's my two cents. -- Bruce Weaver New e-mail: [EMAIL PROTECTED] (formerly [EMAIL PROTECTED]) Homepage: http://www.angelfire.com/wv/bwhomedir/ = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Odd description of LSD approach to multiple comparisons
On 18 Oct 2000, Karl L. Wuensch wrote: I suggest that we not use the phrase "LSD" to describe the "protected t test," or "Fisher's procedure" (the procedure that requires having first obtained a significant omnibus ANOVA effect). After all, one can compute a "least significant difference" (between means to be "significant" at an adjusted criterion of significance) for any of the paranoid alpha-adjustment procedures: Fisher's, Bonferroni, Tukey a or b, Newman-Keuls, REGWQ, etc. You are absolutely right, Karl. But we can't revise all of the textbooks that are already out there. When our students pull books off the shelf in the library, they are going to find references to the "LSD" method of multiple comparisons. And MOST of the time, this will be referring to Fisher's protected t. The Kleinbaum et al book is the first I've seen where it does not. Cheers, Bruce = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: I need help!!! SPSS and Panel Data
On Sun, 2 Jul 2000 [EMAIL PROTECTED] wrote: Help! I'm a Norwegian student who can't figure out how to work SPSS 9.0 properly for running a multiple regression on panel data (longitudinal data or cross-sectional time-series data). My data set consist of financial data from about 300 Norw. municipalities. For each municipality I have observations for 7 fiscal years. My problem is that I don't know how to "tell" SPSS that the cases are grouped 7 by 7, i.e that they are panel data. Can somebody please help me! Ketil Pedersen Hi Ketil, I'm not familiar with time series terminology, but if I followed you, you have a data file that looks something like this: MUNICIP YEAR Y 1 1 1 2 1 3 etc 1 7 2 1 2 2 2 3 etc 2 7 3 1 3 1 etc 3 7 etc I think you may have one or more "between-groups" variables too, but wasn't sure about this. Anyway, if this is more or less accurate, then I think you would find it easier to use UNIANOVA rather than REGRESSION. In the pulldown menus, you find it under GLM--Univariate, I think. Here's an example of some syntax for the data shown above with SIZE included as a between-municipalities variable: UNIANOVA y BY municip year size /RANDOM = municip /METHOD = SSTYPE(3) /INTERCEPT = INCLUDE /EMMEANS = TABLES(year) /EMMEANS = TABLES(size) /EMMEANS = TABLES(year*size) /CRITERIA = ALPHA(.05) /print = etasq /plot = resid /DESIGN = size municip(size) year year*size . Note that municip is a random factor here (i.e., it is treated the same way Subjects are usually treated). And the notation "municip(size)" indicates that municip is nested in the size groups. The output from this syntax will give you an F-test for size with municip(size) as the error term; and for the year and year*size F-tests, the error term (called "residual") will be Year*municip(size), because that's all that is left over. You can get the same F-tests using REGRESSION, but not as easily. For one thing, you have to compute your own dummy variables for MUNICIP and YEAR; and if you have a mixed design (between- and within-municipalities variables), you pretty much have to do two separate analyses, as far as I can tell. Hope this helps. -- Bruce Weaver [EMAIL PROTECTED] http://www.angelfire.com/wv/bwhomedir/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Repeated Measures ANOVA
On Tue, 13 Jun 2000 [EMAIL PROTECTED] wrote: Hi. I have conducted an experiment with 4 within subject variables. 1) Colour 2) Shape 3) Pattern 4) Movement Each of these 4 factors have 2 levels so each subject would be exposed to 16 conditions in total. However, I have made each subject do 10 replications per condition and I have 10 subjects so I have a total of 1600 data points. I have tried using SPSS repeated measures in GLM to analyse my data but I don't know how to include my replications. SPSS requires that I select 16 columns of dependant variables each representing a combination of my factors. However, I am only allowed one row per subject, so how do I input the 10 replications that each subject performed for each combination? Thanks ! Alfred Hi Alfred, You might be better off using UNIANOVA for this analysis instead of GLM. For example, here's the GLM syntax for a mixed-design (A and B as between subjects variables; C and D within-subjects): GLM c1d1 c1d2 c2d1 c2d2 c3d1 c3d2 BY a b /WSFACTOR = c 3 Polynomial d 2 Polynomial /METHOD = SSTYPE(3) /CRITERIA = ALPHA(.05) /WSDESIGN = c d c*d /DESIGN = a b a*b . This analysis required the 6 repeated meaures (3*2) to be strung out across one row for each subject. But I was able to produce exactly the same results using 6 rows per subject (one for each of the c*d combinations) and the following syntax: UNIANOVA y BY subj a b c d /RANDOM = subj /METHOD = SSTYPE(3) /INTERCEPT = INCLUDE /EMMEANS = TABLES(OVERALL) /EMMEANS = TABLES(a) /EMMEANS = TABLES(b) /EMMEANS = TABLES(c) /EMMEANS = TABLES(d) /CRITERIA = ALPHA(.05) /DESIGN = a b a*b subj(a*b) c c*a c*b c*a*b c*subj(a*b) d d*a d*b d*a*b d*subj(a*b) c*d c*d*a c*d*b c*d*a*b c*d*subj(a*b). Note that SUBJ is now listed explicitly as one of the variables. And you must explicitly list each of the error terms for within-subjects effects. If you do not list these error terms, a pooled error term is used for tests of the within-subjects effects. Finally, note as well that SUBJ appears on the /Random line; and the nesting of subjects within a*b cells is indicated as subj(a*b). I haven't tried this with a completely within-subjects design. But if you let y=DV a=colour b=shape c=pattern d=movement e = repetition (as suggested by Donald Burril), your syntax should look something like this, I think: UNIANOVA y BY subj a b c d e /RANDOM = subj e /METHOD = SSTYPE(3) /INTERCEPT = INCLUDE /EMMEANS = TABLES(a) /EMMEANS = TABLES(b) /EMMEANS = TABLES(c) /EMMEANS = TABLES(d) /EMMEANS = TABLES(d) /CRITERIA = ALPHA(.05) /DESIGN = a a*subj b b*subj c c*subj d d*subj e e*subj a*b a*b*subj a*c a*c*subj etc... a*b*c*d*e a*b*c*d*e*subj . Your data file would have 2*2*2*2*10 = 160 rows per subject with variables that code for a-e and another for the DV. Hope this helps. Cheers, Bruce -- Bruce Weaver [EMAIL PROTECTED] http://www.angelfire.com/wv/bwhomedir/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: SPSS GLM - between * within factor interactions
On Tue, 9 May 2000, Johannes Hartig wrote: I have tried modifying the syntax, but I'm not getting any further. The within- and between-subject effects are defined seperately in /WSDESIGN and /DESIGN, and mixing them only gives me cryptic error messages. Could it be possible to customize within * between interactions with /LMATRIX or /KMATRIX? I am checking already the syntax guide, but no success so far :( Thanks for any advice, Johannes How about generating your own dummy variables for the various main effects and interactions of interest (including dummy variables for subject), and using REGRESSION instead of GLM repeated measures? You can use the /TEST subcommand to compare the full model to various reduced models to produce tests for the main effects and interactions of interest. For a between-within design, subject will be nested in the between subjects variables, so I think you'll have to enter those between subjects variables on one step, and the dummy variables for subject on the next step. (If you enter the dummy variables for subject first, you won't be able to enter the between Ss variables, because they'll provide no further information. It would be like entering codes for City, and then trying to enter codes for country: Once you know the city, you already know country.) Good luck. Bruce === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: SPSS GLM - between * within factor interactions
On Mon, 8 May 2000, Johannes Hartig wrote: Click on the Model box in the pull-down menu. The default model is the full-factorial, but you can opt for other custom models with only the effects you are interested in. Thnks for your answer, but - I can't! - or am I missing something obviuos? I only can customize within- and between-factor effects seperately, _not_ interactions between both. WHY? Johannes Sorry Johannes, I didn't know that. I wonder if this is a peculiarity of using the GUI. Have you tried pasting the syntax, and then modifying it to include only the interactions of interest? It probably won't work that way either, but it's worth a try. Bruce === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
On 15 Apr 2000, Donald F. Burrill wrote: (2) My second objection is that if the positive-discrete probability is retained for the value "0" (or whatever value the former "no" is held to represent), the distribution of the observed quantity cannot be one of the standard distributions. (In particular, it is not normal.) One then has no basis for asserting the probability of error in rejecting the null hypothesis (at least, not by invoking the standard distributions, as computers do, or the standard tables, as humans do when they aren't relying on computers). Presumably one could derive the sampling distribution in enough detail to handle simple problems, but that still looks like a lot more work than one can imagine most investigators -- psychologists, say -- cheerfully undertaking. This would not be a problem if the alternative was one-tailed, would it? Sorry, Bruce, I do not see your point. How does 1-tailed vs. 2-tailed make a difference in whatever the underlying probability distribution is? Donald, It was clear at the time, but now I'm not sure if I can see my point either! I think what I was driving at was the idea that a point null hypothesis is often false a priori. But if you have a one-tailed alternative, then you don't have a point null, because the null encompasses a whole range of values. For example, if your alternative is that a treatment improves performance, then the null states that performance remains the same or worsens as a result of the treatment. It seems that this kind of null hypothesis certainly can be true. And I think it is perfectly legitimate to use the appropriate continuous distribution (e.g., t-distribution) in carrying out a test. Or am I missing something? Cheers, Bruce === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Nonpar Repeated Measures
On Thu, 13 Apr 2000, Rich Ulrich wrote: On Thu, 13 Apr 2000 11:53:05 GMT, Chuck Cleland [EMAIL PROTECTED] wrote: I have an ordinal response variable measured at four different times as well as a 3 level between subjects factor. I looked at the time main effect with the Friedman Two-Way Analysis of Variance by Ranks. That effect was statistically significant and was followed up by single df comparisons of time one with each of the three other time points (Siegel and Castellan, 1988, pp. 181-183). I would like bring in the between subjects factor now as I expect an interaction between this factor and the time effect. Could anyone suggest ways of doing this with the ordinal (0 to 3) response variable? I have already looked at the simple main effect of time within each group with the Friedman test, but I would like to test the interaction. An "ordinal (0 to 3) response variable" has to give you a WHOLE lot of ties. (As I have posted before,) For simple analyses, forcing the rank-transformation is morely to do harm than good when you start with just a few ordinal categories. Using the scores of 0-3 or using some other rational scoring, you can probably be quite safe in doing the two-way ANOVA -- safer, I suspect, than anything you can do with ranking as the first step. Good point Rich. I didn't think about ties. If the ordinal data are generated by having people rank order objects, you could avoid completely ties by simply disallowing tied ranks. But in the situation Chuck described (time as the repeated measure), there may well be a LOT of ties, as you say. Cheers, Bruce -- Bruce Weaver [EMAIL PROTECTED] http://www.angelfire.com/wv/bwhomedir/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
On 11 Apr 2000, Donald F. Burrill wrote: On Mon, 10 Apr 2000, Bruce Weaver wrote in part, quoting Bob Frick: -- 8 --- start quote To put this argument another way, suppose the question is whether one variable influences another. This is a discrete probability space with only two answers: yes or no. Therefore, it is natural that both answers receive a nonzero probability. It may be (or seem) "natural"; that doesn't mean that it's so, especially in view of the subsequent refinement: Now suppose the question is changed into one concerning the size of the effect. This creates a continuous probability space, with the possible answer being any of an infinite number of real numbers and each one of these real numbers receiving an essentially zero probability. A natural tendency is to include 0 in this continuous probability space and assign it an essentially zero probability. However, the "no" answer, which corresponds to a size of zero, does not change probability just because the question is phrased differently. Therefore, it still has its nonzero probability; only the nonzero probability of the "yes" answer is spread over the real numbers. end quote To this I have two objections: (1) It is not clear that the "no" answer "does not change probability ...", as Bob puts it. If the question is one that makes sense in a continuous probability space, it is entirely possible (and indeed more usual than not, one would expect) that constraining it to a two-value discrete situation ("yes" vs. "no") may have entailed condensing a range of what one might call "small" values onto the answer "no". That is, the question may already, and perhaps unconsciously, have been "coarsened" to permit the discrete expression of the question with which Bob started. I see your point. But one of the examples Frick gives concerns the existence of ESP. In the discrete space, it does or does not exist. For this particular example, I think one could justify using a 1-tailed test when moving to the continous space; and so the null hypothesis would encompass "less than or equal to 0", and the alternative "greater than 0". It seems to me that with a one-tailed alternative like this, the null hypothesis can certainly be true. (2) My second objection is that if the positive-discrete probability is retained for the value "0" (or whatever value the former "no" is held to represent), the distribution of the observed quantity cannot be one of the standard distributions. (In particular, it is not normal.) One then has no basis for asserting the probability of error in rejecting the null hypothesis (at least, not by invoking the standard distributions, as computers do, or the standard tables, as humans do when they aren't relying on computers). Presumably one could derive the sampling distribution in enough detail to handle simple problems, but that still looks like a lot more work than one can imagine most investigators -- psychologists, say -- cheerfully undertaking. This would not be a problem if the alternative was one-tailed, would it? Cheers, Bruce -- Bruce Weaver [EMAIL PROTECTED] http://www.angelfire.com/wv/bwhomedir/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
On 7 Apr 2000, dennis roberts wrote: i was not suggesting taking away from our arsenal of tricks ... but, since i was one of those old guys too ... i am wondering if we were mostly lead astray ...? the more i work with statistical methods, the less i see any meaningful (at the level of dominance that we see it) applications of hypothesis testing ... here is a typical problem ... and we teach students this! 1. we design a new treatment 2. we do an experiment 3. our null hypothesis is that both 'methods', new and old, produce the same results 4. we WANT to reject the null (especially if OUR method is better!) 5. we DO a two sample t test (our t was 2.98 with 60 df) and reject the null ... and in our favor! 6. what has this told us? if this is ALL you do ... what it has told you AT BEST is that ... the methods probably are not the same ... but, is that the question of interest to us? no ... the real question is: how much difference is there in the two methods? -- 8 --- In one of his papers, Bob Frick has argues very persuasively that very often (in experimental psychology, at least), this is NOT the real question at all. I think that is especially the case when you are testing theories. Suppose, for example that my theory of selective attention posits that inhibition of the internal representations of distracting items is an important mechanism of selection. This idea has been testing in so-called "negative priming" experiments. (Negative priming refers to the fact that subjects respond more slowly to an item that was previously ignored, or is semantically related to a previously ignored item, than they do to a novel item.) Negative priming is measured as a response time difference between 2 conditions in an experiment. The difference is typically between about 20 and 40 milliseconds. I think the important thing to remember about this is that the researcher is not trying to account for variability in response time per se, even though response time is the dependent variable: He or she is just using response time to indirectly measure the object of real interest. If one was trying to account for overall variability in response time, the conditions of this experiment would almost certainly not make the list of important variables. The researcher KNOWS that a lot of other things affect response time, and some of them a LOT more than his experimental conditions do. However, because one is interested in testing a theory of selective attention, this small difference between conditions is VERY important, provided it is statistically significant (and there is sufficient power); and measures of effect size are not all that relevant. Just my 2 cents. -- Bruce Weaver [EMAIL PROTECTED] http://www.angelfire.com/wv/bwhomedir/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Combining 2x2 tables
On Thu, 30 Mar 2000, JohnPeters wrote: Hi, I was wondering if someone could help me. I am interested in combining 2x2 tables from multiple studies. The test used is the McNemar's chi-sq. I have the raw data from each of these studies. What is the proper correction that should be used when combining the results. Thanks!!! Meta-analysis is a common way to combine information from 2x2 tables, but I'm not sure how you would do this with McNemar's chi-square as your measure of "effect size" for each table. It might be possible if you are willing to use something else. It's Friday afternoon, and this is off the top of my head, but here goes anyway. I wonder if you could write the tables this way: Change Yes No -ab Before +cd Cell a: change from - to + Cell b: no change, - before and after Cell c: change from + to - Cell d: no change, + before and after Suppose we're talking about change in opinion after hearing a political speech. The odds ratio for this table would give you the odds of changing from a negative to a positive oppion over the odds of changing from positive to negative. If you're the speaker, you're hoping for an odds ratio greater than 1 (i.e., greater change in those who were negative before the speech). If the amount of change is similar in both groups, the odds ratio will be about 1. If this is a legitimate way to analyze the data for one such table, and I can't see why not, then you could pool the tables meta-analytically with ln(OR) as your measure of effect size. Here's a paper that describes how to go about it: Fleiss, JL. (1993). The statistical basis of meta-analysis. Statistical Methods in Medical Research, 2, 121-145. There are also free programs available for performing this kind of meta-analysis. I have links to some in the statistics section of my homepage. Hope this helps. Bruce -- Bruce Weaver [EMAIL PROTECTED] http://www.angelfire.com/wv/bwhomedir/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Normality parametric tests (WAS: Kruskal-Wallis equal va
On Fri, 24 Mar 2000, Bernard Higgins wrote: Hi Bruce Hello Bernard. The point I was making is that when developing hypothesis tests, from a theoretical point of view, the sampling distribution of the test statistic from which critical values or p-values etc are obtained, is determined by the null hypothesis. We need a probability model to enable use to determine how likely observed patterns are. These probability models will often work well in practice even if we relax the usual assumptions. When using distribution-free tests as an alternative to a parametric test we may need to specify restrictions in order that the tests can be considered "equivalent". Agreed. In my view the t-test is fairly robust and will work well in most situations where the distribution is not too skewed, and constant variance is reasonable. Indeed I have no problems in using it for the majority of problems. When comparing two independent samples using t-tests, lack of normality and constant variance are often not too serious if the samples are of similar size, always a good idea in planned experiments. Agreed here too. As you say, when samples are fairly large, some say 30+ or even less, the sampling distribution of the mean can often be approximated by a normal distribution (Central Limit Theorem) and hence the use of an (asymptotic) Z-test is frequently used. It would not, I think, be strictly correct to call such a statistic t, although from a practical point of view there may be little difference. The formal definition of the single sample t-test is derived from the ratio of a Standard Normal random variable to a Chi-squared random variable and does, in theory, require independent observations from a normal distribution. I think we are no longer in complete agreement here. I am not a mathematician, but for what it's worth, here is my understanding of t- and z-tests: numerator = (statistic - parameter|H0) denominator = SE(statistic) test statistic = z if SE(statistic) is based on pop. SD test statistic = t if SE(statistic) is based on sample SD The most common 'statistics' in the numerator are Xbar and (Xbar1 - Xbar2); but others are certainly possible (e.g., for large-sample versions of rank-based tests). An assumption of both tests is that the statistic in the numerator has a sampling distribution that is normal. This is where the CLT comes into play: It lays out the conditions under which the sampling distribution of the statistic is approximately normal--and those conditions can vary depending on what statistic you're talking about. But having a normal sampling distribution does not mean that we can or should use a critical z-value rather than a critical t when the population variance is unknown (which is what I thought you were suggesting). As you say, one can substitute critical z for critical t when n gets larger, because the differences become negligible. But nowadays, most of us are using computer programs that give us more or less exact p-values anyway, so this is less of an issue than it once was. Cheers, Bruce -- Bruce Weaver [EMAIL PROTECTED] http://www.angelfire.com/wv/bwhomedir/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Multiple Comparison Correction in Multiple Regression
On Fri, 17 Mar 2000, Rich Ulrich wrote: -- 8 --- 2) When performing a multiple linear regression we have performed partial f-tests with the sequential SS (Type I SS) to examine if a particular variable "should be added" to a simpler model. If a series of these tests are used to find a parsimonious model that still fits should we correct for multiple comparisons? "Stepwise inclusion" is usually a bad idea. See the comments in my stats-FAQ, and their references. (If you are worried about correcting for multiple tests, then you probably *shouldn't* add the variable because it is probably capitallizing on chance.) Rich, Is there not an important distinction to be made between the following situations: 1. A computer algorithm determines (based on the magnitude of partial or semi-partial correlations) the order in which variables are entered or removed, and which ones end up in the final model 2. The investigator determines a priori the order in which variables are to be entered or removed. Some of my texbooks refer to situation 1 as "stepwise" regression and situation 2 as "hierarchical" regression. One is less likely to capitalize on chance with hierarchical regression, I think, especially if the decisions about order are theoretically motivated, and the number of variables is not too large. Here's another observation that is relevant to this thread, I think. When one performs a 2-factor ANOVA, there are 3 independent F-tests: one for each main effect, and one for the intereaction. One can arrive at these same F-tests using the same regression model comparison approach that is described above (e.g., compare the FULL regression model to one without the AxB interaction to get F for the interaction term). I don't think I have EVER seen anyone correct for multiple comparisons in this case. Cheers, Bruce -- Bruce Weaver [EMAIL PROTECTED] http://www.angelfire.com/wv/bwhomedir/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: ANOVA causal direction
On 10 Feb 2000, Richard M. Barton wrote: --- Alex Yu wrote: A statistical procedure alone cannot determine casual relationships. --- Correct. A lot depends on eye contact. rb And also, at least 2 statistical procedures are required... === This list is open to everyone. Occasionally, people lacking respect for other members of the list send messages that are inappropriate or unrelated to the list's discussion topics. Please just delete the offensive email. For information concerning the list, please see the following web page: http://jse.stat.ncsu.edu/ ===