RE: Analysis of covariance
On 27 Sep 2001, Paul R. Swank wrote: > Some years ago I did a simulation on the pretest-posttest control group > design lokking at three methods of analysis, ANCOVA, repeated measures > ANOVA, and treatment by block factorial ANOVA (blocking on the pretest using > a median split). I found that that with typical sample sizes, the repeated > measures ANOVA was a bit more powerful than the ANCOVA procedure when the > correlation between pretest and posttest was fairly high (say .90). As noted > below, this is because the ANCOVA and ANOVA methods are approaching the same > solution but ANCOVA loses a degree of freedom estimating the regression > parameter when the ANOVA doesn't. Of course this effect diminshes as the > sample size gets larger because the loss of one df is diminished. On the > other hand, the treatment by block design tends to have a bit more power > when the correlation between pretest and posttest is low (< .30). I tried to > publish the results at the time but aimed a bit too high and received such a > scathing review (what kind of idiot would do this kind of study?) that I > shoved it a drawer and it has never seen the light of day since. I did the > syudy because it seemed at the time that everyone was using this design but > were unsure of the analysis and I thought a demonstration would be helpful. > SO, to make a long story even longer, the ANCOVA seems to be most powerful > in those circumstances one is likely to run into but does have somewhat > rigid assumptions about homogeneity of regression slopes. Of course the > repeated measures ANOVA indirectly makes the same assumption but at such > high correlations, this is really a homogenity of variance issue as well. > The second thought is for you reviewers out there trying to soothe your own > egos by dumping on someone else's. Remember, the researcher you squelch > today might be turned off to research and fail to solve a meaty problem > tomorrow. > > Paul R. Swank, Ph.D. > Professor > Developmental Pediatrics > UT Houston Health Science Center > Paul's post reminded me of something I read in Keppel's Design and Analysis. Here's an excerpt from my notes on ANCOVA: Keppel (1982, p. 512) says: If the choice is between blocking and the analysis of covariance, Feldt (1958) has shown that blocking is more precise when the correlation between the covariate and the dependent variable is less than .4, while the analysis of covariance is more precise with correlations greater than .6. Since we rarely obtain correlations of this latter magnitude in the behavioral sciences, we will not find a unique advantage in the analysis of covariance in most research applications. Keppel (1982, p. 513) also prefers the Treatments X Blocks design to ANCOVA on the grounds that the underlying assumptions are less stringent: Both within-subjects designs and analyses of covariance require a number of specialized statistical assumptions. With the former, homogeneity of between treatment differences and the absence of differential carryover effects are assumptions that are critical for an unambiguous interpretation of the results of an experiment. With the latter, the most stringent is the assumption of homogeneous within-group regression coefficients. Both the analysis of covariance and the analysis of within-subjects designs are sensitive only to the linear relationship between X and Y, in the first case, and between pairs of treatment conditions in the second case. In contrast, the Treatments X Blocks design is sensitive to any type of relationship between treatments and blocks--not just linear. As Winer puts it, the Treatments X Blocks design "is a function-free regression scheme" (1971, p. 754). This is a major advantage of the Treatments X Blocks design. In short, the Treatments X Blocks design does not have restrictive assumptions and, for this reason, is to be preferred for its relative freedom from statistical assumptions underlying the data analysis. -- Bruce Weaver E-mail: [EMAIL PROTECTED] Homepage: http://www.angelfire.com/wv/bwhomedir/ = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: The meaning of the p value
On 30 Jan 2001, Will Hopkins wrote: -- >8 --- > I haven't followed this thread closely, but I would like to state the > only valid and useful interpretation of the p value that I know. If > you observe a positive effect, then p/2 is the probability that the > true value of the effect is negative. Equivalently, 1-p/2 is the > probability that the true value is positive. > > The probability that the null hypothesis is true is exactly 0. The > probability that it is false is exactly 1. Suppose you were conducting a test with someone who claimed to have ESP, such that they were able to predict accurately which card would be turned up next from a well-shuffled deck of cards. The null hypothesis, I think, would be that the person does not have ESP. Is this null false? And what about when one has a one-tailed alternative hypothesis, e.g., mu > 100. In this case, the null covers a whole range of values (mu < or = 100). Is this null false? In such a case, one still uses the point null (mu = 100) for testing, because it is the most extreme case. If you can reject the point null of mu=100, you will certainly be able to reject the null if mu is actually some value less than 100. But the point is, the null can be true. With a two-tailed alternative, the point null may not be true, but as one of the regulars in these newsgroups often points out, we don't know the direction of the difference. So again, it makes sense to use the point null for testing purposes. > Estimation is the name of the game. Hypothesis testing belongs in > another century--the 20th. Unless, that is, you base hypotheses not > on the null effect but on trivial effects... Bob Frick has a paper with some interesting comments on this in the context of experimental psychology. In that context, he argues, models that make "ordinal" predictions are more useful than ones that try to estimate effect sizes, and certainly more generalizable. (An ordinal prediction is something like performance will be impaired in condtion B relative to condition A. Impairment might be indicated by slower responding and more errors, for example.) A lot of cognitive psychologists use reaction time as their primary DV. But note that they are NOT primarily interested in explaining all (or as much as they can) of the variation in reaction time. RT is just a tool they use to make inferences about some underlying construct that really interests them. Usually, they are trying to test some theory which leads them to expect slower responding in one condition relative to another, for example--such as slower responding when distractors are present compared to when only a target item appears. The difference between these conditions almost certainly will explain next to none of the overall variation in RT, so eta-squared and omega-squared measures will not be very impressive looking. But that's fine, because the whole point is to test the ordinal prediction of the theory--not to explain all of the variation in RT. If one was able to measure the underlying construct directly, THEN it might make some sense to try estimating parameters. But with indirect measurements like RT, I think Frick's recommended approach is a better one. There's my two cents. -- Bruce Weaver New e-mail: [EMAIL PROTECTED] (formerly [EMAIL PROTECTED]) Homepage: http://www.angelfire.com/wv/bwhomedir/ = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Normality assumption for ANOVA (was: Effect statistics for non-normality)
s the CLT no longer apply because I've added a 3rd population? I think not. Given large enough samples (and similarly shaped populations with more or less equal variances), the F-statistic I calucate can still be referred to the appropriate F-distribution, I should think. By the way, other good examples are the large sample z-test versions of various non-parametric tests (e.g., Mann-Whitney U). The important thing for those tests is that the sampling distrubution of the statistic (e.g., the sampling distribution of U) is normal when the numbers are large enough. I don't recall ever seeing anyone claim that the underlying raw-score populations had to be normal. Oops! This rant ended up being a bit longer than I anticipated. Looking forward to the comments of others. Cheers, -- Bruce Weaver New e-mail: [EMAIL PROTECTED] (formerly [EMAIL PROTECTED]) Homepage: http://www.angelfire.com/wv/bwhomedir/ = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Odd description of LSD approach to multiple comparisons
On 18 Oct 2000, Karl L. Wuensch wrote: > I suggest that we not use the phrase "LSD" to describe the "protected t > test," or "Fisher's procedure" (the procedure that requires having first > obtained a significant omnibus ANOVA effect). After all, one can compute a > "least significant difference" (between means to be "significant" at an > adjusted criterion of significance) for any of the paranoid alpha-adjustment > procedures: Fisher's, Bonferroni, Tukey a or b, Newman-Keuls, REGWQ, etc. You are absolutely right, Karl. But we can't revise all of the textbooks that are already out there. When our students pull books off the shelf in the library, they are going to find references to the "LSD" method of multiple comparisons. And MOST of the time, this will be referring to Fisher's protected t. The Kleinbaum et al book is the first I've seen where it does not. Cheers, Bruce = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Proper way to correct for multiple comparisons
On Fri, 11 Aug 2000, jazz wrote: > Hi, I'm not feeling confident about my method here, and would apprecaite > it if somebody lets me know if I'm wrong, thanks. > > I'm doing a 2x2 anova (type: logic, math)(difficulty: hard, easy). The > hypothesis is that harder logic will produce a larger DV than easy logic, > but this will not occur in math problems (which constitute a control). I > found a typeXdifficulty interaction (p < .05). Now, I do a post-test > comparing hard logic to easy logic and find an affect at .025 (p < .025). > I do a similar post-test for hard and easy math and p > .025 so the hard > math doesn't produce a significantly larger DV than easy math. > > My reasoning is, I plan the two post-anova comparisons, so I divide my > alpha .05 by two, to get the .025. > > > Thank you for any advice. > > Jim Some authors would call your contrasts of easy and hard for logic and math the "simple main effects" of difficulty. Given that the interaction is significant, and that these contrasts are planned, I think most folks would be happy sticking with alpha = .05. -- Bruce Weaver [EMAIL PROTECTED] http://www.angelfire.com/wv/bwhomedir/ = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: I need help!!! SPSS and Panel Data
On Sun, 2 Jul 2000 [EMAIL PROTECTED] wrote: > Help! > I'm a Norwegian student who can't figure out how > to work SPSS 9.0 properly for running a multiple > regression on panel data (longitudinal data or > cross-sectional time-series data). My data set > consist of financial data from about 300 Norw. > municipalities. For each municipality I have > observations for 7 fiscal years. My problem is > that I don't know how to "tell" SPSS that the > cases are grouped 7 by 7, i.e that they are panel > data. > Can somebody please help me! > > Ketil Pedersen > Hi Ketil, I'm not familiar with time series terminology, but if I followed you, you have a data file that looks something like this: MUNICIP YEAR Y 1 1 1 2 1 3 etc 1 7 2 1 2 2 2 3 etc 2 7 3 1 3 1 etc 3 7 etc I think you may have one or more "between-groups" variables too, but wasn't sure about this. Anyway, if this is more or less accurate, then I think you would find it easier to use UNIANOVA rather than REGRESSION. In the pulldown menus, you find it under GLM-->Univariate, I think. Here's an example of some syntax for the data shown above with SIZE included as a between-municipalities variable: UNIANOVA y BY municip year size /RANDOM = municip /METHOD = SSTYPE(3) /INTERCEPT = INCLUDE /EMMEANS = TABLES(year) /EMMEANS = TABLES(size) /EMMEANS = TABLES(year*size) /CRITERIA = ALPHA(.05) /print = etasq /plot = resid /DESIGN = size municip(size) year year*size . Note that municip is a random factor here (i.e., it is treated the same way Subjects are usually treated). And the notation "municip(size)" indicates that municip is nested in the size groups. The output from this syntax will give you an F-test for size with municip(size) as the error term; and for the year and year*size F-tests, the error term (called "residual") will be Year*municip(size), because that's all that is left over. You can get the same F-tests using REGRESSION, but not as easily. For one thing, you have to compute your own dummy variables for MUNICIP and YEAR; and if you have a mixed design (between- and within-municipalities variables), you pretty much have to do two separate analyses, as far as I can tell. Hope this helps. -- Bruce Weaver [EMAIL PROTECTED] http://www.angelfire.com/wv/bwhomedir/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Repeated Measures ANOVA
On Tue, 13 Jun 2000 [EMAIL PROTECTED] wrote: > Hi. > > I have conducted an experiment with 4 within subject variables. > 1) Colour > 2) Shape > 3) Pattern > 4) Movement > > Each of these 4 factors have 2 levels so each subject would be exposed > to 16 conditions in total. However, I have made each subject do 10 > replications per condition and I have 10 subjects so I have a total of > 1600 data points. > > I have tried using SPSS repeated measures in GLM to analyse my data but > I don't know how to include my replications. SPSS requires that I > select 16 columns of dependant variables each representing a > combination of my factors. However, I am only allowed one row per > subject, so how do I input the 10 replications that each subject > performed for each combination? > > Thanks ! > > Alfred > Hi Alfred, You might be better off using UNIANOVA for this analysis instead of GLM. For example, here's the GLM syntax for a mixed-design (A and B as between subjects variables; C and D within-subjects): GLM c1d1 c1d2 c2d1 c2d2 c3d1 c3d2 BY a b /WSFACTOR = c 3 Polynomial d 2 Polynomial /METHOD = SSTYPE(3) /CRITERIA = ALPHA(.05) /WSDESIGN = c d c*d /DESIGN = a b a*b . This analysis required the 6 repeated meaures (3*2) to be strung out across one row for each subject. But I was able to produce exactly the same results using 6 rows per subject (one for each of the c*d combinations) and the following syntax: UNIANOVA y BY subj a b c d /RANDOM = subj /METHOD = SSTYPE(3) /INTERCEPT = INCLUDE /EMMEANS = TABLES(OVERALL) /EMMEANS = TABLES(a) /EMMEANS = TABLES(b) /EMMEANS = TABLES(c) /EMMEANS = TABLES(d) /CRITERIA = ALPHA(.05) /DESIGN = a b a*b subj(a*b) c c*a c*b c*a*b c*subj(a*b) d d*a d*b d*a*b d*subj(a*b) c*d c*d*a c*d*b c*d*a*b c*d*subj(a*b). Note that SUBJ is now listed explicitly as one of the variables. And you must explicitly list each of the error terms for within-subjects effects. If you do not list these error terms, a pooled error term is used for tests of the within-subjects effects. Finally, note as well that SUBJ appears on the /Random line; and the nesting of subjects within a*b cells is indicated as subj(a*b). I haven't tried this with a completely within-subjects design. But if you let y=DV a=colour b=shape c=pattern d=movement e = repetition (as suggested by Donald Burril), your syntax should look something like this, I think: UNIANOVA y BY subj a b c d e /RANDOM = subj e /METHOD = SSTYPE(3) /INTERCEPT = INCLUDE /EMMEANS = TABLES(a) /EMMEANS = TABLES(b) /EMMEANS = TABLES(c) /EMMEANS = TABLES(d) /EMMEANS = TABLES(d) /CRITERIA = ALPHA(.05) /DESIGN = a a*subj b b*subj c c*subj d d*subj e e*subj a*b a*b*subj a*c a*c*subj etc... a*b*c*d*e a*b*c*d*e*subj . Your data file would have 2*2*2*2*10 = 160 rows per subject with variables that code for a-e and another for the DV. Hope this helps. Cheers, Bruce -- Bruce Weaver [EMAIL PROTECTED] http://www.angelfire.com/wv/bwhomedir/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: SPSS GLM - between * within factor interactions
On Tue, 9 May 2000, Johannes Hartig wrote: > I have tried modifying the syntax, but I'm not getting any further. > The within- and between-subject effects are defined seperately > in /WSDESIGN and /DESIGN, and mixing them only gives me > cryptic error messages. Could it be possible to customize within * > between interactions with /LMATRIX or /KMATRIX? I am > checking already the syntax guide, but no success so far :( > Thanks for any advice, > Johannes > How about generating your own dummy variables for the various main effects and interactions of interest (including dummy variables for subject), and using REGRESSION instead of GLM repeated measures? You can use the /TEST subcommand to compare the full model to various reduced models to produce tests for the main effects and interactions of interest. For a between-within design, subject will be nested in the between subjects variables, so I think you'll have to enter those between subjects variables on one step, and the dummy variables for subject on the next step. (If you enter the dummy variables for subject first, you won't be able to enter the between Ss variables, because they'll provide no further information. It would be like entering codes for City, and then trying to enter codes for country: Once you know the city, you already know country.) Good luck. Bruce === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: SPSS GLM - between * within factor interactions
On Mon, 8 May 2000, Johannes Hartig wrote: > > Click on the Model box in the pull-down menu. The default model > > is the full-factorial, but you can opt for other custom models with only > > the effects you are interested in. > > Thnks for your answer, but - I can't! - or am I missing something obviuos? > I only can customize within- and between-factor effects seperately, _not_ > interactions between both. WHY? > > Johannes > Sorry Johannes, I didn't know that. I wonder if this is a peculiarity of using the GUI. Have you tried pasting the syntax, and then modifying it to include only the interactions of interest? It probably won't work that way either, but it's worth a try. Bruce === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
On 15 Apr 2000, Donald F. Burrill wrote: > > > (2) My second objection is that if the positive-discrete > > > probability is retained for the value "0" (or whatever value the former > > > "no" is held to represent), the distribution of the observed quantity > > > cannot be one of the standard distributions. (In particular, it is not > > > normal.) One then has no basis for asserting the probability of error > > > in rejecting the null hypothesis (at least, not by invoking the standard > > > distributions, as computers do, or the standard tables, as humans do > > > when they aren't relying on computers). Presumably one could derive the > > > sampling distribution in enough detail to handle simple problems, but > > > that still looks like a lot more work than one can imagine most > > > investigators -- psychologists, say -- cheerfully undertaking. > > > > This would not be a problem if the alternative was one-tailed, would it? > > Sorry, Bruce, I do not see your point. How does 1-tailed vs. 2-tailed > make a difference in whatever the underlying probability distribution is? > Donald, It was clear at the time, but now I'm not sure if I can see my point either! I think what I was driving at was the idea that a point null hypothesis is often false a priori. But if you have a one-tailed alternative, then you don't have a point null, because the null encompasses a whole range of values. For example, if your alternative is that a treatment improves performance, then the null states that performance remains the same or worsens as a result of the treatment. It seems that this kind of null hypothesis certainly can be true. And I think it is perfectly legitimate to use the appropriate continuous distribution (e.g., t-distribution) in carrying out a test. Or am I missing something? Cheers, Bruce === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Nonpar Repeated Measures
On Thu, 13 Apr 2000, Rich Ulrich wrote: > On Thu, 13 Apr 2000 11:53:05 GMT, Chuck Cleland <[EMAIL PROTECTED]> > wrote: > > > I have an ordinal response variable measured at four different times > > as well as a 3 level between subjects factor. I looked at the time > > main effect with the Friedman Two-Way Analysis of Variance by Ranks. > > That effect was statistically significant and was followed up by > > single df comparisons of time one with each of the three other time > > points (Siegel and Castellan, 1988, pp. 181-183). > > I would like bring in the between subjects factor now as I expect an > > interaction between this factor and the time effect. Could anyone > > suggest ways of doing this with the ordinal (0 to 3) response > > variable? I have already looked at the simple main effect of time > > within each group with the Friedman test, but I would like to test the > > interaction. > > An "ordinal (0 to 3) response variable" has to give you a WHOLE lot > of ties. (As I have posted before,) For simple analyses, forcing the > rank-transformation is morely to do harm than good when you start with > just a few ordinal categories. Using the scores of 0-3 or using some > other rational scoring, you can probably be quite safe in doing the > two-way ANOVA -- safer, I suspect, than anything you can do with > ranking as the first step. > Good point Rich. I didn't think about ties. If the ordinal data are generated by having people rank order objects, you could avoid completely ties by simply disallowing tied ranks. But in the situation Chuck described (time as the repeated measure), there may well be a LOT of ties, as you say. Cheers, Bruce -- Bruce Weaver [EMAIL PROTECTED] http://www.angelfire.com/wv/bwhomedir/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Nonpar Repeated Measures
On Thu, 13 Apr 2000, Chuck Cleland wrote: > Hello: > I have an ordinal response variable measured at four different times > as well as a 3 level between subjects factor. I looked at the time > main effect with the Friedman Two-Way Analysis of Variance by Ranks. > That effect was statistically significant and was followed up by > single df comparisons of time one with each of the three other time > points (Siegel and Castellan, 1988, pp. 181-183). > I would like bring in the between subjects factor now as I expect an > interaction between this factor and the time effect. Could anyone > suggest ways of doing this with the ordinal (0 to 3) response > variable? I have already looked at the simple main effect of time > within each group with the Friedman test, but I would like to test the > interaction. > > thanks, > > Chuck > Chuck, There is a thread from a year or 2 ago on this topid. Search for "Nonparametric test for mixed model" at www.deja.com/usenet. Cheers, Bruce -- Bruce Weaver [EMAIL PROTECTED] http://www.angelfire.com/wv/bwhomedir/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
On 12 Apr 2000, Herman Rubin wrote: > >I have often wondered if an integrated course/course sequence might not be > >better. > > A course sequence of a rather different kind is definitely > in order. It would be at least three courses. > > The first course would be a general probability only course, > with the emphasis on understanding probability, not in carrying > out computations. This has nothing to do with the discipline > of the individual student, although the level should be such > that it uses as much mathematics as the student is going to know. > One might, at this stage, introduce the ideas of statistical > decision making, but most will need a full course in probability > first to understand probability well enough to use it in any > sensible manner. If probability is presented as merely the > limit of relative frequency, this might be quite difficult. > > The second course should be a course in probability modeling > in the student's department of application. The construction > of probability models, the making of assumptions, and the > meaning of those assumptions, is almost totally absent in > those using statistics today. There should be strong warnings > about the dangers of those assumptions being false, and that > in practice these assumptions might not be quite true. > > Only after this can one reasonably deal with the uncertainties > of inference. Dr. Rubin, Are there any texbooks that you would deem suitable for the 3 courses you describe above? -- Bruce Weaver [EMAIL PROTECTED] http://www.angelfire.com/wv/bwhomedir/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
On 11 Apr 2000, Donald F. Burrill wrote: > On Mon, 10 Apr 2000, Bruce Weaver wrote in part, quoting Bob Frick: > -- >8 --- > > > > To put this argument another way, suppose the question is whether one > > variable influences another. This is a discrete probability space with > > only two answers: yes or no. Therefore, it is natural that both > > answers receive a nonzero probability. > > It may be (or seem) "natural"; that doesn't mean that it's so, > especially in view of the subsequent refinement: > > > Now suppose the question is changed into > > one concerning the size of the effect. This creates a continuous > > probability space, with the possible answer being any of an infinite > > number of real numbers and each one of these real numbers receiving an > > essentially zero probability. A natural tendency is to include 0 in this > > continuous probability space and assign it an essentially zero > > probability. However, the "no" answer, which corresponds to a size of > > zero, does not change probability just because the question is phrased > > differently. Therefore, it still has its nonzero probability; only the > > nonzero probability of the "yes" answer is spread over the real numbers. > > > > To this I have two objections: (1) It is not clear that the "no" answer > "does not change probability ...", as Bob puts it. If the question is > one that makes sense in a continuous probability space, it is entirely > possible (and indeed more usual than not, one would expect) that > constraining it to a two-value discrete situation ("yes" vs. "no") may > have entailed condensing a range of what one might call "small" values > onto the answer "no". That is, the question may already, and perhaps > unconsciously, have been "coarsened" to permit the discrete expression > of the question with which Bob started. I see your point. But one of the examples Frick gives concerns the existence of ESP. In the discrete space, it does or does not exist. For this particular example, I think one could justify using a 1-tailed test when moving to the continous space; and so the null hypothesis would encompass "less than or equal to 0", and the alternative "greater than 0". It seems to me that with a one-tailed alternative like this, the null hypothesis can certainly be true. > (2) My second objection is that if the positive-discrete > probability is retained for the value "0" (or whatever value the former > "no" is held to represent), the distribution of the observed quantity > cannot be one of the standard distributions. (In particular, it is not > normal.) One then has no basis for asserting the probability of error > in rejecting the null hypothesis (at least, not by invoking the standard > distributions, as computers do, or the standard tables, as humans do > when they aren't relying on computers). Presumably one could derive the > sampling distribution in enough detail to handle simple problems, but > that still looks like a lot more work than one can imagine most > investigators -- psychologists, say -- cheerfully undertaking. This would not be a problem if the alternative was one-tailed, would it? Cheers, Bruce -- Bruce Weaver [EMAIL PROTECTED] http://www.angelfire.com/wv/bwhomedir/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
On Mon, 10 Apr 2000, Rich Ulrich wrote: -- >8 --- > > the term 'null' means a hypothesis that is the straw dog case ... for which > > we are hoping that sample data will allow us to NULLIFY ... > > - this seemed okay in the first sentence. However, I think that > "straw dog case" is what I would call "straw man argument" and that > is *not* the quality of argument of the null.The point-null is > always false, but we state the null so that it is "reasonable" to > accept it, or to require data in order to reject it. > -- >8 --- Rich, I do not agree that the point-null is always false. But I guess it depends on how you define "point-null". Bob Frick has some very interesting things to say about all of this. For example, the following is taken from his 1995 Memory & Cognition paper (Vol 23, pp. 132-138), "Accepting the null hypothesis": To put this argument another way, suppose the question is whether one variable influences another. This is a discrete probability space with only two answers: yes or no. Therefore, it is natural that both answers receive a nonzero probability. Now suppose the question is changed into one concerning the size of the effect. This creates a continuous probability space, with the possible answer being any of an infinite number of real numbers and each one of these real numbers receiving an essentially zero probability. A natural tendency is to include 0 in this continuous probability space and assign it an essentially zero probability. However, the "no" answer, which corresponds to a size of zero, does not change probability just because the question is phrased differently. Therefore, it still has its nonzero probability; only the nonzero probability of the "yes" answer is spread over the real numbers. Frick's 1996 paper in Psychological Methods (Vol 1, pp. 379-390), "The appropriate use of null hypothesis testing" is also very interesting and topical. From the abstract of that paper: "This article explores when and why [null hypothesis testing] is appropriate. Null hypothesis testing is insufficient when the size of effect is important, but is ideal for testing ordinal claims relating the order of conditions, which are common in psychology." Cheers, Bruce -- Bruce Weaver [EMAIL PROTECTED] http://www.angelfire.com/wv/bwhomedir/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
On 7 Apr 2000, dennis roberts wrote: > i was not suggesting taking away from our arsenal of tricks ... but, since > i was one of those old guys too ... i am wondering if we were mostly lead > astray ...? > > the more i work with statistical methods, the less i see any meaningful (at > the level of dominance that we see it) applications of hypothesis testing ... > > here is a typical problem ... and we teach students this! > > 1. we design a new treatment > 2. we do an experiment > 3. our null hypothesis is that both 'methods', new and old, produce the > same results > 4. we WANT to reject the null (especially if OUR method is better!) > 5. we DO a two sample t test (our t was 2.98 with 60 df) and reject the > null ... and in our favor! > 6. what has this told us? > > if this is ALL you do ... what it has told you AT BEST is that ... the > methods probably are not the same ... but, is that the question of interest > to us? > > no ... the real question is: how much difference is there in the two methods? -- >8 --- In one of his papers, Bob Frick has argues very persuasively that very often (in experimental psychology, at least), this is NOT the real question at all. I think that is especially the case when you are testing theories. Suppose, for example that my theory of selective attention posits that inhibition of the internal representations of distracting items is an important mechanism of selection. This idea has been testing in so-called "negative priming" experiments. (Negative priming refers to the fact that subjects respond more slowly to an item that was previously ignored, or is semantically related to a previously ignored item, than they do to a novel item.) Negative priming is measured as a response time difference between 2 conditions in an experiment. The difference is typically between about 20 and 40 milliseconds. I think the important thing to remember about this is that the researcher is not trying to account for variability in response time per se, even though response time is the dependent variable: He or she is just using response time to indirectly measure the object of real interest. If one was trying to account for overall variability in response time, the conditions of this experiment would almost certainly not make the list of important variables. The researcher KNOWS that a lot of other things affect response time, and some of them a LOT more than his experimental conditions do. However, because one is interested in testing a theory of selective attention, this small difference between conditions is VERY important, provided it is statistically significant (and there is sufficient power); and measures of effect size are not all that relevant. Just my 2 cents. -- Bruce Weaver [EMAIL PROTECTED] http://www.angelfire.com/wv/bwhomedir/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Combining 2x2 tables
On Thu, 30 Mar 2000, JohnPeters wrote: > Hi, > I was wondering if someone could help me. I am interested in combining > 2x2 tables from multiple studies. The test used is the McNemar's > chi-sq. I have the raw data from each of these studies. What is the > proper correction that should be used when combining the results. > Thanks!!! Meta-analysis is a common way to combine information from 2x2 tables, but I'm not sure how you would do this with McNemar's chi-square as your measure of "effect size" for each table. It might be possible if you are willing to use something else. It's Friday afternoon, and this is off the top of my head, but here goes anyway. I wonder if you could write the tables this way: Change Yes No -ab Before +cd Cell a: change from - to + Cell b: no change, - before and after Cell c: change from + to - Cell d: no change, + before and after Suppose we're talking about change in opinion after hearing a political speech. The odds ratio for this table would give you the odds of changing from a negative to a positive oppion over the odds of changing from positive to negative. If you're the speaker, you're hoping for an odds ratio greater than 1 (i.e., greater change in those who were negative before the speech). If the amount of change is similar in both groups, the odds ratio will be about 1. If this is a legitimate way to analyze the data for one such table, and I can't see why not, then you could pool the tables meta-analytically with ln(OR) as your measure of effect size. Here's a paper that describes how to go about it: Fleiss, JL. (1993). The statistical basis of meta-analysis. Statistical Methods in Medical Research, 2, 121-145. There are also free programs available for performing this kind of meta-analysis. I have links to some in the statistics section of my homepage. Hope this helps. Bruce -- Bruce Weaver [EMAIL PROTECTED] http://www.angelfire.com/wv/bwhomedir/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Normality & parametric tests (WAS: Kruskal-Wallis & equal va
On Fri, 24 Mar 2000, Bernard Higgins wrote: > > > Hi Bruce Hello Bernard. > > The point I was making is that when developing hypothesis tests, > from a theoretical point of view, the sampling distribution of the > test statistic from which critical values or p-values etc are > obtained, is determined by the null hypothesis. We need a probability > model to enable use to determine how likely observed patterns are. > These probability models will often work well in practice even if we > relax the usual assumptions. When using distribution-free tests as > an alternative to a parametric test we may need to specify > restrictions in order that the tests can be considered "equivalent". Agreed. > > In my view the t-test is fairly robust and will work well in most > situations where the distribution is not too skewed, and constant > variance is reasonable. Indeed I have no problems in using it for the > majority of problems. When comparing two independent samples using > t-tests, lack of normality and constant variance are often not too > serious if the samples are of similar size, always a good idea in > planned experiments. Agreed here too. > > As you say, when samples are fairly large, some say 30+ or even > less, the sampling distribution of the mean can often be approximated > by a normal distribution (Central Limit Theorem) and hence the use of > an (asymptotic) Z-test is frequently used. It would not, I think, be > strictly correct to call such a statistic t, although from a > practical point of view there may be little difference. The formal > definition of the single sample t-test is derived from the ratio of a > Standard Normal random variable to a Chi-squared random variable and > does, in theory, require independent observations from a normal > distribution. I think we are no longer in complete agreement here. I am not a mathematician, but for what it's worth, here is my understanding of t- and z-tests: numerator = (statistic - parameter|H0) denominator = SE(statistic) test statistic = z if SE(statistic) is based on pop. SD test statistic = t if SE(statistic) is based on sample SD The most common 'statistics' in the numerator are Xbar and (Xbar1 - Xbar2); but others are certainly possible (e.g., for large-sample versions of rank-based tests). An assumption of both tests is that the statistic in the numerator has a sampling distribution that is normal. This is where the CLT comes into play: It lays out the conditions under which the sampling distribution of the statistic is approximately normal--and those conditions can vary depending on what statistic you're talking about. But having a normal sampling distribution does not mean that we can or should use a critical z-value rather than a critical t when the population variance is unknown (which is what I thought you were suggesting). As you say, one can substitute critical z for critical t when n gets larger, because the differences become negligible. But nowadays, most of us are using computer programs that give us more or less exact p-values anyway, so this is less of an issue than it once was. Cheers, Bruce -- Bruce Weaver [EMAIL PROTECTED] http://www.angelfire.com/wv/bwhomedir/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Normality & parametric tests (WAS: Kruskal-Wallis & equal variances)
On 24 Mar 2000, Bernard Higgins wrote: > These are my thoughts: > > The sampling distribution of a test statistic is determined by the > null hypothesis. So analysis of variance is used to test that a > number of samples come from an identical Normal distribution > against the alternative that the "subpopulations" have different > means (but the same variances). The mean and standard deviation of > normally distributed random variables are independent of one another. > > Distribution free (non-parametric) procedures do not require the > underlying distribution to be normal. For the majority of these -- >8 --- I think it is overly restrictive to say that the samples must come from normally distributed populations under a true null hypothesis. Take the simplest paramtric test, a single sample t-test. The assumption is that the sampling distribution of X-bar is (approximately) normal, not that the population from which you've sampled is normal. If the population is normal, then of course the sampling distribution of X-bar will be too, for any size sample (even n=1). But if your sample size is large enough (e.g., some authors suggest around n=300), the sampling distribution of X-bar will be close to normal no matter what the population distribution looks like. For populations that are not normal, but are reasonably symmetrical, the sampling distribution of X-bar will be near enough to normal with samples somewhere between these extremes. -- Bruce Weaver [EMAIL PROTECTED] http://www.angelfire.com/wv/bwhomedir/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Multiple Comparison Correction in Multiple Regression
On Fri, 17 Mar 2000, Rich Ulrich wrote: -- >8 --- > > 2) When performing a multiple linear regression we have performed partial > > f-tests with the sequential SS (Type I SS) to examine if a particular > > variable "should be added" to a simpler model. If a series of these tests > > are used to find a parsimonious model that still fits should we correct for > > multiple comparisons? > > "Stepwise inclusion" is usually a bad idea. See the comments in my > stats-FAQ, and their references. (If you are worried about correcting > for multiple tests, then you probably *shouldn't* add the variable > because it is probably capitallizing on chance.) Rich, Is there not an important distinction to be made between the following situations: 1. A computer algorithm determines (based on the magnitude of partial or semi-partial correlations) the order in which variables are entered or removed, and which ones end up in the final model 2. The investigator determines a priori the order in which variables are to be entered or removed. Some of my texbooks refer to situation 1 as "stepwise" regression and situation 2 as "hierarchical" regression. One is less likely to capitalize on chance with hierarchical regression, I think, especially if the decisions about order are theoretically motivated, and the number of variables is not too large. Here's another observation that is relevant to this thread, I think. When one performs a 2-factor ANOVA, there are 3 independent F-tests: one for each main effect, and one for the intereaction. One can arrive at these same F-tests using the same regression model comparison approach that is described above (e.g., compare the FULL regression model to one without the AxB interaction to get F for the interaction term). I don't think I have EVER seen anyone correct for multiple comparisons in this case. Cheers, Bruce -- Bruce Weaver [EMAIL PROTECTED] http://www.angelfire.com/wv/bwhomedir/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: ANOVA causal direction
On 10 Feb 2000, Richard M. Barton wrote: > --- Alex Yu wrote: > > A statistical procedure alone cannot determine casual relationships. > --- > > > Correct. A lot depends on eye contact. > > rb And also, at least 2 statistical procedures are required... === This list is open to everyone. Occasionally, people lacking respect for other members of the list send messages that are inappropriate or unrelated to the list's discussion topics. Please just delete the offensive email. For information concerning the list, please see the following web page: http://jse.stat.ncsu.edu/ ===