Estimating priors for Bayesian analysis
I've gone to a lot of trouble to add Bayesian adjustment in a spreadsheet for estimating confidence limits of an individual's true score when the subject is assessed with a noisy test. I specify the prior belief simply by stating a best guess of the true score, and its x% likely limits, with assumption of normality. I now realize that the adjustment is sensitive to the value of x, but how does a person know what x is for a given belief? For example, I might believe that the individual's true score is 70 units, and that the likely range is +/- 10 units. So what describes "likely"? 90%, 95%, 99%...? Do Bayesians have any validated way to work that out? If they don't, then the whole Bayesian edifice might just come crashing down. I put this to a Bayesian who has been helping me, but I have received no reply from him since I sent the message, so I suspect the worst. Will = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
I've joined this one at the fag end. I'm with Dennis Roberts. The way I would put it is this: the PRINCIPLE of a sampling distribution is actually incredibly simple: keep repeating the study and this is the sort of spread you get for the statistic you're interested in. What makes it incredibly simple is that I keep well away from test statistics when I teach stats to biomedical researchers. I deal only with effect (outcome) statistics. I even forbid my students and colleagues from putting the values of test statistics in their papers. Test statistics are clutter. The actual mathematical form of any given sampling distribution is incredibly complex, but only the really gifted students who want to make careers out of statistical research need to come to terms with that. The rest of us just plug numbers into a stats package or spreadsheet. I'm not sure what would be a good sequence for teaching the mathematical forms. Binomial --> normal --> t is probably as good as any. Will = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Reverse of Fisher's r to z
It's elementary algebra, Cherilyn. BTW, it's z = 0.5log..., not sqrt. So r = (e^2z - 1)/(e^2z + 1). Will = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
More Re: Error term in repeated-measures ANOVA
At 10:19 AM -0500 3/4/01, jim clark on edstat-l wrote: >By default in SPSS, the error term used to test the significance >of each contrast is specific to the particular contrast. So with >a two-group comparison, it amounts to a paired-difference t-test... >Whether other packages adopt the same approach or default to >other error terms, I do not know. Thanks Jim. I should have made it clear why I asked this question. It's all to do with precision of the estimate (or the p value thereof). If the error term was based on all levels of the repeated-measures factor, then it would obviously have more degrees of freedom than the error term from just the t test for the two levels in question. Now, if you have a large sample size, it makes no difference to the precision of the estimate (or p value) of the contrast of interest, but if you have a small sample size, it does make a difference. The person I am helping has only 5 subjects, but he has 4 levels. So the t value for the contrast of two levels would have 4 degrees of freedom (paired t test, because no control group), but if the error term was based on all four levels, there would be 12 degrees of freedom. The confidence limits with 4 degrees of freedom are wider than those for 12 degrees of freedom by a factor 2.78/2.18, of 1.27. That's a big difference. For p-value people, it would probably be the difference between p=0.04 and p=0.20, for example. So it looks like all the hoohaa about Greenhouse-Geiser corrections for sphericity are just for the overall significance of the repeated-measures factor. The approach is therefore based on the old-fashioned and misguided "thou shalt not test specific contrasts unless the overall term is statistically significant". Well, that's disappointing, especially when you end up testing with an error term that has fewer degrees of freedom than you have available. Of course, if you're going to use all the levels to get your error term, you have to be happy that the error is uniform across the levels. Hence my question about getting residuals vs predicteds out of a repeated-measures ANOVA. So far I have had no response to that query. Will = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Error term in repeated-measures ANOVA
I do all my repeated measures analyses with mixed modeling in SAS these days, but I get called on to help people who use standard repeated-measures analyses with other stats packages. So here's my question, which I should know the answer to but I don't! In a repeated-measures ANOVA, most stats packages do a test for sphericity, and they provide an associated adjusted p value for overall significance of the repeated-measures factor. If my understanding is correct, the adjustment takes care of non-uniformity in the within-subject error between levels of the factor. Fine, but then you want to do a specific contrast between levels of the within-subject factor, such as the last pre-treatment vs the first post-treatment (with or without a control group--it doesn't matter). Now, the p value you get for that contrast... is it based on the overall adjusted error derived from ALL levels of the repeated-measures factor, or is it nothing more than the p value for a t test of the two levels in question? I realize that some packages attempt to provide a correction for inflation of the Type I error when you have many contrasts, so the analysis will be an ANOVA rather than a simple t test, but what within-subject error term do the packages use for specific contrasts? Supplementary question: can you get meaningful residuals out of a standard repeated-measures ANOVA, so you can see how non-uniform they are when you plot them against predicteds and label points with the different levels of the within-subject factor? I do this sort of thing routinely with Proc Mixed, but I never tried it in the days I was still using RM-ANOVA. Will = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: stan error of r
Fisher z transform is normally distributed with variance 1/(N-3). z=0.5*/ln((1+r)/(1-r)). Will At 4:18 PM -0500 28/3/01, dennis roberts wrote: >anyone know off hand quickly ... what the formula might be for the >standard error for r would be IF the population rho value is >something OTHER than zero? = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: can you use a t-test with non-interval data?
I just thought of a new justification doing the usual parametric analyses on the numbered levels of a Likert-scale variable. Numbering the levels is formally the same as ranking them, and a parametric analysis of a rank-transformed variable is a non-parametric analysis. If non-parametric analyses are OK, then so are parametric analyses of Likert-scale variables. But... an important condition is that the sampling distribution of your outcome statistic must be normal. This topic came up on this list a few weeks ago. In summary, if the majority of your responses are stacked up on one or other extreme value of the Likert scale for one or more groups in the analysis, and if you have less than 10 observations in one or more of those groups, your confidence intervals or p values are untrustworthy. See http://newstats.org/modelsdetail.html#normal for more. Will = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: One tailed vs. Two tailed test
Responses to various folks. And to everyone touchy about one-tailed tests, let me make it quite clear that I am only promoting them as a way of making a sensible statement about probability. A two-tailed p value has no real meaning, because no real effects are ever null. A one-tailed p value, for a normally distributed statistic, does have a real meaning, as I pointed out. But precision of estimation--confidence limits--is paramount. Hypothesis testing is passe. Donald Burrill queried my assertion about one-tailed p values representing the probability that the true value is opposite in sign to what you observed. Don restated what a one-tailed p represents, as it is defined by hypothesis testers, but he did not show that my assertion was false. He did point out that I have to know the sampling distribution of the statistic. Yes, of course. I assumed a normal (or t) distribution. Here's one proof of my assertion, using arbitrary real values. I always find these confidence-limit machinations a bit tricky. If someone has a better way to prove this, please let me know. Suppose you observe a value of 5.3 for some normally distributed outcome statistic X, and suppose the one-tailed p is 0.04. Therefore the sampling distribution is such that, when the true value is 0, the observed values will be greater than 5.3 for 4% of the time. Therefore, when the true value is not 0 but something else, T say, then X-T will be greater than 5.3 for 4% of the time. (This is the tricky bit. Don't leap to deny it without a lot of thought. It follows, because the sampling distribution is normal. It doesn't follow for sampling distributions like the non-central t.) But if X-T > 5.3 for 4% of the time, then rearranging, T < 5.3-X for 4% of the time. But our observed value is 5.3, so T < 0 for 4% of the time. That is, there is a 4% chance that the true value is less than zero. QED. Don also wrote >You had in mind, I trust, the _two-sided_ 95% confidence interval! Of course. I only thing I've got against 95% confidence intervals is that they are too damn conservative, by half. The default should be 90% confidence intervals. I think being wrong about something (here, the true value) 10% of the time is more realistic in human affairs. But obviously, in any specific instance, it depends on the cost of being wrong. Dennis Roberts wrote: >1. some test statistics are naturally (the way they work anyway) ONE >sided with respect to retain/reject decisions Look, forget test statistics. What matters is the precision of the estimate of the EFFECT statistics. If you keep that in front of everything else, the question of hypothesis testing with any number of tails just vanishes into thin air. The only use for a test statistic is to help you work out a confidence interval. Don't ever report them in your papers. Herman Rubin wrote about my assertion: >This is certainly not the case, except under highly dubious >Bayesian assumptions. Herman, see above. And the only Bayesian assumption is what you might call the null Bayesian: that there is no prior knowledge of the true value. But any Bayesian- vs frequentist-type arguments here are academic. Jerry Dallal wrote, ironically: >If you're doing a 1 tailed test, why test at all? Just switch from >standard treatment to the new one. Can't do any harm. Every field >is littered with examples where one-tailed tests would have led to >disasters (harmful treatments missed, etc.) had they been used. As you well know, Jerry, 5% is arbitrary. Will = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: One tailed vs. Two tailed test
At 7:34 PM + 12/3/01, Jerry Dallal wrote: >Don't do one-tailed tests. If you are going to do any tests, it makes more sense to one-tailed tests. The resulting p value actually means something that folks can understand: it's the probability the true value of the effect is opposite to what you have observed. Example: you observe an effect of +5.3 units, one-tailed p = 0.04. Therefore there is a probability of 0.04 that the true value is less than zero. There was a discussion of this notion a month or so ago. A Bayesian on this list made the point that the one-tailed p has this meaning only if you have absolutely no prior knowledge of the true value. Sure, no problem. But why test at all? Just show the 95% confidence limits for your effects, and interpret them: "The effect could be as big as , which would mean Or it could be , which would represent... Therefore... " Doing it in this way automatically addresses the question of the power of your study, which reviewers are starting to ask about. If your study turns out to be underpowered, you can really impress the reviewers by estimating the sample size you would (probably) need to get a clear-cut effect. I can explain, if anyone is listening... Will -- Will G Hopkins, PhD FACSM University of Otago, Dunedin NZ Sportscience: http://sportsci.org A New View of Statistics: http://newstats.org Sportscience Mail List: http://sportsci.org/forum ACSM Stats Mail List: http://sportsci.org/acsmstats Be creative: break rules. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Speaking of ANOVA in SPSS...
I'm trying to reduce all stats to a few simple procedures that students can do EASILY with available stats packages. A two-way ANOVA or an ANCOVA is as complex as I want to go. I thought SPSS would do the trick, but I was amazed to discover that it can't. Here's the example. I want students to convert repeated-measures data into unpaired t tests or non-repeated measures ANOVA, by using change scores between the time points of interest. That's no problem when there is just the group effect: the analysis becomes a simple unpaired t test. But when you have an extra between-subjects effect (e.g. males and females in the treatment and control groups) it becomes a two-way ANOVA. You make a column of change scores between the time points of interest (e.g., post and pre), and that's your dependent variable. The two independent effects are group (exptal and control, say) and sex (male and female). The group term gives the effect of the treatment averaged for males and females. Again, no problem there, but what I want is an appropriate customized contrast of the interaction term, which yields the difference in the overall effect between males and females. SPSS version 10 can't do it. I checked the on-line help, and it looks like you have to use the command language. Well really, what student is going to manage that? It's out of the question. Sure, you can get a p value for the interaction, but I want confidence limits for the difference between males and females. I've got my students to convert the p value, the degrees of freedom, and the observed value of the effect into confidence limits, but I shouldn't have to resort to that. I'd also like SPSS to do an ANCOVA, but again I want to do contrasts for the interaction, and again, they ain't there. Or did I miss something? If so, please let me know. And can you let me know of any simple, and preferably CHEAP or FREE, packages that will do what I want? Will -- Will G Hopkins, PhD FACSM University of Otago, Dunedin NZ Sportscience: http://sportsci.org A New View of Statistics: http://newstats.org Sportscience Mail List: http://sportsci.org/forum ACSM Stats Mail List: http://sportsci.org/acsmstats Be creative: break rules. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Simulating T tests for Likert scales
Rich Ulrich wrote: >You can use t-tests >effectively on outcomes that are dichotomous variables, and you use >the pooled version (Student's t) despite any difference in variances. >That is the test that gives you the proper p-levels. Rich, if the sample sizes in the two groups are different, you have to use the t test jigged for unequal variances. That's what my simulations showed. Your other commments about the robustness of t tests for Likert scales are reassuring, and thanks for responding. I did find that the confidence interval went awry when responses got too stacked up on the first or last level. Will = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =