Estimating priors for Bayesian analysis

2001-05-04 Thread Will Hopkins

I've gone to a lot of trouble to add Bayesian adjustment in a spreadsheet 
for estimating confidence limits of an individual's true score when the 
subject is assessed with a noisy test.  I specify the prior belief simply 
by stating a best guess of the true score, and its x% likely limits, with 
assumption of normality.  I now realize that the adjustment is sensitive to 
the value of x, but how does a person know what x is for a given belief?

For example, I might believe that the individual's true score is 70 units, 
and that the likely range is +/- 10 units.  So what describes 
"likely"?  90%, 95%, 99%...?  Do Bayesians have any validated way to work 
that out?  If they don't, then the whole Bayesian edifice might just come 
crashing down.  I put this to a Bayesian who has been helping me, but I 
have received no reply from him since I sent the message, so I suspect the 
worst.

Will



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Student's t vs. z tests

2001-04-21 Thread Will Hopkins

I've joined this one at the fag end.  I'm with Dennis Roberts.  The way I 
would put it is this:  the PRINCIPLE of a sampling distribution is actually 
incredibly simple: keep repeating the study and this is the sort of spread 
you get for the statistic you're interested in.  What makes it incredibly 
simple is that I keep well away from test statistics when I teach stats to 
biomedical researchers.  I deal only with effect (outcome) statistics.  I 
even forbid my students and colleagues from putting the values of test 
statistics in their papers.  Test statistics are clutter.

The actual mathematical form of any given sampling distribution is 
incredibly complex, but only the really gifted students who want to make 
careers out of statistical research need to come to terms with that.  The 
rest of us just plug numbers into a stats package or spreadsheet.   I'm not 
sure what would be a good sequence for teaching the mathematical 
forms.  Binomial --> normal --> t is probably as good as any.

Will



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Reverse of Fisher's r to z

2001-04-09 Thread Will Hopkins

It's elementary algebra, Cherilyn.  BTW, it's z = 0.5log..., not sqrt.

So r = (e^2z - 1)/(e^2z + 1).

Will



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



More Re: Error term in repeated-measures ANOVA

2001-04-03 Thread Will Hopkins

At 10:19 AM -0500 3/4/01, jim clark on edstat-l wrote:
>By default in SPSS, the error term used to test the significance
>of each contrast is specific to the particular contrast.  So with
>a two-group comparison, it amounts to a paired-difference t-test...
>Whether other packages adopt the same approach or default to
>other error terms, I do not know.

Thanks Jim.  I should have made it clear why I asked this question. 
It's all to do with precision of the estimate (or the p value 
thereof).  If the error term was based on all levels of the 
repeated-measures factor, then it would obviously have more degrees 
of freedom than the error term from just the t test for the two 
levels in question.  Now, if you have a large sample size, it makes 
no difference to the precision of the estimate (or p value) of the 
contrast of interest, but if you have a small sample size, it does 
make a difference.  The person I am helping has only 5 subjects, but 
he has 4 levels.  So the t value for the contrast of two levels would 
have 4 degrees of freedom (paired t test, because no control group), 
but if the error term was based on all four levels, there would be 12 
degrees of freedom.  The confidence limits with 4 degrees of freedom 
are wider than those for 12 degrees of freedom by a factor 2.78/2.18, 
of 1.27.  That's a big difference.  For p-value people, it would 
probably be the difference between p=0.04 and p=0.20, for example.

So it looks like all the hoohaa about Greenhouse-Geiser corrections 
for sphericity are just for the overall significance of the 
repeated-measures factor.  The approach is therefore based on the 
old-fashioned and misguided "thou shalt not test specific contrasts 
unless the overall term is statistically significant".  Well, that's 
disappointing, especially when you end up testing with an error term 
that has fewer degrees of freedom than you have available.

Of course, if you're going to use all the levels to get your error 
term, you have to be happy that the error is uniform across the 
levels.  Hence my question about getting residuals vs predicteds out 
of a repeated-measures ANOVA.  So far I have had no response to that 
query.

Will



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Error term in repeated-measures ANOVA

2001-04-02 Thread Will Hopkins

I do all my repeated measures analyses with mixed modeling in SAS 
these days, but I get called on to help people who use standard 
repeated-measures analyses with other stats packages.  So here's my 
question, which I should know the answer to but I don't!

In a repeated-measures ANOVA, most stats packages do a test for 
sphericity, and they provide an associated adjusted p value for 
overall significance of the repeated-measures factor.  If my 
understanding is correct, the adjustment takes care of non-uniformity 
in the within-subject error between levels of the factor.  Fine, but 
then you want to do a specific contrast between levels of the 
within-subject factor, such as the last pre-treatment vs the first 
post-treatment (with or without a control group--it doesn't matter). 
Now, the p value you get for that contrast... is it based on the 
overall adjusted error derived from ALL levels of the 
repeated-measures factor, or is it nothing more than the p value for 
a t test of the two levels in question?

I realize that some packages attempt to provide a correction for 
inflation of the Type I error when you have many contrasts, so the 
analysis will be an ANOVA rather than a simple t test, but what 
within-subject error term do the packages use for specific contrasts?

Supplementary question:  can you get meaningful residuals out of a 
standard repeated-measures ANOVA, so you can see how non-uniform they 
are when you plot them against predicteds and label points with the 
different levels of the within-subject factor?  I do this sort of 
thing routinely with Proc Mixed, but I never tried it in the days I 
was still using RM-ANOVA.

Will



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: stan error of r

2001-03-28 Thread Will Hopkins

Fisher z transform is normally distributed with variance 1/(N-3).
z=0.5*/ln((1+r)/(1-r)).

Will

At 4:18 PM -0500 28/3/01, dennis roberts wrote:
>anyone know off hand quickly ... what the formula might be for the 
>standard error for r would be IF the population rho value is 
>something OTHER than zero?



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: can you use a t-test with non-interval data?

2001-03-17 Thread Will Hopkins

I just thought of a new justification doing the usual parametric analyses 
on the numbered levels of a Likert-scale variable.   Numbering the levels 
is formally the same as ranking them, and a parametric analysis of a 
rank-transformed variable is a non-parametric analysis.   If non-parametric 
analyses are OK, then so are parametric analyses of Likert-scale variables.

But...  an important condition is that the sampling distribution of your 
outcome statistic must be normal.  This topic came up on this list a few 
weeks ago.  In summary, if the majority of your responses are stacked up on 
one or other extreme value of the Likert scale for one or more groups in 
the analysis, and if you have less than 10 observations in one or more of 
those groups, your confidence intervals or p values are untrustworthy.  See 
http://newstats.org/modelsdetail.html#normal for more.

Will



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: One tailed vs. Two tailed test

2001-03-13 Thread Will Hopkins

Responses to various folks.  And to everyone touchy about one-tailed 
tests, let me make it quite clear that I am only promoting them as a 
way of making a sensible statement about probability.  A two-tailed p 
value has no real meaning, because no real effects are ever null.  A 
one-tailed p value, for a normally distributed statistic, does have a 
real meaning, as I pointed out.  But precision of 
estimation--confidence limits--is paramount.  Hypothesis testing is 
passe.

Donald Burrill queried my assertion about one-tailed p values 
representing the probability that the true value is opposite in sign 
to what you observed.  Don  restated what a one-tailed p represents, 
as it is defined by hypothesis testers, but he did not show that my 
assertion was false.  He did point out that I have to know the 
sampling distribution of the statistic.  Yes, of course.  I assumed a 
normal (or t) distribution.

Here's one proof of my assertion, using arbitrary real values.  I 
always find these confidence-limit machinations a bit tricky.  If 
someone has a better way to prove this, please let me know.

Suppose you observe a value of 5.3 for some normally distributed 
outcome statistic X, and suppose the one-tailed p is 0.04.

Therefore the sampling distribution is such that, when the true value 
is 0, the observed values will be greater than 5.3 for 4% of the time.

Therefore, when the true value is not 0 but something else, T say, 
then X-T will be greater than 5.3 for 4% of the time.  (This is the 
tricky bit.  Don't leap to deny it without a lot of thought.  It 
follows, because the sampling distribution is normal.  It doesn't 
follow for sampling distributions like the non-central t.)

But if X-T > 5.3 for 4% of the time, then rearranging, T < 5.3-X for 
4% of the time. But our observed value is 5.3, so T < 0 for 4% of the 
time.  That is, there is a 4% chance that the true value is less than 
zero.  QED.

Don also wrote
>You had in mind, I trust, the _two-sided_ 95% confidence interval!

Of course. I only thing I've got against 95% confidence intervals is 
that they are too damn conservative, by half.  The default should be 
90% confidence intervals.  I think being wrong about something (here, 
the true value) 10% of the time is more realistic in human affairs. 
But obviously, in any specific instance, it depends on the cost of 
being wrong.

Dennis Roberts  wrote:
>1. some test statistics are naturally (the way they work anyway) ONE 
>sided with respect to retain/reject decisions

Look, forget test statistics.  What matters is the precision of the 
estimate of the EFFECT statistics.  If you keep that in front of 
everything else, the question of hypothesis testing with any number 
of tails just vanishes into thin air.  The only use for a test 
statistic is to help you work out a confidence interval.  Don't ever 
report them in your papers.

Herman Rubin wrote about my assertion:
>This is certainly not the case, except under highly dubious
>Bayesian assumptions.

Herman, see above.  And the only Bayesian assumption is what you 
might call the null Bayesian:  that there is no prior knowledge of 
the true value.  But any Bayesian- vs frequentist-type arguments here 
are academic.

Jerry Dallal wrote, ironically:
>If you're doing a 1 tailed test, why test at all?  Just switch from
>standard treatment to the new one.  Can't do any harm. Every field
>is littered with examples where one-tailed tests would have led to
>disasters (harmful treatments missed, etc.) had they been used.

As you well know, Jerry, 5% is arbitrary.

Will



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: One tailed vs. Two tailed test

2001-03-12 Thread Will Hopkins

At 7:34 PM + 12/3/01, Jerry Dallal wrote:
>Don't do one-tailed tests.

If you are going to do any tests, it makes more sense to one-tailed 
tests.  The resulting p value actually means something that folks can 
understand:  it's the probability the true value of the effect is 
opposite to what you have observed.

Example:  you observe an effect of +5.3 units, one-tailed p = 0.04. 
Therefore there is a probability of 0.04 that the true value is less 
than zero.

There was a discussion of this notion a month or so ago.  A Bayesian 
on this list made the point that the one-tailed p has this meaning 
only if you have absolutely no prior knowledge of the true value. 
Sure, no problem.

But why test at all?  Just show the 95% confidence limits for your 
effects, and interpret them:  "The effect could be as big as , which would mean  Or it could be , which would represent...  Therefore... "  Doing it 
in this way automatically addresses the question of the power of your 
study, which reviewers are starting to ask about. If your study turns 
out to be underpowered, you can really impress the reviewers by 
estimating the sample size you would (probably) need to get a 
clear-cut effect.  I can explain, if anyone is listening...

Will
-- 
Will G Hopkins, PhD FACSM
University of Otago, Dunedin NZ
Sportscience: http://sportsci.org
A New View of Statistics: http://newstats.org
Sportscience Mail List:  http://sportsci.org/forum
ACSM Stats Mail List:  http://sportsci.org/acsmstats

Be creative: break rules.



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Speaking of ANOVA in SPSS...

2001-03-12 Thread Will Hopkins

I'm trying to reduce all stats to a few simple procedures that 
students can do EASILY with available stats packages.  A two-way 
ANOVA or an ANCOVA is as complex as I want to go. I thought SPSS 
would do the trick, but I was amazed to discover that it can't.

Here's the example.  I want students to convert repeated-measures 
data into unpaired t tests or non-repeated measures ANOVA, by using 
change scores between the time points of interest.  That's no problem 
when there is just the group effect:  the analysis becomes a simple 
unpaired t test.  But when you have an extra between-subjects effect 
(e.g. males and females in the treatment and control groups) it 
becomes a two-way ANOVA.  You make a column of change scores between 
the time points of interest (e.g., post and pre), and that's your 
dependent variable.  The two independent effects are group (exptal 
and control, say) and sex (male and female).  The group term gives 
the effect of the treatment averaged for males and females.  Again, 
no problem there, but what I want is an appropriate customized 
contrast of the interaction term, which yields the difference in the 
overall effect between males and females.  SPSS version 10 can't do 
it.  I checked the on-line help, and it looks like you have to use 
the command language.  Well really, what student is going to manage 
that?  It's out of the question.  Sure, you can get a p value for the 
interaction, but I want confidence limits for the difference between 
males and females.  I've got my students to convert the p value, the 
degrees of freedom, and the observed value of the effect into 
confidence limits, but I shouldn't have to resort to that.

I'd also like SPSS to do an ANCOVA, but again I want to do contrasts 
for the interaction, and again, they ain't there.  Or did I miss 
something?  If so, please let me know.  And can you let me know of 
any simple, and preferably CHEAP or FREE, packages that will do what 
I want?

Will
-- 
Will G Hopkins, PhD FACSM
University of Otago, Dunedin NZ
Sportscience: http://sportsci.org
A New View of Statistics: http://newstats.org
Sportscience Mail List:  http://sportsci.org/forum
ACSM Stats Mail List:  http://sportsci.org/acsmstats

Be creative: break rules.



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Simulating T tests for Likert scales

2001-02-13 Thread Will Hopkins

Rich Ulrich wrote:
>You can use t-tests
>effectively on outcomes that are dichotomous variables, and you use
>the pooled version (Student's t) despite any difference in variances.
>That is the test that gives you the proper p-levels.

Rich, if the sample sizes in the two groups are different, you have to use 
the t test jigged for unequal variances.  That's what my simulations showed.

Your other commments about the robustness of t tests for Likert scales are 
reassuring, and thanks for responding.  I did find that the confidence 
interval went awry when responses got too stacked up on the first or last 
level.

Will



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=