RE: Analysis of covariance

2001-10-02 Thread Bruce Weaver


On 27 Sep 2001, Paul R. Swank wrote:

 Some years ago I did a simulation on the pretest-posttest control group
 design lokking at three methods of analysis, ANCOVA, repeated measures
 ANOVA, and treatment by block factorial ANOVA (blocking on the pretest using
 a median split). I found that that with typical sample sizes, the repeated
 measures ANOVA was a bit more powerful than the ANCOVA procedure when the
 correlation between pretest and posttest was fairly high (say .90). As noted
 below, this is because the ANCOVA and ANOVA methods are approaching the same
 solution but ANCOVA loses a degree of freedom estimating the regression
 parameter when the ANOVA doesn't. Of course this effect diminshes as the
 sample size gets larger because the loss of one df is diminished. On the
 other hand, the treatment by block design tends to have a bit more power
 when the correlation between pretest and posttest is low ( .30). I tried to
 publish the results at the time but aimed a bit too high and received such a
 scathing review (what kind of idiot would do this kind of study?) that I
 shoved it a drawer and it has never seen the light of day since. I did the
 syudy because it seemed at the time that everyone was using this design but
 were unsure of the analysis and I thought a demonstration would be helpful.
 SO, to make a long story even longer, the ANCOVA seems to be most powerful
 in those circumstances one is likely to run into but does have somewhat
 rigid assumptions about homogeneity of regression slopes. Of course the
 repeated measures ANOVA indirectly makes the same assumption but at such
 high correlations, this is really a homogenity of variance issue as well.
 The second thought is for you reviewers out there trying to soothe your own
 egos by dumping on someone else's. Remember, the researcher you squelch
 today might be turned off to research and fail to solve a meaty problem
 tomorrow.

 Paul R. Swank, Ph.D.
 Professor
 Developmental Pediatrics
 UT Houston Health Science Center


Paul's post reminded me of something I read in Keppel's Design and
Analysis.  Here's an excerpt from my notes on ANCOVA:


Keppel (1982, p. 512) says:

If the choice is between blocking and the analysis of covariance, Feldt
(1958) has shown that blocking is more precise when the correlation
between the covariate and the dependent variable is less than .4, while
the analysis of covariance is more precise with correlations greater than
.6.  Since we rarely obtain correlations of this latter magnitude in the
behavioral sciences, we will not find a unique advantage in the analysis
of covariance in most research applications.

Keppel (1982, p. 513) also prefers the Treatments X Blocks design
to ANCOVA on the grounds that the underlying assumptions are less
stringent:

Both within-subjects designs and analyses of covariance require a number
of specialized statistical assumptions.  With the former, homogeneity of
between treatment differences and the absence of differential carryover
effects are assumptions that are critical for an unambiguous
interpretation of the results of an experiment.  With the latter, the most
stringent is the assumption of homogeneous within-group regression
coefficients.  Both the analysis of covariance and the analysis of
within-subjects designs are sensitive only to the linear relationship
between X and Y, in the first case, and between pairs of treatment
conditions in the second case.  In contrast, the Treatments X Blocks
design is sensitive to any type of relationship between treatments and
blocks--not just linear.  As Winer puts it, the Treatments X Blocks design
is a function-free regression scheme (1971, p. 754).  This is a major
advantage of the Treatments X Blocks design.  In short, the Treatments X
Blocks design does not have restrictive assumptions and, for this reason,
is to be preferred for its relative freedom from statistical assumptions
underlying the data analysis.

-- 
Bruce Weaver
E-mail: [EMAIL PROTECTED]
Homepage:   http://www.angelfire.com/wv/bwhomedir/



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Introducing inference using the binomial (was: Student's t vs. z

2001-04-20 Thread Bruce Weaver


On 19 Apr 2001, Paul Swank wrote:

 I agree. I normally start inference by using the binomial and then
 then the normal approximation to the binomial for large n. It might be
 best to begin all graduate students with nonparametric statistics
 followed by linear models. Then we could get them to where they can do
 something interesting without taking four courses.
 
 
 
 At 01:28 PM 4/19/01 -0500, you wrote:
 
Why not introduce hypothesis testing in a binomial setting where there are
no nuisance parameters and p-values, power, alpha, beta,... may be obtained
easily and exactly from the Binomial distribution?

Jon Cryer


I concur with Jon and Paul.  (I'll refrain from making a crack about
Ringo.)  When I was an undergrad, the approach was z-test, t-test, ANOVA,
simple linear regression, and if there was time, a bit on tests for
categorical data (chi-squares) and rank-based tests.  I got great marks,
but came away with very little understanding of the logic of hypothesis
testing.

The stats class in 1st year grad school (psychology again) was different,
and it was there that I first started to feel like I was achieving some
understanding.  The first major chunk of the course was all about simple
rules of probability, and how we could use them to generate discrete
distributions, like the binomial.  Then, with a good understanding of
where the numbers came from, and with some understanding of conditional
probability etc, we went on to hypothesis testing in that context.  One
thing I found particularly beneficial was that we started with the case
where the sampling distribution could be specified under both the null and
alternative hypotheses.  This allowed us to calculate the likelihood
ratio, and to use a decision rule to minimize the overall probability of
error.  We could also talk about alpha, beta, and power in this simple
context.  Then we moved on to the more common case where the distribution
cannot be specified under the alternative hypothesis, and came up with a
different decision rule--i.e., one that controlled the level of alpha.  
The other thing I found useful was that all of this was without reference
to any of the standard statistical tests--although we found out that the
sign test was the same thing when we did get to our first test with a
proper name.  We followed that with the Wilcoxon signed ranks test and
Mann-Whitney U before ever getting to z- and t-tests.  By the time we got
to these, we already had a good understanding of the logic:  Calculate a
statistic, and see where it lies in its sampling distribution under a true
null hypothesis.

An undergrad text that takes a similar approach (in terms of order of
topics) is Understanding Statistics in the Behavioral Sciences, by Robert
R. Pagano.  Not only is the ordering of topics good, but the explanations
are generally quite clear.  I would certainly use Pagano's book again (and
supplement certain sections with my own notes) for a psych-stats class.

-- 
Bruce Weaver
New e-mail: [EMAIL PROTECTED] (formerly [EMAIL PROTECTED]) 
Homepage:   http://www.angelfire.com/wv/bwhomedir/



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: The meaning of the p value

2001-01-31 Thread Bruce Weaver

On 30 Jan 2001, Will Hopkins wrote:

-- 8 ---

 I haven't followed this thread closely, but I would like to state the 
 only valid and useful interpretation of the p value that I know.  If 
 you observe a positive effect, then p/2 is the probability that the 
 true value of the effect is negative.  Equivalently, 1-p/2 is the 
 probability that the true value is positive.
 
 The probability that the null hypothesis is true is exactly 0.  The 
 probability that it is false is exactly 1.


Suppose you were conducting a test with someone who claimed to have ESP,
such that they were able to predict accurately which card would be turned
up next from a well-shuffled deck of cards.  The null hypothesis, I think, 
would be that the person does not have ESP.  Is this null false? 

And what about when one has a one-tailed alternative hypothesis, e.g., mu 
 100.  In this case, the null covers a whole range of values (mu  or = 
100).  Is this null false?  In such a case, one still uses the point null 
(mu = 100) for testing, because it is the most extreme case. If you can 
reject the point null of mu=100, you will certainly be able to reject the 
null if mu is actually some value less than 100.  But the point is, the 
null can be true.  

With a two-tailed alternative, the point null may not be true, but as one
of the regulars in these newsgroups often points out, we don't know the
direction of the difference.  So again, it makes sense to use the point 
null for testing purposes.


 Estimation is the name of the game.  Hypothesis testing belongs in 
 another century--the 20th.  Unless, that is, you base hypotheses not 
 on the null effect but on trivial effects...


Bob Frick has a paper with some interesting comments on this in the
context of experimental psychology.  In that context, he argues, models
that make "ordinal" predictions are more useful than ones that try to
estimate effect sizes, and certainly more generalizable.  (An ordinal
prediction is something like performance will be impaired in condtion B
relative to condition A.  Impairment might be indicated by slower
responding and more errors, for example.)

A lot of cognitive psychologists use reaction time as their primary DV. 
But note that they are NOT primarily interested in explaining all (or as
much as they can) of the variation in reaction time.  RT is just a tool
they use to make inferences about some underlying construct that really
interests them.  Usually, they are trying to test some theory which leads
them to expect slower responding in one condition relative to another, for
example--such as slower responding when distractors are present compared
to when only a target item appears.  The difference between these
conditions almost certainly will explain next to none of the overall
variation in RT, so eta-squared and omega-squared measures will not be
very impressive looking.  But that's fine, because the whole point is to
test the ordinal prediction of the theory--not to explain all of the
variation in RT.  If one was able to measure the underlying construct
directly, THEN it might make some sense to try estimating parameters.  But
with indirect measurements like RT, I think Frick's recommended approach
is a better one. 

There's my two cents.
-- 
Bruce Weaver
New e-mail: [EMAIL PROTECTED] (formerly [EMAIL PROTECTED]) 
Homepage:   http://www.angelfire.com/wv/bwhomedir/


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Odd description of LSD approach to multiple comparisons

2000-10-19 Thread Bruce Weaver

On 18 Oct 2000, Karl L. Wuensch wrote:

 I suggest that we not use the phrase "LSD" to describe the "protected t
 test," or "Fisher's procedure" (the procedure that requires having first
 obtained a significant omnibus ANOVA effect).  After all, one can compute a
 "least significant difference" (between means to be "significant" at an
 adjusted criterion of significance) for any of the paranoid alpha-adjustment
 procedures:  Fisher's, Bonferroni, Tukey a or b, Newman-Keuls, REGWQ, etc.


You are absolutely right, Karl.  But we can't revise all of the textbooks
that are already out there.  When our students pull books off the shelf in
the library, they are going to find references to the "LSD"  method of
multiple comparisons.  And MOST of the time, this will be referring to
Fisher's protected t.  The Kleinbaum et al book is the first I've seen
where it does not. 

Cheers,
Bruce



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: I need help!!! SPSS and Panel Data

2000-07-03 Thread Bruce Weaver

On Sun, 2 Jul 2000 [EMAIL PROTECTED] wrote:

 Help!
  I'm a Norwegian student who can't figure out how
 to work SPSS 9.0 properly for running a multiple
 regression on panel data (longitudinal data or
 cross-sectional time-series data). My data set
 consist of financial data from about 300 Norw.
 municipalities. For each municipality I have
 observations for 7 fiscal years. My problem is
 that I don't know how to "tell" SPSS that the
 cases are grouped 7 by 7, i.e that they are panel
 data.
 Can somebody please help me!
 
 Ketil Pedersen
 

Hi Ketil,
I'm not familiar with time series terminology, but if I followed 
you, you have a data file that looks something like this:


MUNICIP  YEAR   Y
  1   1   
  1   2 
  1   3   
etc
  1   7  
  2   1  
  2   2  
  2   3  
etc
  2   7  
  3   1
  3   1
etc
  3   7
etc


I think you may have one or more "between-groups" variables too, but
wasn't sure about this.  Anyway, if this is more or less accurate, then I
think you would find it easier to use UNIANOVA rather than REGRESSION.  In
the pulldown menus, you find it under GLM--Univariate, I think.  Here's
an example of some syntax for the data shown above with SIZE included as a
between-municipalities variable: 

UNIANOVA
  y  BY municip year size
  /RANDOM = municip
  /METHOD = SSTYPE(3)
  /INTERCEPT = INCLUDE
  /EMMEANS = TABLES(year)
  /EMMEANS = TABLES(size)
  /EMMEANS = TABLES(year*size)
  /CRITERIA = ALPHA(.05)
  /print = etasq
  /plot = resid
  /DESIGN = size municip(size)
year year*size .


Note that municip is a random factor here (i.e., it is treated the same
way Subjects are usually treated).  And the notation "municip(size)" 
indicates that municip is nested in the size groups.  The output from this
syntax will give you an F-test for size with municip(size) as the error
term; and for the year and year*size F-tests, the error term (called
"residual") will be Year*municip(size), because that's all that is left
over. 

You can get the same F-tests using REGRESSION, but not as easily.  For 
one thing, you have to compute your own dummy variables for MUNICIP and 
YEAR; and if you have a mixed design (between- and within-municipalities 
variables), you pretty much have to do two separate analyses, as far as I 
can tell.

Hope this helps.
-- 
Bruce Weaver
[EMAIL PROTECTED]
http://www.angelfire.com/wv/bwhomedir/




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Repeated Measures ANOVA

2000-06-13 Thread Bruce Weaver

On Tue, 13 Jun 2000 [EMAIL PROTECTED] wrote:

 Hi.
 
 I have conducted an experiment with 4 within subject variables.
 1) Colour
 2) Shape
 3) Pattern
 4) Movement
 
 Each of these 4 factors have 2 levels so each subject would be exposed
 to 16 conditions in total. However, I have made each subject do 10
 replications per condition and I have 10 subjects so I have a total of
 1600 data points.
 
 I have tried using SPSS repeated measures in GLM to analyse my data but
 I don't know how to include my replications. SPSS requires that I
 select 16 columns of dependant variables each representing a
 combination of my factors. However, I am only allowed one row per
 subject, so how do I input the 10 replications that each subject
 performed for each combination?
 
 Thanks !
 
 Alfred
 

Hi Alfred,
You might be better off using UNIANOVA for this analysis instead 
of GLM.  For example, here's the GLM syntax for a mixed-design (A and B as 
between subjects variables; C and D within-subjects):

GLM
  c1d1 c1d2 c2d1 c2d2 c3d1 c3d2 BY a b
  /WSFACTOR = c 3 Polynomial d 2 Polynomial
  /METHOD = SSTYPE(3)
  /CRITERIA = ALPHA(.05)
  /WSDESIGN = c d c*d
  /DESIGN = a b a*b .

This analysis required the 6 repeated meaures (3*2) to be strung out
across one row for each subject.  But I was able to produce exactly the
same results using 6 rows per subject (one for each of the c*d
combinations) and the following syntax: 

UNIANOVA
  y  BY subj a b c d
  /RANDOM = subj
  /METHOD = SSTYPE(3)
  /INTERCEPT = INCLUDE
  /EMMEANS = TABLES(OVERALL)
  /EMMEANS = TABLES(a)
  /EMMEANS = TABLES(b)
  /EMMEANS = TABLES(c)
  /EMMEANS = TABLES(d)
  /CRITERIA = ALPHA(.05)
  /DESIGN = a b a*b subj(a*b) 
c c*a c*b c*a*b  c*subj(a*b)
d d*a d*b d*a*b  d*subj(a*b)
c*d c*d*a c*d*b c*d*a*b  c*d*subj(a*b).

Note that SUBJ is now listed explicitly as one of the variables.  And you 
must explicitly list each of the error terms for within-subjects 
effects.  If you do not list these error terms, a pooled error term is 
used for tests of the within-subjects effects.  Finally, note as well 
that SUBJ appears on the /Random line; and the nesting of subjects within 
a*b cells is indicated as subj(a*b).

I haven't tried this with a completely within-subjects design.  But if you
let y=DV a=colour b=shape c=pattern d=movement e = repetition (as
suggested by Donald Burril), your syntax should look something like this,
I think: 

UNIANOVA
  y  BY subj a b c d e
  /RANDOM = subj e
  /METHOD = SSTYPE(3)
  /INTERCEPT = INCLUDE
  /EMMEANS = TABLES(a)
  /EMMEANS = TABLES(b)
  /EMMEANS = TABLES(c)
  /EMMEANS = TABLES(d)
  /EMMEANS = TABLES(d)
  /CRITERIA = ALPHA(.05)
  /DESIGN = a a*subj
b b*subj
c c*subj
d d*subj
e e*subj
a*b a*b*subj
a*c a*c*subj
etc...
a*b*c*d*e a*b*c*d*e*subj .

Your data file would have 2*2*2*2*10 = 160 rows per subject with variables
that code for a-e and another for the DV. 

Hope this helps.
Cheers,
Bruce
-- 
Bruce Weaver
[EMAIL PROTECTED]
http://www.angelfire.com/wv/bwhomedir/




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: SPSS GLM - between * within factor interactions

2000-05-09 Thread Bruce Weaver

On Tue, 9 May 2000, Johannes Hartig wrote:

 I have tried modifying the syntax, but I'm not getting any further.
 The within- and between-subject effects are defined seperately
 in /WSDESIGN and /DESIGN, and mixing them only gives me
 cryptic error messages. Could it be possible to customize within *
 between interactions with /LMATRIX or /KMATRIX? I am
 checking already the syntax guide, but no success so far :(
 Thanks for any advice,
 Johannes
 

How about generating your own dummy variables for the various main
effects and interactions of interest (including dummy variables for
subject), and using REGRESSION instead of GLM repeated measures?  You can
use the /TEST subcommand to compare the full model to various reduced
models to produce tests for the main effects and interactions of
interest.  For a between-within design, subject will be nested in the
between subjects variables, so I think you'll have to enter those
between subjects variables on one step, and the dummy variables for
subject on the next step.  (If you enter the dummy variables for subject
first, you won't be able to enter the between Ss variables, because
they'll provide no further information.  It would be like entering codes
for City, and then trying to enter codes for country:  Once you know the
city, you already know country.)

Good luck.
Bruce



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: SPSS GLM - between * within factor interactions

2000-05-08 Thread Bruce Weaver

On Mon, 8 May 2000, Johannes Hartig wrote:

  Click on the Model box in the pull-down menu.  The default model
  is the full-factorial, but you can opt for other custom models with only
  the effects you are interested in.
 
 Thnks for your answer, but - I can't! - or am I missing something obviuos?
 I only can customize within- and between-factor effects seperately, _not_
 interactions between both. WHY?
 
 Johannes
 


Sorry Johannes, I didn't know that.  I wonder if this is a peculiarity of
using the GUI.  Have you tried pasting the syntax, and then modifying it
to include only the interactions of interest?  It probably won't work 
that way either, but it's worth a try.  

Bruce



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing

2000-04-17 Thread Bruce Weaver



On 15 Apr 2000, Donald F. Burrill wrote:

 (2) My second objection is that if the positive-discrete 
   probability is retained for the value "0" (or whatever value the former 
   "no" is held to represent), the distribution of the observed quantity 
   cannot be one of the standard distributions.  (In particular, it is not 
   normal.)  One then has no basis for asserting the probability of error 
   in rejecting the null hypothesis (at least, not by invoking the standard 
   distributions, as computers do, or the standard tables, as humans do 
   when they aren't relying on computers).  Presumably one could derive the 
   sampling distribution in enough detail to handle simple problems, but 
   that still looks like a lot more work than one can imagine most 
   investigators -- psychologists, say -- cheerfully undertaking.
  
  This would not be a problem if the alternative was one-tailed, would it?
 
 Sorry, Bruce, I do not see your point.  How does 1-tailed vs. 2-tailed 
 make a difference in whatever the underlying probability distribution is? 
 

Donald,
It was clear at the time, but now I'm not sure if I can see my
point either!  I think what I was driving at was the idea that a point
null hypothesis is often false a priori.  But if you have a one-tailed
alternative, then you don't have a point null, because the null
encompasses a whole range of values.  For example, if your alternative is
that a treatment improves performance, then the null states that
performance remains the same or worsens as a result of the treatment.  It
seems that this kind of null hypothesis certainly can be true.  And I
think it is perfectly legitimate to use the appropriate continuous
distribution (e.g., t-distribution) in carrying out a test.  Or am I
missing something? 

Cheers,
Bruce



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Nonpar Repeated Measures

2000-04-14 Thread Bruce Weaver

On Thu, 13 Apr 2000, Rich Ulrich wrote:

 On Thu, 13 Apr 2000 11:53:05 GMT, Chuck Cleland [EMAIL PROTECTED]
 wrote:
 
I have an ordinal response variable measured at four different times
  as well as a 3 level between subjects factor.  I looked at the time
  main effect with the Friedman Two-Way Analysis of Variance by Ranks. 
  That effect was statistically significant and was followed up by
  single df comparisons of time one with each of the three other time
  points (Siegel and Castellan, 1988, pp. 181-183).
I would like bring in the between subjects factor now as I expect an
  interaction between this factor and the time effect.  Could anyone
  suggest ways of doing this with the ordinal (0 to 3) response
  variable?  I have already looked at the simple main effect of time
  within each group with the Friedman test, but I would like to test the
  interaction.
 
 An "ordinal (0 to 3) response variable"  has to give you a WHOLE lot
 of ties.  (As I have posted before,) For simple analyses, forcing the
 rank-transformation is morely to do harm than good when you start with
 just a few ordinal categories.  Using the scores of 0-3 or using some
 other rational scoring, you can probably be quite safe in doing the
 two-way ANOVA -- safer, I suspect, than anything you can do with
 ranking as the first step.
 

Good point Rich.  I didn't think about ties.  If the ordinal data are
generated by having people rank order objects, you could avoid completely
ties by simply disallowing tied ranks.  But in the situation Chuck
described (time as the repeated measure), there may well be a LOT of 
ties, as you say.

Cheers,
Bruce
-- 
Bruce Weaver
[EMAIL PROTECTED]
http://www.angelfire.com/wv/bwhomedir/




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing

2000-04-12 Thread Bruce Weaver

On 11 Apr 2000, Donald F. Burrill wrote:

 On Mon, 10 Apr 2000, Bruce Weaver wrote in part, quoting Bob Frick:
 
-- 8 ---

  start quote
  To put this argument another way, suppose the question is whether one 
  variable influences another.  This is a discrete probability space with 
  only two answers: yes or no.  Therefore, it is natural that both 
  answers receive a nonzero probability. 
 
 It may be (or seem) "natural";  that doesn't mean that it's so, 
 especially in view of the subsequent refinement:
 
  Now suppose the question is changed into 
  one concerning the size of the effect.  This creates a continuous 
  probability space, with the possible answer being any of an infinite 
  number of real numbers and each one of these real numbers receiving an 
  essentially zero probability.  A natural tendency is to include 0 in this 
  continuous probability space and assign it an essentially zero 
  probability.  However, the "no" answer, which corresponds to a size of 
  zero, does not change probability just because the question is phrased 
  differently.  Therefore, it still has its nonzero probability; only the 
  nonzero probability of the "yes" answer is spread over the real numbers.
  end quote
 
 To this I have two objections:  (1) It is not clear that the "no" answer 
 "does not change probability ...", as Bob puts it.  If the question is 
 one that makes sense in a continuous probability space, it is entirely 
 possible (and indeed more usual than not, one would expect) that 
 constraining it to a two-value discrete situation ("yes" vs. "no") may 
 have entailed condensing a range of what one might call "small" values 
 onto the answer "no".  That is, the question may already, and perhaps 
 unconsciously, have been "coarsened" to permit the discrete expression 
 of the question with which Bob started.

I see your point.  But one of the examples Frick gives concerns the
existence of ESP.  In the discrete space, it does or does not exist.  For
this particular example, I think one could justify using a 1-tailed test
when moving to the continous space; and so the null hypothesis would
encompass "less than or equal to 0", and the alternative "greater than 0". 
It seems to me that with a one-tailed alternative like this, the null
hypothesis can certainly be true.  


   (2) My second objection is that if the positive-discrete 
 probability is retained for the value "0" (or whatever value the former 
 "no" is held to represent), the distribution of the observed quantity 
 cannot be one of the standard distributions.  (In particular, it is not 
 normal.)  One then has no basis for asserting the probability of error 
 in rejecting the null hypothesis (at least, not by invoking the standard 
 distributions, as computers do, or the standard tables, as humans do 
 when they aren't relying on computers).  Presumably one could derive the 
 sampling distribution in enough detail to handle simple problems, but 
 that still looks like a lot more work than one can imagine most 
 investigators -- psychologists, say -- cheerfully undertaking.

This would not be a problem if the alternative was one-tailed, would it?

Cheers,
Bruce
-- 
Bruce Weaver
[EMAIL PROTECTED]
http://www.angelfire.com/wv/bwhomedir/




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing

2000-04-10 Thread Bruce Weaver

On 7 Apr 2000, dennis roberts wrote:

 i was not suggesting taking away from our arsenal of tricks ... but, since 
 i was one of those old guys too ... i am wondering if we were mostly lead 
 astray ...?
 
 the more i work with statistical methods, the less i see any meaningful (at 
 the level of dominance that we see it) applications of hypothesis testing ...
 
 here is a typical problem ... and we teach students this!
 
 1. we design a new treatment
 2. we do an experiment
 3. our null hypothesis is that both 'methods', new and old, produce the 
 same results
 4. we WANT to reject the null (especially if OUR method is better!)
 5. we DO a two sample t test (our t was 2.98 with 60 df)  and reject the 
 null ... and in our favor!
 6. what has this told us?
 
 if this is ALL you do ... what it has told you AT BEST is that ... the 
 methods probably are not the same ... but, is that the question of interest 
 to us?
 
 no ... the real question is: how much difference is there in the two methods?
-- 8 ---

In one of his papers, Bob Frick has argues very persuasively that very
often (in experimental psychology, at least), this is NOT the real
question at all.  I think that is especially the case when you are testing
theories.  Suppose, for example that my theory of selective attention
posits that inhibition of the internal representations of distracting
items is an important mechanism of selection.  This idea has been testing
in so-called "negative priming" experiments.  (Negative priming refers to
the fact that subjects respond more slowly to an item that was previously
ignored, or is semantically related to a previously ignored item, than
they do to a novel item.) Negative priming is measured as a response time
difference between 2 conditions in an experiment.  The difference is
typically between about 20 and 40 milliseconds.  I think the important
thing to remember about this is that the researcher is not trying to
account for variability in response time per se, even though response time
is the dependent variable:  He or she is just using response time to
indirectly measure the object of real interest.  If one was trying to
account for overall variability in response time, the conditions of this
experiment would almost certainly not make the list of important
variables.  The researcher KNOWS that a lot of other things affect
response time, and some of them a LOT more than his experimental
conditions do.  However, because one is interested in testing a theory of
selective attention, this small difference between conditions is VERY
important, provided it is statistically significant (and there is
sufficient power);  and measures of effect size are not all that relevant. 

Just my 2 cents.
-- 
Bruce Weaver
[EMAIL PROTECTED]
http://www.angelfire.com/wv/bwhomedir/




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Combining 2x2 tables

2000-03-31 Thread Bruce Weaver

On Thu, 30 Mar 2000, JohnPeters wrote:

 Hi,
 I was wondering if someone could help me.  I am interested in combining
 2x2 tables from multiple studies.  The test used is the McNemar's
 chi-sq.  I have the raw data from each of these studies.  What is the
 proper correction that should be used when combining the results.
 Thanks!!!


Meta-analysis is a common way to combine information from 2x2 tables, but
I'm not sure how you would do this with McNemar's chi-square as your
measure of "effect size" for each table.  It might be possible if you
are willing to use something else. 

It's Friday afternoon, and this is off the top of my head, but here goes 
anyway.  I wonder if you could write the tables this way:

 Change
Yes   No
-ab
Before
+cd


Cell a:  change from - to +
Cell b:  no change, - before and after
Cell c:  change from + to -
Cell d:  no change, + before and after

Suppose we're talking about change in opinion after hearing a political
speech.  The odds ratio for this table would give you the odds of changing
from a negative to a positive oppion over the odds of changing from
positive to negative. If you're the speaker, you're hoping for an odds 
ratio greater than 1 (i.e., greater change in those who were negative 
before the speech).  If the amount of change is similar in both groups, 
the odds ratio will be about 1.  

If this is a legitimate way to analyze the data for one such table, and I 
can't see why not, then you could pool the tables meta-analytically with 
ln(OR) as your measure of effect size.  Here's a paper that describes how 
to go about it:

Fleiss, JL. (1993). The statistical basis of meta-analysis. Statistical 
Methods in Medical Research, 2, 121-145.

There are also free programs available for performing this kind of 
meta-analysis.  I have links to some in the statistics section of my 
homepage.

Hope this helps. Bruce
-- 
Bruce Weaver
[EMAIL PROTECTED]
http://www.angelfire.com/wv/bwhomedir/



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Normality parametric tests (WAS: Kruskal-Wallis equal va

2000-03-24 Thread Bruce Weaver

On Fri, 24 Mar 2000, Bernard Higgins wrote:

 
 
 Hi Bruce

Hello Bernard.

 
 The point I was making is that when developing hypothesis tests, 
 from a theoretical point of view, the sampling distribution of the 
 test statistic from which critical values or p-values etc are 
 obtained, is determined by the null hypothesis. We need a probability 
 model to enable use to determine how likely observed patterns are. 
 These probability models will often work well in practice even if we 
 relax the usual assumptions. When using distribution-free tests as 
 an alternative to a parametric test we may need to specify 
 restrictions in order that the tests can be considered "equivalent". 

Agreed.

 
 In my view the t-test is fairly robust and will work well in most 
 situations where the distribution is not too skewed, and constant 
 variance is reasonable. Indeed I have no problems in using it for the 
 majority of problems. When comparing two independent samples using 
 t-tests, lack of normality and constant variance are often not too 
 serious if the samples are of similar size, always a good idea in 
 planned experiments.

Agreed here too.

 
 As you say, when samples are fairly large, some say 30+ or even 
 less, the sampling distribution of the mean can often be approximated 
 by a normal distribution (Central Limit Theorem) and hence the use of 
 an (asymptotic) Z-test is frequently used. It would not, I think, be 
 strictly correct to call such a statistic t, although from a 
 practical point of view there may be little difference. The formal 
 definition of the single sample t-test is derived from the ratio of a 
 Standard Normal random variable to a Chi-squared random variable and 
 does, in theory, require independent observations from a normal 
 distribution.


I think we are no longer in complete agreement here.  I am not a 
mathematician, but for what it's worth, here is my understanding of t- 
and z-tests:

numerator = (statistic - parameter|H0)
denominator = SE(statistic)

test statistic = z if SE(statistic) is based on pop. SD
test statistic = t if SE(statistic) is based on sample SD

The most common 'statistics' in the numerator are Xbar and (Xbar1 - 
Xbar2); but others are certainly possible (e.g., for large-sample 
versions of rank-based tests).

An assumption of both tests is that the statistic in the numerator has a
sampling distribution that is normal.  This is where the CLT comes into
play:  It lays out the conditions under which the sampling distribution of
the statistic is approximately normal--and those conditions can vary
depending on what statistic you're talking about.  But having a normal
sampling distribution does not mean that we can or should use a critical
z-value rather than a critical t when the population variance is unknown
(which is what I thought you were suggesting).  

As you say, one can substitute critical z for critical t when n gets
larger, because the differences become negligible.  But nowadays, most of
us are using computer programs that give us more or less exact p-values
anyway, so this is less of an issue than it once was. 


Cheers,
Bruce
-- 
Bruce Weaver
[EMAIL PROTECTED]
http://www.angelfire.com/wv/bwhomedir/




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Multiple Comparison Correction in Multiple Regression

2000-03-17 Thread Bruce Weaver

On Fri, 17 Mar 2000, Rich Ulrich wrote:

-- 8 ---

  2) When performing a multiple linear regression we have performed partial
  f-tests with the sequential SS (Type I SS) to examine if a particular
  variable "should be added" to a simpler model.  If a series of these tests
  are used to find a parsimonious model that still fits should we correct for
  multiple comparisons?
 
 "Stepwise inclusion" is usually a bad idea.  See the comments in my
 stats-FAQ, and their references.  (If you are worried about correcting
 for multiple tests, then you probably *shouldn't*  add the variable
 because it is probably capitallizing on chance.)


Rich,
Is there not an important distinction to be made between the 
following situations:

1.  A computer algorithm determines (based on the magnitude of partial or 
semi-partial correlations) the order in which variables are entered or 
removed, and which ones end up in the final model

2.  The investigator determines a priori the order in which variables are 
to be entered or removed.


Some of my texbooks refer to situation 1 as "stepwise"  regression and
situation 2 as "hierarchical" regression.  One is less likely to
capitalize on chance with hierarchical regression, I think, especially if
the decisions about order are theoretically motivated, and the number of
variables is not too large.  

Here's another observation that is relevant to this thread, I think.  When
one performs a 2-factor ANOVA, there are 3 independent F-tests:  one for
each main effect, and one for the intereaction.  One can arrive at these
same F-tests using the same regression model comparison approach that is
described above (e.g., compare the FULL regression model to one without
the AxB interaction to get F for the interaction term).  I don't think
I have EVER seen anyone correct for multiple comparisons in this case.

Cheers,
Bruce
-- 
Bruce Weaver
[EMAIL PROTECTED]
http://www.angelfire.com/wv/bwhomedir/


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: ANOVA causal direction

2000-02-11 Thread Bruce Weaver

On 10 Feb 2000, Richard M. Barton wrote:

 --- Alex Yu wrote:
 
 A statistical procedure alone cannot determine casual relationships. 
 ---
 
 
 Correct.  A lot depends on eye contact.
 
 rb


And also, at least 2 statistical procedures are required...



===
  This list is open to everyone. Occasionally, people lacking respect
  for other members of the list send messages that are inappropriate
  or unrelated to the list's discussion topics. Please just delete the
  offensive email.

  For information concerning the list, please see the following web page:
  http://jse.stat.ncsu.edu/
===