RE: Analysis of covariance

2001-10-02 Thread Bruce Weaver


On 27 Sep 2001, Paul R. Swank wrote:

> Some years ago I did a simulation on the pretest-posttest control group
> design lokking at three methods of analysis, ANCOVA, repeated measures
> ANOVA, and treatment by block factorial ANOVA (blocking on the pretest using
> a median split). I found that that with typical sample sizes, the repeated
> measures ANOVA was a bit more powerful than the ANCOVA procedure when the
> correlation between pretest and posttest was fairly high (say .90). As noted
> below, this is because the ANCOVA and ANOVA methods are approaching the same
> solution but ANCOVA loses a degree of freedom estimating the regression
> parameter when the ANOVA doesn't. Of course this effect diminshes as the
> sample size gets larger because the loss of one df is diminished. On the
> other hand, the treatment by block design tends to have a bit more power
> when the correlation between pretest and posttest is low (< .30). I tried to
> publish the results at the time but aimed a bit too high and received such a
> scathing review (what kind of idiot would do this kind of study?) that I
> shoved it a drawer and it has never seen the light of day since. I did the
> syudy because it seemed at the time that everyone was using this design but
> were unsure of the analysis and I thought a demonstration would be helpful.
> SO, to make a long story even longer, the ANCOVA seems to be most powerful
> in those circumstances one is likely to run into but does have somewhat
> rigid assumptions about homogeneity of regression slopes. Of course the
> repeated measures ANOVA indirectly makes the same assumption but at such
> high correlations, this is really a homogenity of variance issue as well.
> The second thought is for you reviewers out there trying to soothe your own
> egos by dumping on someone else's. Remember, the researcher you squelch
> today might be turned off to research and fail to solve a meaty problem
> tomorrow.
>
> Paul R. Swank, Ph.D.
> Professor
> Developmental Pediatrics
> UT Houston Health Science Center
>

Paul's post reminded me of something I read in Keppel's Design and
Analysis.  Here's an excerpt from my notes on ANCOVA:


Keppel (1982, p. 512) says:

If the choice is between blocking and the analysis of covariance, Feldt
(1958) has shown that blocking is more precise when the correlation
between the covariate and the dependent variable is less than .4, while
the analysis of covariance is more precise with correlations greater than
.6.  Since we rarely obtain correlations of this latter magnitude in the
behavioral sciences, we will not find a unique advantage in the analysis
of covariance in most research applications.

Keppel (1982, p. 513) also prefers the Treatments X Blocks design
to ANCOVA on the grounds that the underlying assumptions are less
stringent:

Both within-subjects designs and analyses of covariance require a number
of specialized statistical assumptions.  With the former, homogeneity of
between treatment differences and the absence of differential carryover
effects are assumptions that are critical for an unambiguous
interpretation of the results of an experiment.  With the latter, the most
stringent is the assumption of homogeneous within-group regression
coefficients.  Both the analysis of covariance and the analysis of
within-subjects designs are sensitive only to the linear relationship
between X and Y, in the first case, and between pairs of treatment
conditions in the second case.  In contrast, the Treatments X Blocks
design is sensitive to any type of relationship between treatments and
blocks--not just linear.  As Winer puts it, the Treatments X Blocks design
"is a function-free regression scheme" (1971, p. 754).  This is a major
advantage of the Treatments X Blocks design.  In short, the Treatments X
Blocks design does not have restrictive assumptions and, for this reason,
is to be preferred for its relative freedom from statistical assumptions
underlying the data analysis.

-- 
Bruce Weaver
E-mail: [EMAIL PROTECTED]
Homepage:   http://www.angelfire.com/wv/bwhomedir/



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: The meaning of the p value

2001-01-31 Thread Bruce Weaver

On 30 Jan 2001, Will Hopkins wrote:

-- >8 ---

> I haven't followed this thread closely, but I would like to state the 
> only valid and useful interpretation of the p value that I know.  If 
> you observe a positive effect, then p/2 is the probability that the 
> true value of the effect is negative.  Equivalently, 1-p/2 is the 
> probability that the true value is positive.
> 
> The probability that the null hypothesis is true is exactly 0.  The 
> probability that it is false is exactly 1.


Suppose you were conducting a test with someone who claimed to have ESP,
such that they were able to predict accurately which card would be turned
up next from a well-shuffled deck of cards.  The null hypothesis, I think, 
would be that the person does not have ESP.  Is this null false? 

And what about when one has a one-tailed alternative hypothesis, e.g., mu 
> 100.  In this case, the null covers a whole range of values (mu < or = 
100).  Is this null false?  In such a case, one still uses the point null 
(mu = 100) for testing, because it is the most extreme case. If you can 
reject the point null of mu=100, you will certainly be able to reject the 
null if mu is actually some value less than 100.  But the point is, the 
null can be true.  

With a two-tailed alternative, the point null may not be true, but as one
of the regulars in these newsgroups often points out, we don't know the
direction of the difference.  So again, it makes sense to use the point 
null for testing purposes.


> Estimation is the name of the game.  Hypothesis testing belongs in 
> another century--the 20th.  Unless, that is, you base hypotheses not 
> on the null effect but on trivial effects...


Bob Frick has a paper with some interesting comments on this in the
context of experimental psychology.  In that context, he argues, models
that make "ordinal" predictions are more useful than ones that try to
estimate effect sizes, and certainly more generalizable.  (An ordinal
prediction is something like performance will be impaired in condtion B
relative to condition A.  Impairment might be indicated by slower
responding and more errors, for example.)

A lot of cognitive psychologists use reaction time as their primary DV. 
But note that they are NOT primarily interested in explaining all (or as
much as they can) of the variation in reaction time.  RT is just a tool
they use to make inferences about some underlying construct that really
interests them.  Usually, they are trying to test some theory which leads
them to expect slower responding in one condition relative to another, for
example--such as slower responding when distractors are present compared
to when only a target item appears.  The difference between these
conditions almost certainly will explain next to none of the overall
variation in RT, so eta-squared and omega-squared measures will not be
very impressive looking.  But that's fine, because the whole point is to
test the ordinal prediction of the theory--not to explain all of the
variation in RT.  If one was able to measure the underlying construct
directly, THEN it might make some sense to try estimating parameters.  But
with indirect measurements like RT, I think Frick's recommended approach
is a better one. 

There's my two cents.
-- 
Bruce Weaver
New e-mail: [EMAIL PROTECTED] (formerly [EMAIL PROTECTED]) 
Homepage:   http://www.angelfire.com/wv/bwhomedir/


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Normality assumption for ANOVA (was: Effect statistics for non-normality)

2001-01-19 Thread Bruce Weaver
s the CLT no longer apply because I've added a 3rd
population?  I think not.  Given large enough samples (and similarly
shaped populations with more or less equal variances), the F-statistic I
calucate can still be referred to the appropriate F-distribution, I should
think. 

By the way, other good examples are the large sample z-test versions of 
various non-parametric tests (e.g., Mann-Whitney U).  The important thing 
for those tests is that the sampling distrubution of the statistic (e.g., 
the sampling distribution of U) is normal when the numbers are large 
enough.  I don't recall ever seeing anyone claim that the underlying 
raw-score populations had to be normal.

Oops!  This rant ended up being a bit longer than I anticipated.  Looking 
forward to the comments of others.

Cheers,
-- 
Bruce Weaver
New e-mail: [EMAIL PROTECTED] (formerly [EMAIL PROTECTED]) 
Homepage:   http://www.angelfire.com/wv/bwhomedir/




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Odd description of LSD approach to multiple comparisons

2000-10-19 Thread Bruce Weaver

On 18 Oct 2000, Karl L. Wuensch wrote:

> I suggest that we not use the phrase "LSD" to describe the "protected t
> test," or "Fisher's procedure" (the procedure that requires having first
> obtained a significant omnibus ANOVA effect).  After all, one can compute a
> "least significant difference" (between means to be "significant" at an
> adjusted criterion of significance) for any of the paranoid alpha-adjustment
> procedures:  Fisher's, Bonferroni, Tukey a or b, Newman-Keuls, REGWQ, etc.


You are absolutely right, Karl.  But we can't revise all of the textbooks
that are already out there.  When our students pull books off the shelf in
the library, they are going to find references to the "LSD"  method of
multiple comparisons.  And MOST of the time, this will be referring to
Fisher's protected t.  The Kleinbaum et al book is the first I've seen
where it does not. 

Cheers,
Bruce



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Proper way to correct for multiple comparisons

2000-08-11 Thread Bruce Weaver

On Fri, 11 Aug 2000, jazz wrote:

> Hi, I'm not feeling confident about my method here, and would apprecaite
> it if somebody lets me know if I'm wrong, thanks.
> 
> I'm doing a 2x2 anova (type: logic, math)(difficulty: hard, easy). The
> hypothesis is that harder logic will produce a larger DV than easy logic,
> but this will not occur in math problems (which constitute a control). I
> found a typeXdifficulty interaction (p < .05). Now, I do a post-test
> comparing hard logic to easy logic and find an affect at .025 (p < .025).
> I do a similar post-test for hard and easy math and p > .025 so the hard
> math doesn't produce a significantly larger DV than easy math.
> 
> My reasoning is, I plan the two post-anova comparisons, so I divide my
> alpha .05 by two, to get the .025.
> 
> 
> Thank you for any advice.
> 
> Jim


Some authors would call your contrasts of easy and hard for logic and math
the "simple main effects" of difficulty.  Given that the interaction is 
significant, and that these contrasts are planned, I think most folks 
would be happy sticking with alpha = .05.  

-- 
Bruce Weaver
[EMAIL PROTECTED]
http://www.angelfire.com/wv/bwhomedir/



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: I need help!!! SPSS and Panel Data

2000-07-03 Thread Bruce Weaver

On Sun, 2 Jul 2000 [EMAIL PROTECTED] wrote:

> Help!
>  I'm a Norwegian student who can't figure out how
> to work SPSS 9.0 properly for running a multiple
> regression on panel data (longitudinal data or
> cross-sectional time-series data). My data set
> consist of financial data from about 300 Norw.
> municipalities. For each municipality I have
> observations for 7 fiscal years. My problem is
> that I don't know how to "tell" SPSS that the
> cases are grouped 7 by 7, i.e that they are panel
> data.
> Can somebody please help me!
> 
> Ketil Pedersen
> 

Hi Ketil,
I'm not familiar with time series terminology, but if I followed 
you, you have a data file that looks something like this:


MUNICIP  YEAR   Y
  1   1   
  1   2 
  1   3   
etc
  1   7  
  2   1  
  2   2  
  2   3  
etc
  2   7  
  3   1
  3   1
etc
  3   7
etc


I think you may have one or more "between-groups" variables too, but
wasn't sure about this.  Anyway, if this is more or less accurate, then I
think you would find it easier to use UNIANOVA rather than REGRESSION.  In
the pulldown menus, you find it under GLM-->Univariate, I think.  Here's
an example of some syntax for the data shown above with SIZE included as a
between-municipalities variable: 

UNIANOVA
  y  BY municip year size
  /RANDOM = municip
  /METHOD = SSTYPE(3)
  /INTERCEPT = INCLUDE
  /EMMEANS = TABLES(year)
  /EMMEANS = TABLES(size)
  /EMMEANS = TABLES(year*size)
  /CRITERIA = ALPHA(.05)
  /print = etasq
  /plot = resid
  /DESIGN = size municip(size)
year year*size .


Note that municip is a random factor here (i.e., it is treated the same
way Subjects are usually treated).  And the notation "municip(size)" 
indicates that municip is nested in the size groups.  The output from this
syntax will give you an F-test for size with municip(size) as the error
term; and for the year and year*size F-tests, the error term (called
"residual") will be Year*municip(size), because that's all that is left
over. 

You can get the same F-tests using REGRESSION, but not as easily.  For 
one thing, you have to compute your own dummy variables for MUNICIP and 
YEAR; and if you have a mixed design (between- and within-municipalities 
variables), you pretty much have to do two separate analyses, as far as I 
can tell.

Hope this helps.
-- 
Bruce Weaver
[EMAIL PROTECTED]
http://www.angelfire.com/wv/bwhomedir/




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Repeated Measures ANOVA

2000-06-13 Thread Bruce Weaver

On Tue, 13 Jun 2000 [EMAIL PROTECTED] wrote:

> Hi.
> 
> I have conducted an experiment with 4 within subject variables.
> 1) Colour
> 2) Shape
> 3) Pattern
> 4) Movement
> 
> Each of these 4 factors have 2 levels so each subject would be exposed
> to 16 conditions in total. However, I have made each subject do 10
> replications per condition and I have 10 subjects so I have a total of
> 1600 data points.
> 
> I have tried using SPSS repeated measures in GLM to analyse my data but
> I don't know how to include my replications. SPSS requires that I
> select 16 columns of dependant variables each representing a
> combination of my factors. However, I am only allowed one row per
> subject, so how do I input the 10 replications that each subject
> performed for each combination?
> 
> Thanks !
> 
> Alfred
> 

Hi Alfred,
You might be better off using UNIANOVA for this analysis instead 
of GLM.  For example, here's the GLM syntax for a mixed-design (A and B as 
between subjects variables; C and D within-subjects):

GLM
  c1d1 c1d2 c2d1 c2d2 c3d1 c3d2 BY a b
  /WSFACTOR = c 3 Polynomial d 2 Polynomial
  /METHOD = SSTYPE(3)
  /CRITERIA = ALPHA(.05)
  /WSDESIGN = c d c*d
  /DESIGN = a b a*b .

This analysis required the 6 repeated meaures (3*2) to be strung out
across one row for each subject.  But I was able to produce exactly the
same results using 6 rows per subject (one for each of the c*d
combinations) and the following syntax: 

UNIANOVA
  y  BY subj a b c d
  /RANDOM = subj
  /METHOD = SSTYPE(3)
  /INTERCEPT = INCLUDE
  /EMMEANS = TABLES(OVERALL)
  /EMMEANS = TABLES(a)
  /EMMEANS = TABLES(b)
  /EMMEANS = TABLES(c)
  /EMMEANS = TABLES(d)
  /CRITERIA = ALPHA(.05)
  /DESIGN = a b a*b subj(a*b) 
c c*a c*b c*a*b  c*subj(a*b)
d d*a d*b d*a*b  d*subj(a*b)
c*d c*d*a c*d*b c*d*a*b  c*d*subj(a*b).

Note that SUBJ is now listed explicitly as one of the variables.  And you 
must explicitly list each of the error terms for within-subjects 
effects.  If you do not list these error terms, a pooled error term is 
used for tests of the within-subjects effects.  Finally, note as well 
that SUBJ appears on the /Random line; and the nesting of subjects within 
a*b cells is indicated as subj(a*b).

I haven't tried this with a completely within-subjects design.  But if you
let y=DV a=colour b=shape c=pattern d=movement e = repetition (as
suggested by Donald Burril), your syntax should look something like this,
I think: 

UNIANOVA
  y  BY subj a b c d e
  /RANDOM = subj e
  /METHOD = SSTYPE(3)
  /INTERCEPT = INCLUDE
  /EMMEANS = TABLES(a)
  /EMMEANS = TABLES(b)
  /EMMEANS = TABLES(c)
  /EMMEANS = TABLES(d)
  /EMMEANS = TABLES(d)
  /CRITERIA = ALPHA(.05)
  /DESIGN = a a*subj
b b*subj
c c*subj
d d*subj
e e*subj
a*b a*b*subj
a*c a*c*subj
etc...
a*b*c*d*e a*b*c*d*e*subj .

Your data file would have 2*2*2*2*10 = 160 rows per subject with variables
that code for a-e and another for the DV. 

Hope this helps.
Cheers,
Bruce
-- 
Bruce Weaver
[EMAIL PROTECTED]
http://www.angelfire.com/wv/bwhomedir/




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: SPSS GLM - between * within factor interactions

2000-05-09 Thread Bruce Weaver

On Tue, 9 May 2000, Johannes Hartig wrote:

> I have tried modifying the syntax, but I'm not getting any further.
> The within- and between-subject effects are defined seperately
> in /WSDESIGN and /DESIGN, and mixing them only gives me
> cryptic error messages. Could it be possible to customize within *
> between interactions with /LMATRIX or /KMATRIX? I am
> checking already the syntax guide, but no success so far :(
> Thanks for any advice,
> Johannes
> 

How about generating your own dummy variables for the various main
effects and interactions of interest (including dummy variables for
subject), and using REGRESSION instead of GLM repeated measures?  You can
use the /TEST subcommand to compare the full model to various reduced
models to produce tests for the main effects and interactions of
interest.  For a between-within design, subject will be nested in the
between subjects variables, so I think you'll have to enter those
between subjects variables on one step, and the dummy variables for
subject on the next step.  (If you enter the dummy variables for subject
first, you won't be able to enter the between Ss variables, because
they'll provide no further information.  It would be like entering codes
for City, and then trying to enter codes for country:  Once you know the
city, you already know country.)

Good luck.
Bruce



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: SPSS GLM - between * within factor interactions

2000-05-08 Thread Bruce Weaver

On Mon, 8 May 2000, Johannes Hartig wrote:

> > Click on the Model box in the pull-down menu.  The default model
> > is the full-factorial, but you can opt for other custom models with only
> > the effects you are interested in.
> 
> Thnks for your answer, but - I can't! - or am I missing something obviuos?
> I only can customize within- and between-factor effects seperately, _not_
> interactions between both. WHY?
> 
> Johannes
> 


Sorry Johannes, I didn't know that.  I wonder if this is a peculiarity of
using the GUI.  Have you tried pasting the syntax, and then modifying it
to include only the interactions of interest?  It probably won't work 
that way either, but it's worth a try.  

Bruce



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing

2000-04-17 Thread Bruce Weaver



On 15 Apr 2000, Donald F. Burrill wrote:

> > >   (2) My second objection is that if the positive-discrete 
> > > probability is retained for the value "0" (or whatever value the former 
> > > "no" is held to represent), the distribution of the observed quantity 
> > > cannot be one of the standard distributions.  (In particular, it is not 
> > > normal.)  One then has no basis for asserting the probability of error 
> > > in rejecting the null hypothesis (at least, not by invoking the standard 
> > > distributions, as computers do, or the standard tables, as humans do 
> > > when they aren't relying on computers).  Presumably one could derive the 
> > > sampling distribution in enough detail to handle simple problems, but 
> > > that still looks like a lot more work than one can imagine most 
> > > investigators -- psychologists, say -- cheerfully undertaking.
> > 
> > This would not be a problem if the alternative was one-tailed, would it?
> 
> Sorry, Bruce, I do not see your point.  How does 1-tailed vs. 2-tailed 
> make a difference in whatever the underlying probability distribution is? 
> 

Donald,
It was clear at the time, but now I'm not sure if I can see my
point either!  I think what I was driving at was the idea that a point
null hypothesis is often false a priori.  But if you have a one-tailed
alternative, then you don't have a point null, because the null
encompasses a whole range of values.  For example, if your alternative is
that a treatment improves performance, then the null states that
performance remains the same or worsens as a result of the treatment.  It
seems that this kind of null hypothesis certainly can be true.  And I
think it is perfectly legitimate to use the appropriate continuous
distribution (e.g., t-distribution) in carrying out a test.  Or am I
missing something? 

Cheers,
Bruce



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Nonpar Repeated Measures

2000-04-14 Thread Bruce Weaver

On Thu, 13 Apr 2000, Rich Ulrich wrote:

> On Thu, 13 Apr 2000 11:53:05 GMT, Chuck Cleland <[EMAIL PROTECTED]>
> wrote:
> 
> >   I have an ordinal response variable measured at four different times
> > as well as a 3 level between subjects factor.  I looked at the time
> > main effect with the Friedman Two-Way Analysis of Variance by Ranks. 
> > That effect was statistically significant and was followed up by
> > single df comparisons of time one with each of the three other time
> > points (Siegel and Castellan, 1988, pp. 181-183).
> >   I would like bring in the between subjects factor now as I expect an
> > interaction between this factor and the time effect.  Could anyone
> > suggest ways of doing this with the ordinal (0 to 3) response
> > variable?  I have already looked at the simple main effect of time
> > within each group with the Friedman test, but I would like to test the
> > interaction.
> 
> An "ordinal (0 to 3) response variable"  has to give you a WHOLE lot
> of ties.  (As I have posted before,) For simple analyses, forcing the
> rank-transformation is morely to do harm than good when you start with
> just a few ordinal categories.  Using the scores of 0-3 or using some
> other rational scoring, you can probably be quite safe in doing the
> two-way ANOVA -- safer, I suspect, than anything you can do with
> ranking as the first step.
> 

Good point Rich.  I didn't think about ties.  If the ordinal data are
generated by having people rank order objects, you could avoid completely
ties by simply disallowing tied ranks.  But in the situation Chuck
described (time as the repeated measure), there may well be a LOT of 
ties, as you say.

Cheers,
Bruce
-- 
Bruce Weaver
[EMAIL PROTECTED]
http://www.angelfire.com/wv/bwhomedir/




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Nonpar Repeated Measures

2000-04-13 Thread Bruce Weaver

On Thu, 13 Apr 2000, Chuck Cleland wrote:

> Hello:
>   I have an ordinal response variable measured at four different times
> as well as a 3 level between subjects factor.  I looked at the time
> main effect with the Friedman Two-Way Analysis of Variance by Ranks. 
> That effect was statistically significant and was followed up by
> single df comparisons of time one with each of the three other time
> points (Siegel and Castellan, 1988, pp. 181-183).
>   I would like bring in the between subjects factor now as I expect an
> interaction between this factor and the time effect.  Could anyone
> suggest ways of doing this with the ordinal (0 to 3) response
> variable?  I have already looked at the simple main effect of time
> within each group with the Friedman test, but I would like to test the
> interaction.
> 
> thanks,
> 
> Chuck 
>  


Chuck,
There is a thread from a year or 2 ago on this topid.  Search for
"Nonparametric test for mixed model" at www.deja.com/usenet. 

Cheers,
Bruce
-- 
Bruce Weaver
[EMAIL PROTECTED]
http://www.angelfire.com/wv/bwhomedir/




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing

2000-04-12 Thread Bruce Weaver

On 12 Apr 2000, Herman Rubin wrote:

> >I have often wondered if an integrated course/course sequence might not be
> >better.
> 
> A course sequence of a rather different kind is definitely
> in order.  It would be at least three courses.
> 
> The first course would be a general probability only course,
> with the emphasis on understanding probability, not in carrying
> out computations.  This has nothing to do with the discipline
> of the individual student, although the level should be such
> that it uses as much mathematics as the student is going to know.
> One might, at this stage, introduce the ideas of statistical
> decision making, but most will need a full course in probability
> first to understand probability well enough to use it in any
> sensible manner.  If probability is presented as merely the
> limit of relative frequency, this might be quite difficult.
> 
> The second course should be a course in probability modeling
> in the student's department of application.  The construction
> of probability models, the making of assumptions, and the
> meaning of those assumptions, is almost totally absent in
> those using statistics today.  There should be strong warnings
> about the dangers of those assumptions being false, and that
> in practice these assumptions might not be quite true.
> 
> Only after this can one reasonably deal with the uncertainties
> of inference.


Dr. Rubin,
Are there any texbooks that you would deem suitable for the 3
courses you describe above? 

-- 
Bruce Weaver
[EMAIL PROTECTED]
http://www.angelfire.com/wv/bwhomedir/



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing

2000-04-12 Thread Bruce Weaver

On 11 Apr 2000, Donald F. Burrill wrote:

> On Mon, 10 Apr 2000, Bruce Weaver wrote in part, quoting Bob Frick:
> 
-- >8 ---

> > 
> > To put this argument another way, suppose the question is whether one 
> > variable influences another.  This is a discrete probability space with 
> > only two answers: yes or no.  Therefore, it is natural that both 
> > answers receive a nonzero probability. 
> 
> It may be (or seem) "natural";  that doesn't mean that it's so, 
> especially in view of the subsequent refinement:
> 
> > Now suppose the question is changed into 
> > one concerning the size of the effect.  This creates a continuous 
> > probability space, with the possible answer being any of an infinite 
> > number of real numbers and each one of these real numbers receiving an 
> > essentially zero probability.  A natural tendency is to include 0 in this 
> > continuous probability space and assign it an essentially zero 
> > probability.  However, the "no" answer, which corresponds to a size of 
> > zero, does not change probability just because the question is phrased 
> > differently.  Therefore, it still has its nonzero probability; only the 
> > nonzero probability of the "yes" answer is spread over the real numbers.
> > 
> 
> To this I have two objections:  (1) It is not clear that the "no" answer 
> "does not change probability ...", as Bob puts it.  If the question is 
> one that makes sense in a continuous probability space, it is entirely 
> possible (and indeed more usual than not, one would expect) that 
> constraining it to a two-value discrete situation ("yes" vs. "no") may 
> have entailed condensing a range of what one might call "small" values 
> onto the answer "no".  That is, the question may already, and perhaps 
> unconsciously, have been "coarsened" to permit the discrete expression 
> of the question with which Bob started.

I see your point.  But one of the examples Frick gives concerns the
existence of ESP.  In the discrete space, it does or does not exist.  For
this particular example, I think one could justify using a 1-tailed test
when moving to the continous space; and so the null hypothesis would
encompass "less than or equal to 0", and the alternative "greater than 0". 
It seems to me that with a one-tailed alternative like this, the null
hypothesis can certainly be true.  


>   (2) My second objection is that if the positive-discrete 
> probability is retained for the value "0" (or whatever value the former 
> "no" is held to represent), the distribution of the observed quantity 
> cannot be one of the standard distributions.  (In particular, it is not 
> normal.)  One then has no basis for asserting the probability of error 
> in rejecting the null hypothesis (at least, not by invoking the standard 
> distributions, as computers do, or the standard tables, as humans do 
> when they aren't relying on computers).  Presumably one could derive the 
> sampling distribution in enough detail to handle simple problems, but 
> that still looks like a lot more work than one can imagine most 
> investigators -- psychologists, say -- cheerfully undertaking.

This would not be a problem if the alternative was one-tailed, would it?

Cheers,
Bruce
-- 
Bruce Weaver
[EMAIL PROTECTED]
http://www.angelfire.com/wv/bwhomedir/




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing

2000-04-10 Thread Bruce Weaver

On Mon, 10 Apr 2000, Rich Ulrich wrote:

-- >8 ---

> > the term 'null' means a hypothesis that is the straw dog case ... for which 
> > we are hoping that sample data will allow us to NULLIFY ...
> 
>  - this seemed okay in the first sentence.  However, I think that
> "straw dog case" is what I would call "straw man argument"  and that
> is *not*  the quality of argument of the null.The point-null is
> always false, but we state the null so that it is "reasonable" to
> accept it, or to require data in order to reject it.
> 
-- >8 ---

Rich, I do not agree that the point-null is always false.  But I guess it
depends on how you define "point-null".  Bob Frick has some very
interesting things to say about all of this.  For example, the following
is taken from his 1995 Memory & Cognition paper (Vol 23, pp.  132-138),
"Accepting the null hypothesis": 


To put this argument another way, suppose the question is whether one 
variable influences another.  This is a discrete probability space with 
only two answers: yes or no.  Therefore, it is natural that both answers 
receive a nonzero probability.  Now suppose the question is changed into 
one concerning the size of the effect.  This creates a continuous 
probability space, with the possible answer being any of an infinite 
number of real numbers and each one of these real numbers receiving an 
essentially zero probability.  A natural tendency is to include 0 in this 
continuous probability space and assign it an essentially zero 
probability.  However, the "no" answer, which corresponds to a size of 
zero, does not change probability just because the question is phrased 
differently.  Therefore, it still has its nonzero probability; only the 
nonzero probability of the "yes" answer is spread over the real numbers.


Frick's 1996 paper in Psychological Methods (Vol 1, pp.  379-390),
"The appropriate use of null hypothesis testing" is also very interesting
and topical.  From the abstract of that paper:  "This article explores
when and why [null hypothesis testing] is appropriate. Null hypothesis
testing is insufficient when the size of effect is important, but is ideal
for testing ordinal claims relating the order of conditions, which are
common in psychology." 

Cheers,
Bruce
-- 
Bruce Weaver
[EMAIL PROTECTED]
http://www.angelfire.com/wv/bwhomedir/




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing

2000-04-10 Thread Bruce Weaver

On 7 Apr 2000, dennis roberts wrote:

> i was not suggesting taking away from our arsenal of tricks ... but, since 
> i was one of those old guys too ... i am wondering if we were mostly lead 
> astray ...?
> 
> the more i work with statistical methods, the less i see any meaningful (at 
> the level of dominance that we see it) applications of hypothesis testing ...
> 
> here is a typical problem ... and we teach students this!
> 
> 1. we design a new treatment
> 2. we do an experiment
> 3. our null hypothesis is that both 'methods', new and old, produce the 
> same results
> 4. we WANT to reject the null (especially if OUR method is better!)
> 5. we DO a two sample t test (our t was 2.98 with 60 df)  and reject the 
> null ... and in our favor!
> 6. what has this told us?
> 
> if this is ALL you do ... what it has told you AT BEST is that ... the 
> methods probably are not the same ... but, is that the question of interest 
> to us?
> 
> no ... the real question is: how much difference is there in the two methods?
-- >8 ---

In one of his papers, Bob Frick has argues very persuasively that very
often (in experimental psychology, at least), this is NOT the real
question at all.  I think that is especially the case when you are testing
theories.  Suppose, for example that my theory of selective attention
posits that inhibition of the internal representations of distracting
items is an important mechanism of selection.  This idea has been testing
in so-called "negative priming" experiments.  (Negative priming refers to
the fact that subjects respond more slowly to an item that was previously
ignored, or is semantically related to a previously ignored item, than
they do to a novel item.) Negative priming is measured as a response time
difference between 2 conditions in an experiment.  The difference is
typically between about 20 and 40 milliseconds.  I think the important
thing to remember about this is that the researcher is not trying to
account for variability in response time per se, even though response time
is the dependent variable:  He or she is just using response time to
indirectly measure the object of real interest.  If one was trying to
account for overall variability in response time, the conditions of this
experiment would almost certainly not make the list of important
variables.  The researcher KNOWS that a lot of other things affect
response time, and some of them a LOT more than his experimental
conditions do.  However, because one is interested in testing a theory of
selective attention, this small difference between conditions is VERY
important, provided it is statistically significant (and there is
sufficient power);  and measures of effect size are not all that relevant. 

Just my 2 cents.
-- 
Bruce Weaver
[EMAIL PROTECTED]
http://www.angelfire.com/wv/bwhomedir/




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Combining 2x2 tables

2000-03-31 Thread Bruce Weaver

On Thu, 30 Mar 2000, JohnPeters wrote:

> Hi,
> I was wondering if someone could help me.  I am interested in combining
> 2x2 tables from multiple studies.  The test used is the McNemar's
> chi-sq.  I have the raw data from each of these studies.  What is the
> proper correction that should be used when combining the results.
> Thanks!!!


Meta-analysis is a common way to combine information from 2x2 tables, but
I'm not sure how you would do this with McNemar's chi-square as your
measure of "effect size" for each table.  It might be possible if you
are willing to use something else. 

It's Friday afternoon, and this is off the top of my head, but here goes 
anyway.  I wonder if you could write the tables this way:

 Change
Yes   No
-ab
Before
+cd


Cell a:  change from - to +
Cell b:  no change, - before and after
Cell c:  change from + to -
Cell d:  no change, + before and after

Suppose we're talking about change in opinion after hearing a political
speech.  The odds ratio for this table would give you the odds of changing
from a negative to a positive oppion over the odds of changing from
positive to negative. If you're the speaker, you're hoping for an odds 
ratio greater than 1 (i.e., greater change in those who were negative 
before the speech).  If the amount of change is similar in both groups, 
the odds ratio will be about 1.  

If this is a legitimate way to analyze the data for one such table, and I 
can't see why not, then you could pool the tables meta-analytically with 
ln(OR) as your measure of effect size.  Here's a paper that describes how 
to go about it:

Fleiss, JL. (1993). The statistical basis of meta-analysis. Statistical 
Methods in Medical Research, 2, 121-145.

There are also free programs available for performing this kind of 
meta-analysis.  I have links to some in the statistics section of my 
homepage.

Hope this helps. Bruce
-- 
Bruce Weaver
[EMAIL PROTECTED]
http://www.angelfire.com/wv/bwhomedir/



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Normality & parametric tests (WAS: Kruskal-Wallis & equal va

2000-03-24 Thread Bruce Weaver

On Fri, 24 Mar 2000, Bernard Higgins wrote:

> 
> 
> Hi Bruce

Hello Bernard.

> 
> The point I was making is that when developing hypothesis tests, 
> from a theoretical point of view, the sampling distribution of the 
> test statistic from which critical values or p-values etc are 
> obtained, is determined by the null hypothesis. We need a probability 
> model to enable use to determine how likely observed patterns are. 
> These probability models will often work well in practice even if we 
> relax the usual assumptions. When using distribution-free tests as 
> an alternative to a parametric test we may need to specify 
> restrictions in order that the tests can be considered "equivalent". 

Agreed.

> 
> In my view the t-test is fairly robust and will work well in most 
> situations where the distribution is not too skewed, and constant 
> variance is reasonable. Indeed I have no problems in using it for the 
> majority of problems. When comparing two independent samples using 
> t-tests, lack of normality and constant variance are often not too 
> serious if the samples are of similar size, always a good idea in 
> planned experiments.

Agreed here too.

> 
> As you say, when samples are fairly large, some say 30+ or even 
> less, the sampling distribution of the mean can often be approximated 
> by a normal distribution (Central Limit Theorem) and hence the use of 
> an (asymptotic) Z-test is frequently used. It would not, I think, be 
> strictly correct to call such a statistic t, although from a 
> practical point of view there may be little difference. The formal 
> definition of the single sample t-test is derived from the ratio of a 
> Standard Normal random variable to a Chi-squared random variable and 
> does, in theory, require independent observations from a normal 
> distribution.


I think we are no longer in complete agreement here.  I am not a 
mathematician, but for what it's worth, here is my understanding of t- 
and z-tests:

numerator = (statistic - parameter|H0)
denominator = SE(statistic)

test statistic = z if SE(statistic) is based on pop. SD
test statistic = t if SE(statistic) is based on sample SD

The most common 'statistics' in the numerator are Xbar and (Xbar1 - 
Xbar2); but others are certainly possible (e.g., for large-sample 
versions of rank-based tests).

An assumption of both tests is that the statistic in the numerator has a
sampling distribution that is normal.  This is where the CLT comes into
play:  It lays out the conditions under which the sampling distribution of
the statistic is approximately normal--and those conditions can vary
depending on what statistic you're talking about.  But having a normal
sampling distribution does not mean that we can or should use a critical
z-value rather than a critical t when the population variance is unknown
(which is what I thought you were suggesting).  

As you say, one can substitute critical z for critical t when n gets
larger, because the differences become negligible.  But nowadays, most of
us are using computer programs that give us more or less exact p-values
anyway, so this is less of an issue than it once was. 


Cheers,
Bruce
-- 
Bruce Weaver
[EMAIL PROTECTED]
http://www.angelfire.com/wv/bwhomedir/




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Normality & parametric tests (WAS: Kruskal-Wallis & equal variances)

2000-03-24 Thread Bruce Weaver

On 24 Mar 2000, Bernard Higgins wrote:

> These are my thoughts:
> 
> The sampling distribution of a test statistic is determined by the 
> null hypothesis. So analysis of variance is used to test that a 
> number of samples come from an identical Normal distribution
> against the alternative that the "subpopulations" have different 
> means (but the same variances). The mean and standard deviation of 
> normally distributed random variables are independent of one another.
> 
> Distribution free (non-parametric) procedures do not require the 
> underlying distribution to be normal. For the majority of these
-- >8 ---


I think it is overly restrictive to say that the samples must come from
normally distributed populations under a true null hypothesis.  Take the
simplest paramtric test, a single sample t-test.  The assumption is that
the sampling distribution of X-bar is (approximately) normal, not that the
population from which you've sampled is normal.  If the population is
normal, then of course the sampling distribution of X-bar will be too, for
any size sample (even n=1).  But if your sample size is large enough
(e.g., some authors suggest around n=300), the sampling distribution of
X-bar will be close to normal no matter what the population distribution
looks like. For populations that are not normal, but are reasonably
symmetrical, the sampling distribution of X-bar will be near enough to
normal with samples somewhere between these extremes.

-- 
Bruce Weaver
[EMAIL PROTECTED]
http://www.angelfire.com/wv/bwhomedir/



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Multiple Comparison Correction in Multiple Regression

2000-03-17 Thread Bruce Weaver

On Fri, 17 Mar 2000, Rich Ulrich wrote:

-- >8 ---

> > 2) When performing a multiple linear regression we have performed partial
> > f-tests with the sequential SS (Type I SS) to examine if a particular
> > variable "should be added" to a simpler model.  If a series of these tests
> > are used to find a parsimonious model that still fits should we correct for
> > multiple comparisons?
> 
> "Stepwise inclusion" is usually a bad idea.  See the comments in my
> stats-FAQ, and their references.  (If you are worried about correcting
> for multiple tests, then you probably *shouldn't*  add the variable
> because it is probably capitallizing on chance.)


Rich,
Is there not an important distinction to be made between the 
following situations:

1.  A computer algorithm determines (based on the magnitude of partial or 
semi-partial correlations) the order in which variables are entered or 
removed, and which ones end up in the final model

2.  The investigator determines a priori the order in which variables are 
to be entered or removed.


Some of my texbooks refer to situation 1 as "stepwise"  regression and
situation 2 as "hierarchical" regression.  One is less likely to
capitalize on chance with hierarchical regression, I think, especially if
the decisions about order are theoretically motivated, and the number of
variables is not too large.  

Here's another observation that is relevant to this thread, I think.  When
one performs a 2-factor ANOVA, there are 3 independent F-tests:  one for
each main effect, and one for the intereaction.  One can arrive at these
same F-tests using the same regression model comparison approach that is
described above (e.g., compare the FULL regression model to one without
the AxB interaction to get F for the interaction term).  I don't think
I have EVER seen anyone correct for multiple comparisons in this case.

Cheers,
Bruce
-- 
Bruce Weaver
[EMAIL PROTECTED]
http://www.angelfire.com/wv/bwhomedir/


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: ANOVA causal direction

2000-02-11 Thread Bruce Weaver

On 10 Feb 2000, Richard M. Barton wrote:

> --- Alex Yu wrote:
> 
> A statistical procedure alone cannot determine casual relationships. 
> ---
> 
> 
> Correct.  A lot depends on eye contact.
> 
> rb


And also, at least 2 statistical procedures are required...



===
  This list is open to everyone. Occasionally, people lacking respect
  for other members of the list send messages that are inappropriate
  or unrelated to the list's discussion topics. Please just delete the
  offensive email.

  For information concerning the list, please see the following web page:
  http://jse.stat.ncsu.edu/
===