Fwd: Re: diff in proportions

2001-11-17 Thread Rich Strauss

This is true.  I simulated the null distributions, those obtained when the
null hypothesis is true, which is what the centered t-distribution
represents.  I didn't look at the sampling distributions for different
effect sizes.

>Date: Sat, 17 Nov 2001 00:19:06 -0600
>From: jim clark <[EMAIL PROTECTED]>
>Subject: Re: diff in proportions
>Sender: [EMAIL PROTECTED]
>X-Sender: [EMAIL PROTECTED]
>To: [EMAIL PROTECTED]
>Organization: The University of Winnipeg
>X-Authentication-warning: dex.pathlink.com: news set sender to
> [EMAIL PROTECTED] using -f
>Original-recipient: rfc822;[EMAIL PROTECTED]
>
>Hi
>
>On 16 Nov 2001, Rich Strauss wrote:
>> I've just done some quick simulations in Matlab, constructing randomized
>> null distributions of the t-statistic under both scenarious: (1) sample
>> variances based on sample means vs. (2) variances about the pooled mean.
>> Assuming I've done everything correctly, the result is that the null
>> distribution of the t-statistic in the second case consistently
>> approximates the theoretical t-distribution more closely that that of the
>> first case.  This seems to be true regardless of sample sizes and of
>> whether the two sample sizes are identical or different.  This result
>> implies that the t-statistic should indeed be calculated about a pooled
>> estimate of the common mean, as Jerry Dallal suggested.
>> 
>> I could pass on the details of my simulation if anyone is interested, but
>> mostly I'd appreciate it if someone could repeat this simulation
>> independently of mine to see whether it holds up.
>
>This simply cannot be generally true.  It probably only applies
>when the null is in fact true, which may be the case for your
>simulations.  To appreciate the illogical nature of this
>recommendation, consider creating a real difference of x between
>your population means, then 2x, then 3x, and so on.  By the
>common mean approach, you are treating the variability between
>groups as though it were noise (i.e., a component in your
>estimate of sigma^2, the variance about the null-hypothesis of
>a common mean).  It is critical to keep in mind that the null
>hypothesis is in fact just that, a hypothesis that may or may
>not be correct.  Computing the within-group variance about the
>group means is the correct way to estimate sigma^2, however,
>irrespective of whether the Ho about the means is true or not.
>
>Best wishes
>Jim
>
>
>James M. Clark (204) 786-9757
>Department of Psychology   (204) 774-4134 Fax
>University of Winnipeg 4L05D
>Winnipeg, Manitoba  R3B 2E9[EMAIL PROTECTED]
>CANADA http://www.uwinnipeg.ca/~clark
>
>
>
>
>=
>Instructions for joining and leaving this list and remarks about
>the problem of INAPPROPRIATE MESSAGES are available at
>  http://jse.stat.ncsu.edu/
>=
> 


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: diff in proportions

2001-11-16 Thread jim clark

Hi

On 16 Nov 2001, Rich Strauss wrote:
> I've just done some quick simulations in Matlab, constructing randomized
> null distributions of the t-statistic under both scenarious: (1) sample
> variances based on sample means vs. (2) variances about the pooled mean.
> Assuming I've done everything correctly, the result is that the null
> distribution of the t-statistic in the second case consistently
> approximates the theoretical t-distribution more closely that that of the
> first case.  This seems to be true regardless of sample sizes and of
> whether the two sample sizes are identical or different.  This result
> implies that the t-statistic should indeed be calculated about a pooled
> estimate of the common mean, as Jerry Dallal suggested.
> 
> I could pass on the details of my simulation if anyone is interested, but
> mostly I'd appreciate it if someone could repeat this simulation
> independently of mine to see whether it holds up.

This simply cannot be generally true.  It probably only applies
when the null is in fact true, which may be the case for your
simulations.  To appreciate the illogical nature of this
recommendation, consider creating a real difference of x between
your population means, then 2x, then 3x, and so on.  By the
common mean approach, you are treating the variability between
groups as though it were noise (i.e., a component in your
estimate of sigma^2, the variance about the null-hypothesis of
a common mean).  It is critical to keep in mind that the null
hypothesis is in fact just that, a hypothesis that may or may
not be correct.  Computing the within-group variance about the
group means is the correct way to estimate sigma^2, however,
irrespective of whether the Ho about the means is true or not.

Best wishes
Jim


James M. Clark  (204) 786-9757
Department of Psychology(204) 774-4134 Fax
University of Winnipeg  4L05D
Winnipeg, Manitoba  R3B 2E9 [EMAIL PROTECTED]
CANADA  http://www.uwinnipeg.ca/~clark




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: diff in proportions

2001-11-16 Thread Rich Strauss

At 05:12 PM 11/16/01 +, you wrote:
>>On Thu, 15 Nov 2001, Jerry Dallal wrote:
>>> But, if the null hypothesis is that the means are the same, why
>>> isn't(aren't) the sample variance(s) calculated about a pooled
>>> estimate of the common mean?

I've just done some quick simulations in Matlab, constructing randomized
null distributions of the t-statistic under both scenarious: (1) sample
variances based on sample means vs. (2) variances about the pooled mean.
Assuming I've done everything correctly, the result is that the null
distribution of the t-statistic in the second case consistently
approximates the theoretical t-distribution more closely that that of the
first case.  This seems to be true regardless of sample sizes and of
whether the two sample sizes are identical or different.  This result
implies that the t-statistic should indeed be calculated about a pooled
estimate of the common mean, as Jerry Dallal suggested.

I could pass on the details of my simulation if anyone is interested, but
mostly I'd appreciate it if someone could repeat this simulation
independently of mine to see whether it holds up.

Rich Strauss



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: diff in proportions

2001-11-16 Thread Radford Neal

>On Thu, 15 Nov 2001, Jerry Dallal wrote:
>> But, if the null hypothesis is that the means are the same, why
>> isn't(aren't) the sample variance(s) calculated about a pooled
>> estimate of the common mean?

Another thought on this...  A simpler question is, for a one-sample
test of the hull hypothesis that the mean is zero, why don't we find a
p-value based on something like a t statistic, but in which the
variance is estimated by the average squared differences of the data
points from zero, rather than from their sample mean?  I investigated
this once, and came to the conclusion that the final result (after
finding the distribution of the test statistic, and calculating
p-values on that basis) is no different from the usual t test.
Perhaps the same is the case for a two-sample test, which would
explain why no one talks about the possibility of doing it this way.

   Radford Neal


Radford M. Neal   [EMAIL PROTECTED]
Dept. of Statistics and Dept. of Computer Science [EMAIL PROTECTED]
University of Toronto http://www.cs.utoronto.ca/~radford



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: diff in proportions

2001-11-16 Thread Robert J. MacG. Dawson

> Jerry Dallal wrote:
> 
>But, if the null hypothesis is that the means are the same, why
>isn't(aren't) the sample variance(s) calculated about a pooled
>estimate of the common mean?

I looked at this some years ago.  The answer is straightforward: it
would be logically valid to do so but you would lose a *lot* of power. A
hypothesis test is essentially a proof by contradiction; in such an
argument you are permitted to run with the hare and hunt with the
hounds, changing sides as often as you like.  Thus, at any stage, you
may appeal to the null hypothesis or to the data; any inconsistency
between the two, no matter how byzantine the argument, is evidence
against the null.

If you think about the two-sample-T as a two-level ANOVA (a roughly
correct idea), the pooled estimate of the mean gives you the SST; the
usual method gives you the SSE. As you expect the SSTr to be nonzero,
you have 

SSE < SST

and substituting one for the other is a Bad Thing.  In an extreme case:


A   B
10  20
11  21  
12  22


one method estimates the SD as 1, the other as 5.55.

-Robert Dawson


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: diff in proportions

2001-11-16 Thread Herman Rubin

In article <[EMAIL PROTECTED]>,
Jerry Dallal  <[EMAIL PROTECTED]> wrote:
>Radford Neal wrote:


>> The difference is that when dealing with real data, it is possible for
>> two populations to have the same mean (as assumed by the null), but
>> different variances.  In contrast, when dealing with binary data, if
>> the means are the same in the two populations, the variances must
>> necessarily be the same as well.  So one can argue on this basis that
>> the distribution of the p-values if the null is true will be close to
>> correct when using the pooled estimate (apart from the use of a normal
>> approximation, etc.)


>But, if the null hypothesis is that the means are the same, why
>isn't(aren't) the sample variance(s) calculated about a pooled
>estimate of the common mean?

I suspect that much of the confusion comes from the overuse of
the normal distribution.  With a normal distribution, each 
sample has a mean and variance, and these are the sufficient
statistics.

Now SOME of this may carry over to SOME other problems, but 
when one is doing statistical inference, the probability 
model for the actual situation should be used, and there
should not be an attempt to connect the inference with that
from a normal model.

In the case of the binomial, it is the case that the sample
mean is a sufficient statistic.  But it is not a "measure of
central tendency", the individual Bernoulli trials are all
0 or 1.  Do the actual problem, not force it into an 
inappropriate mold.
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: diff in proportions

2001-11-16 Thread Herman Rubin

In article <[EMAIL PROTECTED]>,
dennis roberts <[EMAIL PROTECTED]> wrote:
>At 08:03 PM 11/15/01 +, Radford Neal wrote:
>>Radford Neal:

>> >> The difference is that when dealing with real data, it is possible for
>> >> two populations to have the same mean (as assumed by the null), but
>> >> different variances.  In contrast, when dealing with binary data, if
>> >> the means are the same in the two populations, the variances must
>> >> necessarily be the same as well.  So one can argue on this basis that
>> >> the distribution of the p-values if the null is true will be close to
>> >> correct when using the pooled estimate (apart from the use of a normal
>> >> approximation, etc.)

>>Jerry Dallal:

>> >But, if the null hypothesis is that the means are the same, why
>> >isn't(aren't) the sample variance(s) calculated about a pooled
>> >estimate of the common mean?


>>An interesting question.


>i think what  this shows (ie, these small highly technical distinctions) is 
>that ... that most null hypotheses that we use for our array of 
>significance tests ... have rather little meaning

>null hypothesis testing is a highly overrated activity in statistical work

Agreed.  The question is how to act.

>in the case of differences between two proportions ... the useful question 
>is: i wonder how much difference (since i know there is bound to be some 
>[even though it could be trivial]) there is between the proportions of A 
>population versus B population?

Now this is a difficult problem.  It is only in translation
parameter problems that it is even clear that this is what
should be asked.  Confidence intervals for a binomial proportion
are a major headache, although for large samples, the usual
asymptotic expressions give a good approximation.

>to seek an answer to the real question ... no notion of null has to even be 
>entertained

If the means are far apart, one definitely should NOT use the
pooled mean to estimate the precision; the estimate of 
precision from that is always too large.  If the means are 
close, the difference might be unimportant.
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: diff in proportions

2001-11-15 Thread jim clark

Hi

On Thu, 15 Nov 2001, Jerry Dallal wrote:
> But, if the null hypothesis is that the means are the same, why
> isn't(aren't) the sample variance(s) calculated about a pooled
> estimate of the common mean?

What you are testing is whether there is more variability between
groups than you would expect by chance given the variability
within groups.  This is most clear with the F test, of course,
(i.e., F = n*Vmeans/Vwithin) but t is simply a variation of this.  
Is the difference between X1 and X2 (i.e., variation in Xjs)
greater than expected given variation within groups.  Taking the
common mean to calculate a variance would conflate the within and
between group factors that you want to contrast.

Best wishes
Jim


James M. Clark  (204) 786-9757
Department of Psychology(204) 774-4134 Fax
University of Winnipeg  4L05D
Winnipeg, Manitoba  R3B 2E9 [EMAIL PROTECTED]
CANADA  http://www.uwinnipeg.ca/~clark




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: diff in proportions

2001-11-15 Thread dennis roberts

At 08:03 PM 11/15/01 +, Radford Neal wrote:
>Radford Neal:
>
> >> The difference is that when dealing with real data, it is possible for
> >> two populations to have the same mean (as assumed by the null), but
> >> different variances.  In contrast, when dealing with binary data, if
> >> the means are the same in the two populations, the variances must
> >> necessarily be the same as well.  So one can argue on this basis that
> >> the distribution of the p-values if the null is true will be close to
> >> correct when using the pooled estimate (apart from the use of a normal
> >> approximation, etc.)
>
>Jerry Dallal:
>
> >But, if the null hypothesis is that the means are the same, why
> >isn't(aren't) the sample variance(s) calculated about a pooled
> >estimate of the common mean?
>
>
>An interesting question.


i think what  this shows (ie, these small highly technical distinctions) is 
that ... that most null hypotheses that we use for our array of 
significance tests ... have rather little meaning

null hypothesis testing is a highly overrated activity in statistical work

in the case of differences between two proportions ... the useful question 
is: i wonder how much difference (since i know there is bound to be some 
[even though it could be trivial]) there is between the proportions of A 
population versus B population?

to seek an answer to the real question ... no notion of null has to even be 
entertained


==
dennis roberts, penn state university
educational psychology, 8148632401
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: diff in proportions

2001-11-15 Thread Jerry Dallal

Radford Neal wrote:
> 

> The difference is that when dealing with real data, it is possible for
> two populations to have the same mean (as assumed by the null), but
> different variances.  In contrast, when dealing with binary data, if
> the means are the same in the two populations, the variances must
> necessarily be the same as well.  So one can argue on this basis that
> the distribution of the p-values if the null is true will be close to
> correct when using the pooled estimate (apart from the use of a normal
> approximation, etc.)
> 

But, if the null hypothesis is that the means are the same, why
isn't(aren't) the sample variance(s) calculated about a pooled
estimate of the common mean?


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: diff in proportions

2001-11-15 Thread Robert J. MacG. Dawson



Dennis Roberts wrote:
> 
> At 08:51 AM 11/15/01 -0600, jim clark wrote:
> 
> >The Ho in the case of means is NOT about the variances, so the
> >analogy breaks down.  That is, we are not hypothesizing
> >Ho: sig1^2 = sig2^2, but rather Ho: mu1 = mu2.  So there is no
> >direct link between Ho and the SE, unlike the proportions
> >example.
> 
> would it be correct then to say ... that the test of differences in
> proportions is REALLY a test about the differences between two population
> variances?

No, because it would reject the null
(with large enough samples) when pi_1 = 1-pi_2,
despite the fact that the variances would be
equal!

-Robert Dawson


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: diff in proportions

2001-11-15 Thread Rolf Dalin

I'm not really arguing for using the pooled stdev in this case, I'm just 
trying to find out the reasons for significance testing procedures. 

I think that what were discussing here is if we should use CIs BOTH 
for stating effect sizes with errors AND for hypoyhesis testing. I read 
a book by Michael Smithson called Statistics with Confidence 
(SAGE, 2000). He's using CIs through the whole book in formulations 
of hypothethis testing. It was really nice reading and I believe 
students would appreciate the clearness of using fewer formulae for 
SEs. But then I think we also have to kill darlings like Pearson's Chi 
Sq. 

Rolf D
 


> At 04:26 PM 11/15/01 +0100, Rolf Dalin wrote:
> 
> 
> >The significance test produces a p-value UNDER THE CONDITION
> >that the null is true. In my opinion it does not matter whether we
> >know it isn't true. It is just an assumption for the calculations. And
> >these calculations do not produce exactly the same information as the CI
> >for the difference. They state in some sense, if the procedure was
> >repeted, how probable it would be to ... etc.
> 
> this might make sense if the sample p*q values were the same for BOTH
> samples ... but if they are not (which will almost always be the case in
> real data) ... then you already have SOME evidence that the null is
> perhaps not true (of course, we know that it is not exactly true anyway
> ... so that sort of tosses out the notion of pooling so as to get a better
> estimate of a COMMON variance)
> 
> earlier in their presentation, moore and mccabe say that they prefer to
> use a CI to test some null in this case ... but, if one did a z test with
> the unpooled estimator for standard error, this would lead to a "valid"
> significance test ... HOWEVER ... then they go on to say that INSTEAD,
> they will adopt the pooled standard error approach since it is the " ...
> more common practice"
> 
> that logic escapes me
> 
> if we can build a CI using the un pooled standard error formula and, find
> that to be ok to see if some null value like 0 difference in population
> proportions is inside or outside of the CI, i don't see any need to switch
> the denominator formula in the z test JUST because we want to use the z
> test STATISTIC to test the null
> 
> a little more consistency in logic would seem to be in the best interests
> of students trying to learn this ...
> 
> i would still argue that the extent to which you would not be willing to
> use the pooled standard error formula in the case of differences in means,
> would be the same extent to which you would not be willing to use the
> pooled standard error formula when it comes to differences in proportions
> ... i don't see that the logic really is any different
> 
> but, this is just my opinion
> 
> 
> _
> dennis roberts, educational psychology, penn state university
> 208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED]
> http://roberts.ed.psu.edu/users/droberts/drober~1.htm
> 


**
Rolf Dalin
Department of Information Tchnology and Media
Mid Sweden University
S-870 51 SUNDSVALL
Sweden
Phone: 060 148690, international: +46 60 148690
Fax: 060 148970, international: +46 60 148970
Mobile: 0705 947896, intnational: +46 70 5947896

mailto:[EMAIL PROTECTED]
http://www.itk.mh.se/~roldal/
**


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: diff in proportions

2001-11-15 Thread Radford Neal

In article <[EMAIL PROTECTED]>,
dennis roberts <[EMAIL PROTECTED]> wrote:

>in the moore and mccabe book (IPS), in the section on testing for 
>differences in population proportions, when it comes to doing a 'z' test 
>for significance, they argue for (and say this is commonly done) that the 
>standard error for the difference in proportions formula should be a POOLED 
>one ... 
>
>in their discussion of differences in means ... they present FIRST the NON 
>pooled version of the standard error and that is there preferred way to 
>build CIs and do t tests ... though they also bring in later the pooled 
>version as a later topic (and of course if we KNEW that populations had the 
>same variances, then the pooled version would be useful)
>
>it seems to me that this same logic should hold in the case of differences 
>in proportions

The difference is that when dealing with real data, it is possible for
two populations to have the same mean (as assumed by the null), but
different variances.  In contrast, when dealing with binary data, if
the means are the same in the two populations, the variances must
necessarily be the same as well.  So one can argue on this basis that
the distribution of the p-values if the null is true will be close to
correct when using the pooled estimate (apart from the use of a normal
approximation, etc.)

   Radford Neal


Radford M. Neal   [EMAIL PROTECTED]
Dept. of Statistics and Dept. of Computer Science [EMAIL PROTECTED]
University of Toronto http://www.cs.utoronto.ca/~radford



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: diff in proportions

2001-11-15 Thread Jerry Dallal

dennis roberts wrote:
> 
> in the moore and mccabe book (IPS), in the section on testing for
> differences in population proportions, when it comes to doing a 'z' test
> for significance, they argue for (and say this is commonly done) that the
> standard error for the difference in proportions formula should be a POOLED
> one ... since if one is testing the null of equal proportions, then that
> means your null is assuming that the p*q combinations are the SAME for both
> populations thus, this is a case of pooling sample variances to estimate a
> single common population variance
> 
> but since this is just a null ... and we have no way of knowing if the null
> is true (not that we can in any case) ... i don't see any logical
> progression that would then lead one to also assume that the p*q
> combinations are the same in the two populations ... hence, i don't see why
> the pooled variance version of the standard error of a difference in
> proportions formula would be the recommended way to go
> 
> in their discussion of differences in means ... they present FIRST the NON
> pooled version of the standard error and that is there preferred way to
> build CIs and do t tests ... though they also bring in later the pooled
> version as a later topic (and of course if we KNEW that populations had the
> same variances, then the pooled version would be useful)
> 
> it seems to me that this same logic should hold in the case of differences
> in proportions
> 

Either form is valid, that is, either produces a test of the
requisite size under the null.  To my knowledge, neither test has
been proven uniformly superior in terms of power.  There are some
alternatives where each is the better.

While I don't have the text and it may be using a version of the
test that is different from the way I usually see it constructed,
the way it's typically formulated, the square of the pooled
statistic is equal to the usual Pearson chi-square statistic for
homogeneity of proportions.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: diff in proportions

2001-11-15 Thread Dennis Roberts

At 08:51 AM 11/15/01 -0600, jim clark wrote:

>The Ho in the case of means is NOT about the variances, so the
>analogy breaks down.  That is, we are not hypothesizing
>Ho: sig1^2 = sig2^2, but rather Ho: mu1 = mu2.  So there is no
>direct link between Ho and the SE, unlike the proportions
>example.

would it be correct then to say ... that the test of differences in 
proportions is REALLY a test about the differences between two population 
variances?


>Best wishes
>Jim
>
>
>James M. Clark  (204) 786-9757
>Department of Psychology(204) 774-4134 Fax
>University of Winnipeg  4L05D
>Winnipeg, Manitoba  R3B 2E9 [EMAIL PROTECTED]
>CANADA  http://www.uwinnipeg.ca/~clark
>
>
>
>
>=
>Instructions for joining and leaving this list and remarks about
>the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
>=

_
dennis roberts, educational psychology, penn state university
208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED]
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: diff in proportions

2001-11-15 Thread Dennis Roberts

At 04:26 PM 11/15/01 +0100, Rolf Dalin wrote:


>The significance test produces a p-value UNDER THE CONDITION
>that the null is true. In my opinion it does not matter whether we
>know it isn't true. It is just an assumption for the calculations. And
>these calculations do not produce exactly the same information as
>the CI for the difference. They state in some sense, if the procedure
>was repeted, how probable it would be to ... etc.

this might make sense if the sample p*q values were the same for BOTH 
samples ... but if they are not (which will almost always be the case in 
real data) ... then you already have SOME evidence that the null is perhaps 
not true (of course, we know that it is not exactly true anyway ... so that 
sort of tosses out the notion of pooling so as to get a better estimate of 
a COMMON variance)

earlier in their presentation, moore and mccabe say that they prefer to use 
a CI to test some null in this case ... but, if one did a z test with the 
unpooled estimator for standard error, this would lead to a "valid" 
significance test ... HOWEVER ... then they go on to say that INSTEAD, they 
will adopt the pooled standard error approach since it is the " ... more 
common practice"

that logic escapes me

if we can build a CI using the un pooled standard error formula and, find 
that to be ok to see if some null value like 0 difference in population 
proportions is inside or outside of the CI, i don't see any need to switch 
the denominator formula in the z test JUST because we want to use the z 
test STATISTIC to test the null

a little more consistency in logic would seem to be in the best interests 
of students trying to learn this ...

i would still argue that the extent to which you would not be willing to 
use the pooled standard error formula in the case of differences in means, 
would be the same extent to which you would not be willing to use the 
pooled standard error formula when it comes to differences in proportions 
... i don't see that the logic really is any different

but, this is just my opinion


_
dennis roberts, educational psychology, penn state university
208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED]
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: diff in proportions

2001-11-15 Thread jim clark

Hi

On 15 Nov 2001, dennis roberts wrote:

> in the moore and mccabe book (IPS), in the section on testing for 
> differences in population proportions, when it comes to doing a 'z' test 
> for significance, they argue for (and say this is commonly done) that the 
> standard error for the difference in proportions formula should be a POOLED 
> one ... since if one is testing the null of equal proportions, then that 
> means your null is assuming that the p*q combinations are the SAME for both 
> populations thus, this is a case of pooling sample variances to estimate a 
> single common population variance
> 
> but since this is just a null ... and we have no way of knowing if the null 
> is true (not that we can in any case) ... i don't see any logical 
> progression that would then lead one to also assume that the p*q 
> combinations are the same in the two populations ... hence, i don't see why 
> the pooled variance version of the standard error of a difference in 
> proportions formula would be the recommended way to go

The p value that one is calculating assumes that the Ho is true,
doesn't it.  That is, what is p(zobt > zalpha | Ho true)?  So
assuming equality is correct assuming Ho true; that is, p1 = p2
in the population.

> in their discussion of differences in means ... they present FIRST the NON 
> pooled version of the standard error and that is there preferred way to 
> build CIs and do t tests ... though they also bring in later the pooled 
> version as a later topic (and of course if we KNEW that populations had the 
> same variances, then the pooled version would be useful)
> 
> it seems to me that this same logic should hold in the case of differences 
> in proportions

The Ho in the case of means is NOT about the variances, so the
analogy breaks down.  That is, we are not hypothesizing
Ho: sig1^2 = sig2^2, but rather Ho: mu1 = mu2.  So there is no
direct link between Ho and the SE, unlike the proportions
example.

Best wishes
Jim


James M. Clark  (204) 786-9757
Department of Psychology(204) 774-4134 Fax
University of Winnipeg  4L05D
Winnipeg, Manitoba  R3B 2E9 [EMAIL PROTECTED]
CANADA  http://www.uwinnipeg.ca/~clark




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



RE: diff in proportions

2001-11-15 Thread Kaplon, Howard
Title: RE: diff in proportions






Dennis,


        I am not sure about this, but here goes anyway.  Since the decision making process is based on Type I error (Critical Point and p-value), and since Type I error is under the assumption that the Null Hypothesis is true, then the "pooled" formula is appropriate.  However, when one is doing Power calculations, then one would not use the "pooled" formula (similar to using a non-central t with continuous data).

Howard Kaplon


-Original Message-

From: dennis roberts [mailto:[EMAIL PROTECTED]]

Sent: Thursday, November 15, 2001 8:30 AM

To: [EMAIL PROTECTED]

Subject: diff in proportions



in the moore and mccabe book (IPS), in the section on testing for 

differences in population proportions, when it comes to doing a 'z' test 

for significance, they argue for (and say this is commonly done) that the 

standard error for the difference in proportions formula should be a POOLED 

one ... since if one is testing the null of equal proportions, then that 

means your null is assuming that the p*q combinations are the SAME for both 

populations thus, this is a case of pooling sample variances to estimate a 

single common population variance


but since this is just a null ... and we have no way of knowing if the null 

is true (not that we can in any case) ... i don't see any logical 

progression that would then lead one to also assume that the p*q 

combinations are the same in the two populations ... hence, i don't see why 

the pooled variance version of the standard error of a difference in 

proportions formula would be the recommended way to go


in their discussion of differences in means ... they present FIRST the NON 

pooled version of the standard error and that is there preferred way to 

build CIs and do t tests ... though they also bring in later the pooled 

version as a later topic (and of course if we KNEW that populations had the 

same variances, then the pooled version would be useful)


it seems to me that this same logic should hold in the case of differences 

in proportions


comments?


==

dennis roberts, penn state university

educational psychology, 8148632401

http://roberts.ed.psu.edu/users/droberts/drober~1.htm




=

Instructions for joining and leaving this list and remarks about

the problem of INAPPROPRIATE MESSAGES are available at

  http://jse.stat.ncsu.edu/

=





diff in proportions

2001-11-15 Thread dennis roberts

in the moore and mccabe book (IPS), in the section on testing for 
differences in population proportions, when it comes to doing a 'z' test 
for significance, they argue for (and say this is commonly done) that the 
standard error for the difference in proportions formula should be a POOLED 
one ... since if one is testing the null of equal proportions, then that 
means your null is assuming that the p*q combinations are the SAME for both 
populations thus, this is a case of pooling sample variances to estimate a 
single common population variance

but since this is just a null ... and we have no way of knowing if the null 
is true (not that we can in any case) ... i don't see any logical 
progression that would then lead one to also assume that the p*q 
combinations are the same in the two populations ... hence, i don't see why 
the pooled variance version of the standard error of a difference in 
proportions formula would be the recommended way to go

in their discussion of differences in means ... they present FIRST the NON 
pooled version of the standard error and that is there preferred way to 
build CIs and do t tests ... though they also bring in later the pooled 
version as a later topic (and of course if we KNEW that populations had the 
same variances, then the pooled version would be useful)

it seems to me that this same logic should hold in the case of differences 
in proportions

comments?

==
dennis roberts, penn state university
educational psychology, 8148632401
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=