Cramer-von-Mises Criterion

2001-12-10 Thread Chia C Chong

Hi!

Any idea where can I get good reference about the Cramer-von-Mises
criterion??

I am trying to test the goodness-of-fit between the some theoretical
distributions with the emprical distribution om my data.

Any other suggestions on goodness-of-fit tests are welcomed and
appreciated


Thanks.

Cheers,
CCC




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: When to Use t and When to Use z Revisited

2001-12-10 Thread kjetil halvorsen



Ronny Richardson wrote:
> 
> A few weeks ago, I posted a message about when to use t and when to use z.
> In reviewing the responses, it seems to me that I did a poor job of
> explaining my question/concern so I am going to try again.
> 
> I have included a few references this time since one responder doubted the
> items to which I was referring. The specific references are listed at the
> end of this message.
> 
> Bluman has a figure (2, page 333) that is suppose to show the student "When
> to Use the z or t Distribution." I have seen a similar figure in several
> different textbooks. The figure is a logic diagram and the first question
> is "Is sigma known?" If the answer is yes, the diagram says to use z. I do
> not question this; however, I doubt that sigma is ever known in a business
> situation and I only have experience with business statistics books.
> 
> If the answer is no, the next question is "Is n>=30?" If the answer is yes,
> the diagram says to use z and estimate sigma with s. This is the option I
> question and I will return to it briefly.
> 
> In the diagram, if the answer is no to the question about n>=30, you are to
> use t. I do not question this either.
> 
> Now, regarding using z when n>=30. If we always use z when n>=30, then you
> would never need a t table with greater than 28 degrees of freedom. (n<=29
> would always yield df<=28.) Bluman cuts his off at 28 except for the
> infinity row so he is consistent. (The infinity row shows that t becomes z
> at infinity.)
> 
> However, other authors go well beyond 30. Aczel (3, inside cover) has
> values for 29, 30, 40, 60, and 120, in addition to infinity. Levine (4,
> pages E7-E8) has values for 29-100 and then 110 and 112, along with
> infinity. I could go on, but you get the point. If you always switch to z
> at 30, then why have t tables that go above 28? Again, the infinity entry I
> understand, just not the others.
> 
> Berenson states (1, page 373), "However, the t distribution has more area
> in the tails and less in the center than down the normal distribution. This
> is because sigma is unknown and we are using s to estimate it. Because we
> are uncertain of the value of sigma, the values of t that we observe will
> be more variable than for Z." So, Berenson seems to me to be saying that
> you always use t when you must estimate sigma using s.
> 
> Levine (4, page 424) says roughly the same thing, "However, the t
> distribution has more area in the tails and less in the center than does
> the normal distribution. This is because sigma is unknown and we are using
> s to estimate it. Because we are uncertain of the value sigma, the values
> of t that we observe will be more variable than for Z."
> 
> So, I conclude 1) we use z when we know the sigma and either the data is
> normally distributed or the sample size is greater than 30 so we can use
> the central limit theorem.
> 
> 2) When n<30 and the data is normally distributed, we use t.
> 
> 3) When n is greater than 30 and we do not know sigma, we must estimate
> sigma using s so we really should be using t rather than z.
> 
> Now, every single business statistics book I have examined, including the
> four referenced below, use z values when performing hypothesis testing or
> computing confidence intervals when n>30.
> 
> Are they
> 
> 1. Wrong
> 2. Just oversimplifying it without telling the reader 

They are not oversimplifying, they are  complexifying. To quote Polya
"How to solve it" : "If you need rules, use this one first: 1) Use your
own brains first".

Sigma is hardly ever known, so you must use t. Then why not simply tell
the students: "use the t table as far as it goes, (usually around
n=120), and after that, use the n=\infty line (which corresponds to the
normal distribution). Then there is no need for a rule for "when to use
z, when to use t".

Kjetil Halvorsen
> 
> or am I overlooking something?
> 
> Ronny Richardson
> 
> References
> --
> (1) Basic Business Statistics, Seventh Edition, Berenson and Levine.
> 
> (2) Elementary Statistics: A Step by Step Approach, Third Edition, Bluman.
> 
> (3) Complete Business Statistics, Fourth Edition, Aczel.
> 
> (4) Statistics for Managers Using Microsoft Excel, Second Edition, Levine,
> Berenson, Stephan.
> 
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Art Kendall

This is a multi-part message in MIME format.
--F89CEF3F1CDF5660163AA634
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

If your conclusion differs whether you use t or z, your decision is "at
the
edge".


The total uncertainty (T)  in a decision has two parts, sampling error
(S) ,
and everything else (N). We can get a rough handle on the sampling error
which
the t or z help put in perspective.  The "nonsampling" uncertainty has
to be
taken into account subjectively. However, think of both as
"nonnegative". So
the sampling uncertainty (S) is the bare minimum of total uncertainty
(T).

If you always use t, your confidence intervals will be a little wider.
You will
make slightly more conservative decisions. The question you have to ask
yourself is whether with 30 cases it really makes a difference if , for
example, your margin of error is +/- $1000 or is +/- $1042.  If you are
doing
"back of the envelope" calculations, you can (a) look up a t, (b) use
1.96 or
(c) commit heresy for both camps and use 2 which easier to multiply by.

If you are writing a program as an exercise use t.

Run the following SPSS syntax to get a handle on how far off  you might
be by using t instead of z.

*your margin of error (one side of confidence interval)
* by using t instead of z
*will be a small percentage wider.  see the variable mult.
new file.
input program.
loop df = 30 to 200.
compute t=idf.t(.975,df).
compute z=idf.normal(.975,0,1).
end case.
end loop.
end file.
end input program.
compute mult = (100*t)/z.
formats t z (f6.2) mult (pct8.2).
list.
execute.
cache.

Ronny Richardson wrote:

> A few weeks ago, I posted a message about when to use t and when to use z.
> In reviewing the responses, it seems to me that I did a poor job of
> explaining my question/concern so I am going to try again.
>
> I have included a few references this time since one responder doubted the
> items to which I was referring. The specific references are listed at the
> end of this message.
>
> Bluman has a figure (2, page 333) that is suppose to show the student "When
> to Use the z or t Distribution." I have seen a similar figure in several
> different textbooks. The figure is a logic diagram and the first question
> is "Is sigma known?" If the answer is yes, the diagram says to use z. I do
> not question this; however, I doubt that sigma is ever known in a business
> situation and I only have experience with business statistics books.
>
> If the answer is no, the next question is "Is n>=30?" If the answer is yes,
> the diagram says to use z and estimate sigma with s. This is the option I
> question and I will return to it briefly.
>
> In the diagram, if the answer is no to the question about n>=30, you are to
> use t. I do not question this either.
>
> Now, regarding using z when n>=30. If we always use z when n>=30, then you
> would never need a t table with greater than 28 degrees of freedom. (n<=29
> would always yield df<=28.) Bluman cuts his off at 28 except for the
> infinity row so he is consistent. (The infinity row shows that t becomes z
> at infinity.)
>
> However, other authors go well beyond 30. Aczel (3, inside cover) has
> values for 29, 30, 40, 60, and 120, in addition to infinity. Levine (4,
> pages E7-E8) has values for 29-100 and then 110 and 112, along with
> infinity. I could go on, but you get the point. If you always switch to z
> at 30, then why have t tables that go above 28? Again, the infinity entry I
> understand, just not the others.
>
> Berenson states (1, page 373), "However, the t distribution has more area
> in the tails and less in the center than down the normal distribution. This
> is because sigma is unknown and we are using s to estimate it. Because we
> are uncertain of the value of sigma, the values of t that we observe will
> be more variable than for Z." So, Berenson seems to me to be saying that
> you always use t when you must estimate sigma using s.
>
> Levine (4, page 424) says roughly the same thing, "However, the t
> distribution has more area in the tails and less in the center than does
> the normal distribution. This is because sigma is unknown and we are using
> s to estimate it. Because we are uncertain of the value sigma, the values
> of t that we observe will be more variable than for Z."
>
> So, I conclude 1) we use z when we know the sigma and either the data is
> normally distributed or the sample size is greater than 30 so we can use
> the central limit theorem.
>
> 2) When n<30 and the data is normally distributed, we use t.
>
> 3) When n is greater than 30 and we do not know sigma, we must estimate
> sigma using s so we really should be using t rather than z.
>
> Now, every single business statistics book I have examined, including the
> four referenced below, use z values when performing hypothesis testing or
> computing confidence intervals when n>30.
>
> Are they
>
> 1. Wrong
> 2. Just oversimplifying it without telling the reader
>
> or am 

Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Robert J. MacG. Dawson



Ronny Richardson wrote:
> 

> Are they
> 
> 1. Wrong
> 2. Just oversimplifying it without telling the reader

Neither, really. The MAIN objection to "z over 30" is that it adds an
an unnecessary step to the decision process. If it actually simplified
things greatly I reckon we could live with the slightly wonky p-values
(as we do when we use ANOVA in the knowledge that we do not have perfect
homoscedasticity).  But it makes things more complicated...

A true cynic might say that there is one advantage to keeping the
procedure in the textbooks - it wil occasionally give the readers of
articles warning that the writer has learned statistics by rote.

-Robert Dawson


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Dennis Roberts

At 04:14 AM 12/10/01 +, Jim Snow wrote:
>"Ronny Richardson" <[EMAIL PROTECTED]> wrote in message
>[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
>
> > A few weeks ago, I posted a message about when to use t and when to use z.
>
>I did not see the earlier postings, so forgive me if I repeat advice already
>given.:-)
>
> 1. The consequences of using the t distribution instead of the normal
>distribution for sample sizes greater than 30 are of no importance in
>practice.

what's magical about 30? i say 33 ... no actually, i amend that to 28

> 2. There is no good reason for statistical tables for use in practical
>analysis of data to give figures for t on numbers of degrees of freedom over
>30 except that it makes it simple to routinely use one set of tables when
>the variance is estimated from the sample.

with software, there is no need for tables ... period!


> 3. There are situations where the error variance is known. They
>generally arise when the errors in the data arise from the use of a
>measuring instrument with known accuracy or when the figures available are
>known to be truncated to a certain number of decimal places. For example:
> Several drivers use cars in a car pool. The distance tavelled on each
>trip by a driver is recorded, based on the odometer reading. Each
>observation has an error which is uniformly distributed in (0,0.2). The
>variance of this error is (0.2)^2)/12  = .00  and standard deviation
>0.0578  . To calculate confidence limits for the average distance travelled
>by each driver, the z statistic should be used.

this is pure speculation ... i have yet to hear of any convincing case 
where the variance is known but, the mean is not


_
dennis roberts, educational psychology, penn state university
208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED]
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Gus Gassmann

Dennis Roberts wrote:

> this is pure speculation ... i have yet to hear of any convincing case
> where the variance is known but, the mean is not

What about that other application used so prominently in texts of
business statistics, testing for a proportion?





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Jon Cryer

But then you should use a binomial (or hypergeometric)
distribution.
Jon Cryer
p.s. Of course, you might approximate
by an appropriate normal distribution.
At 11:39 AM 12/10/01 -0400, you wrote:
Dennis Roberts wrote:
> this is pure speculation ... i have yet to hear of any convincing
case
> where the variance is known but, the mean is not
What about that other application used so prominently in texts of
business statistics, testing for a proportion?
=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
 
http://jse.stat.ncsu.edu/
=



Jon Cryer, Professor Emeritus
Dept. of Statistics
www.stat.uiowa.edu/~jcryer

 and Actuarial Science   office 319-335-0819
The University of Iowa home   319-351-4639
Iowa City, IA 52242    FAX    319-335-3017 
"It ain't so much the things we don't know that get us into trouble. 
It's the things we do know that just ain't so." --Artemus Ward 


Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Jerry Dallal

Dennis Roberts wrote:

> this is pure speculation ... i have yet to hear of any convincing case
> where the variance is known but, the mean is not

A scale (weighing device) with known precision.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Jon Cryer

I always thought that the precision of a scale was
proportional
to the amount weighed. So don't you have to know the mean
before you
know the standard deviation? But wait a minute - we are trying
assess
the size of the mean!
Jon Cryer
At 03:42 PM 12/10/01 +, you wrote:
Dennis Roberts wrote:
> this is pure speculation ... i have yet to hear of any convincing
case
> where the variance is known but, the mean is not
A scale (weighing device) with known precision.

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
 
http://jse.stat.ncsu.edu/
=



Jon Cryer, Professor Emeritus
Dept. of Statistics
www.stat.uiowa.edu/~jcryer

 and Actuarial Science   office 319-335-0819
The University of Iowa home   319-351-4639
Iowa City, IA 52242    FAX    319-335-3017 
"It ain't so much the things we don't know that get us into trouble. 
It's the things we do know that just ain't so." --Artemus Ward 


Re: Cramer-von-Mises Criterion

2001-12-10 Thread Clay S. Turner


You have probably thought of this, but the age old standard is the Chi
Square test.

One thing about empirical distributions is that they may not be one of
the standard forms.  This is why the Jackknife method and then later the
Bootstrapping methods were developed. Thus you can extract the
distribution for your data set.

Clay



Chia C Chong wrote:
> 
> Hi!
> 
> Any idea where can I get good reference about the Cramer-von-Mises
> criterion??
> 
> I am trying to test the goodness-of-fit between the some theoretical
> distributions with the emprical distribution om my data.
> 
> Any other suggestions on goodness-of-fit tests are welcomed and
> appreciated
> 
> Thanks.
> 
> Cheers,
> CCC



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Art Kendall

the sample mean of the dichotomous (one_zero, dummy) variable is known, It
is the proportion.

Gus Gassmann wrote:

> Dennis Roberts wrote:
>
> > this is pure speculation ... i have yet to hear of any convincing case
> > where the variance is known but, the mean is not
>
> What about that other application used so prominently in texts of
> business statistics, testing for a proportion?



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Gus Gassmann

Art Kendall wrote:

(putting below the previous quotes for readability)

> Gus Gassmann wrote:
>
> > Dennis Roberts wrote:
> >
> > > this is pure speculation ... i have yet to hear of any convincing case
> > > where the variance is known but, the mean is not
> >
> > What about that other application used so prominently in texts of
> > business statistics, testing for a proportion?

> the sample mean of the dichotomous (one_zero, dummy) variable is known, It
> is the proportion.

Sure. But when you test Ho: p = p0, you know (or pretend to  know) the
population variance. So if the CLT applies, you should use a z-table, no?





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Art Kendall

Usually I would use software.  As I tried to show is the sample syntax I posted
earlier, it doesn't usually make much difference whether you use z or t.

Gus Gassmann wrote:

> Art Kendall wrote:
>
> (putting below the previous quotes for readability)
>
> > Gus Gassmann wrote:
> >
> > > Dennis Roberts wrote:
> > >
> > > > this is pure speculation ... i have yet to hear of any convincing case
> > > > where the variance is known but, the mean is not
> > >
> > > What about that other application used so prominently in texts of
> > > business statistics, testing for a proportion?
>
> > the sample mean of the dichotomous (one_zero, dummy) variable is known, It
> > is the proportion.
>
> Sure. But when you test Ho: p = p0, you know (or pretend to  know) the
> population variance. So if the CLT applies, you should use a z-table, no?



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Cramer-von-Mises Criterion

2001-12-10 Thread Chia C Chong

Hi!!

Thanks for your reply...do you mean that Jackknife and Bootstrapping methods
area also some kind of goodness-of-fit tests??

Cheers,
CCC

"Clay S. Turner" <[EMAIL PROTECTED]> wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
>
> You have probably thought of this, but the age old standard is the Chi
> Square test.
>
> One thing about empirical distributions is that they may not be one of
> the standard forms.  This is why the Jackknife method and then later the
> Bootstrapping methods were developed. Thus you can extract the
> distribution for your data set.
>
> Clay
>
>
>
> Chia C Chong wrote:
> >
> > Hi!
> >
> > Any idea where can I get good reference about the Cramer-von-Mises
> > criterion??
> >
> > I am trying to test the goodness-of-fit between the some theoretical
> > distributions with the emprical distribution om my data.
> >
> > Any other suggestions on goodness-of-fit tests are welcomed and
> > appreciated
> >
> > Thanks.
> >
> > Cheers,
> > CCC
>




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Cramer-von-Mises Criterion

2001-12-10 Thread Clay S. Turner

Hello Chia,
No actually they are used to extract the distribution from the data. 
They do this by a process known as resampling.

Clay



Chia C Chong wrote:
> 
> Hi!!
> 
> Thanks for your reply...do you mean that Jackknife and Bootstrapping methods
> area also some kind of goodness-of-fit tests??
> 
> Cheers,
> CCC
> 
> "Clay S. Turner" <[EMAIL PROTECTED]> wrote in message
> [EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> >
> > You have probably thought of this, but the age old standard is the Chi
> > Square test.
> >
> > One thing about empirical distributions is that they may not be one of
> > the standard forms.  This is why the Jackknife method and then later the
> > Bootstrapping methods were developed. Thus you can extract the
> > distribution for your data set.
> >
> > Clay
> >
> >
> >
> > Chia C Chong wrote:
> > >
> > > Hi!
> > >
> > > Any idea where can I get good reference about the Cramer-von-Mises
> > > criterion??
> > >
> > > I am trying to test the goodness-of-fit between the some theoretical
> > > distributions with the emprical distribution om my data.
> > >
> > > Any other suggestions on goodness-of-fit tests are welcomed and
> > > appreciated
> > >
> > > Thanks.
> > >
> > > Cheers,
> > > CCC
> >



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



What is the difference between Statistics and Mathematical Statistics?

2001-12-10 Thread Andreas Karlsson

What is (are) the difference(s) between Statistics and Mathematical
Statistics?


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Jon Cryer

Only as an approximation.

At 12:57 PM 12/10/01 -0400, you wrote:
>Art Kendall wrote:
>
>(putting below the previous quotes for readability)
>
> > Gus Gassmann wrote:
> >
> > > Dennis Roberts wrote:
> > >
> > > > this is pure speculation ... i have yet to hear of any convincing case
> > > > where the variance is known but, the mean is not
> > >
> > > What about that other application used so prominently in texts of
> > > business statistics, testing for a proportion?
>
> > the sample mean of the dichotomous (one_zero, dummy) variable is known, It
> > is the proportion.
>
>Sure. But when you test Ho: p = p0, you know (or pretend to  know) the
>population variance. So if the CLT applies, you should use a z-table, no?
>
>
>
>
>
>=
>Instructions for joining and leaving this list and remarks about
>the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
>=



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: What is the difference between Statistics and Mathematical Statistics?

2001-12-10 Thread Kevin C. Heslin

Mathematical statistics will require that you take 5, rather than 2, Advil
or Tylenol.



At 06:24 PM 12/10/2001 +, Andreas Karlsson wrote:
>What is (are) the difference(s) between Statistics and Mathematical
>Statistics?
>
>
>=
>Instructions for joining and leaving this list and remarks about
>the problem of INAPPROPRIATE MESSAGES are available at
>  http://jse.stat.ncsu.edu/
>=




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Gus Gassmann

Jon Cryer wrote:

>  But then you should use a binomial (or hypergeometric) distribution.
>
> Jon Cryer
>
> p.s. Of course, you might approximate by an appropriate normal
> distribution.

Quite, and then you are in a situation where you know (or at least
pretend to know)
the population variance, the situation Dennis Roberts was interested
in.



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Dennis Roberts

At 03:42 PM 12/10/01 +, Jerry Dallal wrote:
>Dennis Roberts wrote:
>
> > this is pure speculation ... i have yet to hear of any convincing case
> > where the variance is known but, the mean is not
>
>A scale (weighing device) with known precision.

as far as i know ... knowing the precision is expressed in terms of ... 
'accurate to within' ... and if there is ANY 'within' attached ... then 
accuracy for SURE is not known





>=
>Instructions for joining and leaving this list and remarks about
>the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
>=

_
dennis roberts, educational psychology, penn state university
208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED]
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Slutsky's theorem

2001-12-10 Thread Herman Rubin

In article <[EMAIL PROTECTED]>,
kjetil halvorsen <[EMAIL PROTECTED]> wrote:
>Slutsky's theorem says that if Xn ->(D) X and Yn ->(P) y0, y0 a
>constant, then

>Xn + Yn ->(D) X+y0. It is easy to make a counterexample if both Xn and
>Yn converges in distribution. Anybody have an counterexample when Yn
>converges in probability to a non-constant random variable?

This is very easy.  For example, let Xn all be the same 
nontrivial normal random variable Z, let Yn and y0 be -Z,
and let X be an independent normal random variable.  Then 
Xn + Yn = 0 for all n, but X - Z is not zero.


-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Rich Ulrich

On Mon, 10 Dec 2001 12:57:29 -0400, Gus Gassmann
<[EMAIL PROTECTED]> wrote:

> Art Kendall wrote:
> 
> (putting below the previous quotes for readability)
> 
> > Gus Gassmann wrote:
> >
> > > Dennis Roberts wrote:
> > >
> > > > this is pure speculation ... i have yet to hear of any convincing case
> > > > where the variance is known but, the mean is not
> > >
> > > What about that other application used so prominently in texts of
> > > business statistics, testing for a proportion?
> 
> > the sample mean of the dichotomous (one_zero, dummy) variable is known, It
> > is the proportion.
GG > 
> Sure. But when you test Ho: p = p0, you know (or pretend to  know) the
> population variance. So if the CLT applies, you should use a z-table, no?
> 

That is the textbook justification for chi-squared and z  tests
in the sets of 'nonparametric tests'  which are based on 
rank-order transformations and dichotomizing.

The variance is known, so the test statistic has the shorter tails.
(It works for ranks when you don't have ties.)

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



test-ignore

2001-12-10 Thread Jim Snow

test please ignore




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Sorry for question, but how is the english word for @

2001-12-10 Thread Richard Wright

The name given to the symbol @ in international standard character
sets is 'commercial at'.

See

http://www.quinion.com/words/articles/whereat.htm

for a history of the symbol.

Richard Wright



On Mon, 10 Dec 2001 23:34:19 +0100, "Nathaniel" <[EMAIL PROTECTED]>
wrote:

>Hi,
>
>Sorry for question, but how is the english word for @
>Pleas forgive me.
>
>N.
>
>



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



¿ìÀ´²Î¼Ó£¬¡°»Û´ÏÍøÉñ¡±Óн±×¢²á»î¶¯~¡¤~

2001-12-10 Thread »Û´ÏÍøÉñ
Title: Ç×°®µÄÅóÓÑ








  

  
  
  

  
  



  


  
  
Ç×°®µÄÅóÓÑ£º
ÄãºÃ!»Û´Ï¹ú¼Ê×ÊѶÓÐÏÞ¹«Ë¾×öΪÖйúÊ×ϯµÄÉÌÎñ×ÊѶ·þÎñÉÌ£¬ÓÚ½ñÄêÊ®ÔÂÍƳöÁ˺óÃÅ»§Ê±´úµÄ¾«Æ·ÍøÂçÐÅÏ¢Èí¼þ--¡°»Û´ÏÍøÉñ¡±£¬ 
ËüÊÇÒ»ÖÖ°ïÖúÓû§¸ßЧ»ñÈ¡¡¢¶©ÖÆÍøÂçÐÅÏ¢µÄÈí¼þ²úÆ·£¬Í»³öÌصãÊǾ߱¸¹ã¶ÈµÄÐÅÏ¢»ñÈ¡¡¢¼´Ê±ÐÅÏ¢·¢ËÍ¡¢¸öÐÔÐÅÏ¢¶¨ÖÆ¡¢Ç¿´óµÄËÑË÷¹¦ÄÜ£¬ÎªÓû§ÌṩÍêÈ«¸öÐÔ»¯µÄÐÅÏ¢·þÎñ¿Õ¼ä¡£»Û´ÏÍøÉñµÄÍƳöµÃµ½ÁËÒµ½çµÄÒ»ÖºÃÆÀ£¬ÏàÐÅËü»á³ÉΪ¸öÈËÁ¼ºÃµÄÍøÂçÐÅÏ¢±Ø±¸¹¤¾ß£¬ÎªÁ˸Ðл¸÷·½ÃæÈËÊ¿¶Ô»Û´ÏÍøÉñµÄÖ§³Ö£¬ÎÒ¹«Ë¾¾ö¶¨ÍƳöΪÆÚÒ»¸öÔµÄÓн±×¢²á»î¶¯£¬Ê±¼ä´Ó11ÔÂ26ÈÕÖÁ12ÔÂ31ÈÕ£¬Ò»µÈ½±Îª3600ÔªµÄ»Û´ÏÍøÉñ¿¨£¬¶þµÈ½±Îª68ÔªµÄ»Û´ÏÍøÉñ¿¨¡£  
  
ÅóÓÑ£¬Õæ³Ï×£¸£Ä㣬»Û´ÏÍøÉñ--ºÃÔË°éËæÄú!
  

   
 
       

  
  


 ÁªÏµ·½Ê½£º[EMAIL PROTECTED]   
  


  
    
  

  

  







Re: Sorry for question, but how is the english word for @

2001-12-10 Thread Art Kendall

"at"usually indicate  some kind of rate or unit price 10 pounds @ $1
per pound

on the net is is used as a separator between the id of an individual and
his/her location

[EMAIL PROTECTED] id spoken as john dot smith at harvard dot e d u.

until the early-80's or so dot was spoken as point as in filname point
ext (extension indicating type).  Sometimes addresses were  given as
john.smith at harvard.edu
Nathaniel wrote:

> Hi,
>
> Sorry for question, but how is the english word for @
> Pleas forgive me.
>
> N.



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



RE: Question about concatenating probability distributions

2001-12-10 Thread David Heiser


RE: The Poisson process and Lognormal action time.

This kind of problem arises a lot in the actuarial literature (a
process for the number of claims and a process for the claim size),
and the Poisson and the lognormal have been used in this context - it
might be worth your while to look there for results.

Glen
...
This is a very general and important event process. It is also used to
describe
the general failure-repair process that occurs at any repair shop. The
Poisson
is a good approximation of the arrival times of equipment to be repaired,
and
the log-normal is a good approximation of the time it takes to repair it.

>From an operations standpoint, the downtime is approximated by the
exponential
distribution (occurrence) and a log-normal repair time, which includes
diagnosis,
replacement and validation.

In the Air Force (1982-1995) where the reliability and maintainability of
equipment has to be
characterized, the means are determined and used in a form called
availability.
We never got beyond the use of availability. They never got into the
distribution and confidence interval aspects.

As a general approximation, the log-normal distribution approximates human
reaction times to events.

 DAHeiser



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



RE: When does correlation imply causation?

2001-12-10 Thread David Heiser



-Original Message-
Dennis Roberts makes a good point here

>i repeat ... the r value shows the extent to which a straight line (in a 2
>variable problem) can pass through a scatterplot and, be close TO the data
>points

>in that sense, r is an index value for the extent to which a straight line
>MODEL fits the data ...

>knowing how the dots on the scatterplot got to be ... is totally outside
>the realm of what r can know

We are however looking at a very large area of practice here involved in
causality.

The fundamental stating point is measurement theory, which takes this as the
statement of reality.
X(obs) = X(true) + e(uncertainty, error ...)

This applies to either experimental or observational data.

Then we take a theory regarding two different variables and say
Y(true) = a * X(true) + e(effects from unknown variables, unknown
probabilistic structures,  etc..)

And combine, ending up with
Y(obs) = a * X(obs) + E(a combinations of the two e's from above)

Then with the assumption that E is uncorrelated with either X(obs) or
Y(obs), we make a statement that X causes Y.
(Stan Muliak's position)

My point here is that we are imposing a directionality on a basic
mathematical structure of EQUIVALENCE. From this viewpoint X(obs) = Y(obs)/a
+ error terms is equally true.

Mathematically we have not shown any causality, only an equivalence
expressed as a correlation coefficient. The linear expression (regression)
is strictly a convenience in terms of how we understand what is X and Y.

Observational data is the primary source. It is not unusual to have data
sets exceeding 5000 observations. As such asymptotic probability methods are
used to make conclusions. It is difficult to argue about whether r=0.1 is
significant causality or not when there are 2 observations. If one is
able to target a market segment here, the money is "go with it".

DAHeiser




At 10:06 AM 12/6/01 -0700, Alex Yu wrote:

>Whether we can get causal inferences out of correlation and equations has
>been a dispute between two camps:
>
>For causation: Clark Glymour (Philosopher), Pearl (Computer scientist),
>James Woodward (Philosopher)
>
>Against: Nancy Cartwright (Economist and philosopher), David Freedman
>(Mathematician)
>
>One comment fromm this list is about that causal inferences cannot be
>drawn from non-experimental design. Clark Glymour asserts that using
>Causal Markov condition and faithfulness assumption, we can make causal
>interpretation to non-experimental data.
>
>***
*
>Chong-ho (Alex) Yu, Ph.D., MCSE, CNE, CCNA
>Psychometrician and Data Analyst
>Assessment, Research and Evaulation
>Cisco Systems, Inc.
>Email: [EMAIL PROTECTED]
>URL:http://seamonkey.ed.asu.edu/~alex/
>***
*
>
>
>
>=
>Instructions for joining and leaving this list and remarks about
>the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
>=

_
dennis roberts, educational psychology, penn state university
208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED]
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Vadim and Oxana Marmer

besides, who needs those tables? we have computers now, don't we?
I was told that there were tables for logarithms once. I have not seen one
in my life. Is not it the same kind of stuff?

>
>   3.  Outdated.
>
> on the grounds that when sigma is unknown, the proper distribution is t
> (unless N is small and the parent population is screwy) regardless how
> large the sample size may be.  The main (if not the only) reason for the
> apparent logical bifurcation at N = 30 or thereabouts was that, when
> one's only sources of information about critical values were printed
> tables, 30 lines was about what fit on one page (plus maybe a few extra
> lines for 40, 60, 120 d.f.) and one could not (or at any rate did not)
> expect one's business students to have convenient access to more
> extensive tables of the t distribution.  And, one suspects latterly,
> authors were skeptical that students would pay attention to (or perhaps
> be able to master?) the technique of interpolating by reciprocals between
> 30 df and larger numbers of df (particularly including infinity).
>
> But currently, _I_ would not expect business students to carry out the
> calculations for hypothesis tests, or confidence intervals, by hand,
> except maybe half a dozen times in class for the good of their souls:
> I'd expect them to learn to invoke a statistical package, or else
> something like Excel that pretends to supply adequate statistical
> routines.  And for all the packages I know of, there is a built-in
> function for calculating, or approximating, the cumulative distribution
> of t for ANY number of df.  The advice in any _current_ business-
> statistics text ought to be, therefore, to use t _whenever_ sigma is not
> known.  And if the textbook isn't up to that standard, the instructor
> jolly well should be.
>



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Vadim and Oxana Marmer

> 3) When n is greater than 30 and we do not know sigma, we must estimate
> sigma using s so we really should be using t rather than z.


you are wrong. you use t-distribution not because you don't know sigma,
but because your statistic has EXACT t-distribution under certain
conditions. I know that the textbook says "if we knew sigma then the
distribution would be normal, but because we used s instead the
distribution turned out to be t". It does not say how exactly it becomes
t, so you make the conclusion: use t instead of normal whenever you use s
instead of sigma. But it's wrong, it does not go like this.

when you don't know underlying distribution of the sample you may use
normal distribution (under certain regularity conditions),
as an APPROXIMATION to the actual distribution of your statistic.
approximate distribution in most cases is not parameter-free, it may
depend, for example, on unknown sigma. in such situation you may replace
the
unknown parameter by its consistent estimator.the  approximate
distribution is
still normal. think about it as iterated approximation. first you
approximate the actual distribution by N(0,sigma^2), then you approximate
it by N(0,S^2), where S^2 is a consistent estimator for sigma. there are
formal theorems that allow you to do this kind of thigs.

The essential difference between two approaches is that the first one
tries to derive the
EXACT disribution, second says I will use APPROXIMATION.

number 30 has no importance at all, throw away all the tables you have. I
cannot believe they still teach you this stuff. I wish it was that
simle:30!

Your confusion is the result of oversimplification and desire to provide
students with simple stratagies which present in basic statistics
textbooks. I guess it makes teaching very simple, but it mislead students.
Your confusion is an example. The problem is that there is no simple strategies,
and things are much-much more complicated than they appear in basic textbooks.
Basic text books don't tell you the whole story, and they don't even try,
because you simply cannot do this at their level. Don't make any strong
conclusions after reading only basic textbooks.

In practice, in business and economics statistics, nobody uses
t-tests, but normal and chi-square approximations are used a lot. The
assumptions that you have to make for t-test are too strong.







=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Vadim and Oxana Marmer

>
> Sigma is hardly ever known, so you must use t. Then why not simply tell
> the students: "use the t table as far as it goes, (usually around
> n=120), and after that, use the n=\infty line (which corresponds to the
> normal distribution). Then there is no need for a rule for "when to use
> z, when to use t".
>

but the data is not normal either in 99.9(9) of the cases. Furthermore,
the data that you see in economics/business is very often is not  an iid
sample either. So, one way or another you end up with normal or
chi-square.

actually, there is an alternative to both approaches. it's bootstrap. but
it does not always work and should not be used blindly.



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=