Re: E as a % of a standard deviation

2001-09-29 Thread John Jackson

Here is my solution using figures which are self-explanatory:

Sample Size Determination

pi = 50%  central area 0.99
confid level= 99% 2 tail area 0.5
sampling error 2%  1 tail area 0.025
z =2.58
n1  4,146.82  Excel function for determining central interval
NORMSINV($B$10+(1-$B$10)/2)
n  4,147

The algebraic formula for n was:   n = ?(1-?)*(z/e)2



If you can't read the above:

  n = pi(1-pi)*(z/e)^2

  Let me know if this makes sense.



It is simply amazing to me that you can do a random sample of 4,147 people
out of 50 million and get a valid answer. What is the reason for taking
mulitple samples of the same n - to achieve more accuracy?  Is there a rule
of thumb on how many repetitions of the same sample you would take?



"John Jackson" <[EMAIL PROTECTED]> wrote in message
s1ot7.61225$[EMAIL PROTECTED]">news:s1ot7.61225$[EMAIL PROTECTED]...
> Donald - Thank you for your cogent explanation of a concept that is a bit
> hard to grasp.
> After researching it more, I determined that there is a gaping hole in my
> knowldege relating to the area of inferences on a population proportion so
I
> am somethat admittedly in the dark and have to study up a bit.
>
> Having said that, here are some answers to ?s you posed and some
additional
> comments.
>
> Instead of a warehouse full of CDs, lets work w/a much larger population.
>
> Revised fact pattern:
>
> Suppose you want to estimate the % of voters who acutally  voted in the
2000
> U.S. Presidential election who failed to make a choice for any candidate
> (blank ballot).  Assume (forgetting about politics) that this was simply a
> matter of inadvertance, error on the part of the voter, that all voting
> machines worked properly, and that the problem manifested itself the same
> way all over the country. You want to estimate how many ballots were blank
> and be 98% confident that the error of estimate is 2% or less. So you have
a
> universe of 50m voters or however many went to the polls. Assume you don't
> really know if its is 50m or 75m or 100m. You just know its in the tens of
> millions.
>
> So you want to estimate the proportion of blank ballots, knowing that a
huge
> number of people went to the polls.  You mention and I see it stated in
some
> books that when you don't know the SD and don't know the exact population
> size, other than that is in the millions, the safest choice is p = .5 -
that
> apparently is a sort of worse case scenario it seems. I have to
> reread my material and also revisit the binomial distribution area which I
> have studied extensively. However that knowledge has been pushed out of
the
> way by this complex area of sampling.
>
> Anyway, if you have some further thoughts given my clarification, I would
> welcome your insights.
>
>
> "Donald Burrill" <[EMAIL PROTECTED]> wrote in message
> [EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> > On Fri, 28 Sep 2001, John Jackson wrote in part:
> >
> > > My formula is a rearrangement of the confidence interval formula shown
> > > below for ascertaining the maximum error.
> > E = Z(a/2) x SD/SQRT N
> > > The issue is you want to solve for N, but you have no standard
> > > deviation value.
> > Oh, but you do.  In the problem you formulated, unless I
> > misunderstood egregiously, you are seeking to estimate the proportion of
> > defective (or pirated, or whatever) CDs in a universe of 10,000 CDs.
> > There is then a maximum value for the SD of a proportion:
> > SD = SQRT[p(1-p)/n]
> > where  p  is the proportion in question,  n  is the sample size.
> > This value is maximized for  p = 0.5  (and it doesn't change much
> > between  p = 0.3  and  p = 0.7 ).  If you have a guess as to the value
> > of  p,  you can get a smaller value of  SD,  but using  p = 0.5  will
> > give you a conservative estimate.
> > You then have to figure out what that "5% error" means:  it might
> > mean "+/- 0.05 on the estimated proportion p" (but this is probably not
a
> > useful error bound if, say, p = 0.03), or it might mean "5% of the
> > estimated proportion" (which would mean +/- 0.0015 if p = 0.03).
> > (In the latter case, E is a function of p, so the formula for n
> > can be solved without using a guesstimated value for p until the last
> > step.)
> > Notice that throughout this analysis, you're using the normal
> > distribution as an approximation to the binomial b(n,p;k) distribution
> > that presumably "really" applies.  That's probably reasonable;  but the
> > approximation may be quite lousy if  p  is very close to 0 (or 1).
> > Thbe thing is, of course, that if there is NO pirating of the CDs, p=0,
> > and this is a desirable state of affairs from your clients' perspective.
> > So you might want to be in the business of expressing the minimum  p
> > that you could expect to detect with, say, 80% probability, using the
> > sample size eventually chosen

Re: definition of " metric" as a noun

2001-09-29 Thread Neville X. Elliven

Herman Rubin wrote:

>The OED cites the following use of metric as a noun:
>1921 Proc. R. Soc. A. XCIX. 104 "In the non-Euclidean
>geometry of Riemann, the metric is defined by certain quantities . . 

A good example of bad usage: *what* metric, *what* quantities?
The reader should not be left hanging with those questions unanswered.
>>>
>>>This is not bad usage at all.  In mathematics, the word
>>>"metric" as a noun refers to a general type of distance,
>>>not necessarily the type in common use.
>>
>>It is certainly bad usage, for the following reason: the phrase,
>>"the metric", implies that there is *one* metric function on
>>Riemannian geometry, which is false. This reason has nothing
>>to do with distance measure in general, as commonly understood,
>>or otherwise.
>
>It is not bad usage, because a PARTICULAR Riemannian
>geometry is given by a particular metric; in fact, by the
>local quadratic form defining the differential metric.

It *is* bad usage, because it requires the type of exegesis you
have just provided to make it meaningful.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



chaos vs LRD

2001-09-29 Thread Priya Ranjan


Hi All, 
Is anyone aware of a chaotic map which generates a LRD type 
distribution? Is the distribution generated by logistic map which is
f(x)=1/(pi*sqrt(x(1-x))) for r=4 (full chaos) exhibits LRD ? Is there a 
proof for this fact?
thanks, 
-priya 



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: definition of " metric" as a noun

2001-09-29 Thread Gordon D. Pusch

[EMAIL PROTECTED] (Neville X. Elliven) writes:

> Herman Rubin wrote:
> 
>> The OED cites the following use of metric as a noun:
>> 1921 Proc. R. Soc. A. XCIX. 104 "In the non-Euclidean
>> geometry of Riemann, the metric is defined by certain quantities ...
>
> A good example of bad usage: *what* metric, *what* quantities?
> The reader should not be left hanging with those questions unanswered.

 This is not bad usage at all.  In mathematics, the word
 "metric" as a noun refers to a general type of distance,
 not necessarily the type in common use.
>>>
>>> It is certainly bad usage, for the following reason: the phrase,
>>> "the metric", implies that there is *one* metric function on
>>> Riemannian geometry, which is false. This reason has nothing
>>> to do with distance measure in general, as commonly understood,
>>> or otherwise.
>>
>> It is not bad usage, because a PARTICULAR Riemannian
>> geometry is given by a particular metric; in fact, by the
>> local quadratic form defining the differential metric.
> 
> It *is* bad usage, because it requires the type of exegesis you
> have just provided to make it meaningful.

Statisticians, of course, always use completely unambiguous terminology 
that anyone can intuitively understand without need for exegesis --- 
like ``statistically significant at the 95% confidence level''... ;-)


-- Gordon D. Pusch   

perl -e '$_ = "gdpusch\@NO.xnet.SPAM.com\n"; s/NO\.//; s/SPAM\.//; print;'


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: E as a % of a standard deviation

2001-09-29 Thread Donald Burrill

On Sun, 30 Sep 2001, John Jackson wrote:

> Here is my solution using figures which are self-explanatory:
> 
> Sample Size Determination
> 
> pi = 50%  central area 0.99
> confid level= 99% 2 tail area 0.5
> sampling error 2%  1 tail area 0.025
> z =2.58
> n1  4,146.82  Excel function for determining central interval
> NORMSINV($B$10+(1-$B$10)/2)
> n  4,147
> 
> The algebraic formula for n was:  
> 
>   n = pi(1-pi)*(z/e)^2
> 
> 
> It is simply amazing to me that you can do a random sample of 4,147 
> people out of 50 million and get a valid answer. 

It is not clear what part of this you find "amazing".  
(Would you otherwise expect an INvalid answer, in some sense?)
Thme hard part, of course, is taking the random sample in the first 
place.  The equation you used, I believe, assumes a "simple random 
sample", sometimes known in the trade as a SRS;  but it seems to me 
VERY unlikely that any real sampling among the ballots cast in a 
national election would be done that way.  I'd expect it to involve 
stratifying on (e.g.) states, and possibly clustering within states; 
both of which would affect the precision of the estimate, and therefore 
the minimum sample size desired.
As to what may be your concern, that 4,000 looks like a small 
part of 50 million, the precision of an estimate depends principally 
on the amount of information available -- that is, on the size of the 
sample;  not on the proportion that amount bears to the total amount 
of information that may be of interest.  Rather like a hologram, in 
some respects;  and very like the resolving power of an optical 
instrument (e.g., a telescope), which is a function of the amount of 
information the instrument can receive (the area of the primary lens 
or reflector), not on how far away the object in view may be nor what 
its absolute magnitude may be.

> What is the reason for taking multiple samples of the same n - 
> to achieve more accuracy? 

I, for one, don't understand the point of this question at all.
Multiple samples?  Who takes them, or advocates taking them?

< snip, the rest >

 
 Donald F. Burrill [EMAIL PROTECTED]
 184 Nashua Road, Bedford, NH 03110  603-471-7128



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Help with Minitab Problem?

2001-09-29 Thread Donald Burrill

I second Dennis' question.  While indeed "MINITAB recognizes the missing
values", what it does with them depends on the procedure being used: 
e.g., for CORRelation it uses all cases for which each pair of variables
is complete ("pairwise deletion of missing data"), and therefore, for a
data set like yours, the numbers of cases (as well as the particular set
of cases) used for each correlation coefficient are possibly different; 
whereas for REGRession, where any of the variables named on the REGRession
command is missing, the case is deleted ("listwise deletion").  Whether it
is even useful to construct a subset of the data for which all variables
are non-missing depends on how badly infected the variables are with
missing data, and on whether the missing data occur in (useful?) patterns. 
If you have about 10% missing in each column, unsystematically spread
through the set of columns, you could end up with a subset containing zero
cases. 
To answer your question however, on the (possibly unjustified) 
assumption that it's a useful thing to do:

COPY c1-c35 to c41-c75;   #  Always retain the original data
OMIT c1 = '*';
OMIT c2 = '*';
. . . ;
OMIT c35 = '*'.

There is probably a limit on the number of subcommands that MINITAB 
can handle (or on the number of OMIT subcommands that COPY can handle), 
but I don't know offhand what it is.  (It is also imaginable that the 
OMIT subcommand permits naming more than one column, which would greatly 
simplify things, but I am inclined to suspect not.)  If 35 subcommands 
are too many, proceed in batches of, say, 10 (or whatever):  
copy c1-c35 to c41-c75, omitting '*" in c1-c10;  
then copy c41-c75 to c81-c115, omitting '*' in c51-c60;  
then copy c81-c115 back to c41-c75, omitting '*' in c101-c110; 
then copy c41-c75 to c81-c115, omitting '*' in c71-c75.
 Finally, to check that no missing values have been retained, count the 
number of missing values in that set of columns:
NMISS c81
NMISS c82
. . . 
NMISS c115
To avoid having to inspect the result for each column, store the NMISSes 
in 35 constants:
NMISS c81 k1
NMISS c82 k2
. . .
NMISS c115 k35
 copy them into an unused column somewhere (e.g., c116):
COPY k1-k35 c116
 and verify that they're all zero by  
SSQ c116  
which will return "0" iff all values in the colunmn are 0.

An easier way of verifying that there are no missng values in c81-c115 
is to call for the INFO window (or give the INFO command:
INFO c81-c115 )
which will report, inter alia, the number of missing values in each 
column.  (I prefer the command in this situation, to avoid being 
confused by information about columns not relevant to the question.)

On Fri, 28 Sep 2001, John Spitzer wrote:

> I have a dataset which has about 35 column.  Many of the cells have
> missing values.  Since MINITAB recognizes the missing values, I can
> perform the statistical work I need to do and don't need to worry 
> about the missing values. 
Perhaps you "don't need to", but you probably should.

> However, I would like to be able to obtain the subset of observations 
> which MINITAB used for its calculations. 
As remarked above, this subset may vary from one pair of columns 
to another, or from one list of columns to another, depending on the 
calculations being performed.  Yes, you definitely should worry about 
the missing values.

> I would like to be able to create a worksheet with only the rows from 
> my dataset which do NOT contain any missing values.
Which may or may not correspond to any particular subset of the 
data that MINITAB defined for its work.

< snip, hypothetical example >

 
 Donald F. Burrill [EMAIL PROTECTED]
 184 Nashua Road, Bedford, NH 03110  603-471-7128



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: E as a % of a standard deviation

2001-09-29 Thread Donald Burrill

On Fri, 28 Sep 2001, John Jackson wrote in part:

> My formula is a rearrangement of the confidence interval formula shown 
> below for ascertaining the maximum error.
E = Z(a/2) x SD/SQRT N
> The issue is you want to solve for N, but you have no standard 
> deviation value.
Oh, but you do.  In the problem you formulated, unless I 
misunderstood egregiously, you are seeking to estimate the proportion of 
defective (or pirated, or whatever) CDs in a universe of 10,000 CDs. 
There is then a maximum value for the SD of a proportion:  
SD = SQRT[p(1-p)/n]
where  p  is the proportion in question,  n  is the sample size.
This value is maximized for  p = 0.5  (and it doesn't change much 
between  p = 0.3  and  p = 0.7 ).  If you have a guess as to the value 
of  p,  you can get a smaller value of  SD,  but using  p = 0.5  will 
give you a conservative estimate.
You then have to figure out what that "5% error" means:  it might 
mean "+/- 0.05 on the estimated proportion p" (but this is probably not a 
useful error bound if, say, p = 0.03), or it might mean "5% of the 
estimated proportion" (which would mean +/- 0.0015 if p = 0.03). 
(In the latter case, E is a function of p, so the formula for n 
can be solved without using a guesstimated value for p until the last 
step.) 
Notice that throughout this analysis, you're using the normal 
distribution as an approximation to the binomial b(n,p;k) distribution 
that presumably "really" applies.  That's probably reasonable;  but the 
approximation may be quite lousy if  p  is very close to 0 (or 1).
Thbe thing is, of course, that if there is NO pirating of the CDs, p=0, 
and this is a desirable state of affairs from your clients' perspective. 
So you might want to be in the business of expressing the minimum  p 
that you could expect to detect with, say, 80% probability, using the 
sample size eventually chosen:  that is, to report a power analysis.

> The formula then translates into n = (Z(a/2)*SD)/E)^2   
>   Note: ^2 stands for squared.
> 
> You have only the confidence interval, let's say 95% and E of 1%.  
> Let's say that you want to find out how many people in the US have 
> fake driver's licenses using these numbers.  How large (N) must your 
> sample be?

Again, you're essentially trying to estimate a proportion.  (If it is 
the number of instances that is of interest, the distribution is still 
inherently binomial, but instead of  p  you're estimating  np,  with 
SD = SQRT[np(1-p)]
 and you still have to decide whether that 1% means "+/- 0.01 on the 
proportion p" or "1% of the value of np".
-- DFB.
 
 Donald F. Burrill [EMAIL PROTECTED]
 184 Nashua Road, Bedford, NH 03110  603-471-7128



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: p value

2001-09-29 Thread Magenta


"Dennis Roberts" <[EMAIL PROTECTED]> wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> let's say that you do a simple (well executed) 2 group study ...
> treatment/control ... and, are interested in the mean difference ... and
> find that a simple t test shows a p value (with mean in favor of
treatment)
> of .009
>
> while it generally seems to be held that such a p value would suggest that
> our null model is not likely to be correct (ie, some other alternative
> model might make more sense), does it say ANYthing more than that?

You could use it in conjunction with your sample/group sizes to get an idea
of effect size.  For example, if you got that p-value with group sizes of 40
that could be a very interesting result.  However, if each group contained
100,000 subjects it may not be so interesting because the effect size will
be so much smaller.

cheers
Michelle




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



IT TRAINING SCHOLARSHIPS FOR FACULTY, STUDENTS AND STAFF

2001-09-29 Thread digitaldivide


IT TRAINING SCHOLARSHIPS FOR FACULTY, STUDENTS AND STAFF

If you wish to apply for a Personal Computing (300+ IT courses including 
Microsoft Office, Web Design, Lotus etc.) scholarship, click below:
http://www.cyberlearning.org/welcomepc12.htm

If you wish to apply for an IT (650+ courses including the above and 
20+ certifications in Microsoft,Cisco, Novell, Oracle, Web Master etc.) scholarship, 
click below:
http://www.cyberlearning.org/welcomeit12.htm

If you wish to apply for the Harvard Management Skills (24 courses) scholarship, click 
below:
http://www.cyberlearning.org/welcomeharvard.htm

Note: The first 5,000 applicants will receive full tuition scholarships. Every 
twentieth 
applicant will receive a two-for-one deal, allowing him/her to gift free access to a 
friend/family/colleague.

If you are interested in receiving Digital Divide scholarship information on IT (650+ 
courses) 
and Harvard Management Skills (24 courses), please click below:
http://www.cyberlearning.org

Please Note: This mail is not considered a spam as we have included contact 
information and
a remove link. To be removed from our mailing list, reply with "REMOVE" in the subject 
heading
and include your "Original Email Address/Addresses" in the subject heading. We will 
immediately 
update accordingly, and we apologize for any inconvenience.

National Education Foundation CyberLearning
FORD FOUNDATION Leadership Award Nominee
1428 Duke St
Alexandria, VA 22314
Tel: (703) 821-2100


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: What is a confidence interval?

2001-09-29 Thread dennis roberts

At 02:16 AM 9/29/01 +, John Jackson wrote:

>For any random inverval selected, there is a .05% probability that the
>sample will NOT yield an interval that yields the parameter being estimated
>and additonally such interval will not include any values in area
>represented by the left tail.  Can you make different statements about the
>left and right tail?

unless CIs work differently than i think ... about 1/2 the time the CI will 
miss to the right ... and 1/2 the time they will miss to the left ... thus, 
what if we labelled EACH CI with a tag called HIT ... or MISSleft ... or 
MISSright ... for 95% CIs ... the p of grabbing a CI that is HIT from all 
possible is about .95 ... the p for getting MISSleft PLUS MISSright is 
about .05 ... thus, about 1/2 of the .05 will be MISSleft and about 1/2 of 
the .05 will be MISSright

so, i don't see that you can say anything differentially important about 
one end or the other




>"Michael F." <[EMAIL PROTECTED]> wrote in message
>[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> > (Warren) wrote in message:
> >
> > > So, what is your best way to explain a CI?  How do you explain it
> > > without using some esoteric discussion of probability?
> >
> > I prefer to focus on the reliability of the estimate and say it is:
> >
> > "A range of values for an estimate that reflect its unreliability and
> > which contain the parameter of interest 95% of the time in the long run."
>
>
>
>
>=
>Instructions for joining and leaving this list and remarks about
>the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
>=

==
dennis roberts, penn state university
educational psychology, 8148632401
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Confidence intervals

2001-09-29 Thread Herman Rubin

In article <9p2d8l$clk$[EMAIL PROTECTED]>,
Ronald Bloom  <[EMAIL PROTECTED]> wrote:
>Herman Rubin <[EMAIL PROTECTED]> wrote:

>> Teaching people to use something without any understanding
>> can only be ritual; this is what most uses of statistics
>> are these days.  

>> If one does not use numbers, it is opinion.  I hope that the
>> pediatricians you have in your classes do not misuse data in
>> the manner you seem to be suggesting.


>  If one *does* use numbers it still is "opinion".  Anchoring
>numbers and mathematical models to claims about the way in
>which the world behaves (as opposed to claims about the way
>in which *numbers* behave) cannot but involve strong and
>deep assertions; which themselves *cannot* be theorems
>of mathematics.  The specific nature of the tasks to which we
>put "numbers" to work reflects corresponding "opinions" we
>have about the relevance of specific mathematical models,
>and the underlying assumptions corresonding to each, in
>those specific instances.

>The use of Mathematics does not vanquish proto-mathematical hypotheses
>from the empirical questions upon which it is brought to bear.

One of the mistakes that philosophers of science make is that
one can just look at the data.  It is the other way around;
the investigator must make the assumptions using whatever 
information from the field of investigation is available, and
may consider alternative models.  All one can do with the
numbers is to attempt to make decisions based on the assumptions.

The user needs to understand probability and probability
modeling, and to evaluate the consequences of the actions in
the various states of nature, as well as to formulate the
model, which describes how nature generates the data.  It is
only then that the power of mathematics and statistics can
be applied.  It is necessary to make approximations, as we
do not have infinitely large and infinitely fast computers
operating at zero cost; it does not take too large a problem
before we cannot just "put it on the computer".

Now the statistician can help by pointing out where one can
approximate with little risk of adverse consequences, and 
when one cannot, and certainly can advise on how to calculate
the costs and procedures, which may take a numerical analyst
to carry out.  What should be done is not likely to be at all
like what the ritual cookbooks indoctrinate.
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: p value

2001-09-29 Thread Herman Rubin

In article ,
Magenta <[EMAIL PROTECTED]> wrote:

>"Dennis Roberts" <[EMAIL PROTECTED]> wrote in message
>[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
>> let's say that you do a simple (well executed) 2 group study ...
>> treatment/control ... and, are interested in the mean difference ... and
>> find that a simple t test shows a p value (with mean in favor of
>treatment)
>> of .009

>> while it generally seems to be held that such a p value would suggest that
>> our null model is not likely to be correct (ie, some other alternative
>> model might make more sense), does it say ANYthing more than that?

>You could use it in conjunction with your sample/group sizes to get an idea
>of effect size.  For example, if you got that p-value with group sizes of 40
>that could be a very interesting result.  However, if each group contained
>100,000 subjects it may not be so interesting because the effect size will
>be so much smaller.

What should be done is to give the likelihood function,
which contains the relevant information.

One can carry out a simple calculation to show that the
idea of a nearly constant p value is WRONG.  Feel free to
change the model and weights; the results will be somewhat
similar, and this one is easy to calculate without using
numerical methods.

Suppose that one wishes to test that the mean \mu of a
distribution is 0.  The importance of rejecting the
hypothesis if it is true is one; the importance of
accepting the hypothesis if it is false, and \mu lies
in a set of area A, is A/(2\pi).

Let the data be summarized by a normal vector with mean
\mu and covariance matrix vI.  Then it can be shown that
the optimal procedure is to use a p value of v, assuming
v < 1.  If v >= 1, just reject.

-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: What is a confidence interval?

2001-09-29 Thread John Jackson

Great explanation

"dennis roberts" <[EMAIL PROTECTED]> wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> At 02:16 AM 9/29/01 +, John Jackson wrote:
>
> >For any random inverval selected, there is a .05% probability that the
> >sample will NOT yield an interval that yields the parameter being
estimated
> >and additonally such interval will not include any values in area
> >represented by the left tail.  Can you make different statements about
the
> >left and right tail?
>
> unless CIs work differently than i think ... about 1/2 the time the CI
will
> miss to the right ... and 1/2 the time they will miss to the left ...
thus,
> what if we labelled EACH CI with a tag called HIT ... or MISSleft ... or
> MISSright ... for 95% CIs ... the p of grabbing a CI that is HIT from all
> possible is about .95 ... the p for getting MISSleft PLUS MISSright is
> about .05 ... thus, about 1/2 of the .05 will be MISSleft and about 1/2 of
> the .05 will be MISSright
>
> so, i don't see that you can say anything differentially important about
> one end or the other
>
>
>
>
> >"Michael F." <[EMAIL PROTECTED]> wrote in message
> >[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> > > (Warren) wrote in message:
> > >
> > > > So, what is your best way to explain a CI?  How do you explain it
> > > > without using some esoteric discussion of probability?
> > >
> > > I prefer to focus on the reliability of the estimate and say it is:
> > >
> > > "A range of values for an estimate that reflect its unreliability and
> > > which contain the parameter of interest 95% of the time in the long
run."
> >
> >
> >
> >
> >=
> >Instructions for joining and leaving this list and remarks about
> >the problem of INAPPROPRIATE MESSAGES are available at
> >   http://jse.stat.ncsu.edu/
> >=
>
> ==
> dennis roberts, penn state university
> educational psychology, 8148632401
> http://roberts.ed.psu.edu/users/droberts/drober~1.htm
>
>
>
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: E as a % of a standard deviation

2001-09-29 Thread John Jackson

Donald - Thank you for your cogent explanation of a concept that is a bit
hard to grasp.
After researching it more, I determined that there is a gaping hole in my
knowldege relating to the area of inferences on a population proportion so I
am somethat admittedly in the dark and have to study up a bit.

Having said that, here are some answers to ?s you posed and some additional
comments.

Instead of a warehouse full of CDs, lets work w/a much larger population.

Revised fact pattern:

Suppose you want to estimate the % of voters who acutally  voted in the 2000
U.S. Presidential election who failed to make a choice for any candidate
(blank ballot).  Assume (forgetting about politics) that this was simply a
matter of inadvertance, error on the part of the voter, that all voting
machines worked properly, and that the problem manifested itself the same
way all over the country. You want to estimate how many ballots were blank
and be 98% confident that the error of estimate is 2% or less. So you have a
universe of 50m voters or however many went to the polls. Assume you don't
really know if its is 50m or 75m or 100m. You just know its in the tens of
millions.

So you want to estimate the proportion of blank ballots, knowing that a huge
number of people went to the polls.  You mention and I see it stated in some
books that when you don't know the SD and don't know the exact population
size, other than that is in the millions, the safest choice is p = .5 - that
apparently is a sort of worse case scenario it seems. I have to
reread my material and also revisit the binomial distribution area which I
have studied extensively. However that knowledge has been pushed out of the
way by this complex area of sampling.

Anyway, if you have some further thoughts given my clarification, I would
welcome your insights.


"Donald Burrill" <[EMAIL PROTECTED]> wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> On Fri, 28 Sep 2001, John Jackson wrote in part:
>
> > My formula is a rearrangement of the confidence interval formula shown
> > below for ascertaining the maximum error.
> E = Z(a/2) x SD/SQRT N
> > The issue is you want to solve for N, but you have no standard
> > deviation value.
> Oh, but you do.  In the problem you formulated, unless I
> misunderstood egregiously, you are seeking to estimate the proportion of
> defective (or pirated, or whatever) CDs in a universe of 10,000 CDs.
> There is then a maximum value for the SD of a proportion:
> SD = SQRT[p(1-p)/n]
> where  p  is the proportion in question,  n  is the sample size.
> This value is maximized for  p = 0.5  (and it doesn't change much
> between  p = 0.3  and  p = 0.7 ).  If you have a guess as to the value
> of  p,  you can get a smaller value of  SD,  but using  p = 0.5  will
> give you a conservative estimate.
> You then have to figure out what that "5% error" means:  it might
> mean "+/- 0.05 on the estimated proportion p" (but this is probably not a
> useful error bound if, say, p = 0.03), or it might mean "5% of the
> estimated proportion" (which would mean +/- 0.0015 if p = 0.03).
> (In the latter case, E is a function of p, so the formula for n
> can be solved without using a guesstimated value for p until the last
> step.)
> Notice that throughout this analysis, you're using the normal
> distribution as an approximation to the binomial b(n,p;k) distribution
> that presumably "really" applies.  That's probably reasonable;  but the
> approximation may be quite lousy if  p  is very close to 0 (or 1).
> Thbe thing is, of course, that if there is NO pirating of the CDs, p=0,
> and this is a desirable state of affairs from your clients' perspective.
> So you might want to be in the business of expressing the minimum  p
> that you could expect to detect with, say, 80% probability, using the
> sample size eventually chosen:  that is, to report a power analysis.
>
> > The formula then translates into n = (Z(a/2)*SD)/E)^2
> > Note: ^2 stands for squared.
> >
> > You have only the confidence interval, let's say 95% and E of 1%.
> > Let's say that you want to find out how many people in the US have
> > fake driver's licenses using these numbers.  How large (N) must your
> > sample be?
>
> Again, you're essentially trying to estimate a proportion.  (If it is
> the number of instances that is of interest, the distribution is still
> inherently binomial, but instead of  p  you're estimating  np,  with
> SD = SQRT[np(1-p)]
>  and you still have to decide whether that 1% means "+/- 0.01 on the
> proportion p" or "1% of the value of np".
> -- DFB.
>  
>  Donald F. Burrill [EMAIL PROTECTED]
>  184 Nashua Road, Bedford, NH 03110  603-471-7128
>
>
>
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are av

ADV: I bet that I make more money in the Web design business than you do. Time:12:14:01 PM

2001-09-29 Thread owner-edstat

I bet that I make more money in the Web design business than you do.

>From the customers I received last month I made $1560 income.
I also profited on these people $1000 up front.
And you know the funniest part?
I didn't even design their sites!
They did it for themselves!

I bet your sales pitch doesn't come anywhere near mine.

My sales pitch looks like this:

Free Website! 
Free .com, .net, or .org name! 
Free First Month! 
Free Shopping Cart for E-commerce! 
Free Secure Credit Card Transaction Server Access! 
Free Website Editor! (Allows you to control your entire site from anywhere in the 
world 
with nothing more than your Internet browser!) 
Free Website Statistics Analysis! 
Unlimited everything! 
Unlimited Email Addresses! 
Unlimited Hosting Space! 
Unlimited Bandwidth! 
Unlimited Pages! 
Unlimited Capacity of items in the Shopping Cart! 
Fastest Websites!!! (Hosted on the best servers and bandwidth anywhere!) 
Website Promotion Options...
There is nothing left to add to this service! 
If you can use a word processor,
You can manage your own website! 
Only $35/month after your first FREE month! 
Everything you need to be doing business online NOW is here for only $25! (Limited 
time offer)

I have been advertising this pitch on the front of my website for my design business 1 
month, I have received over 40 signups.

People SIGNUP EVERY SINGLE DAY.  Almost, they bunch up on the weekends often.

1 month= $1560 income that comes in every month with no work!
I will beat that number this month easily, but assuming I just keep up the same pace, 
next month will net $3,120 PROFIT.
FOR A FACT I will be netting at least $10,720 a month by the end of the year.  I got 
that number after subtracting $8000 to account for cancellations down the line.  

That is a ton of money!  I can not even think of a way to not hit that number unless I 
completely stopped doing everything.

My service is also better.
You can't give anyone the as much value as I can.
You can't give them the power to control their site as I can.
You can't give them the prices that I can.
You can't get them online as fast as I can.
And even if somehow you found a way to do all that, you won't able to keep your 
customers as long as I do.
Wow.  Don't believe me?

The interface I give my customers is easier to use than any other I have seen.
It is by far the best web based interface you will ever see.  A monkey would have a 
hard 
time making a site look bad with the software I include for my Customers.

I charge them $35 a month and I only pay $10!  I know I could charge a lot more for 
the service, but I am more interested in getting as many customers as possible now, 
than I am on making more on them.

If you did the numbers to make sure I wasn't making them up, you'll see $560 missing 
this month.  Where did it come from?  There is an optional search engine submission 
program, that 70 percent of the people that signup opt for, I charge them $30/month.  
I pay $10.

If they do decide they would like custom work done, no problem.  I do it for them, and 
they don't try to bother me to change little things all the time on their site, 
because I give them the power to do it themselves, which they prefer.  I like it to, 
keeps my time free for things I enjoy.

In addition to being able to get at customers you can't, and being able to upsell them 
to all the custom design work I like, when ever I like,

I bet I have a whole bunch of other things you DO NOT HAVE.

Private Labeled to me Website Builder/Store Builder (Best Anywhere)
Private Labeled to me Shopping Cart
Private Labeled to me WebMail and Pop3 Service
Private Labeled to me Secure Server Hosting
Private Labeled to me Domain Name Registration
Private Labeled to me Search Engine Submission
Private Labeled to me Control Panel for FTP, email, user access...

I can make as many new templates as I like to start them out from too.

I also never have to pay for custom CGI work to provide E-Commerce solutions anymore.
It is all done for me already, even the payment gateway integration.

I use the same service my end-users use to do design work and It has cut my design 
time in more than half.
I can make a complete E-Commerce enabled site in 15-30 minutes, email, shopping cart, 
ftp, running on the net!
Can you do that??

Long story short.  Unless you have some plans I don't know about, My business will be 
beating yours for sure in about 12 months.

Can you compete?
Are you getting customers as fast as I am?
Are you making as much on them as I am?
Is that money you are making staying with you every month?
Is there a way for you to provide my customers something I don't?
Can you say the same for yourself?

I am going to let you in on SECRET now.  

Even though I know that my business will most likely be making a lot more than yours 
in 12 months, I am not greedy.
I know that BIG money is not in being greedy.
I know that No matter how much money my design company makes next year, If I combined 
4-

ADV: I bet that I make more money in the Web design business than you do. Time:12:14:01 PM

2001-09-29 Thread owner-edstat

I bet that I make more money in the Web design business than you do.

>From the customers I received last month I made $1560 income.
I also profited on these people $1000 up front.
And you know the funniest part?
I didn't even design their sites!
They did it for themselves!

I bet your sales pitch doesn't come anywhere near mine.

My sales pitch looks like this:

Free Website! 
Free .com, .net, or .org name! 
Free First Month! 
Free Shopping Cart for E-commerce! 
Free Secure Credit Card Transaction Server Access! 
Free Website Editor! (Allows you to control your entire site from anywhere in the 
world 
with nothing more than your Internet browser!) 
Free Website Statistics Analysis! 
Unlimited everything! 
Unlimited Email Addresses! 
Unlimited Hosting Space! 
Unlimited Bandwidth! 
Unlimited Pages! 
Unlimited Capacity of items in the Shopping Cart! 
Fastest Websites!!! (Hosted on the best servers and bandwidth anywhere!) 
Website Promotion Options...
There is nothing left to add to this service! 
If you can use a word processor,
You can manage your own website! 
Only $35/month after your first FREE month! 
Everything you need to be doing business online NOW is here for only $25! (Limited 
time offer)

I have been advertising this pitch on the front of my website for my design business 1 
month, I have received over 40 signups.

People SIGNUP EVERY SINGLE DAY.  Almost, they bunch up on the weekends often.

1 month= $1560 income that comes in every month with no work!
I will beat that number this month easily, but assuming I just keep up the same pace, 
next month will net $3,120 PROFIT.
FOR A FACT I will be netting at least $10,720 a month by the end of the year.  I got 
that number after subtracting $8000 to account for cancellations down the line.  

That is a ton of money!  I can not even think of a way to not hit that number unless I 
completely stopped doing everything.

My service is also better.
You can't give anyone the as much value as I can.
You can't give them the power to control their site as I can.
You can't give them the prices that I can.
You can't get them online as fast as I can.
And even if somehow you found a way to do all that, you won't able to keep your 
customers as long as I do.
Wow.  Don't believe me?

The interface I give my customers is easier to use than any other I have seen.
It is by far the best web based interface you will ever see.  A monkey would have a 
hard 
time making a site look bad with the software I include for my Customers.

I charge them $35 a month and I only pay $10!  I know I could charge a lot more for 
the service, but I am more interested in getting as many customers as possible now, 
than I am on making more on them.

If you did the numbers to make sure I wasn't making them up, you'll see $560 missing 
this month.  Where did it come from?  There is an optional search engine submission 
program, that 70 percent of the people that signup opt for, I charge them $30/month.  
I pay $10.

If they do decide they would like custom work done, no problem.  I do it for them, and 
they don't try to bother me to change little things all the time on their site, 
because I give them the power to do it themselves, which they prefer.  I like it to, 
keeps my time free for things I enjoy.

In addition to being able to get at customers you can't, and being able to upsell them 
to all the custom design work I like, when ever I like,

I bet I have a whole bunch of other things you DO NOT HAVE.

Private Labeled to me Website Builder/Store Builder (Best Anywhere)
Private Labeled to me Shopping Cart
Private Labeled to me WebMail and Pop3 Service
Private Labeled to me Secure Server Hosting
Private Labeled to me Domain Name Registration
Private Labeled to me Search Engine Submission
Private Labeled to me Control Panel for FTP, email, user access...

I can make as many new templates as I like to start them out from too.

I also never have to pay for custom CGI work to provide E-Commerce solutions anymore.
It is all done for me already, even the payment gateway integration.

I use the same service my end-users use to do design work and It has cut my design 
time in more than half.
I can make a complete E-Commerce enabled site in 15-30 minutes, email, shopping cart, 
ftp, running on the net!
Can you do that??

Long story short.  Unless you have some plans I don't know about, My business will be 
beating yours for sure in about 12 months.

Can you compete?
Are you getting customers as fast as I am?
Are you making as much on them as I am?
Is that money you are making staying with you every month?
Is there a way for you to provide my customers something I don't?
Can you say the same for yourself?

I am going to let you in on SECRET now.  

Even though I know that my business will most likely be making a lot more than yours 
in 12 months, I am not greedy.
I know that BIG money is not in being greedy.
I know that No matter how much money my design company makes next year, If I combined 
4-