Re: varinace

2001-06-12 Thread Chris

Duane Allen [EMAIL PROTECTED] wrote:
Kelly [EMAIL PROTECTED] wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
 Hi, I'm trying to calculate sample size using the sample size formula
 for simple random sampling, which requires an estimate of the
 variance. But I don't know the variance, instead I wan't to use the
 maximum varince the random variable can take. I know the range of the
 varible say a to b. How can I use the range to calculate the max.
 variance the random variable can possibly take?
 Thanks in advance for your help.


The bounding formulas are dependent on sample size.

The bounding upper formulas are
s^2 = [n/(4(n-1))]*R^2 for n even 

OK, when applying the sample formula using (n-1).
The first part of your formula - in square brackets - can be rewritten as
{ n/(n-1) } * 1/4

and 
s^2 =[(n+1)/(4n)]*R^2 for n odd
Here the first part can be rewritten as
{ (n+1)/n } * 1/4


Forget about the 1/4, that has to do with the denominator of the 
deviation-scores and is perfectly allright. 
I'm concerned about the first part of the two rewritten formulas, the part in 
accolades.
Both the nominator and the denominator are different. Why is that?

The denominator in the second one should still be (n-1), as always in the 
sample variance calculation. The nominator however should be (n-1) too, 
instead of n+1.

With an even sample size, variance is maximal when half of the scores - n/2 - 
equal the minimum, and the other half - n/2 - equal the maximum. The mean then 
of course equals (a+b)/2. [a is minimum, b is maximum]. 
All deviation scores of course are equal - half is on the negative side, half 
is on the positive side - and equal half the range (b-a)/2.
Squaring these deviation scores over n scores yields
n * {(b-a)/2}^2  ==  n * { (R^2)/4 }
where R is the range (b-a).
Subsequent division by (n-1) yields
n/(n-1) * 1/4 * R^2.
So far, so good.

With an odd sample size, the variance is maximal when _one_ score is exactly 
in the middle, and half of the others - (n-1)/2 - equal the minimum, and the 
others - also (n-1)/2 - equal the maximum.
The deviation scores of those scores that are on the extremes are of course 
exactly the same as in the even-sized-sample example. The one in the middle is 
by definition on the mean and consequently doesn't have a deviation score.
To calculate the variance, we have to sum the squared deviation scores. 
Since the one in the middle doesn't have a deviation score, and doesn't add to 
the variance, we now sum over (n-1) scores, yielding
(n-1) * {(b-a)/2}^2  == (n-1) * { (R^2)/4 }
Subsequent division by (n-1)
(n-1)/(n-1) * 1/4 * R^2  == 1/4 * R^2


Where in your derivation does the part (n+1)/n come from?

Chris


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: multivariate techniques for large datasets

2001-06-12 Thread Donald Burrill

On 11 Jun 2001, srinivas wrote:

   I have a problem in identifying the right multivariate tools to 
 handle datset of dimension 1,00,000*500.  The problem is still
 complicated with lot of missing data.

So far, you have not described the problem you want to address, nor the 
models you think may be appropriate to the situation.  Consequently, 
no-one will be able to offer you much assistance. 

 Can anyone suggest a way out to reduce the data set and also to 
 estimate the missing value. 

There are a variety of ways of estimating missing values, all of which 
depend on the model you have in mind for the data, and the reason(s) you 
think you have for substituting estimates for the missing data.

 I need to know which clustering tool is appropriate for grouping the
 observations ( based on 500 variables ).

No answer is possible without context.  No context has been supplied.

 
 Donald F. Burrill [EMAIL PROTECTED]
 184 Nashua Road, Bedford, NH 03110  603-471-7128



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



About kendall

2001-06-12 Thread Monica De Stefani

When I aplly Kendall tau or Kendall's partial tau to a time series do
I have to calcolate ranks or not?
In fact a time series has a natural temporal order.

Thanks,
Monica De Stefani.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



3 Biopharmaceutical Statistics Conferences

2001-06-12 Thread Frank E Harrell Jr

Dear Colleagues,

Please note that the following Statistical Conferences will take place
in
Washington DC.


Title:Statistical Issues in Drug Development
Content: A Two-day Intensive course avoiding technical detail and
concentrating on PRACTICAL and PHILOSOPHICAL issues
that determine which data are collected, how the study should
be organized, and what the outcome means. It will
ensure that participants acquire a well grounded appreciation
of the all the important statistical issues surrounding
 Drug Development.
Date: 14  15 June 2001
Venue:Georgetown University Conference Center, Washington DC
Course Leader: Professor Stephen Senn, University College London, UK
Speakers: Professor Peter Lachenbruch, Director of Biostatistics,
CBER, US Food and Drug Administration
Dr Richard Simon, Head of Molecular Statistics and
Bioinformatics, National Cancer Institute, NIH

ALL DELEGATES RECEIVE A FREE HARDBACK COPY OF STATISTICAL ISSUES IN
DRUG
DEVELOPMENT BY AUTHOR AND COURSE DIRECTOR Professor STEPHEN SENN



Title: Statistics of Optimal Dosing
Content: Many statisticians feel that dose selection is very poorly
done.
This briefing will cover methods appropriate for establishing the
optimal
dose. It will examine the problems of dosing and suggest practical
solutions. Dose assessment must be done correctly: if assessed
incorrectly
the cost project delay and extra costs can be very high.
Venue: Washington DC, USA
Date:  26th July 2001



Title: Statistics of Multi-center Trials
Content: Views differ enormously on how multi-center trials should be
designed and data from them analyzed. Section 3.2 of the ICH guideline,
which has now been adopted in the US, Europe and Japan is open to
interpretation. The FDA provides specific guidelines for a strong
statistical basis for the design and analysis of clinical trials. Clare
Gnecco, Senior Biostatistician, CBER, US FDA will discuss along with
other
experts, the statistical issues surrounding multi-center trials
Venue: Washington DC, USA
Date:  14th September 2001



For more information, please got to our website at
www.henrystewart.co.uk or contact

Dr Carlos Horkan
Henry Stewart Conference Studies
Russell House
28-30 Little Russell Street
London WC1A 2HN
Tel: +44 207 404 3040
Fax: +44 207 404 2081
Email: [EMAIL PROTECTED]



-- 
Frank E Harrell Jr  Prof. of Biostatistics  Statistics
Div. of Biostatistics  Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Need Good book on foundations of statistics

2001-06-12 Thread Sidney Thomas

I just posted this to sci.stat.edu. With apologies to those who would
see it twice, I post it again, this time cross-posted to
comp.ai.fuzzy, where it may also be of interest.

Neville X. Elliven wrote:
 
 R. Jones wrote:
 
 Can anyone refer me to a good book on the foundations of statistics?
 I want to know of the limitations, assumptions, and philosophy
 behind statistics.
 
 Probability, Statistics, and Truth by Richard von Mises is available
 in paperback [ISBN 0-486-24214-5] and might be just what you seek.

May I suggest my own Fuzziness and Probability (ACG Press, 1995).
In attempting to reconcile the competing paradigmatic claims to
representing uncertainty, of fuzzy set theory (FST) on the one hand,
and probability/statistical inference theory (PST) on the other, I
was driven to look deeply into the foundations, not only of these
two, but also of measurement theory, deductive and inductive logic,
decision analysis, and the relevant aspects of semantics. I also
found it necessary to be clear as to the notion of what constitutes a
model, and logically prior to that, what constitutes a phenomenon,
which competing models seek in some way to represent. I think I have
succeeded, not only in reconciling the competing claims of FST and
PST, but also in finding the extended likelihood calculus which
eluded Fisher, and the generations of statisticians since. Likelihood
theory thus far has been considered inadequate because simple
maximization rules of maginalization and set evaluation fail in
significant cases, which may in part have temptingly led Bayesians to
substitute a probabilistic model, now necessarily subjectivist, for
what is in actuality a possibilistic sort of uncertainty.
Classicists, quite rightly, have never accepted this insistent
Bayesian subjectivism, while Bayesians, quite understandably, have
been impatient with the cautious, indirect characterizations of
statistical uncertainty that are the hallmark of classical
(Neyman-Pearson) statistical method. An extended likelihood calculus
which is as easy of manipulation as the probability calculus, but
without the injection of subjective priors, seems to me to offer a
solution to the disagreements that beset the foundations of
statistical inference. At any rate, the original poster may want to
take a look see. Be all that as it may, I would also commend to the
original poster to the following two sources, which I found to be
very helpful when I was asking the sorts of questions which the
original poster now poses:

1) Sir Ronald A. Fisher.  (1951). Statistical Methods and Scientific
Inference.  Collier MacMillan, 1973 (third edition).

2) V.P. Godambe and D.A. Sprott (eds.).  (1971). Foundations of
Statistical Inference: A Symposium. Toronto, Montreal: Holt, Rinehart
and Winston.

There are many other worthwhile references, but these two helped me
enormously in framing the core issues. The latter was especially
useful for the informal commentaries and rejoinders which saw
respective champions of the three main schools of thought --
classical, Bayesian, and likelihood -- going at each other in
vigorous debate.

 A discussion of how the quantum world may have
 different laws of statistics might be a plus.
 
 The statistical portion of statistical mechanics is fairly simple, and
 no different conceptually from other statistics.

But from the standpoint of one whose interest is in the
quantum-theoretic application domain, there is a very real question
of where fuzziness ends, and probability begins. I remember reading
Penrose's The Emperor's New Mind, and thinking -- idly, it's not my
field and I haven't tried to follow up -- that at least some of the
uncertainty in the quantum world is of the fuzzy rather than
probabilistic sort. The original poster is certainly well-advised to
explore the foundations of uncertainty, period, as distinct from
purely statistical uncertainty.

Hope this is of some help.

Regards,
S. F. Thomas


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: About kendall

2001-06-12 Thread Rich Ulrich

On 12 Jun 2001 08:43:53 -0700, [EMAIL PROTECTED] (Monica De Stefani)
wrote:

 When I aplly Kendall tau or Kendall's partial tau to a time series do
 I have to calcolate ranks or not?
 In fact a time series has a natural temporal order.

 ... but you are not partialing out time.  Surely.

Your program that does the Kendall tau must do some
ranking, as part of the algorithm.  Why do you think you 
might have to calculate ranks?

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



MDS algorithm needed

2001-06-12 Thread jms1\.nospam


I need an algorithm for MDS (multidim. scaling) with real city 
distances. 
Such it must be metric and absolute or ratio, for symmetric matrices 
with missings. What is the best for that? Somewhere I read SMACOF, is 
this good?

Thanks!
Jens
(Software Developer)

f'up2 sci.stat.math


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: multivariate techniques for large datasets

2001-06-12 Thread Sidney Thomas

srinivas wrote:
 
 Hi,
 
   I have a problem in identifying the right multivariate tools to
 handle datset of dimension 1,00,000*500. The problem is still
 complicated with lot of missing data. can anyone suggest a way out to
 reduce the data set and  also to estimate the missing value. I need to
 know which clustering tool is appropriate for grouping the
 observations( based on 500 variables ).

This may not be the answer to your question, but clearly you need a
good statistical package that would allow you to manipulate the data
in ways that make sense and that would allow you to devise
simplification strategies appropriate in context. I recently went
through a similar exercise, smaller than yours, but still complex ...
approx. 5,000 cases by 65 variables. I used the statistical package
R, and I can tell you it was a god-send. In previous incarnations
(more than 10 years ago) I had used at various times (which varied
with employer) BDMS, SAS, SPSS, and S. I had liked S best of the lot
because of the advantages I found in the Unix environment. Nowadays,
I have Linux on the desktop, and looked for the package closest to S
in spirit, which turned out to be R. That it is freeware was a bonus.
That it is a fully extensible programming language in its own right
gave me everything I needed, as I tend to roll my own when I do
statistical analysis, combining elements of possibilistic analysis of
the likelihood function derived from fuzzy set theory. At any rate,
if that was indeed your question, and if you're on a tight budget, I
would say get a Linux box (a fast one, with lots of RAM and hard disk
space) and download a copy of R, and start with the graphing tools
that allow you as a first step to look at the data. Sensible ways
of grouping and simplifying will suggest themselves to you, and
inevitably thereafter you'll want to fit some regression models
and/or do some analysis of variance. If you're *not* on a tight
budget, and/or you have access to a fancy workstation, then you might
also have access to your choice of expensive stats packages. If I
were you, I would still opt for R, essentially because of its
programmability, which in my recent work I found to be indispensable.
Hope this is of help. Good luck.

S. F. Thomas


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=