Re: multivariate techniques for large datasets

2001-06-12 Thread Sidney Thomas

srinivas wrote:
> 
> Hi,
> 
>   I have a problem in identifying the right multivariate tools to
> handle datset of dimension 1,00,000*500. The problem is still
> complicated with lot of missing data. can anyone suggest a way out to
> reduce the data set and  also to estimate the missing value. I need to
> know which clustering tool is appropriate for grouping the
> observations( based on 500 variables ).

This may not be the answer to your question, but clearly you need a
good statistical package that would allow you to manipulate the data
in ways that make sense and that would allow you to devise
simplification strategies appropriate in context. I recently went
through a similar exercise, smaller than yours, but still complex ...
approx. 5,000 cases by 65 variables. I used the statistical package
R, and I can tell you it was a god-send. In previous incarnations
(more than 10 years ago) I had used at various times (which varied
with employer) BDMS, SAS, SPSS, and S. I had liked S best of the lot
because of the advantages I found in the Unix environment. Nowadays,
I have Linux on the desktop, and looked for the package closest to S
in spirit, which turned out to be R. That it is freeware was a bonus.
That it is a fully extensible programming language in its own right
gave me everything I needed, as I tend to "roll my own" when I do
statistical analysis, combining elements of possibilistic analysis of
the likelihood function derived from fuzzy set theory. At any rate,
if that was indeed your question, and if you're on a tight budget, I
would say get a Linux box (a fast one, with lots of RAM and hard disk
space) and download a copy of R, and start with the graphing tools
that allow you as a first step to "look at" the data. Sensible ways
of grouping and simplifying will suggest themselves to you, and
inevitably thereafter you'll want to fit some regression models
and/or do some analysis of variance. If you're *not* on a tight
budget, and/or you have access to a fancy workstation, then you might
also have access to your choice of expensive stats packages. If I
were you, I would still opt for R, essentially because of its
programmability, which in my recent work I found to be indispensable.
Hope this is of help. Good luck.

S. F. Thomas


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Need Good book on foundations of statistics

2001-06-12 Thread Sidney Thomas

I just posted this to sci.stat.edu. With apologies to those who would
see it twice, I post it again, this time cross-posted to
comp.ai.fuzzy, where it may also be of interest.

"Neville X. Elliven" wrote:
> 
> R. Jones wrote:
> 
> >Can anyone refer me to a good book on the foundations of statistics?
> >I want to know of the limitations, assumptions, and philosophy
> >behind statistics.
> 
> "Probability, Statistics, and Truth" by Richard von Mises is available
> in paperback [ISBN 0-486-24214-5] and might be just what you seek.

May I suggest my own "Fuzziness and Probability" (ACG Press, 1995).
In attempting to reconcile the competing paradigmatic claims to
representing uncertainty, of fuzzy set theory (FST) on the one hand,
and probability/statistical inference theory (PST) on the other, I
was driven to look deeply into the foundations, not only of these
two, but also of measurement theory, deductive and inductive logic,
decision analysis, and the relevant aspects of semantics. I also
found it necessary to be clear as to the notion of what constitutes a
model, and logically prior to that, what constitutes a phenomenon,
which competing models seek in some way to represent. I think I have
succeeded, not only in reconciling the competing claims of FST and
PST, but also in finding the extended likelihood calculus which
eluded Fisher, and the generations of statisticians since. Likelihood
theory thus far has been considered inadequate because simple
maximization rules of maginalization and set evaluation fail in
significant cases, which may in part have temptingly led Bayesians to
substitute a probabilistic model, now necessarily subjectivist, for
what is in actuality a possibilistic sort of uncertainty.
Classicists, quite rightly, have never accepted this insistent
Bayesian subjectivism, while Bayesians, quite understandably, have
been impatient with the cautious, indirect characterizations of
statistical uncertainty that are the hallmark of classical
(Neyman-Pearson) statistical method. An extended likelihood calculus
which is as easy of manipulation as the probability calculus, but
without the injection of subjective priors, seems to me to offer a
solution to the disagreements that beset the foundations of
statistical inference. At any rate, the original poster may want to
take a look see. Be all that as it may, I would also commend to the
original poster to the following two sources, which I found to be
very helpful when I was asking the sorts of questions which the
original poster now poses:

1) Sir Ronald A. Fisher.  (1951). Statistical Methods and Scientific
Inference.  Collier MacMillan, 1973 (third edition).

2) V.P. Godambe and D.A. Sprott (eds.).  (1971). Foundations of
Statistical Inference: A Symposium. Toronto, Montreal: Holt, Rinehart
and Winston.

There are many other worthwhile references, but these two helped me
enormously in framing the core issues. The latter was especially
useful for the informal commentaries and rejoinders which saw
respective champions of the three main schools of thought --
classical, Bayesian, and likelihood -- going at each other in
vigorous debate.

> >A discussion of how the quantum world may have
> >different laws of statistics might be a plus.
> 
> The statistical portion of statistical mechanics is fairly simple, and
> no different conceptually from other statistics.

But from the standpoint of one whose interest is in the
quantum-theoretic application domain, there is a very real question
of where fuzziness ends, and probability begins. I remember reading
Penrose's "The Emperor's New Mind", and thinking -- idly, it's not my
field and I haven't tried to follow up -- that at least some of the
uncertainty in the quantum world is of the fuzzy rather than
probabilistic sort. The original poster is certainly well-advised to
explore the foundations of uncertainty, period, as distinct from
purely statistical uncertainty.

Hope this is of some help.

Regards,
S. F. Thomas


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Need Good book on foundations of statistics

2001-06-12 Thread Sidney Thomas

"Neville X. Elliven" wrote:
> 
> R. Jones wrote:
> 
> >Can anyone refer me to a good book on the foundations of statistics?
> >I want to know of the limitations, assumptions, and philosophy
> >behind statistics.
> 
> "Probability, Statistics, and Truth" by Richard von Mises is available
> in paperback [ISBN 0-486-24214-5] and might be just what you seek.

May I suggest my own "Fuzziness and Probability" (ACG Press, 1995).
In attempting to reconcile the competing paradigmatic claims to
representing uncertainty, of fuzzy set theory (FST) on the one hand,
and probability/statistical inference theory (PST) on the other, I
was driven to look deeply into the foundations, not only of these
two, but also of measurement theory, deductive and inductive logic,
decision analysis, and the relevant aspects of semantics. I also
found it necessary to be clear as to the notion of what constitutes a
model, and logically prior to that, what constitutes a phenomenon,
which competing models seek in some way to represent. I think I have
succeeded, not only in reconciling the competing claims of FST and
PST, but also in finding the extended likelihood calculus which
eluded Fisher, and the generations of statisticians since. Likelihood
theory thus far has been considered inadequate because simple
maximization rules of maginalization and set evaluation fail in
significant cases, which may in part have temptingly led Bayesians to
substitute a probabilistic model, now necessarily subjectivist, for
what is in actuality a possibilistic sort of uncertainty.
Classicists, quite rightly, have never accepted this insistent
Bayesian subjectivism, while Bayesians, quite understandably, have
been impatient with the cautious, indirect characterizations of
statistical uncertainty that are the hallmark of classical
(Neyman-Pearson) statistical method. An extended likelihood calculus
which is as easy of manipulation as the probability calculus, but
without the injection of subjective priors, seems to me to offer a
solution to the disagreements that beset the foundations of
statistical inference. At any rate, the original poster may want to
take a look see. Be all that as it may, I would also commend to the
original poster to the following two sources, which I found to be
very helpful when I was asking the sorts of questions which the
original poster now poses:

1) Sir Ronald A. Fisher.  (1951). Statistical Methods and Scientific
Inference.  Collier MacMillan, 1973 (third edition).

2) V.P. Godambe and D.A. Sprott (eds.).  (1971). Foundations of
Statistical Inference: A Symposium. Toronto, Montreal: Holt, Rinehart
and Winston.

There are many other worthwhile references, but these two helped me
enormously in framing the core issues. The latter was especially
useful for the informal commentaries and rejoinders which saw
respective champions of the three main schools of thought --
classical, Bayesian, and likelihood -- going at each other in
vigorous debate.

> >A discussion of how the quantum world may have
> >different laws of statistics might be a plus.
> 
> The statistical portion of statistical mechanics is fairly simple, and
> no different conceptually from other statistics.

But from the standpoint of one whose interest is in the
quantum-theoretic application domain, there is a very real question
of where fuzziness ends, and probability begins. I remember reading
Penrose's "The Emperor's New Mind", and thinking -- idly, it's not my
field and I haven't tried to follow up -- that at least some of the
uncertainty in the quantum world is of the fuzzy rather than
probabilistic sort. The original poster is certainly well-advised to
explore the foundations of uncertainty, period, as distinct from
purely statistical uncertainty.

Hope this is of some help.

Regards,
S. F. Thomas


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=