Re: multivariate techniques for large datasets
srinivas wrote: > > Hi, > > I have a problem in identifying the right multivariate tools to > handle datset of dimension 1,00,000*500. The problem is still > complicated with lot of missing data. can anyone suggest a way out to > reduce the data set and also to estimate the missing value. I need to > know which clustering tool is appropriate for grouping the > observations( based on 500 variables ). This may not be the answer to your question, but clearly you need a good statistical package that would allow you to manipulate the data in ways that make sense and that would allow you to devise simplification strategies appropriate in context. I recently went through a similar exercise, smaller than yours, but still complex ... approx. 5,000 cases by 65 variables. I used the statistical package R, and I can tell you it was a god-send. In previous incarnations (more than 10 years ago) I had used at various times (which varied with employer) BDMS, SAS, SPSS, and S. I had liked S best of the lot because of the advantages I found in the Unix environment. Nowadays, I have Linux on the desktop, and looked for the package closest to S in spirit, which turned out to be R. That it is freeware was a bonus. That it is a fully extensible programming language in its own right gave me everything I needed, as I tend to "roll my own" when I do statistical analysis, combining elements of possibilistic analysis of the likelihood function derived from fuzzy set theory. At any rate, if that was indeed your question, and if you're on a tight budget, I would say get a Linux box (a fast one, with lots of RAM and hard disk space) and download a copy of R, and start with the graphing tools that allow you as a first step to "look at" the data. Sensible ways of grouping and simplifying will suggest themselves to you, and inevitably thereafter you'll want to fit some regression models and/or do some analysis of variance. If you're *not* on a tight budget, and/or you have access to a fancy workstation, then you might also have access to your choice of expensive stats packages. If I were you, I would still opt for R, essentially because of its programmability, which in my recent work I found to be indispensable. Hope this is of help. Good luck. S. F. Thomas = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Need Good book on foundations of statistics
I just posted this to sci.stat.edu. With apologies to those who would see it twice, I post it again, this time cross-posted to comp.ai.fuzzy, where it may also be of interest. "Neville X. Elliven" wrote: > > R. Jones wrote: > > >Can anyone refer me to a good book on the foundations of statistics? > >I want to know of the limitations, assumptions, and philosophy > >behind statistics. > > "Probability, Statistics, and Truth" by Richard von Mises is available > in paperback [ISBN 0-486-24214-5] and might be just what you seek. May I suggest my own "Fuzziness and Probability" (ACG Press, 1995). In attempting to reconcile the competing paradigmatic claims to representing uncertainty, of fuzzy set theory (FST) on the one hand, and probability/statistical inference theory (PST) on the other, I was driven to look deeply into the foundations, not only of these two, but also of measurement theory, deductive and inductive logic, decision analysis, and the relevant aspects of semantics. I also found it necessary to be clear as to the notion of what constitutes a model, and logically prior to that, what constitutes a phenomenon, which competing models seek in some way to represent. I think I have succeeded, not only in reconciling the competing claims of FST and PST, but also in finding the extended likelihood calculus which eluded Fisher, and the generations of statisticians since. Likelihood theory thus far has been considered inadequate because simple maximization rules of maginalization and set evaluation fail in significant cases, which may in part have temptingly led Bayesians to substitute a probabilistic model, now necessarily subjectivist, for what is in actuality a possibilistic sort of uncertainty. Classicists, quite rightly, have never accepted this insistent Bayesian subjectivism, while Bayesians, quite understandably, have been impatient with the cautious, indirect characterizations of statistical uncertainty that are the hallmark of classical (Neyman-Pearson) statistical method. An extended likelihood calculus which is as easy of manipulation as the probability calculus, but without the injection of subjective priors, seems to me to offer a solution to the disagreements that beset the foundations of statistical inference. At any rate, the original poster may want to take a look see. Be all that as it may, I would also commend to the original poster to the following two sources, which I found to be very helpful when I was asking the sorts of questions which the original poster now poses: 1) Sir Ronald A. Fisher. (1951). Statistical Methods and Scientific Inference. Collier MacMillan, 1973 (third edition). 2) V.P. Godambe and D.A. Sprott (eds.). (1971). Foundations of Statistical Inference: A Symposium. Toronto, Montreal: Holt, Rinehart and Winston. There are many other worthwhile references, but these two helped me enormously in framing the core issues. The latter was especially useful for the informal commentaries and rejoinders which saw respective champions of the three main schools of thought -- classical, Bayesian, and likelihood -- going at each other in vigorous debate. > >A discussion of how the quantum world may have > >different laws of statistics might be a plus. > > The statistical portion of statistical mechanics is fairly simple, and > no different conceptually from other statistics. But from the standpoint of one whose interest is in the quantum-theoretic application domain, there is a very real question of where fuzziness ends, and probability begins. I remember reading Penrose's "The Emperor's New Mind", and thinking -- idly, it's not my field and I haven't tried to follow up -- that at least some of the uncertainty in the quantum world is of the fuzzy rather than probabilistic sort. The original poster is certainly well-advised to explore the foundations of uncertainty, period, as distinct from purely statistical uncertainty. Hope this is of some help. Regards, S. F. Thomas = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Need Good book on foundations of statistics
"Neville X. Elliven" wrote: > > R. Jones wrote: > > >Can anyone refer me to a good book on the foundations of statistics? > >I want to know of the limitations, assumptions, and philosophy > >behind statistics. > > "Probability, Statistics, and Truth" by Richard von Mises is available > in paperback [ISBN 0-486-24214-5] and might be just what you seek. May I suggest my own "Fuzziness and Probability" (ACG Press, 1995). In attempting to reconcile the competing paradigmatic claims to representing uncertainty, of fuzzy set theory (FST) on the one hand, and probability/statistical inference theory (PST) on the other, I was driven to look deeply into the foundations, not only of these two, but also of measurement theory, deductive and inductive logic, decision analysis, and the relevant aspects of semantics. I also found it necessary to be clear as to the notion of what constitutes a model, and logically prior to that, what constitutes a phenomenon, which competing models seek in some way to represent. I think I have succeeded, not only in reconciling the competing claims of FST and PST, but also in finding the extended likelihood calculus which eluded Fisher, and the generations of statisticians since. Likelihood theory thus far has been considered inadequate because simple maximization rules of maginalization and set evaluation fail in significant cases, which may in part have temptingly led Bayesians to substitute a probabilistic model, now necessarily subjectivist, for what is in actuality a possibilistic sort of uncertainty. Classicists, quite rightly, have never accepted this insistent Bayesian subjectivism, while Bayesians, quite understandably, have been impatient with the cautious, indirect characterizations of statistical uncertainty that are the hallmark of classical (Neyman-Pearson) statistical method. An extended likelihood calculus which is as easy of manipulation as the probability calculus, but without the injection of subjective priors, seems to me to offer a solution to the disagreements that beset the foundations of statistical inference. At any rate, the original poster may want to take a look see. Be all that as it may, I would also commend to the original poster to the following two sources, which I found to be very helpful when I was asking the sorts of questions which the original poster now poses: 1) Sir Ronald A. Fisher. (1951). Statistical Methods and Scientific Inference. Collier MacMillan, 1973 (third edition). 2) V.P. Godambe and D.A. Sprott (eds.). (1971). Foundations of Statistical Inference: A Symposium. Toronto, Montreal: Holt, Rinehart and Winston. There are many other worthwhile references, but these two helped me enormously in framing the core issues. The latter was especially useful for the informal commentaries and rejoinders which saw respective champions of the three main schools of thought -- classical, Bayesian, and likelihood -- going at each other in vigorous debate. > >A discussion of how the quantum world may have > >different laws of statistics might be a plus. > > The statistical portion of statistical mechanics is fairly simple, and > no different conceptually from other statistics. But from the standpoint of one whose interest is in the quantum-theoretic application domain, there is a very real question of where fuzziness ends, and probability begins. I remember reading Penrose's "The Emperor's New Mind", and thinking -- idly, it's not my field and I haven't tried to follow up -- that at least some of the uncertainty in the quantum world is of the fuzzy rather than probabilistic sort. The original poster is certainly well-advised to explore the foundations of uncertainty, period, as distinct from purely statistical uncertainty. Hope this is of some help. Regards, S. F. Thomas = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =