In article <[EMAIL PROTECTED]>, Rich Ulrich <[EMAIL PROTECTED]> wrote: >Note to sci.stat.edu readers - The thread by the old name has >been active in sci.stat.math for a couple of weeks.
>On 13 Jul 2003 19:34:30 -0500, [EMAIL PROTECTED] (Herman >Rubin) wrote: >> In article <[EMAIL PROTECTED]>, >> Rich Ulrich <[EMAIL PROTECTED]> wrote: >[ ... ] >> >On 2 Jul 2003 16:47:31 GMT, [EMAIL PROTECTED] >> >(Radford Neal) wrote: [ I am snipping everything else. ] >> >'Meanwhile, statisticians like to think that statistics >> >(a) is a mathematically precise discipline, and >> >(b) hasn't been barking up the wrong tree for the past 100 years. ' >ru > >> >I'm a statistician. I think that the statistics that I use did not >> >exist 100 years ago, for the most part. >> >Regression was first described in (IIRC) 1998. >HR > >> Do you mean 1898? I am not sure what year, but Galton >> introduced the term "regression" in the 19th century. >> However, using multiple regression (least squares) goes >> back to the end of the 18th century, with massive use, >> mainly nonlinear, in the 19th. Even the question of >> outliers is 19th century. Maximum likelihood goes back >> to the middle of the 18th century. >oops, certainly 18xx not 19xx. The citation was supposed >to be to Yule, who apparently published the first 'multiple >regression', with coefficients, etc., in 1896 and 1897. >==== here are a couple of citations from an AOL website that >==== google showed me >Partial correlation. G. U. Yule introduced "net coefficients" for >"coefficients of correlation between any two of the variables while >eliminating the effects of variations in the third" in "On the >Correlation of Total Pauperism with Proportion of Out-Relief" (in >Notes and Memoranda) Economic Journal, Vol. 6, (1896), pp. 613-623. >Pearson argued that partial and total are more appropriate than net >and gross in Karl Pearson & Alice Lee "On the Distribution of >Frequency (Variation and Correlation) of the Barometric Height at >Divers Stations," Phil. Trans. R. Soc., Ser. A, 190 (1897), pp. >423-469. Yule went fully partial with his 1907 paper "On the Theory of >Correlation for any Number of Variables, Treated by a New System of >Notation," Proc. R. Soc. Series A, 79, pp. 182-193. >Multiple correlation. At first multiple correlation referred only to >the general approach, e.g. by Yule in Economic Journal (1896). The >coefficient arrives later. "On the Theory of Correlation" (J. Royal >Statist. Soc., 1897, p. 833) refers to a coefficient of double >correlation R1 (the correlation of the first variable with the other >two). Yule (1907) discussed the coefficient of n-fold correlation >R21(23...n). Pearson used the phrases "coefficient of multiple >correlation" in his 1914 "On Certain Errors with Regard to Multiple >Correlation Occasionally Made by Those Who Have not Adequately Studied >this Subject," Biometrika, 10, pp. 181-187, and "multiple correlation >coefficient" in his 1915 paper "On the Partial Correlation Ratio," >Proc. R. Soc. Series A, 91, pp. 492-498. >[This entry was largely contributed by John Aldrich.] The obsession (this is correct) with multivariate normality and correlations did come at this time. The people ignored the statistics which had been done by mathematicians and physicists, mainly before 1850, and essentially started over. There was a large overuse even then of normality for ERRORS, which Gauss tried to justify (which is why the normal distribution is called Gaussian in much of the literature) until he proved a version of what is now called the Gauss-Markov Theorem, which states that one does as well without normality as with it. The normal distribution was obtained as a limit of binomial by de Moivre in 1731. Normality of data was a later artifact. It was not until the mid 19th century that the biologists got into it, and Quetelet claimed that this distribution was the distribution of a "normal person". Together with the Central Limit Theorem, which is not that valid under biological conditions, it was assumed that everything was normal, of course with no proof, as it is false. This led to the overuse of correlation, which is still prevalent, and the overuse of multivariate normality. Psychologists and others distort their data by forcing normality, which can only destroy the form of the relationships. BTW, at least asymptotically, normality is generally of little consequence. >[snip, some] >ru > Oh, the whole >> >matter of 'national economic statistics' was an invention >> >of the war effort, for figuring out how to compare resources. >HR > >> Try the 17th century for the systematic use of this. The >> word "statistics" comes from "state", and gathering state >> statistics of this type goes WAY back, such as 11th >> century. >Well, governments were figuring how to collect their >own 35-50% of incomes for *taxes* going back at least >10 centuries BCE. Here is what I had in mind -- >If you are doing research in economics today, I believe >that you will find that the series of data on prices for goods, >stocks, interest rates, or for employment, or for whatever, are >seriously inferior -- mostly, estimated long after the fact -- for >the data before WW II. At least, that is my impression about >economic data, and that is what I was referring to. Is it that much more accurate now? I worked with the older econometricians, and know something about the methods, as well as the current methods. Besides, the models are not any good, and the estimation methods cannot do a good job. This is due to a lack of precision. >[ snip, quality control; Meehl;] >ru > >> >Following disputes about what could or should be said about >> >cigarettes causing lung cancer and heart disease, the logic >> >of epidemiological evidence was written out in the 1960s. >HR > >> The first use of statistics for epidemiological evidence >> which I know about is the London cholera epidemic of, I >> believe, the 17th century. >John Snow removed the pump handle in 1854. His evidence >was more in the form of a map, I think, than in numbers, but >it was certainly before 1963. It was not, however, a deeply >articulated document on the "logic of epidemiological evidence". >The logic was finally laid out and widely accepted. The logic of inference in epidemic models could not be developed until the mathematics of such models was in place. Probability was in the doldrums during the last half of the 19th century, and the needed mathematics, measure theory and integration using it, was 1890-1910. >One could say, equally as well, that Darwin did not invent >"evolution." But it was his presentation in 1848 that led to >the conversion, in rather short order, of all his contemporaries. >ru > >> >Also in the 1960s, the names of Bayes was attached to an >> >alternative movement for doing certain things in statistics. >HR > >> Try the late 18th and early 19th century. >I don't find the name in the 18th and 19th century, though >I am not looking in those original documents. The two papers of Bayes were published in the 1760s. Laplace did take it up. The problem with using it in the early 19th century was the almost insistence on there being "objective" priors; these do not exist, and modern attempts to produce such necessarily fail. Because of this, and the belief that science can proceed purely objectively, Bayesian methods dropped out of use, and remained so until mathematical arguments, starting in the 1940s, showed the need for them. >I found a book entitled Empirical Bayes Methods, which is >dated 1969. Just about all its citations are in the 1960s >It has a pointer to a theoretical volume, by Savage, which >was written in 1954, and Savage scarcely mentions >Bayes except to quote his Theorem. Empirical Bayes is later, and is an attempt to come close to Bayes when the prior is unclear, and the results may be good enough that it is not worth making it clear. Savage was a philosophical Bayesian. I seem to have the honor of being the first "rational" Bayesian; in 1947 (not published until 1987, but fairly well known) I showed that self-consistent behavior had to be Bayesian, but I always considered the prior measure as a weight measure, forced by the logic to exist. I have never approved of the attempts to specify a prior for someone else, or the use of conjugate or reference priors. You will find lots of this starting with the late 1940s; it will not all have "Bayes" in the title. For example, _Theory of Games and Statistical Decisions_ is mainly Bayesian, and _Decision Analysis_ is likewise mainly Bayesian. The decision approach is what leads logically to Bayesian behavior. BTW, the decision approach shows that one cannot separate the prior from the loss, and this is explicitly pointed out in my 1987 paper. >The local library online search for book titles has the same >sort of evidence -- 227 hits for books with Bayes in the title >or keywords, abstracts, or whatever; 90% are after 1970, >and the rest are in the 1960s. >So, I am really curious about who was using the name of >Bayes, for what purposes, before the 1960s. Like I said -- >and I will repeat -- it is my impression that the name of Bayes >was not attached to any movement until the 1960s. >[ snip, rest, for now] -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Deptartment of Statistics, Purdue University [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================