Martin Maechler says in reply to Sergueï Sokol > Note the 'Subject' you've chosen for this thread, "... does not produce the correct Tukey line",
The choice of title was mine not Serguei's; I posted the original message where the error was pointed out I agree with Martin's assessment that the correct split (both by Tukey's lights and by general practice) for 19 points would be 6,7,6 and I also agree that it's better to "fix more" in this instance, where possible. (e.g. Johnstone&Velleman's standard errors would be a nice thing to add if feasible) -- but if any blame is attached to the choice of title, it really should be aimed at me. Glen On Wed, May 31, 2017 at 2:51 AM, Martin Maechler <maech...@stat.math.ethz.ch > wrote: > >>>>> Serguei Sokol <so...@insa-toulouse.fr> > >>>>> on Tue, 30 May 2017 16:01:17 +0200 writes: > > > Le 30/05/2017 à 09:33, Martin Maechler a écrit : ... > >> However, even after the patch, The example from the SO > >> post differs from the result of Richie Cotton's > >> function... > > The explanation is quite simple. In SO function, the first > > 1/3 quantile of used example counts 6 points (of 19 in > > total), while line()'s definition of quantile leads to 8 > > points. The same numbers (6 and 8) are on the other end of > > sample. > > so the number of obs. for the three thirds for line() are > {8, 3, 8} in line() [also, after your patch, right?] > > whereas in MMline() they are as they should be, namely > > {6, 7, 6} > > But the {8, 3, 8} split is not at all what all "the literature", > including Tukey himself says that "should" be done. > (Other literature on the topic suggests that the optimal sizes > of the split in three groups depends on the distribution of x ..) > > OTOH, MMline() does exactly what "the literature" and also the > reference on the ?line help pages says. > > > In x sample, there are few repeated values, this > > is certainly be the reason of different quantiles.. > > > I am not sure that one quantile definition is better or > > more correct than the other. > > > So I would leave line()'s definition as is. > > you mean _after_ applying your patch, I assume. > > I currently tend do disagree. If we change line() we should > rather fix more .. > Note the 'Subject' you've chosen for this thread, > "... does not produce the correct Tukey line", > so I think we should get better. > > Apart from Richie / my MMline() function, I've also noticed > that ACSWR :: resistant_line() > exists. > > However "the literature" (see references below), notably the two > with Hoaglin, strongly recommends smarter iterations, and > -- lo and behold! -- when this topic came up last (for me) in > Dec. 2014, I did spend about 2 days work (or more?) to get the > FORTRAN code from the 1981 - book (which is abbreviated the > "ABC of EDA") from a somewhat useful OCR scan into compilable > Fortran code and then f2c'ed, wrote an R interface function > found problems i.e., bugs, including infinite loops, fixed most > AFAICS, but somehow did not finish making the result available. > > Yes, and I have too many other things on my desk... this will > have to wait! > > References: > > Tukey, J. W. (1977). _Exploratory Data Analysis_, Reading > Massachusetts: Addison-Wesley. > > Velleman, P. F. and Hoaglin, D. C. (1981) _Applications, Basics > and Computing of Exploratory Data Analysis_ Duxbury Press. > > Emerson, J. D. and Hoaglin, D. C. (1983) Resistant Lines for y > versus x. Chapter 5 of _Understanding Robust and Exploratory Data > Analysis_, eds. David C. Hoaglin, Frederick Mosteller and John W. > Tukey. Wiley. > > Iain M. Johnstone and Paul F. Velleman (1985) The Resistant Line > and Related Regression Methods. _Journal of the American > Statistical Association_ *80*, 1041-1054. <URL: > https://dx.doi.org/10.1080/01621459.1985.10478222> > > > > Best, Sergueï. > > Martin Maechler, ETH Zurich (and R core team) > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel