Le 30/05/2017 à 18:51, Martin Maechler a écrit :
Serguei Sokol <so...@insa-toulouse.fr>
     on Tue, 30 May 2017 16:01:17 +0200 writes:
     > Le 30/05/2017 à 09:33, Martin Maechler a écrit : ...
     >> However, even after the patch, The example from the SO
     >> post differs from the result of Richie Cotton's
     >> function...
     > The explanation is quite simple. In SO function, the first
     > 1/3 quantile of used example counts 6 points (of 19 in
     > total), while line()'s definition of quantile leads to 8
     > points. The same numbers (6 and 8) are on the other end of
     > sample.

so the number of obs. for the three thirds for line() are
    {8, 3, 8}  in line()  [also, after your patch, right?]

whereas in MMline() they are as they should be, namely

    {6, 7, 6}

But the  {8, 3, 8}  split is not at all what all "the literature",
including Tukey himself says that "should" be done.
(Other literature on the topic suggests that the optimal sizes
  of the split in three groups depends on the distribution of x ..)

OTOH, MMline() does exactly what "the literature" and also  the
reference on the  ?line  help pages says.
Well, what I have seen so far in "literature" was mention of 1/3 quantiles
(but, yes I could overlook smth as I did not spend too much time on it)
So the sample distribution in three groups boils down to a particular quantile
definition to use. It turns out that the line()'s version (you are right, 
_after_ the patch
but my patch left this definition untouched) is consistent with the R's one.
If you do in R sum(dfr$time <= quantile(dfr$time, 1./3.)) you get 8, not 6
(and the same on the 2/3 end).
To my mind, consistency with the rest of R, namely with the quantile definition,
is an argument good enough to let the line()'s definition as is.

Serguei.

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to