>>>>> "PD" == Peter Dalgaard <[EMAIL PROTECTED]> >>>>> on 28 Jun 2005 14:57:42 +0200 writes:
PD> "Liaw, Andy" <[EMAIL PROTECTED]> writes: >> The issue is not with boxplot, but with split. boxplot.formula() >> calls boxplot(split(split(mf[[response]], mf[-response]), ...), >> but look at what split() returns when there are empty levels in >> the factor: >> >> > f <- factor(gl(3, 6), levels=1:5) >> > y <- rnorm(f) >> > split(y, f) >> $"1" >> [1] 0.4832124 1.1924811 0.3657797 1.7400198 0.5577356 0.9889520 >> >> $"2" >> [1] -1.1296642 -0.4808355 -0.2789933 0.1220718 0.1287742 -0.7573801 >> >> $"3" >> [1] 1.2320902 0.5090700 -1.5508074 2.1373780 1.1681297 -0.7151561 >> >> The "culprit" is the following in split.default(): >> >> f <- factor(f) >> >> which drops empty levels in f, if there are any. BTW, ?split doesn't >> mention what it does in such situation. Perhaps it should? >> >> If this is to be "fixed", I suppose an additional argument, e.g., >> drop=TRUE, can be added, and the corresponding line mentioned >> above changed to something like: >> >> if (drop || !is.factor(f)) f <- factor(f) >> >> Then this additional argument can be pass on from boxplot.formula() to >> split(). PD> Alternatively, I suspect that the intention was as.factor() rather PD> than factor(). at first I thought Peter was right; but the real source of split.default contains a comment (!) and that line is f <- factor(f) # drop extraneous levels so it seems, this was done there very much on purpose. OTOH, S(-plus) has implemented it quite a bit differently, and actually does keep the empty levels in the example f <- factor(rep(1:3, each=6), levels=1:5); y <- rnorm(f); split(y, f) PD> It does require a bit of care to fix it that way, PD> though. There could be problems with empty levels popping up in PD> unexpected places. Indeed! Given the new facts, I think we want to go in Andy's direction with a new argument, 'drop' A Peter mentioned, the real question is about its default. "drop = TRUE" would be fully compatible with previous versions of R. "drop = FALSE" would be compatible with S and S-plus. I'm going to implement it, and try to see if 'drop = FALSE' gives changes for R and its standard packages; if 'yes', that would be an indication that such a R-back-compatibility breaking change was not a good idea. If 'no', I could commit it and see if it has an effect on the CRAN packages.... Of course, since split() and split()<- are S3 generics, and since there's also unsplit(), this entails a whole slew of changes {adding a "drop = FALSE" argument everywhere!} and I presume will break everyone's code who has written own split.foobar methods.... great... Martin ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel