>From my side: it would be great if you (or R core) could prepare a patch, it >would probably take me quite a bit longer than you since I don't have >experience creating patches for R.
Best, Martin On Sun, Oct 18, 2020, at 21:49, Gabriel Becker wrote: > Peter et al, > > I had the same thought, in particular for any() and all(), which in as > much as they should work on data.frames in the first place (which to be > perfectly honest i do find quite debatable myself), should certainly > work on "logical" data.frames if they are going to work on "numeric" > ones. > > I can volunteer to prepare a patch if Martin (the reporter) did not > want to take a crack at it, and further if it is not already being done > within R-core. > > Best, > ~G > > On Sun, Oct 18, 2020 at 12:19 AM peter dalgaard <pda...@gmail.com> wrote: > > Hmm, yes, this is probably wrong. E.g., we are likely to get > > inconsistencies out of boundary cases like this > > > > > a <- na.omit(airquality) > > > sum(a) > > [1] 37495.3 > > > sum(a[FALSE,]) > > Error in FUN(X[[i]], ...) : > > only defined on a data frame with all numeric variables > > > > Or, closer to an actual use case: > > > > > sum(subset(a, Ozone>100)) > > [1] 3330.5 > > > sum(subset(a, Ozone>200)) > > Error in FUN(X[[i]], ...) : > > only defined on a data frame with all numeric variables > > > > > > However, given that numeric summaries generally treat logicals as 0/1, > > wouldn't it be easiest just to extend the check inside Summary.data.frame > > with "&& !is.logical(x)"? > > > > > sum(as.matrix(a[FALSE,])) > > [1] 0 > > > > -pd > > > > > On 17 Oct 2020, at 21:18 , Martin <r...@mb706.com> wrote: > > > > > > The "Summary" group generics always throw errors for a data.frame with > > > zero rows, for example: > > >> sum(data.frame(x = numeric(0))) > > > #> Error in FUN(X[[i]], ...) : > > > #> only defined on a data frame with all numeric variables > > > Same behaviour for min, max, any, all, ... . I believe this is > > > inconsistent with what these methods do for other empty objects (vectors, > > > matrices), where the return value is chosen to ensure transitivity: > > > sum(numeric(0)) == 0. > > > > > > The reason for this is that the return type of as.matrix() for empty (no > > > rows or no columns) data.frame objects is always a matrix of type > > > "logical". The Summary method for data.frame, in turn, throws an error > > > when the data.frame, converted to a matrix, is not of numeric type. > > > > > > I suggest two ways that make sum, min, max, ... more consistent. IMHO it > > > would be fitting to implement both of these fixes, because they also make > > > other things more consistent. > > > > > > 1. Make the return type of as.matrix() for zero-row data.frames > > > consistent with the type that would have been returned, had the > > > data.frame had more than zero rows. "as.matrix(data.frame(x = > > > numeric(0)))" should then be numeric, if there is an empty "character" > > > column the return matrix should be a character etc. This would make > > > subsetting by row and conversion to matrix commute (except for row names > > > sometimes): > > >> all.equal(as.matrix(df[rows, , drop = FALSE]), as.matrix(df)[rows, , > > >> drop = FALSE]) > > > Furthermore, this change would make as.matrix.data.frame obey the > > > documentation, which indicates that the coercion hierarchy is used for > > > the return type. > > > > > > 2. Make the Summary.data.frame method accept data.frames that produce > > > non-numeric matrices. Next to the main focus of this message, I believe > > > it would e.g. be fitting to have any() and all() work on logical > > > data.frame objects. The current behaviour is such that > > >> any(data.frame(x = 1)) > > > #> [1] TRUE > > > #> Warning message: > > > #> In any(1, na.rm = FALSE) : coercing argument of type 'double' to > > > logical > > > and > > >> any(data.frame(x = TRUE)) > > > #> Error in FUN(X[[i]], ...) : > > > #> only defined on a data frame with all numeric variables > > > So a numeric data.frame warns about implicit coercion, while a logical > > > data.frame (which would not need coercion) does not work at all. > > > > > > (I feel more strongly about fixing 1. than 2., because I don't know the > > > discussion that lead to the behaviour described in 2.) > > > > > > Best, > > > Martin > > > > > > ______________________________________________ > > > R-devel@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > > Peter Dalgaard, Professor, > > Center for Statistics, Copenhagen Business School > > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > > Phone: (+45)38153501 > > Office: A 4.23 > > Email: pd....@cbs.dk Priv: pda...@gmail.com > > > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel