Hi all, I wanted to share with you an experimental package that I’m currently working on: vctrs, <https://github.com/r-lib/vctrs>. The motivation for vctrs is to think deeply about the output “type” of functions like `c()`, `ifelse()`, and `rbind()`, with an eye to implementing one strategy throughout the tidyverse (i.e. all the functions listed at <https://github.com/r-lib/vctrs#tidyverse-functions>). Because this is going to be a big change, I thought it would be very useful to get comments from a wide audience, so I’m reaching out to R-devel to get your thoughts.
There is quite a lot already in the readme (<https://github.com/r-lib/vctrs#vctrs>), so here I’ll try to motivate vctrs as succinctly as possible by comparing `base::c()` to its equivalent `vctrs::vec_c()`. I think the drawbacks of `c()` are well known, but to refresh your memory, I’ve highlighted a few at <https://github.com/r-lib/vctrs#compared-to-base-r>. I think they arise because of two main challenges: `c()` has to both combine vectors *and* strip attributes, and it only dispatches on the first argument. The design of vctrs is largely driven by a pair of principles: - The type of `vec_c(x, y)` should be the same as `vec_c(y, x)` - The type of `vec_c(x, vec_c(y, z))` should be the same as `vec_c(vec_c(x, y), z)` i.e. the type should be associative and commutative. I think these are good principles because they makes types simpler to understand and to implement. Method dispatch for `vec_c()` is quite simple because associativity and commutativity mean that we can determine the output type only by considering a pair of inputs at a time. To this end, vctrs provides `vec_type2()` which takes two inputs and returns their common type (represented as zero length vector): str(vec_type2(integer(), double())) #> num(0) str(vec_type2(factor("a"), factor("b"))) #> Factor w/ 2 levels "a","b": # NB: not all types have a common/unifying type str(vec_type2(Sys.Date(), factor("a"))) #> Error: No common type for date and factor (`vec_type()` currently implements double dispatch through a combination of S3 dispatch and if-else blocks, but this will change to a pure S3 approach in the near future.) To find the common type of multiple vectors, we can use `Reduce()`: vecs <- list(TRUE, 1:10, 1.5) type <- Reduce(vec_type2, vecs) str(type) #> num(0) There’s one other piece of the puzzle: casting one vector to another type. That’s implemented by `vec_cast()` (which also uses double dispatch): str(lapply(vecs, vec_cast, to = type)) #> List of 3 #> $ : num 1 #> $ : num [1:10] 1 2 3 4 5 6 7 8 9 10 #> $ : num 1.5 All up, this means that we can implement the essence of `vec_c()` in only a few lines: vec_c2 <- function(...) { args <- list(...) type <- Reduce(vec_type, args) cast <- lapply(type, vec_cast, to = type) unlist(cast, recurse = FALSE) } vec_c(factor("a"), factor("b")) #> [1] a b #> Levels: a b vec_c(Sys.Date(), Sys.time()) #> [1] "2018-08-06 00:00:00 CDT" "2018-08-06 11:20:32 CDT" (The real implementation is little more complex: <https://github.com/r-lib/vctrs/blob/master/R/c.R>) On top of this foundation, vctrs expands in a few different ways: - To consider the “type” of a data frame, and what the common type of two data frames should be. This leads to a natural implementation of `vec_rbind()` which includes all columns that appear in any input. - To create a new “list\_of” type, a list where every element is of fixed type (enforced by `[<-`, `[[<-`, and `$<-`) - To think a little about the “shape” of a vector, and to consider recycling as part of the type system. (This thinking is not yet fully fleshed out) Thanks for making it to the bottom of this long email :) I would love to hear your thoughts on vctrs. It’s something that I’ve been having a lot of fun exploring, and I’d like to make sure it is as robust as possible (and the motivations are as clear as possible) before we start using it in other packages. Hadley -- http://hadley.nz ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel