paleolimbot opened a new pull request #11652: URL: https://github.com/apache/arrow/pull/11652
This PR updates the floor division dplyr translation to better respect the input types (as determined by how this would be done in R). The main change is the output type: `integer_type_1 %/% integer_type_2` will now have the same type as `integer_type_1` and everything else has the same type as `floor(arg1 / arg2)`. As a side effect, floor division by zero is `Inf` rather than the maximum integer value (unless you try to floor divide by `0L`...see below). A few things that need some hashing out: - Floor division by `0L` results in the max integer value rather than `NA`. This is, I think, because it's how cast (even with `safe = TRUE`) to integer from `Inf`. That is perhaps a different issue than this one? - There's [some tests for floor division for arrays outside a dplyr verb](https://github.com/apache/arrow/blob/master/r/tests/testthat/test-compute-arith.R#L64-L94) that appear to be using a [completely different translation logic](https://github.com/apache/arrow/blob/master/r/R/compute.R). I didn't update those tests or that logic because it seemed like a different issue to me (maybe needs to implement the Math and/or Ops group generics or more S3 methods for the array class?). Reprex before this PR: <details> ``` r # remotes::install_github("apache/arrow/r") library(arrow, warn.conflicts = FALSE) library(dplyr, warn.conflicts = FALSE) tbl <- tibble::tibble( integers = c(1:4, NA_integer_), doubles = c(as.numeric(1:4), NA_real_) ) tbl %>% mutate( int_div_dbl = integers %/% 2, int_div_int = integers %/% 2L, int_div_zero_int = integers %/% 0L, int_div_zero_dbl = integers %/% 0, dbl_div_dbl = doubles %/% 2, dbl_div_int = doubles %/% 2L, dbl_div_zero_int = doubles %/% 0L, dbl_div_zero_dbl = doubles %/% 0 ) %>% glimpse() #> Rows: 5 #> Columns: 10 #> $ integers <int> 1, 2, 3, 4, NA #> $ doubles <dbl> 1, 2, 3, 4, NA #> $ int_div_dbl <dbl> 0, 1, 1, 2, NA #> $ int_div_int <int> 0, 1, 1, 2, NA #> $ int_div_zero_int <int> NA, NA, NA, NA, NA #> $ int_div_zero_dbl <dbl> Inf, Inf, Inf, Inf, NA #> $ dbl_div_dbl <dbl> 0, 1, 1, 2, NA #> $ dbl_div_int <dbl> 0, 1, 1, 2, NA #> $ dbl_div_zero_int <dbl> Inf, Inf, Inf, Inf, NA #> $ dbl_div_zero_dbl <dbl> Inf, Inf, Inf, Inf, NA RecordBatch$create(!!! tbl) %>% mutate( int_div_dbl = integers %/% 2, int_div_int = integers %/% 2L, int_div_zero_int = integers %/% 0L, int_div_zero_dbl = integers %/% 0, dbl_div_dbl = doubles %/% 2, dbl_div_int = doubles %/% 2L, dbl_div_zero_int = doubles %/% 0L, dbl_div_zero_dbl = doubles %/% 0, ) %>% collect() %>% glimpse() #> Rows: 5 #> Columns: 10 #> $ integers <int> 1, 2, 3, 4, NA #> $ doubles <dbl> 1, 2, 3, 4, NA #> $ int_div_dbl <int> 0, 1, 1, 2, NA #> $ int_div_int <int> 0, 1, 1, 2, NA #> $ int_div_zero_int <int> 2147483647, 2147483647, 2147483647, 2147483647, NA #> $ int_div_zero_dbl <int> 2147483647, 2147483647, 2147483647, 2147483647, NA #> $ dbl_div_dbl <int> 0, 1, 1, 2, NA #> $ dbl_div_int <int> 0, 1, 1, 2, NA #> $ dbl_div_zero_int <int> 2147483647, 2147483647, 2147483647, 2147483647, NA #> $ dbl_div_zero_dbl <int> 2147483647, 2147483647, 2147483647, 2147483647, NA ``` <sup>Created on 2021-11-09 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup> </details> Reprex after this PR: <details> ``` r # remotes::install_github("paleolimbot/arrow/r@r-floor-div") library(arrow, warn.conflicts = FALSE) library(dplyr, warn.conflicts = FALSE) tbl <- tibble::tibble( integers = c(1:4, NA_integer_), doubles = c(as.numeric(1:4), NA_real_) ) tbl %>% mutate( int_div_dbl = integers %/% 2, int_div_int = integers %/% 2L, int_div_zero_int = integers %/% 0L, int_div_zero_dbl = integers %/% 0, dbl_div_dbl = doubles %/% 2, dbl_div_int = doubles %/% 2L, dbl_div_zero_int = doubles %/% 0L, dbl_div_zero_dbl = doubles %/% 0 ) %>% glimpse() #> Rows: 5 #> Columns: 10 #> $ integers <int> 1, 2, 3, 4, NA #> $ doubles <dbl> 1, 2, 3, 4, NA #> $ int_div_dbl <dbl> 0, 1, 1, 2, NA #> $ int_div_int <int> 0, 1, 1, 2, NA #> $ int_div_zero_int <int> NA, NA, NA, NA, NA #> $ int_div_zero_dbl <dbl> Inf, Inf, Inf, Inf, NA #> $ dbl_div_dbl <dbl> 0, 1, 1, 2, NA #> $ dbl_div_int <dbl> 0, 1, 1, 2, NA #> $ dbl_div_zero_int <dbl> Inf, Inf, Inf, Inf, NA #> $ dbl_div_zero_dbl <dbl> Inf, Inf, Inf, Inf, NA RecordBatch$create(!!! tbl) %>% mutate( int_div_dbl = integers %/% 2, int_div_int = integers %/% 2L, int_div_zero_int = integers %/% 0L, int_div_zero_dbl = integers %/% 0, dbl_div_dbl = doubles %/% 2, dbl_div_int = doubles %/% 2L, dbl_div_zero_int = doubles %/% 0L, dbl_div_zero_dbl = doubles %/% 0, ) %>% collect() %>% glimpse() #> Rows: 5 #> Columns: 10 #> $ integers <int> 1, 2, 3, 4, NA #> $ doubles <dbl> 1, 2, 3, 4, NA #> $ int_div_dbl <dbl> 0, 1, 1, 2, NA #> $ int_div_int <int> 0, 1, 1, 2, NA #> $ int_div_zero_int <int> 2147483647, 2147483647, 2147483647, 2147483647, NA #> $ int_div_zero_dbl <dbl> Inf, Inf, Inf, Inf, NA #> $ dbl_div_dbl <dbl> 0, 1, 1, 2, NA #> $ dbl_div_int <dbl> 0, 1, 1, 2, NA #> $ dbl_div_zero_int <dbl> Inf, Inf, Inf, Inf, NA #> $ dbl_div_zero_dbl <dbl> Inf, Inf, Inf, Inf, NA ``` <sup>Created on 2021-11-09 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup> </details> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
