[ https://issues.apache.org/jira/browse/ARROW-12994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nic Crane updated ARROW-12994: ------------------------------ Comment: was deleted (was: [~pachamaltese] - I think this is a known bug and is due to the tests not being quite right. I'm asking this as I remember seeing a similar issue, which [~jonkeane] has reported to R-devel: [https://r.789695.n4.nabble.com/Should-all-equal-POSIXt-respect-check-attributes-td4769007.html] See also: [https://github.com/apache/arrow/pull/10334#issuecomment-848155516] Does it run correctly if you update all instances of `expect_equal` to `expect_equivalent` from the appropriate tests (search for 'test_that("strptime"' within 'r/tests/testthat/test-dplyr-string-functions.R')?) > [R] stringr tests: 4 hours of difference between arrow and strptime > ------------------------------------------------------------------- > > Key: ARROW-12994 > URL: https://issues.apache.org/jira/browse/ARROW-12994 > Project: Apache Arrow > Issue Type: Task > Components: R > Affects Versions: 4.0.1 > Reporter: Mauricio 'PachĂĄ' Vargas SepĂșlveda > Priority: Major > > Here's the problem I detected while triaging tickets. > This was run locally after merging from apache/arrow at commit 8773b9d and > re-building both Arrow library and Arrow R package. > {code:r} > library(arrow) > #> See arrow_info() for available features > #> > #> Attaching package: 'arrow' > #> The following object is masked from 'package:utils': > #> > #> timestamp > library(dplyr) > #> > #> Attaching package: 'dplyr' > #> The following objects are masked from 'package:stats': > #> > #> filter, lag > #> The following objects are masked from 'package:base': > #> > #> intersect, setdiff, setequal, union > library(testthat) > #> > #> Attaching package: 'testthat' > #> The following object is masked from 'package:dplyr': > #> > #> matches > #> The following object is masked from 'package:arrow': > #> > #> matches > tstring <- tibble(x = c("08-05-2008", NA)) > tstamp <- tibble(x = c(strptime("08-05-2008", format = "%m-%d-%Y"), NA)) > expect_equal( > tstring %>% > Table$create() %>% > mutate( > x = strptime(x, format = "%m-%d-%Y") > ) %>% > collect(), > tstamp, > check.tzone = FALSE > ) > #> Error: `%>%`(...) not equal to `tstamp`. > #> Component "x": Mean absolute difference: 14400 > {code} > We can see that the dates are different by exact 4 hours by removing the > expectation: > {code:r} > library(arrow) > #> See arrow_info() for available features > #> > #> Attaching package: 'arrow' > #> The following object is masked from 'package:utils': > #> > #> timestamp > library(dplyr) > #> > #> Attaching package: 'dplyr' > #> The following objects are masked from 'package:stats': > #> > #> filter, lag > #> The following objects are masked from 'package:base': > #> > #> intersect, setdiff, setequal, union > library(testthat) > #> > #> Attaching package: 'testthat' > #> The following object is masked from 'package:dplyr': > #> > #> matches > #> The following object is masked from 'package:arrow': > #> > #> matches > tstring <- tibble(x = c("08-05-2008", NA)) > tstamp <- tibble(x = c(strptime("08-05-2008", format = "%m-%d-%Y"), NA)) > tstring %>% > Table$create() %>% > mutate( > x = strptime(x, format = "%m-%d-%Y") > ) %>% > collect() > #> # A tibble: 2 x 1 > #> x > #> <dttm> > #> 1 2008-08-04 20:00:00 > #> 2 NA > tstamp > #> # A tibble: 2 x 1 > #> x > #> <dttm> > #> 1 2008-08-05 00:00:00 > #> 2 NA > {code} > _Created on 2021-06-07 by the [reprex package|https://reprex.tidyverse.org] > (v2.0.0)_ -- This message was sent by Atlassian Jira (v8.3.4#803005)