[ https://issues.apache.org/jira/browse/ARROW-12994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nic Crane updated ARROW-12994: ------------------------------ Summary: [R] stringr tests fail on non-UTC machines due to strptime defaulting to local timezone and Arrow defaulting to UTC (was: [R] stringr tests fails due to strptime defaulting to local timezone and Arrow defaulting to UTC) > [R] stringr tests fail on non-UTC machines due to strptime defaulting to > local timezone and Arrow defaulting to UTC > -------------------------------------------------------------------------------------------------------------------- > > Key: ARROW-12994 > URL: https://issues.apache.org/jira/browse/ARROW-12994 > Project: Apache Arrow > Issue Type: Task > Components: R > Affects Versions: 4.0.1 > Reporter: Mauricio 'PachĂĄ' Vargas SepĂșlveda > Priority: Major > > Here's the problem I detected while triaging tickets. > This was run locally after merging from apache/arrow at commit 8773b9d and > re-building both Arrow library and Arrow R package. > {code:r} > library(arrow) > #> See arrow_info() for available features > #> > #> Attaching package: 'arrow' > #> The following object is masked from 'package:utils': > #> > #> timestamp > library(dplyr) > #> > #> Attaching package: 'dplyr' > #> The following objects are masked from 'package:stats': > #> > #> filter, lag > #> The following objects are masked from 'package:base': > #> > #> intersect, setdiff, setequal, union > library(testthat) > #> > #> Attaching package: 'testthat' > #> The following object is masked from 'package:dplyr': > #> > #> matches > #> The following object is masked from 'package:arrow': > #> > #> matches > tstring <- tibble(x = c("08-05-2008", NA)) > tstamp <- tibble(x = c(strptime("08-05-2008", format = "%m-%d-%Y"), NA)) > expect_equal( > tstring %>% > Table$create() %>% > mutate( > x = strptime(x, format = "%m-%d-%Y") > ) %>% > collect(), > tstamp, > check.tzone = FALSE > ) > #> Error: `%>%`(...) not equal to `tstamp`. > #> Component "x": Mean absolute difference: 14400 > {code} > We can see that the dates are different by exact 4 hours by removing the > expectation: > {code:r} > library(arrow) > #> See arrow_info() for available features > #> > #> Attaching package: 'arrow' > #> The following object is masked from 'package:utils': > #> > #> timestamp > library(dplyr) > #> > #> Attaching package: 'dplyr' > #> The following objects are masked from 'package:stats': > #> > #> filter, lag > #> The following objects are masked from 'package:base': > #> > #> intersect, setdiff, setequal, union > library(testthat) > #> > #> Attaching package: 'testthat' > #> The following object is masked from 'package:dplyr': > #> > #> matches > #> The following object is masked from 'package:arrow': > #> > #> matches > tstring <- tibble(x = c("08-05-2008", NA)) > tstamp <- tibble(x = c(strptime("08-05-2008", format = "%m-%d-%Y"), NA)) > tstring %>% > Table$create() %>% > mutate( > x = strptime(x, format = "%m-%d-%Y") > ) %>% > collect() > #> # A tibble: 2 x 1 > #> x > #> <dttm> > #> 1 2008-08-04 20:00:00 > #> 2 NA > tstamp > #> # A tibble: 2 x 1 > #> x > #> <dttm> > #> 1 2008-08-05 00:00:00 > #> 2 NA > {code} > _Created on 2021-06-07 by the [reprex package|https://reprex.tidyverse.org] > (v2.0.0)_ -- This message was sent by Atlassian Jira (v8.3.4#803005)