nealrichardson commented on a change in pull request #10334:
URL: https://github.com/apache/arrow/pull/10334#discussion_r633148838
##########
File path: r/R/compute.R
##########
@@ -286,3 +286,8 @@ cast_options <- function(safe = TRUE, ...) {
)
modifyList(opts, list(...))
}
+
+strptime_arrow <- function(..., format, unit){
+ a <- collect_arrays_from_dots(list(...))
Review comment:
I don't think you want `collect_arrays_from_dots` here. This function
exists to support the base R behavior like:
```
> sum(1, 2)
[1] 3
```
But `strptime` doesn't take `...` like that.
##########
File path: r/R/compute.R
##########
@@ -286,3 +286,8 @@ cast_options <- function(safe = TRUE, ...) {
)
modifyList(opts, list(...))
}
+
+strptime_arrow <- function(..., format, unit){
Review comment:
I'm not sure how useful this function is since it is a thin wrapper
around `call_function()` and we can't set it as an S3 method.. More useful
would be to add a version of this in the `nse_funcs` in dplyr-functions.R.
In either case, we should match the `base::strptime()` signature: `function
(x, format, tz = "")` with the possible addition of `unit` if that's an Arrow
feature.
Also, should `format` and `unit` have default arguments?
##########
File path: r/tests/testthat/test-Array.R
##########
@@ -291,6 +291,17 @@ test_that("Timezone handling in Arrow roundtrip
(ARROW-3543)", {
expect_identical(read_feather(feather_file), df)
})
+test_that("strptime", {
+ # array of strings
+ time_strings <- Array$create(c("2018-10-07 19:04:05", NA))
+ # array of timestamps (doesn't work if tz="" is added!)
+ timestamps <- Array$create(c(as.POSIXct("2018-10-07 19:04:05"), NA))
+ # array of parsed timestamps
+ parsed_timestamps <- strptime_arrow(time_strings, format = "%Y-%m-%d
%H:%M:%S", unit = TimeUnit$MICRO)
Review comment:
`unit` should also take human-friendly strings ("s", "ms", etc.); see
how this is done in the `timestamp()` function in type.R.
##########
File path: r/tests/testthat/test-Array.R
##########
@@ -291,6 +291,17 @@ test_that("Timezone handling in Arrow roundtrip
(ARROW-3543)", {
expect_identical(read_feather(feather_file), df)
})
+test_that("strptime", {
+ # array of strings
+ time_strings <- Array$create(c("2018-10-07 19:04:05", NA))
+ # array of timestamps (doesn't work if tz="" is added!)
Review comment:
Why not?
##########
File path: r/src/compute.cpp
##########
@@ -233,6 +233,12 @@ std::shared_ptr<arrow::compute::FunctionOptions>
make_compute_options(
max_replacements);
}
+ if (func_name == "strptime") {
+ using Options = arrow::compute::StrptimeOptions;
Review comment:
Does `StrptimeOptions` have a `Defaults()` method in C++? If so, we
should call it.
##########
File path: r/src/compute.cpp
##########
@@ -233,6 +233,12 @@ std::shared_ptr<arrow::compute::FunctionOptions>
make_compute_options(
max_replacements);
}
+ if (func_name == "strptime") {
+ using Options = arrow::compute::StrptimeOptions;
Review comment:
Yeah looks like it. I made ARROW-12809 to evaluate whether that's
correct, but for the purposes of the PR, it's fine.
##########
File path: r/tests/testthat/test-Array.R
##########
@@ -291,6 +291,17 @@ test_that("Timezone handling in Arrow roundtrip
(ARROW-3543)", {
expect_identical(read_feather(feather_file), df)
})
+test_that("strptime", {
+ # array of strings
+ time_strings <- Array$create(c("2018-10-07 19:04:05", NA))
+ # array of timestamps (doesn't work if tz="" is added!)
Review comment:
You're right that strptime in the C++ library doesn't take a timezone
argument. Maybe it is expected that if there is a timezone, it will be encoded
in the string and parseable by strptime (with the right format string)? But
this gets us into the always tricky area of timezone-aware vs. timezone-naive
data. @jorisvandenbossche do you have any thoughts/experience with this code?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]