jorisvandenbossche commented on a change in pull request #10334:
URL: https://github.com/apache/arrow/pull/10334#discussion_r634269138
##########
File path: r/tests/testthat/test-Array.R
##########
@@ -291,6 +291,17 @@ test_that("Timezone handling in Arrow roundtrip
(ARROW-3543)", {
expect_identical(read_feather(feather_file), df)
})
+test_that("strptime", {
+ # array of strings
+ time_strings <- Array$create(c("2018-10-07 19:04:05", NA))
+ # array of timestamps (doesn't work if tz="" is added!)
Review comment:
> I'm not sure, can you cast `timestamp[ms]` to `timestamp[ms,
tz="UTC"]` or whatever (without modifying the values in the array, just to set
the tz)?
Casting actually works, but it's simply setting the tz and not changing the
actual values, so it's not necessarily the behaviour you would expect (the
behaviour would be correct if you assume the tz-naive data to be in UTC, but
that seems a wrong assumption).
> So timestamp's timezone is currently ignored and the local time is
returned. It might be good to document this or even block `%z` and `%Z` to
avoid surprises?
Indeed, it seems we now simply ignore any timezone information in `strptime`:
```
>>> pc.strptime(["2012-01-01 01:02:03+01:00"], format="%Y-%m-%d %H:%M:%S%z",
unit="s")
<pyarrow.lib.TimestampArray object at 0x7fad84855220>
[
2012-01-01 01:02:03
]
>>> pc.strptime(["2012-01-01 01:02:03+01:00"], format="%Y-%m-%d %H:%M:%S%Z",
unit="s").type
TimestampType(timestamp[s])
```
I can see some value in keeping that working, so you can at least parse
strings that include such information (otherwise you would always get an error
with arrow, or you would need to do some string preprocessing to be able to
pass them to `strptime`). But then we certainly need to document that.
On the other hand, if we want to support it in the future, that would change
behaviour and erroring now might then be better ..
It seems that at least some `strptime` implementation support `%z` offsets,
and store that in `tm->gmt_offset`, which we currently don't use
(https://code.woboq.org/userspace/glibc/time/strptime_l.c.html#776).
At least supporting fixed offsets (`%z`) seems doable (and the result could
then be a timestamp type with `tz="UTC"`), properly supporting `%Z` timezone
names will be harder.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]