jorisvandenbossche commented on a change in pull request #10334:
URL: https://github.com/apache/arrow/pull/10334#discussion_r634269138



##########
File path: r/tests/testthat/test-Array.R
##########
@@ -291,6 +291,17 @@ test_that("Timezone handling in Arrow roundtrip 
(ARROW-3543)", {
   expect_identical(read_feather(feather_file), df)
 })
 
+test_that("strptime", {
+  # array of strings
+  time_strings <- Array$create(c("2018-10-07 19:04:05", NA))
+  # array of timestamps (doesn't work if tz="" is added!)

Review comment:
       > I'm not sure, can you cast `timestamp[ms]` to `timestamp[ms, 
tz="UTC"]` or whatever (without modifying the values in the array, just to set 
the tz)?
   
   Casting actually works, but it's simply setting the tz and not changing the 
actual values, so it's not necessarily the behaviour you would expect (the 
behaviour would be correct if you assume the tz-naive data to be in UTC, but 
that seems a wrong assumption).
   
   > So timestamp's timezone is currently ignored and the local time is 
returned. It might be good to document this or even block `%z` and `%Z` to 
avoid surprises?
   
   Indeed, it seems we now simply ignore any timezone information in `strptime`:
   
   ```
   >>> pc.strptime(["2012-01-01 01:02:03+01:00"], format="%Y-%m-%d %H:%M:%S%z", 
unit="s")
   <pyarrow.lib.TimestampArray object at 0x7fad84855220>
   [
     2012-01-01 01:02:03
   ]
   >>> pc.strptime(["2012-01-01 01:02:03+01:00"], format="%Y-%m-%d %H:%M:%S%Z", 
unit="s").type
   TimestampType(timestamp[s])
   ```
   
   I can see some value in keeping that working, so you can at least parse 
strings that include such information (otherwise you would always get an error 
with arrow, or you would need to do some string preprocessing to be able to 
pass them to `strptime`). But then we certainly need to document that. 
   On the other hand, if we want to support it in the future, that would change 
behaviour and erroring now might then be better ..
   
   It seems that at least some `strptime` implementation support `%z` offsets, 
and store that in `tm->gmt_offset`, which we currently don't use 
(https://code.woboq.org/userspace/glibc/time/strptime_l.c.html#776). 
   At least supporting fixed offsets (`%z`) seems doable (and the result could 
then be a timestamp type with `tz="UTC"`), properly supporting `%Z` timezone 
names will be harder.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to