lidavidm commented on a change in pull request #11358:
URL: https://github.com/apache/arrow/pull/11358#discussion_r725266070
##########
File path: cpp/src/arrow/compute/kernels/scalar_string_test.cc
##########
@@ -1429,6 +1430,17 @@ TYPED_TEST(TestStringKernels, Strptime) {
this->CheckUnary("strptime", input1, timestamp(TimeUnit::MICRO), output1,
&options);
}
+TYPED_TEST(TestStringKernels, StrptimeZoneOffset) {
+ if (!arrow::internal::kStrptimeSupportsZone) {
+ GTEST_SKIP() << "strptime does not support %z on this platform";
+ }
+ std::string input1 = R"(["5/1/2020 +01", null, "12/11/1900 -01:30"])";
+ std::string output1 =
+ R"(["2020-04-30T23:00:00.000000", null, "1900-12-11T01:30:00.000000"])";
+ StrptimeOptions options("%m/%d/%Y %z", TimeUnit::MICRO);
+ this->CheckUnary("strptime", input1, timestamp(TimeUnit::MICRO), output1,
&options);
Review comment:
Sorry, I think I made things unclear here. This is what I'm worried
about. CSV uses the ISO8601 parser by default. All these CSV files previously
and currently result in a timestamp without timezone column:
```
ts
2010-01-01T00:00:00
```
```
ts
2010-01-01T00:00:00-0100
```
```
ts
2010-01-01T00:00:00-0100
2010-01-01T00:00:00
```
Now, I think the first one is reasonable to continue interpreting in this
way. And the second seems reasonable to interpret as `timestamp[s, "UTC"]`.
However, this is a user-visible change in behavior. Do we also want to reflect
that in other places we use the ISO8601 parser?
And what about the third? Do we error? (This wasn't previously parseable
before.) Joris touches on this above as well.
For strptime, yes, I meant that if the user provides `%z` then it seems
reasonable to expect that they wanted a timestamp with timezone as the result,
and otherwise it's reasonable to give back timestamp without timezone. I don't
think strptime is a worry here, but rather the ISO8601 parser.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]