rok commented on a change in pull request #11358:
URL: https://github.com/apache/arrow/pull/11358#discussion_r725374276
##########
File path: cpp/src/arrow/compute/kernels/scalar_string_test.cc
##########
@@ -1429,6 +1430,17 @@ TYPED_TEST(TestStringKernels, Strptime) {
this->CheckUnary("strptime", input1, timestamp(TimeUnit::MICRO), output1,
&options);
}
+TYPED_TEST(TestStringKernels, StrptimeZoneOffset) {
+ if (!arrow::internal::kStrptimeSupportsZone) {
+ GTEST_SKIP() << "strptime does not support %z on this platform";
+ }
+ std::string input1 = R"(["5/1/2020 +01", null, "12/11/1900 -01:30"])";
+ std::string output1 =
+ R"(["2020-04-30T23:00:00.000000", null, "1900-12-11T01:30:00.000000"])";
+ StrptimeOptions options("%m/%d/%Y %z", TimeUnit::MICRO);
+ this->CheckUnary("strptime", input1, timestamp(TimeUnit::MICRO), output1,
&options);
Review comment:
> Now, I think the first one is reasonable to continue interpreting in
this way. And the second seems reasonable to interpret as `timestamp[s,
"UTC"]`. However, this is a user-visible change in behavior. Do we also want to
reflect that in other places we use the ISO8601 parser?
Do we know how breaking this would be? On that note [comparison doesn't
error when comparing timezone-aware to timezone-naive
timestamp](https://issues.apache.org/jira/browse/ARROW-13081) at the moment.
> And what about the third? Do we error? (This wasn't previously parseable
before.) Joris touches on this above as well.
I'd find it odd to see mixed formats in my data and would want to know about
it. So I'd propose erroring and allow for error handling options in strptime
(e.g. return null if ambiguous, assume a given timezone if missing).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]