[GitHub] [arrow] rok commented on a change in pull request #11358: ARROW-12820: [C++] Support zone offset in ISO8601, strptime parser

GitBox Fri, 08 Oct 2021 16:14:39 -0700


rok commented on a change in pull request #11358:
URL: https://github.com/apache/arrow/pull/11358#discussion_r725374276




##########
File path: cpp/src/arrow/compute/kernels/scalar_string_test.cc
##########
@@ -1429,6 +1430,17 @@ TYPED_TEST(TestStringKernels, Strptime) {
   this->CheckUnary("strptime", input1, timestamp(TimeUnit::MICRO), output1, 
&options);
 }
 
+TYPED_TEST(TestStringKernels, StrptimeZoneOffset) {
+  if (!arrow::internal::kStrptimeSupportsZone) {
+    GTEST_SKIP() << "strptime does not support %z on this platform";
+  }
+  std::string input1 = R"(["5/1/2020 +01", null, "12/11/1900 -01:30"])";
+  std::string output1 =
+      R"(["2020-04-30T23:00:00.000000", null, "1900-12-11T01:30:00.000000"])";
+  StrptimeOptions options("%m/%d/%Y %z", TimeUnit::MICRO);
+  this->CheckUnary("strptime", input1, timestamp(TimeUnit::MICRO), output1, 
&options);

Review comment:
       > Now, I think the first one is reasonable to continue interpreting in 
this way. And the second seems reasonable to interpret as `timestamp[s, 
"UTC"]`. However, this is a user-visible change in behavior. Do we also want to 
reflect that in other places we use the ISO8601 parser?
   
   Do we know how breaking this would be? On that note [comparison doesn't 
error when comparing timezone-aware to timezone-naive 
timestamp](https://issues.apache.org/jira/browse/ARROW-13081) at the moment.
   
   > And what about the third? Do we error? (This wasn't previously parseable 
before.) Joris touches on this above as well.
   
   I'd find it odd to see mixed formats in my data and would want to know about 
it. So I'd propose erroring and allow for error handling options in strptime 
(e.g. return null if ambiguous, assume a given timezone if missing).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] rok commented on a change in pull request #11358: ARROW-12820: [C++] Support zone offset in ISO8601, strptime parser

Reply via email to