jorisvandenbossche commented on a change in pull request #11358: URL: https://github.com/apache/arrow/pull/11358#discussion_r727129048
########## File path: cpp/src/arrow/compute/kernels/scalar_string_test.cc ########## @@ -1429,6 +1430,17 @@ TYPED_TEST(TestStringKernels, Strptime) { this->CheckUnary("strptime", input1, timestamp(TimeUnit::MICRO), output1, &options); } +TYPED_TEST(TestStringKernels, StrptimeZoneOffset) { + if (!arrow::internal::kStrptimeSupportsZone) { + GTEST_SKIP() << "strptime does not support %z on this platform"; + } + std::string input1 = R"(["5/1/2020 +01", null, "12/11/1900 -01:30"])"; + std::string output1 = + R"(["2020-04-30T23:00:00.000000", null, "1900-12-11T01:30:00.000000"])"; + StrptimeOptions options("%m/%d/%Y %z", TimeUnit::MICRO); + this->CheckUnary("strptime", input1, timestamp(TimeUnit::MICRO), output1, &options); Review comment: > And what about the third? Do we error? (This wasn't previously parseable before.) Joris touches on this above as well. I think we should error by default, but ideally with the option to force interpreting the ones without offset in UTC (or any specified timezone). For context, that's also more or less what pandas does. It doesn't error, but if you have a mixture of timezone-aware and naive timestamp strings while trying to convert them with `pd.to_datetime(..)` or `pd.DatetimeIndex(..)`, it will return "object"-dtype array with a mixture of timezone-aware and naive datetime objects (which is basically useless if you want to perform timestamp operations on that column). In Arrow we of course don't have the concept of "generic objects", so I think erroring is the closest alternative. The aforementioned pandas functions have a `utc=True` option to explicitly ask to return values with a proper timezone-aware datetime64 dtype (where the naive datetimes are interpreted to be in UTC, and the aware datetimes with an offset are converted to UTC). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org