[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7604: ARROW-9223: [Python] Propagate Timzone information in pandas conversion

GitBox Thu, 02 Jul 2020 00:50:17 -0700


jorisvandenbossche commented on a change in pull request #7604:
URL: https://github.com/apache/arrow/pull/7604#discussion_r448801572




##########
File path: cpp/src/arrow/python/arrow_to_pandas.cc
##########
@@ -642,24 +641,27 @@ inline Status ConvertStruct(const PandasOptions& options, 
const ChunkedArray& da
   std::vector<OwnedRef> fields_data(num_fields);
   OwnedRef dict_item;
 
-  // XXX(wesm): In ARROW-7723, we found as a result of ARROW-3789 that second
+  // In ARROW-7723, we found as a result of ARROW-3789 that second
   // through microsecond resolution tz-aware timestamps were being promoted to
   // use the DATETIME_NANO_TZ conversion path, yielding a datetime64[ns] NumPy
   // array in this function. PyArray_GETITEM returns datetime.datetime for
   // units second through microsecond but PyLong for nanosecond (because
-  // datetime.datetime does not support nanoseconds). We inserted this hack to
-  // preserve the <= 0.15.1 behavior until a better solution can be devised
+  // datetime.datetime does not support nanoseconds).
+  // We force the object conversion to preserve the value of the timezone.
   PandasOptions modified_options = options;
-  modified_options.ignore_timezone = true;
   modified_options.coerce_temporal_nanoseconds = false;
 
   for (int c = 0; c < data.num_chunks(); c++) {
     auto arr = checked_cast<const StructArray*>(data.chunk(c).get());
     // Convert the struct arrays first
     for (int32_t i = 0; i < num_fields; i++) {
+      // Se notes above about conversion.
+      std::shared_ptr<Array> field = arr->field(static_cast<int>(i));
+      modified_options.timestamp_as_object =
+          field->type()->id() == Type::TIMESTAMP &&
+          !checked_cast<const 
TimestampType&>(*field->type()).timezone().empty();

Review comment:
       We could maybe also *always* use the object path? (so also when there is 
no timezone) 
   The PyArray_GETITEM will convert it to a python datetime.datetime object 
later anyway, if I understand correctly? So then we can maybe already do this 
conversion ourselves also in the tz-naive case.

##########
File path: python/pyarrow/tests/test_pandas.py
##########
@@ -3321,9 +3321,12 @@ def test_cast_timestamp_unit():
     assert result.equals(expected)
 
 
-def test_struct_with_timestamp_tz():
+def test_nested_with_timestamp_tz():
     # ARROW-7723
     ts = pd.Timestamp.now()
+    # This is used for verifying timezone conversion to micros are not
+    # important
+    ts_dt = ts.to_pydatetime().replace(microsecond=0)

Review comment:
       Why is this needed? (or, I suppose this is only needed if the unit is 
's' or 'ms' ? If so, I would rather do this only in those cases, and for 'us' 
unit actually check the microseconds are also correct)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7604: ARROW-9223: [Python] Propagate Timzone information in pandas conversion

Reply via email to