[GitHub] [arrow] itamarst commented on a change in pull request #7169: ARROW-5359: [Python] Support non-nanosecond out-of-range timestamps in conversion to pandas

2020-06-17 Thread GitBox


itamarst commented on a change in pull request #7169:
URL: https://github.com/apache/arrow/pull/7169#discussion_r441594967



##
File path: cpp/src/arrow/python/arrow_to_pandas.cc
##
@@ -1688,8 +1698,12 @@ static Status GetPandasWriterType(const ChunkedArray& 
data, const PandasOptions&
   break;
 case Type::TIMESTAMP: {
   const auto& ts_type = checked_cast(*data.type());
-  // XXX: Hack here for ARROW-7723
-  if (ts_type.timezone() != "" && !options.ignore_timezone) {
+  if (options.timestamp_as_object && ts_type.unit() != TimeUnit::NANO) {
+// Nanoseconds are never out of bounds for pandas, so in that case

Review comment:
   Oh I see what you mean. OK.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] itamarst commented on a change in pull request #7169: ARROW-5359: [Python] Support non-nanosecond out-of-range timestamps in conversion to pandas

2020-06-17 Thread GitBox


itamarst commented on a change in pull request #7169:
URL: https://github.com/apache/arrow/pull/7169#discussion_r441564235



##
File path: cpp/src/arrow/python/arrow_to_pandas.cc
##
@@ -1688,8 +1698,12 @@ static Status GetPandasWriterType(const ChunkedArray& 
data, const PandasOptions&
   break;
 case Type::TIMESTAMP: {
   const auto& ts_type = checked_cast(*data.type());
-  // XXX: Hack here for ARROW-7723
-  if (ts_type.timezone() != "" && !options.ignore_timezone) {
+  if (options.timestamp_as_object && ts_type.unit() != TimeUnit::NANO) {
+// Nanoseconds are never out of bounds for pandas, so in that case

Review comment:
   I don't understand this comment—is there an extra negative clause in 
here? It seems the opposite of the code.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] itamarst commented on a change in pull request #7169: ARROW-5359: [Python] Support non-nanosecond out-of-range timestamps in conversion to pandas

2020-06-11 Thread GitBox


itamarst commented on a change in pull request #7169:
URL: https://github.com/apache/arrow/pull/7169#discussion_r438971377



##
File path: python/pyarrow/tests/test_pandas.py
##
@@ -3941,3 +3946,63 @@ def test_metadata_compat_missing_field_name():
 result = table.to_pandas()
 # on python 3.5 the column order can differ -> adding check_like=True
 tm.assert_frame_equal(result, expected, check_like=True)
+
+
+def make_df_with_timestamps():
+# Some of the milliseconds timestamps deliberately don't fit in the range
+# that is possible with nanosecond timestamps.
+df = pd.DataFrame({
+'dateTimeMs': [
+np.datetime64('0001-01-01 00:00', 'ms'),
+np.datetime64('2012-05-02 12:35', 'ms'),
+np.datetime64('2012-05-03 15:42', 'ms'),
+np.datetime64('3000-05-03 15:42', 'ms'),
+],
+'dateTimeNs': [
+np.datetime64('1991-01-01 00:00', 'ns'),
+np.datetime64('2012-05-02 12:35', 'ns'),
+np.datetime64('2012-05-03 15:42', 'ns'),
+np.datetime64('2050-05-03 15:42', 'ns'),
+],
+})
+# Not part of what we're testing, just ensuring that the inputs are what we
+# expect.
+assert (df.dateTimeMs.dtype, df.dateTimeNs.dtype) == (
+# O == object, 

[GitHub] [arrow] itamarst commented on a change in pull request #7169: ARROW-5359: [Python] Support non-nanosecond out-of-range timestamps in conversion to pandas

2020-06-11 Thread GitBox


itamarst commented on a change in pull request #7169:
URL: https://github.com/apache/arrow/pull/7169#discussion_r438967894



##
File path: python/pyarrow/tests/test_pandas.py
##
@@ -3941,3 +3946,63 @@ def test_metadata_compat_missing_field_name():
 result = table.to_pandas()
 # on python 3.5 the column order can differ -> adding check_like=True
 tm.assert_frame_equal(result, expected, check_like=True)
+
+
+def make_df_with_timestamps():
+# Some of the milliseconds timestamps deliberately don't fit in the range
+# that is possible with nanosecond timestamps.
+df = pd.DataFrame({
+'dateTimeMs': [
+np.datetime64('0001-01-01 00:00', 'ms'),
+np.datetime64('2012-05-02 12:35', 'ms'),
+np.datetime64('2012-05-03 15:42', 'ms'),
+np.datetime64('3000-05-03 15:42', 'ms'),
+],
+'dateTimeNs': [
+np.datetime64('1991-01-01 00:00', 'ns'),
+np.datetime64('2012-05-02 12:35', 'ns'),
+np.datetime64('2012-05-03 15:42', 'ns'),
+np.datetime64('2050-05-03 15:42', 'ns'),
+],
+})
+# Not part of what we're testing, just ensuring that the inputs are what we
+# expect.
+assert (df.dateTimeMs.dtype, df.dateTimeNs.dtype) == (
+# O == object, 

[GitHub] [arrow] itamarst commented on a change in pull request #7169: ARROW-5359: [Python] Support non-nanosecond out-of-range timestamps in conversion to pandas

2020-06-10 Thread GitBox


itamarst commented on a change in pull request #7169:
URL: https://github.com/apache/arrow/pull/7169#discussion_r438222016



##
File path: python/pyarrow/tests/test_pandas.py
##
@@ -3941,3 +3946,63 @@ def test_metadata_compat_missing_field_name():
 result = table.to_pandas()
 # on python 3.5 the column order can differ -> adding check_like=True
 tm.assert_frame_equal(result, expected, check_like=True)
+
+
+def make_df_with_timestamps():
+# Some of the milliseconds timestamps deliberately don't fit in the range
+# that is possible with nanosecond timestamps.
+df = pd.DataFrame({
+'dateTimeMs': [
+np.datetime64('0001-01-01 00:00', 'ms'),
+np.datetime64('2012-05-02 12:35', 'ms'),
+np.datetime64('2012-05-03 15:42', 'ms'),
+np.datetime64('3000-05-03 15:42', 'ms'),
+],
+'dateTimeNs': [
+np.datetime64('1991-01-01 00:00', 'ns'),
+np.datetime64('2012-05-02 12:35', 'ns'),
+np.datetime64('2012-05-03 15:42', 'ns'),
+np.datetime64('2050-05-03 15:42', 'ns'),
+],
+})
+# Not part of what we're testing, just ensuring that the inputs are what we
+# expect.
+assert (df.dateTimeMs.dtype, df.dateTimeNs.dtype) == (
+# O == object,