[ https://issues.apache.org/jira/browse/ARROW-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17662474#comment-17662474 ]
Rok Mihevc commented on ARROW-5450: ----------------------------------- This issue has been migrated to [issue #21903|https://github.com/apache/arrow/issues/21903] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Python] TimestampArray.to_pylist() fails with OverflowError: Python int too > large to convert to C long > ------------------------------------------------------------------------------------------------------- > > Key: ARROW-5450 > URL: https://issues.apache.org/jira/browse/ARROW-5450 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Reporter: Tim Swast > Assignee: Wes McKinney > Priority: Major > Labels: pull-request-available > Fix For: 0.15.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > When I attempt to roundtrip from a list of moderately large (beyond what can > be represented in nanosecond precision, but within microsecond precision) > datetime objects to pyarrow and back, I get an OverflowError: Python int too > large to convert to C long. > pyarrow version: > {noformat} > $ pip freeze | grep pyarrow > pyarrow==0.13.0{noformat} > > Reproduction: > {code:java} > import datetime > import pandas > import pyarrow > import pytz > timestamp_rows = [ > datetime.datetime(1, 1, 1, 0, 0, 0, tzinfo=pytz.utc), > None, > datetime.datetime(9999, 12, 31, 23, 59, 59, 999999, tzinfo=pytz.utc), > datetime.datetime(1970, 1, 1, 0, 0, 0, tzinfo=pytz.utc), > ] > timestamp_array = pyarrow.array(timestamp_rows, pyarrow.timestamp("us", > tz="UTC")) > timestamp_roundtrip = timestamp_array.to_pylist() > # --------------------------------------------------------------------------- > # OverflowError Traceback (most recent call last) > # <ipython-input-25-4a798e917c20> in <module> > # ----> 1 timestamp_roundtrip = timestamp_array.to_pylist() > # > # > ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/array.pxi > in __iter__() > # > # > ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi > in pyarrow.lib.TimestampValue.as_py() > # > # > ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi > in pyarrow.lib._datetime_conversion_functions.lambda5() > # > # pandas/_libs/tslibs/timestamps.pyx in > pandas._libs.tslibs.timestamps.Timestamp.__new__() > # > # pandas/_libs/tslibs/conversion.pyx in > pandas._libs.tslibs.conversion.convert_to_tsobject() > # > # OverflowError: Python int too large to convert to C long > {code} > For good measure, I also tested with timezone-naive timestamps with the same > error: > {code:java} > naive_rows = [ > datetime.datetime(1, 1, 1, 0, 0, 0), > None, > datetime.datetime(9999, 12, 31, 23, 59, 59, 999999), > datetime.datetime(1970, 1, 1, 0, 0, 0), > ] > naive_array = pyarrow.array(naive_rows, pyarrow.timestamp("us", tz=None)) > naive_roundtrip = naive_array.to_pylist() > # --------------------------------------------------------------------------- > # OverflowError Traceback (most recent call last) > # <ipython-input-27-0c32e563d44a> in <module> > # ----> 1 naive_roundtrip = naive_array.to_pylist() > # > # > ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/array.pxi > in __iter__() > # > # > ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi > in pyarrow.lib.TimestampValue.as_py() > # > # > ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi > in pyarrow.lib._datetime_conversion_functions.lambda5() > # > # pandas/_libs/tslibs/timestamps.pyx in > pandas._libs.tslibs.timestamps.Timestamp.__new__() > # > # pandas/_libs/tslibs/conversion.pyx in > pandas._libs.tslibs.conversion.convert_to_tsobject() > # > # OverflowError: Python int too large to convert to C long > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)