Kevin Glasson created ARROW-7856:
------------------------------------

             Summary: to_pandas() Causing datetimes > pd.Timestamp.max to wrap 
around
                 Key: ARROW-7856
                 URL: https://issues.apache.org/jira/browse/ARROW-7856
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.15.1
         Environment: Distributor ID: Ubuntu
Description:    Ubuntu 18.04.4 LTS
Release:        18.04
Codename:       bionic

Python 3.7.3

In [3]: pa.__version__
Out[3]: '0.15.1'

In [4]: pd.__version__
Out[4]: '0.25.2'
            Reporter: Kevin Glasson


When writing a dataframe containing `datetime.datetime` in an object columns 
any datetime that is greater than pd.Timestamp.max or less than 
pd.Timestamp.min is wrapped around.

 

For reference these are the timestamp min and max values.

 
{code:java}
In [43]: pd.Timestamp.max
Out[43]: Timestamp('2262-04-11 23:47:16.854775807')
In [44]: pd.Timestamp.min
Out[44]: Timestamp('1677-09-21 00:12:43.145225')
{code}
 

 

To reproduce the error using pandas

 
{code:java}
In [49]: df = pd.DataFrame({"A":[datetime.datetime(2262,4,12)]})

In [50]: df
Out[50]:
                     A
0  2262-04-12 00:00:00

In [51]: df.to_parquet("datetimething.parquet")

In [52]: pd.read_parquet("datetimething.parquet")
Out[52]:
                              A
0 1677-09-21 00:25:26.290448384

{code}
I have narrowed it down as far as to note that it is happening when converting 
a `pa.Table` using the `to_pandas()` method.
{code:java}
In [30]: df = pd.DataFrame({"A":[datetime.datetime(2262,4,12)]})
In [31]: tf = pa.Table.from_pandas(df)
In [32]: tf.columns
Out[32]: [<pyarrow.lib.ChunkedArray object at 0x7f23884deef8>
 [
   [
     2262-04-12 00:00:00.000000
   ]
 ]
]
In [33]: tf.to_pandas()
Out[33]:                      A
0 1677-09-21 00:25:26.290448384
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to