Kevin Glasson created ARROW-7856: ------------------------------------ Summary: to_pandas() Causing datetimes > pd.Timestamp.max to wrap around Key: ARROW-7856 URL: https://issues.apache.org/jira/browse/ARROW-7856 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.15.1 Environment: Distributor ID: Ubuntu Description: Ubuntu 18.04.4 LTS Release: 18.04 Codename: bionic
Python 3.7.3 In [3]: pa.__version__ Out[3]: '0.15.1' In [4]: pd.__version__ Out[4]: '0.25.2' Reporter: Kevin Glasson When writing a dataframe containing `datetime.datetime` in an object columns any datetime that is greater than pd.Timestamp.max or less than pd.Timestamp.min is wrapped around. For reference these are the timestamp min and max values. {code:java} In [43]: pd.Timestamp.max Out[43]: Timestamp('2262-04-11 23:47:16.854775807') In [44]: pd.Timestamp.min Out[44]: Timestamp('1677-09-21 00:12:43.145225') {code} To reproduce the error using pandas {code:java} In [49]: df = pd.DataFrame({"A":[datetime.datetime(2262,4,12)]}) In [50]: df Out[50]: A 0 2262-04-12 00:00:00 In [51]: df.to_parquet("datetimething.parquet") In [52]: pd.read_parquet("datetimething.parquet") Out[52]: A 0 1677-09-21 00:25:26.290448384 {code} I have narrowed it down as far as to note that it is happening when converting a `pa.Table` using the `to_pandas()` method. {code:java} In [30]: df = pd.DataFrame({"A":[datetime.datetime(2262,4,12)]}) In [31]: tf = pa.Table.from_pandas(df) In [32]: tf.columns Out[32]: [<pyarrow.lib.ChunkedArray object at 0x7f23884deef8> [ [ 2262-04-12 00:00:00.000000 ] ] ] In [33]: tf.to_pandas() Out[33]: A 0 1677-09-21 00:25:26.290448384 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)