Olaf created ARROW-8482: --------------------------- Summary: critical timestamp bug! Key: ARROW-8482 URL: https://issues.apache.org/jira/browse/ARROW-8482 Project: Apache Arrow Issue Type: Bug Components: Python, R Reporter: Olaf
Hello there! First of all, thanks for making parquet files a reality in *R* and *Python*. This is really great. I found a very nasty bug when exchanging parquet files between the two platforms. Consider this. {code:java} import pandas as pd import pyarrow.parquet as pq import numpy as np df = pd.DataFrame({'string_time_utc' : [pd.to_datetime('2018-02-01 14:00:00.531'), pd.to_datetime('2018-02-01 14:01:00.456'), pd.to_datetime('2018-03-05 14:01:02.200')]}) df['timestamp_est'] = pd.to_datetime(df.string_time_utc).dt.tz_localize('UTC').dt.tz_convert('US/Eastern').dt.tz_localize(None) df Out[5]: string_time_utc timestamp_est 0 2018-02-01 14:00:00.531 2018-02-01 09:00:00.531 1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456 2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200 {code} Now I simply write to disk {code:java} df.to_parquet('myparquet.pq') {code} And the use *R* to load it. {code:java} test <- read_parquet('myparquet.pq') > test # A tibble: 3 x 2 string_time_utc timestamp_est <dttm> <dttm> 1 2018-02-01 09:00:00.530999 2018-02-01 04:00:00.530999 2 2018-02-01 09:01:00.456000 2018-02-01 04:01:00.456000 3 2018-03-05 09:01:02.200000 2018-03-05 04:01:02.200000 {code} As you can see, the timestamps have been converted in the process. I first referenced this bug in feather but I still it is still there. This is a very dangerous, silent bug. What do you think? Thanks -- This message was sent by Atlassian Jira (v8.3.4#803005)