[ https://issues.apache.org/jira/browse/ARROW-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17085118#comment-17085118 ]
Olaf commented on ARROW-8482: ----------------------------- got it Wes. thanks. but in a nutshell can you just tell me if we can control the column type in arrow::read_parquet()? > [Python][R][Parquet] Possible time zone handling inconsistencies > ----------------------------------------------------------------- > > Key: ARROW-8482 > URL: https://issues.apache.org/jira/browse/ARROW-8482 > Project: Apache Arrow > Issue Type: Bug > Components: Python, R > Reporter: Olaf > Priority: Critical > > Hello there! > > First of all, thanks for making parquet files a reality in *R* and *Python*. > This is really great. > I found a very nasty bug when exchanging parquet files between the two > platforms. Consider this. > > > {code:java} > import pandas as pd > import pyarrow.parquet as pq > import numpy as np > df = pd.DataFrame({'string_time_utc' : [pd.to_datetime('2018-02-01 > 14:00:00.531'), > pd.to_datetime('2018-02-01 14:01:00.456'), > pd.to_datetime('2018-03-05 14:01:02.200')]}) > df['timestamp_est'] = > pd.to_datetime(df.string_time_utc).dt.tz_localize('UTC').dt.tz_convert('US/Eastern').dt.tz_localize(None) > df > Out[5]: > string_time_utc timestamp_est > 0 2018-02-01 14:00:00.531 2018-02-01 09:00:00.531 > 1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456 > 2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200 > {code} > > Now I simply write to disk > > {code:java} > df.to_parquet('myparquet.pq') > {code} > > And the use *R* to load it. > > {code:java} > test <- read_parquet('myparquet.pq') > > test > # A tibble: 3 x 2 > string_time_utc timestamp_est > <dttm> <dttm> > 1 2018-02-01 09:00:00.530999 2018-02-01 04:00:00.530999 > 2 2018-02-01 09:01:00.456000 2018-02-01 04:01:00.456000 > 3 2018-03-05 09:01:02.200000 2018-03-05 04:01:02.200000 > {code} > > > As you can see, the timestamps have been converted in the process. I first > referenced this bug in feather but I still it is still there. This is a very > dangerous, silent bug. > > What do you think? > Thanks -- This message was sent by Atlassian Jira (v8.3.4#803005)