[ 
https://issues.apache.org/jira/browse/ARROW-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17085118#comment-17085118
 ] 

Olaf commented on ARROW-8482:
-----------------------------

got it Wes. thanks. but in a nutshell can you just tell me if we can control 
the column type in arrow::read_parquet()?

 

 

> [Python][R][Parquet] Possible time zone handling inconsistencies 
> -----------------------------------------------------------------
>
>                 Key: ARROW-8482
>                 URL: https://issues.apache.org/jira/browse/ARROW-8482
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python, R
>            Reporter: Olaf
>            Priority: Critical
>
> Hello there!
>  
> First of all, thanks for making parquet files a reality in *R* and *Python*. 
> This is really great.
> I found a very nasty bug when exchanging parquet files between the two 
> platforms. Consider this.
>  
>  
> {code:java}
> import pandas as pd
> import pyarrow.parquet as pq
> import numpy as np
> df = pd.DataFrame({'string_time_utc' : [pd.to_datetime('2018-02-01 
> 14:00:00.531'), 
>  pd.to_datetime('2018-02-01 14:01:00.456'),
>  pd.to_datetime('2018-03-05 14:01:02.200')]})
> df['timestamp_est'] = 
> pd.to_datetime(df.string_time_utc).dt.tz_localize('UTC').dt.tz_convert('US/Eastern').dt.tz_localize(None)
> df
> Out[5]: 
>  string_time_utc timestamp_est
> 0 2018-02-01 14:00:00.531 2018-02-01 09:00:00.531
> 1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456
> 2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200
> {code}
>  
> Now I simply write to disk
>  
> {code:java}
> df.to_parquet('myparquet.pq')
> {code}
>  
> And the use *R* to load it.
>  
> {code:java}
> test <- read_parquet('myparquet.pq')
> > test
> # A tibble: 3 x 2
>  string_time_utc timestamp_est 
>  <dttm> <dttm> 
> 1 2018-02-01 09:00:00.530999 2018-02-01 04:00:00.530999
> 2 2018-02-01 09:01:00.456000 2018-02-01 04:01:00.456000
> 3 2018-03-05 09:01:02.200000 2018-03-05 04:01:02.200000
> {code}
>  
>  
> As you can see, the timestamps have been converted in the process. I first 
> referenced this bug in feather but I still it is still there. This is a very 
> dangerous, silent bug.
>  
> What do you think?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to