[ 
https://issues.apache.org/jira/browse/ORC-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459183#comment-17459183
 ] 

Gang Wu commented on ORC-1055:
------------------------------

The problem is caused by inconsistent time zones used between CSVFileImport.cc 
and Writer.cc.

The C++ csv import tool uses local time zone to parse the timestamp: 
[orc/CSVFileImport.cc at main · apache/orc 
(github.com)|https://github.com/apache/orc/blob/main/tools/src/CSVFileImport.cc#L257]

But the C++ writer fixes time zone to GMT for some reasons: [orc/Writer.cc at 
main · apache/orc 
(github.com)|https://github.com/apache/orc/blob/main/c%2B%2B/src/Writer.cc#L63]

We may fix it by adding a conversion from local time zone to GMT in the 
CSVFileImport.cc.

[~Guiyankuang] [~dongjoon] 

> [C++] Timestamp values read in Hive are different when using ORC file created 
> using CSV to ORC converter tools
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: ORC-1055
>                 URL: https://issues.apache.org/jira/browse/ORC-1055
>             Project: ORC
>          Issue Type: Bug
>          Components: C++
>            Reporter: Yiqun Zhang
>            Priority: Major
>         Attachments: converted_by_cpp.orc, timestamp.csv
>
>
> I have a CSV file that has a column having timestamp values as 0001-01-01 
> 00:00:00.0. Then I convert CSV file to ORC file using CSV to ORC converter 
> and place the ORC file in a hive table backed by ORC files. On querying the 
> data using Hive beeline and Spark SQL, different results are obtained
> If converted using CPP tool, value read using Hive beeline and Spark SQL 
> queries is 0001-01-03 00:00:00
> Reported by [~vraval48]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to