[
https://issues.apache.org/jira/browse/ORC-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451504#comment-17451504
]
Yiqun Zhang commented on ORC-1053:
----------------------------------
Thanks for the help provided by the discussion, [~vraval48] [~dongjoon]
I analyzed converted_by_java.orc and found a bug.
The precision of the time zone offset calculated by the tool that converts
strings to timestamps using java tools is different from the precision of the
time zone offset calculated by ORC when reading and writing internally
{code:java}
// use tool compute offset : 17762
int toolOffset = ((LocalDateTime)
temporalAccessor).atZone(TimeZone.getTimeZone("America/New_York").toZoneId()).getOffset().getTotalSeconds();
// in orc internal compute offset: 18000
int orcInternalOffset = TimeZone.getTimeZone("America/New_York").getRawOffset()
/ 1000
{code}
I'm not sure if there are other issues in the series of processes you describe,
but this is one that can be determined.
I'll come back later to fix this definite bug.
> Timestamp values read in Hive are different when using ORC file created using
> CSV to ORC converter tools
> --------------------------------------------------------------------------------------------------------
>
> Key: ORC-1053
> URL: https://issues.apache.org/jira/browse/ORC-1053
> Project: ORC
> Issue Type: Bug
> Components: C++, Java
> Reporter: Varun Raval
> Priority: Major
> Attachments: converted_by_cpp.orc, converted_by_hive.orc,
> converted_by_java.orc, hive_table_desc.jpg, timestamp.csv
>
>
> I have a CSV file that has a column having timestamp values as 0001-01-01
> 00:00:00.0. Then I convert CSV file to ORC file using CSV to ORC converter
> and place the ORC file in a hive table backed by ORC files. On querying the
> data using Hive beeline and Spark SQL, different results are obtained
> If converted using CPP tool, value read using Hive beeline and Spark SQL
> queries is 0001-01-03 00:00:00
> If converted using Java tool, value read using Hive beeline and Spark SQL
> queries is 0001-01-02 23:56:02.0
--
This message was sent by Atlassian Jira
(v8.20.1#820001)