Github user wgtmac commented on the issue:

    https://github.com/apache/orc/pull/233
  
    Thanks @majetideepak for comment!
    
    On the Java side, the input timestamp in writer TimestampColumnVector is in 
UTC. It leverages java.sql.Timestamp which knows the local timezone info so 
that it can PRINT in local timezone. You can print millis variable in line 109 
in TimestampTreeWriter.java to verify this. The name of 
SerializationUtils.convertToUtc(localTimezone, millis) in line 113 is kind of 
confusing, because the result is not the current timestamp in UTC but adds an 
offset to local timezone which I think it is also a problem.
    
    ORC-10 has fixed the bug without writer timezone. The original design is to 
be resilient to move between different reader timezones. However this caused an 
issue in C++ between different daylight saving timezones and writer timezone is 
forced to be written. ORC-10 adds GMT offset is actually converting the value 
to local timezone so that ColumnPrinter can print the same time in local 
timezone. This causes a new problem that C++ reader gets timestamp value in 
local timezone, not UTC and it is different from java reader. I believe this is 
why @owen has created [ORC-37](https://issues.apache.org/jira/browse/ORC-37). 
SQL type TimestampTz is a new type other than traditional SQL type Timestamp, I 
don't think it is a good idea to mix ORC timestamp type with TimestampTz and 
there is another open issue for it: 
[ORC-189](https://issues.apache.org/jira/browse/ORC-189)
    
    It is very confusing that an input timestamp written using Java writer is 
read differently via C++ reader. I think we need to fix it and this can also 
resolve ORC-37. What do you think?


---

Reply via email to