Github user stiga-huang commented on the issue:

    https://github.com/apache/orc/pull/233
  
    @omalley @majetideepak @wgtmac Thanks for your follow up on ORC-322! If I 
understand these correctly, the convention is that TimestampColumnVector should 
only accept timestamps in local time. Timestamp values stored in ORC file are 
`local_timestamp - local_orc_epoch`. TimestampColumnVector got from the java 
reader has timestamps in local time. However, TimestampColumnVector got from 
the c++ reader has UTC timestamps.
    
    If so, the c++ writer doesn't need to minus gmtOffset for each timestamp, 
because after shifting the values in ORC file are `utc_timestamp - 
local_orc_epoch`.
    
    If not, I think the bug in ORC-320 should still be fixed (ORC-322 is aimed 
to fix ORC-320). The root cause of ORC-320 is that gmtOffsets got in writer and 
reader can be different, though they're using the same Timezone.
    
    To be specific, the writer gets gmtOffset by timestamp `ts`, then writes 
down `ts - gmtOffset` (Let's ignore the orc epoch since it's the same in writer 
and reader). The reader use `ts - gmtOffset` to get gmtOffset2, then read out 
`ts - gmtOffset + gmtOffset2`. However, `gmtOffset2` may not equal to 
`gmtOffset`.
    
    Thanks for your patience reading this long comment!


---

Reply via email to