Devavret Makkar created ORC-645:
-----------------------------------

             Summary: Timestamp between -1 and 0 seconds from UNIX epoch 
changes after write-read
                 Key: ORC-645
                 URL: https://issues.apache.org/jira/browse/ORC-645
             Project: ORC
          Issue Type: Bug
            Reporter: Devavret Makkar


When writing timestamp between -1 seconds and 0 seconds from UNIX epoch like 
this:
{code:java}
TypeDescription schema =
  TypeDescription.fromString("struct<x:timestamp>");
Writer writer = OrcFile.createWriter(new Path("time-file.orc"),
                                     OrcFile.writerOptions(new Configuration())
                                      .setSchema(schema));
VectorizedRowBatch batch = schema.createRowBatch();
TimestampColumnVector x = (TimestampColumnVector) batch.cols[0];
int row = batch.size++;

// This is supposed to be 1969-12-31 23:59:59.762
x.set(row, new Timestamp(-238L));

writer.addRowBatch(batch);
writer.close();
{code}
And reading it back with pyarrow.orc
{code:python}
import pyarrow.orc as orc
pdf = orc.ORCFile("time-file.orc").read().to_pandas()
print(pdf)
{code}
I get:
{noformat}
                              x
0 1970-01-01 00:00:00.762000000
{noformat}
 

This is probably because of special handling of negative timestamps here:

[https://github.com/apache/orc/blob/fa9c011e13e8376d2a185bd76af834bd644f4332/java/core/src/java/org/apache/orc/impl/TreeReaderFactory.java#L1221-L1227]

Millis will not be < 0 in this particular case so it will not be reduced by 
MILLIS_PER_SECOND.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to