Devavret Makkar created ORC-645:
-----------------------------------
Summary: Timestamp between -1 and 0 seconds from UNIX epoch
changes after write-read
Key: ORC-645
URL: https://issues.apache.org/jira/browse/ORC-645
Project: ORC
Issue Type: Bug
Reporter: Devavret Makkar
When writing timestamp between -1 seconds and 0 seconds from UNIX epoch like
this:
{code:java}
TypeDescription schema =
TypeDescription.fromString("struct<x:timestamp>");
Writer writer = OrcFile.createWriter(new Path("time-file.orc"),
OrcFile.writerOptions(new Configuration())
.setSchema(schema));
VectorizedRowBatch batch = schema.createRowBatch();
TimestampColumnVector x = (TimestampColumnVector) batch.cols[0];
int row = batch.size++;
// This is supposed to be 1969-12-31 23:59:59.762
x.set(row, new Timestamp(-238L));
writer.addRowBatch(batch);
writer.close();
{code}
And reading it back with pyarrow.orc
{code:python}
import pyarrow.orc as orc
pdf = orc.ORCFile("time-file.orc").read().to_pandas()
print(pdf)
{code}
I get:
{noformat}
x
0 1970-01-01 00:00:00.762000000
{noformat}
This is probably because of special handling of negative timestamps here:
[https://github.com/apache/orc/blob/fa9c011e13e8376d2a185bd76af834bd644f4332/java/core/src/java/org/apache/orc/impl/TreeReaderFactory.java#L1221-L1227]
Millis will not be < 0 in this particular case so it will not be reduced by
MILLIS_PER_SECOND.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)