Ruiqi Dong created AVRO-4269:
--------------------------------
Summary: TimestampNanosConversion.toLong(...) encodes pre-epoch
instants with the wrong nanosecond offset
Key: AVRO-4269
URL: https://issues.apache.org/jira/browse/AVRO-4269
Project: Apache Avro
Issue Type: Bug
Components: java
Reporter: Ruiqi Dong
*Summary*
`TimestampNanosConversion.toLong(...)` has a special path for negative epoch
seconds with positive nanoseconds. That path subtracts `1_000_000` instead of
`1_000_000_000`. As a result, an instant such as `1969-12-31T23:59:59.500Z` is
encoded as `499000000` instead of `-500000000`.
*Affected code*
File: `lang/java/avro/src/main/java/org/apache/avro/data/TimeConversions.java`
{code:java}
public static class TimestampNanosConversion extends Conversion<Instant> {
...
@Override
public Long toLong(Instant instant, Schema schema, LogicalType type) {
long seconds = instant.getEpochSecond();
int nanos = instant.getNano();
if (seconds < 0 && nanos > 0) {
long micros = Math.multiplyExact(seconds + 1, 1_000_000_000L);
long adjustment = nanos - 1_000_000;
return Math.addExact(micros, adjustment);
} else {
long micros = Math.multiplyExact(seconds, 1_000_000_000L);
return Math.addExact(micros, nanos);
}
}
} {code}
*Reproducer*
Add this test to
`lang/java/avro/src/test/java/org/apache/avro/data/TestTimeConversions.java`
{code:java}
@Test
void timestampNanosConversionBeforeEpoch() {
TimestampNanosConversion conversion = new TimestampNanosConversion();
Instant beforeEpoch = Instant.ofEpochSecond(-1, 500_000_000);
assertEquals(-500_000_000L,
(long) conversion.toLong(beforeEpoch, TIMESTAMP_NANOS_SCHEMA,
LogicalTypes.timestampNanos()));
assertEquals(beforeEpoch,
conversion.fromLong(-500_000_000L, TIMESTAMP_NANOS_SCHEMA,
LogicalTypes.timestampNanos()));
} {code}
Also initialize:
{code:java}
TIMESTAMP_NANOS_SCHEMA =
LogicalTypes.timestampNanos().addToSchema(Schema.create(Schema.Type.LONG));
{code}
Run:
{code:java}
MAVEN_SKIP_RC=true
JAVA_HOME=/opt/homebrew/Cellar/openjdk@21/21.0.6/libexec/openjdk.jdk/Contents/Home
\
PATH=/opt/homebrew/Cellar/openjdk@21/21.0.6/libexec/openjdk.jdk/Contents/Home/bin:/opt/homebrew/bin:/usr/bin:/bin:/usr/sbin:/sbin
\
/opt/homebrew/bin/mvn -q -t toolchains-local.xml -pl lang/java/avro \
-Dtest=org.apache.avro.data.TestTimeConversions#timestampNanosConversionBeforeEpoch
test{code}
Observed behavior:
The test fails
{code:java}
expected: <-500000000> but was: <499000000> {code}
Expected behavior:
`Instant.ofEpochSecond(-1, 500_000_000)` should encode to `-500_000_000`
nanoseconds from the Unix epoch.
Avro logical type `timestamp-nanos` represents an instant as a long count of
nanoseconds from the epoch. The current implementation corrupts pre-epoch
instants with a fractional nanosecond component, which can reorder timestamps
and break round-trip encoding. The fix direction is to subtract `1_000_000_000`
in the negative branch, matching the nanosecond unit used by the rest of the
method.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)