Re: [PR] [SPARK-56981][SQL] Add physical representation and UnsafeRow support for nanosecond timestamps [spark]

via GitHub Tue, 26 May 2026 04:45:07 -0700


peter-toth commented on code in PR #56059:
URL: https://github.com/apache/spark/pull/56059#discussion_r3303444730



##########
common/unsafe/src/main/java/org/apache/spark/unsafe/types/TimestampNanosVal.java:
##########
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.unsafe.types;
+
+import org.apache.spark.SparkIllegalArgumentException;
+import org.apache.spark.annotation.Unstable;
+
+import java.io.Serializable;
+import java.util.Map;
+import java.util.Objects;
+
+/**
+ * Physical representation for nanosecond-capable timestamp types ({@code 
TIMESTAMP_NTZ(p)} and
+ * {@code TIMESTAMP_LTZ(p)} with {@code p} in [7, 9]). Analogous to {@link 
GeometryVal} for
+ * GEOMETRY: this class is only a container for the composite value; NTZ vs 
LTZ semantics live in
+ * {@link org.apache.spark.sql.catalyst.util.TimestampNTZNanos} and
+ * {@link org.apache.spark.sql.catalyst.util.TimestampLTZNanos}.
+ *
+ * <p>Values are stored as two components:
+ * <ul>
+ *   <li>{@link #epochMicros} - microseconds since the Unix epoch (same unit 
as microsecond
+ *   timestamp types),</li>
+ *   <li>{@link #nanosWithinMicro} - additional nanoseconds within that 
microsecond, in [0, 999].
+ *   </li>
+ * </ul>
+ *
+ * <p>Logical row-size estimation uses 10 bytes (8 + 2). In {@code UnsafeRow}, 
values are stored in
+ * the variable-length region using a 16-byte payload (see
+ * {@link org.apache.spark.sql.catalyst.expressions.TimestampNanosRowValues}), 
the same pattern as
+ * {@link CalendarInterval}.
+ *
+ * @since 4.3.0
+ */
+@Unstable
+public final class TimestampNanosVal implements Serializable {
+  /** Size of the {@code UnsafeRow} variable-length payload for this type (two 
8-byte words). */
+  public static final int SIZE_IN_BYTES = 16;
+
+  /** Maximum valid value for {@link #nanosWithinMicro} (three sub-micro 
decimal digits). */
+  public static final int MAX_NANOS_WITHIN_MICRO = 999;
+
+  /** Microseconds since the Unix epoch. */
+  public final long epochMicros;
+  /** Nanoseconds within {@link #epochMicros}, in [0, 999]. */
+  public final short nanosWithinMicro;
+
+  /**
+   * @param epochMicros microseconds since the Unix epoch
+   * @param nanosWithinMicro nanoseconds within {@code epochMicros}, must be 
in [0, 999]
+   */
+  public TimestampNanosVal(long epochMicros, short nanosWithinMicro) {

Review Comment:
   This constructor (and the `fromParts` factory at `:82` that wraps it) is 
also the read-path constructor: `TimestampNanosRowValues.readVal` 
(`TimestampNanosRowValues.java:76`) builds a fresh value here on every 
UnsafeRow / UnsafeArrayData get. So the `nanosWithinMicro` range check runs on 
every cell read, even though every `TimestampNanosVal` that ever reaches a row 
was already validated at its origin (the only path to one is this constructor). 
Sibling types in this package — `CalendarInterval`, `VariantVal`, 
`GeographyVal`, `GeometryVal` — all leave the constructor unchecked for the 
same reason.
   
   Consider exposing a package-private trusted factory and routing the row 
reader through it:
   
   ```java
   // in TimestampNanosVal.java
   static TimestampNanosVal fromTrustedRowBytes(long epochMicros, short 
nanosWithinMicro) {
     return new TimestampNanosVal(epochMicros, nanosWithinMicro, /*trusted*/ 
true);
   }
   private TimestampNanosVal(long epochMicros, short nanosWithinMicro, boolean 
trusted) {
     this.epochMicros = epochMicros;
     this.nanosWithinMicro = nanosWithinMicro;
   }
   ```
   
   and have `TimestampNanosRowValues.readVal` call `fromTrustedRowBytes`. The 
validating public constructor and `fromParts` stay for SQL-layer / user-facing 
callers where the value can come from anywhere.



##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimestampNanosRowSuite.scala:
##########
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.InternalRow
+import 
org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.TimestampNanosVal
+import org.apache.spark.util.ArrayImplicits._
+
+class TimestampNanosRowSuite extends SparkFunSuite with ExpressionEvalHelper {
+
+  private val ntzValue = TimestampNanosVal.fromParts(1234567890123L, 
42.toShort)
+  private val ltzValue = TimestampNanosVal.fromParts(9876543210987L, 
999.toShort)
+
+  test("GenerateUnsafeProjection.canSupport for nanos timestamp types") {
+    assert(GenerateUnsafeProjection.canSupport(TimestampNTZNanosType(9)))
+    assert(GenerateUnsafeProjection.canSupport(TimestampLTZNanosType(7)))
+  }
+
+  test("GenericInternalRow roundtrip for TIMESTAMP_NTZ nanos") {
+    val row = new GenericInternalRow(Array[Any](ntzValue, null))
+    val accessor = InternalRow.getAccessor(TimestampNTZNanosType(9))
+    val writer = InternalRow.getWriter(0, TimestampNTZNanosType(9))
+    assert(accessor(row, 0) === ntzValue)
+    assert(accessor(row, 1) === null)
+
+    val row2 = new GenericInternalRow(Array[Any](null, null))
+    writer(row2, ntzValue)
+    assert(accessor(row2, 0) === ntzValue)
+  }
+
+  test("GenericInternalRow roundtrip for TIMESTAMP_LTZ nanos") {
+    val row = new GenericInternalRow(Array[Any](ltzValue, null))
+    val accessor = InternalRow.getAccessor(TimestampLTZNanosType(8))
+    val writer = InternalRow.getWriter(0, TimestampLTZNanosType(8))
+    assert(accessor(row, 0) === ltzValue)
+    assert(accessor(row, 1) === null)
+
+    val row2 = new GenericInternalRow(Array[Any](null, null))
+    writer(row2, ltzValue)
+    assert(accessor(row2, 0) === ltzValue)
+  }
+
+  testBothCodegenAndInterpreted("UnsafeRow roundtrip for nanos timestamp 
columns") {
+    val schema = StructType(Seq(

Review Comment:
   The schema only includes top-level nanos columns, so 
`UnsafeArrayWriter.write(int, TimestampNanosVal)` 
(`UnsafeArrayWriter.java:212`) is unexercised — the codegen path through 
`GenerateUnsafeProjection.writeArrayToBuffer` for 
`ArrayType(TimestampNTZNanosType, ...)` has no test coverage. A small 
additional case would close the gap and follow the same shape as the 
`CalendarInterval`-array tests in `UnsafeRowConverterSuite`:
   
   ```scala
   testBothCodegenAndInterpreted("UnsafeArrayWriter for nanos timestamp 
arrays") {
     val arrType = ArrayType(TimestampNTZNanosType(9), containsNull = true)
     val converter = UnsafeProjection.create(Array[DataType](arrType))
     val input = new GenericInternalRow(Array[Any](
       new GenericArrayData(Array[Any](ntzValue, null, ntzValue))))
     val output = converter.apply(input)
     val arr = output.getArray(0)
     assert(arr.numElements() == 3)
     assert(arr.getTimestampNTZNanos(0) === ntzValue)
     assert(arr.isNullAt(1))
     assert(arr.getTimestampNTZNanos(2) === ntzValue)
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-56981][SQL] Add physical representation and UnsafeRow support for nanosecond timestamps [spark]

Reply via email to