Jackie-Jiang commented on code in PR #18400:
URL: https://github.com/apache/pinot/pull/18400#discussion_r3185794563
##########
pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java:
##########
@@ -41,94 +55,209 @@
/// - avro `double` → `Double` → `Double`
/// - avro `string` → `Utf8` → `String` (via `Utf8.toString()`)
/// - avro `bytes` → `ByteBuffer` → `byte[]` (materialized from the buffer's
remaining content)
-/// - avro `fixed` → [GenericFixed] / `GenericData.Fixed` → `byte[]`
+/// - avro `fixed` → `GenericFixed` / `GenericData.Fixed` → `byte[]`
/// - avro `enum` → `GenericData.EnumSymbol` → `String` (enum name)
/// - avro `array<T>` → `List<T>` → `Object[]` (each element recursively
converted)
/// - avro `map<string, T>` → `Map<Utf8, T>` → `Map<String, Object>` (each
value recursively converted)
-/// - avro nested `record` → [GenericRecord] → `Map<String, Object>`
+/// - avro nested `record` → `GenericRecord` → `Map<String, Object>`
/// - avro `union[null, X]` with the `null` branch selected → `null`
///
-/// **Logical types** (applied when `enableLogicalTypes = true`, the config
default; `GenericDatumReader` emits
-/// the post-conversion Java types listed below):
-/// - `decimal` → `BigDecimal` → `BigDecimal`
-/// - `uuid` → `UUID` → `String` (UUID string form via `UUID.toString()`)
-/// - `date` → `LocalDate` → `LocalDate` (passes through; TZ-independent)
-/// - `time-millis` / `time-micros` → `LocalTime` → `LocalTime` (passes
through; full ns precision preserved)
-/// - `timestamp-millis` / `timestamp-micros` → `Instant` →
`java.sql.Timestamp` (sub-millisecond nanos preserved
-/// via `Timestamp.getNanos()`)
+/// **Logical types** (the reader emits the raw Avro physical type; this
extractor applies the conversion via
+/// [#CONVERSION_MAP] to produce the Pinot contract type):
+/// - `decimal` / `big-decimal` → raw `ByteBuffer` → [BigDecimal] (always
converted; raw bytes aren't interpretable
+/// without external precision/scale)
+/// - `timestamp-millis` → raw `Long` → [Timestamp], or `Long` raw epoch
millis when `extractRawTimeValues` is `true`
+/// - `timestamp-micros` → raw `Long` → [Timestamp] (sub-millisecond micros
preserved), or `Long` raw epoch micros when
+/// `extractRawTimeValues` is `true`
+/// - `timestamp-nanos` → raw `Long` → [Timestamp] (nanosecond precision
preserved), or `Long` raw epoch nanos when
+/// `extractRawTimeValues` is `true`
+/// - `date` → raw `Integer` → [LocalDate], or `Integer` raw days-since-epoch
when `extractRawTimeValues` is `true`
+/// - `time-millis` → raw `Integer` → [LocalTime], or `Integer` raw
ms-since-midnight when `extractRawTimeValues` is
+/// `true`
+/// - `time-micros` → raw `Long` → [LocalTime], or `Long` raw
µs-since-midnight when `extractRawTimeValues` is `true`
+/// - `uuid` → raw `Utf8` / `GenericFixed` → [UUID] (always converted;
downstream type transformer adapts to the Pinot
+/// column type — `STRING` column gets canonical UUID string, `BYTES` column
gets 16-byte big-endian form)
public class AvroRecordExtractor extends BaseRecordExtractor<GenericRecord> {
- private boolean _applyLogicalTypes = true;
+ private static final Conversion<BigDecimal> DECIMAL_CONVERSION = new
Conversions.DecimalConversion();
+ private static final Conversion<BigDecimal> BIG_DECIMAL_CONVERSION = new
Conversions.BigDecimalConversion();
+ private static final Conversion<Instant> TIMESTAMP_MILLIS_CONVERSION =
+ new TimeConversions.TimestampMillisConversion();
+ private static final Conversion<Instant> TIMESTAMP_MICROS_CONVERSION =
+ new TimeConversions.TimestampMicrosConversion();
+ private static final Conversion<Instant> TIMESTAMP_NANOS_CONVERSION = new
TimeConversions.TimestampNanosConversion();
+ private static final Conversion<LocalDate> DATE_CONVERSION = new
TimeConversions.DateConversion();
+ private static final Conversion<LocalTime> TIME_MILLIS_CONVERSION = new
TimeConversions.TimeMillisConversion();
+ private static final Conversion<LocalTime> TIME_MICROS_CONVERSION = new
TimeConversions.TimeMicrosConversion();
+ private static final Conversion<UUID> UUID_CONVERSION = new
Conversions.UUIDConversion();
+
+ private static final Map<String, Conversion<?>> CONVERSION_MAP = Map.of(
+ DECIMAL_CONVERSION.getLogicalTypeName(), DECIMAL_CONVERSION,
+ BIG_DECIMAL_CONVERSION.getLogicalTypeName(), BIG_DECIMAL_CONVERSION,
+ TIMESTAMP_MILLIS_CONVERSION.getLogicalTypeName(),
TIMESTAMP_MILLIS_CONVERSION,
+ TIMESTAMP_MICROS_CONVERSION.getLogicalTypeName(),
TIMESTAMP_MICROS_CONVERSION,
+ TIMESTAMP_NANOS_CONVERSION.getLogicalTypeName(),
TIMESTAMP_NANOS_CONVERSION,
+ DATE_CONVERSION.getLogicalTypeName(), DATE_CONVERSION,
+ TIME_MILLIS_CONVERSION.getLogicalTypeName(), TIME_MILLIS_CONVERSION,
+ TIME_MICROS_CONVERSION.getLogicalTypeName(), TIME_MICROS_CONVERSION,
+ UUID_CONVERSION.getLogicalTypeName(), UUID_CONVERSION
+ );
+
+ private static final Set<String> TEMPORAL_LOGICAL_TYPES = Set.of(
+ TIMESTAMP_MILLIS_CONVERSION.getLogicalTypeName(),
+ TIMESTAMP_MICROS_CONVERSION.getLogicalTypeName(),
+ TIMESTAMP_NANOS_CONVERSION.getLogicalTypeName(),
+ DATE_CONVERSION.getLogicalTypeName(),
+ TIME_MILLIS_CONVERSION.getLogicalTypeName(),
Review Comment:
Fair. Added two end-to-end tests:
- `testTimestampNanosLogicalTypeAsTimestampPreservingNanos` — default mode:
builds a record with `Instant.ofEpochSecond(1649924302L, 123_456_789L)` (full
nanosecond precision past the micros boundary), encodes via
`TimeConversions.TimestampNanosConversion`, asserts the extractor returns
`Timestamp.from(instant)` with all nanos preserved via `Timestamp.setNanos`.
- `testTimestampNanosLogicalTypeAsRawLongNanosWhenRaw` —
`extractRawTimeValues=true` mode: same record, asserts the extractor passes
through the raw `Long` epoch nanos (`1649924302L * 1_000_000_000L +
123_456_789L`).
Matches the existing millis/micros pairs structurally.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]