Copilot commented on code in PR #18400:
URL: https://github.com/apache/pinot/pull/18400#discussion_r3185724224


##########
pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java:
##########
@@ -41,94 +55,209 @@
 /// - avro `double` → `Double` → `Double`
 /// - avro `string` → `Utf8` → `String` (via `Utf8.toString()`)
 /// - avro `bytes` → `ByteBuffer` → `byte[]` (materialized from the buffer's 
remaining content)
-/// - avro `fixed` → [GenericFixed] / `GenericData.Fixed` → `byte[]`
+/// - avro `fixed` → `GenericFixed` / `GenericData.Fixed` → `byte[]`
 /// - avro `enum` → `GenericData.EnumSymbol` → `String` (enum name)
 /// - avro `array<T>` → `List<T>` → `Object[]` (each element recursively 
converted)
 /// - avro `map<string, T>` → `Map<Utf8, T>` → `Map<String, Object>` (each 
value recursively converted)
-/// - avro nested `record` → [GenericRecord] → `Map<String, Object>`
+/// - avro nested `record` → `GenericRecord` → `Map<String, Object>`
 /// - avro `union[null, X]` with the `null` branch selected → `null`
 ///
-/// **Logical types** (applied when `enableLogicalTypes = true`, the config 
default; `GenericDatumReader` emits
-/// the post-conversion Java types listed below):
-/// - `decimal` → `BigDecimal` → `BigDecimal`
-/// - `uuid` → `UUID` → `String` (UUID string form via `UUID.toString()`)
-/// - `date` → `LocalDate` → `LocalDate` (passes through; TZ-independent)
-/// - `time-millis` / `time-micros` → `LocalTime` → `LocalTime` (passes 
through; full ns precision preserved)
-/// - `timestamp-millis` / `timestamp-micros` → `Instant` → 
`java.sql.Timestamp` (sub-millisecond nanos preserved
-///   via `Timestamp.getNanos()`)
+/// **Logical types** (the reader emits the raw Avro physical type; this 
extractor applies the conversion via
+///   [#CONVERSION_MAP] to produce the Pinot contract type):
+/// - `decimal` / `big-decimal` → raw `ByteBuffer` → [BigDecimal] (always 
converted; raw bytes aren't interpretable
+///   without external precision/scale)
+/// - `timestamp-millis` → raw `Long` → [Timestamp], or `Long` raw epoch 
millis when `extractRawTimeValues` is `true`
+/// - `timestamp-micros` → raw `Long` → [Timestamp] (sub-millisecond micros 
preserved), or `Long` raw epoch micros when
+///   `extractRawTimeValues` is `true`
+/// - `timestamp-nanos` → raw `Long` → [Timestamp] (nanosecond precision 
preserved), or `Long` raw epoch nanos when
+///   `extractRawTimeValues` is `true`
+/// - `date` → raw `Integer` → [LocalDate], or `Integer` raw days-since-epoch 
when `extractRawTimeValues` is `true`
+/// - `time-millis` → raw `Integer` → [LocalTime], or `Integer` raw 
ms-since-midnight when `extractRawTimeValues` is
+///   `true`
+/// - `time-micros` → raw `Long` → [LocalTime], or `Long` raw 
µs-since-midnight when `extractRawTimeValues` is `true`
+/// - `uuid` → raw `Utf8` / `GenericFixed` → [UUID] (always converted; 
downstream type transformer adapts to the Pinot
+///   column type — `STRING` column gets canonical UUID string, `BYTES` column 
gets 16-byte big-endian form)
 public class AvroRecordExtractor extends BaseRecordExtractor<GenericRecord> {
-  private boolean _applyLogicalTypes = true;
+  private static final Conversion<BigDecimal> DECIMAL_CONVERSION = new 
Conversions.DecimalConversion();
+  private static final Conversion<BigDecimal> BIG_DECIMAL_CONVERSION = new 
Conversions.BigDecimalConversion();
+  private static final Conversion<Instant> TIMESTAMP_MILLIS_CONVERSION =
+      new TimeConversions.TimestampMillisConversion();
+  private static final Conversion<Instant> TIMESTAMP_MICROS_CONVERSION =
+      new TimeConversions.TimestampMicrosConversion();
+  private static final Conversion<Instant> TIMESTAMP_NANOS_CONVERSION = new 
TimeConversions.TimestampNanosConversion();
+  private static final Conversion<LocalDate> DATE_CONVERSION = new 
TimeConversions.DateConversion();
+  private static final Conversion<LocalTime> TIME_MILLIS_CONVERSION = new 
TimeConversions.TimeMillisConversion();
+  private static final Conversion<LocalTime> TIME_MICROS_CONVERSION = new 
TimeConversions.TimeMicrosConversion();
+  private static final Conversion<UUID> UUID_CONVERSION = new 
Conversions.UUIDConversion();
+
+  private static final Map<String, Conversion<?>> CONVERSION_MAP = Map.of(
+      DECIMAL_CONVERSION.getLogicalTypeName(), DECIMAL_CONVERSION,
+      BIG_DECIMAL_CONVERSION.getLogicalTypeName(), BIG_DECIMAL_CONVERSION,
+      TIMESTAMP_MILLIS_CONVERSION.getLogicalTypeName(), 
TIMESTAMP_MILLIS_CONVERSION,
+      TIMESTAMP_MICROS_CONVERSION.getLogicalTypeName(), 
TIMESTAMP_MICROS_CONVERSION,
+      TIMESTAMP_NANOS_CONVERSION.getLogicalTypeName(), 
TIMESTAMP_NANOS_CONVERSION,
+      DATE_CONVERSION.getLogicalTypeName(), DATE_CONVERSION,
+      TIME_MILLIS_CONVERSION.getLogicalTypeName(), TIME_MILLIS_CONVERSION,
+      TIME_MICROS_CONVERSION.getLogicalTypeName(), TIME_MICROS_CONVERSION,
+      UUID_CONVERSION.getLogicalTypeName(), UUID_CONVERSION
+  );
+
+  private static final Set<String> TEMPORAL_LOGICAL_TYPES = Set.of(
+      TIMESTAMP_MILLIS_CONVERSION.getLogicalTypeName(),
+      TIMESTAMP_MICROS_CONVERSION.getLogicalTypeName(),
+      TIMESTAMP_NANOS_CONVERSION.getLogicalTypeName(),
+      DATE_CONVERSION.getLogicalTypeName(),
+      TIME_MILLIS_CONVERSION.getLogicalTypeName(),

Review Comment:
   `timestamp-nanos` was added to the Avro logical-type conversion path, but 
this file's tests still cover only millis/micros timestamps. Because nanosecond 
handling has its own conversion class and raw-mode behavior, a regression there 
would currently slip through unnoticed; please add end-to-end extractor tests 
for both the default `Timestamp` path and `extractRawTimeValues=true`.



##########
pinot-plugins/pinot-input-format/pinot-parquet/src/main/java/org/apache/pinot/plugin/inputformat/parquet/ParquetRecordReaderConfig.java:
##########
@@ -18,34 +18,31 @@
  */
 package org.apache.pinot.plugin.inputformat.parquet;
 
-import org.apache.commons.configuration2.Configuration;
 import org.apache.pinot.spi.data.readers.RecordReaderConfig;
 
 
-/**
- * Config for ParquetRecordReader
- */
+/// Config for [ParquetRecordReader] (and the underlying 
[ParquetAvroRecordReader] / [ParquetNativeRecordReader]).
+/// Three settings, all default `false`:
+/// - `useParquetAvroRecordReader` — force the parquet-avro reader.
+/// - `useParquetNativeRecordReader` — force the native parquet reader. When 
neither flag is set, the dispatcher
+///   auto-detects via the file's `avro.schema` metadata (Avro reader if 
present, native otherwise).
+/// - `extractRawTimeValues` — opt out of TIMESTAMP / DATE / TIME conversion 
at the extractor boundary,
+///   surfacing the raw underlying integer in the column's declared unit 
instead of the contract Java type.
+///   DECIMAL and UUID always convert. See [ParquetAvroRecordExtractor] / 
[ParquetNativeRecordExtractor] for
+///   the per-type matrix.
 public class ParquetRecordReaderConfig implements RecordReaderConfig {
-  private static final String USE_PARQUET_AVRO_RECORDER_READER = 
"useParquetAvroRecordReader";
-  private static final String USE_PARQUET_NATIVE_RECORDER_READER = 
"useParquetNativeRecordReader";
-
   private boolean _useParquetAvroRecordReader;
   private boolean _useParquetNativeRecordReader;
-  private Configuration _conf;
-
-  public ParquetRecordReaderConfig() {
-  }
-
-  public ParquetRecordReaderConfig(Configuration conf) {
-    _conf = conf;
-    _useParquetAvroRecordReader = 
conf.getBoolean(USE_PARQUET_AVRO_RECORDER_READER, false);
-    _useParquetNativeRecordReader = 
conf.getBoolean(USE_PARQUET_NATIVE_RECORDER_READER, false);
-  }
+  private boolean _extractRawTimeValues;

Review Comment:
   This removes the `Configuration`-accepting constructor and `getConfig()` 
accessor from a public record-reader config class. In-repo callers may not use 
them anymore, but external ingestion code that still constructs 
`ParquetRecordReaderConfig` from a Commons Configuration or inspects the 
backing config will now fail to compile on upgrade. If this break is 
intentional, please keep a compatibility shim or deprecate first rather than 
deleting the API outright in one step.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to