Jackie-Jiang opened a new pull request, #18816:
URL: https://github.com/apache/pinot/pull/18816

   ## Summary
   
   Adds an optional per-source-field data type fix during ingestion, configured 
via a new `SourceFieldConfig` listed under 
`ingestionConfig.sourceFieldConfigs`. It coerces the data type of a 
source/input field **before** other transformers consume it — useful when a 
source field arrives with a type that a downstream enricher or transform 
expression does not expect (e.g. an epoch timestamp arriving as a `String` that 
`toEpochDays(ts)` expects as `LONG`).
   
   ```json
   "ingestionConfig": {
     "sourceFieldConfigs": [
       { "name": "ts", "dataType": "LONG" },
       { "name": "rawId", "dataType": "LONG", "preComplexTypeTransform": true }
     ]
   }
   ```
   
   ### Placement in the transformer chain
   
   The fix is applied as a `DataTypeTransformer`, with 
`preComplexTypeTransform` selecting the phase (mirroring how `RecordEnricher` 
distinguishes pre/post complex-type):
   
   - `true` → before the `ComplexTypeTransformer` and the pre-complex-type 
`RecordEnricher`s, so the corrected value can feed complex-type flattening and 
pre-complex-type enrichment.
   - `false` (default) → after the `ComplexTypeTransformer`, before the 
post-complex-type `RecordEnricher`s and the `ExpressionTransformer`, so 
flattened/unnested fields can be fixed before expressions run.
   
   Source fields that are not schema columns are extracted automatically (the 
transformer's `getInputColumns()` flows into 
`IngestionUtils.getFieldsForRecordExtractor` via the `TransformPipeline`).
   
   ### Implementation notes
   
   - `DataTypeTransformer` gains a `Map<String, PinotDataType>` constructor so 
the source-field fix reuses the existing conversion logic.
   - `TableConfigUtils` validates that a field is configured at most once **per 
phase** — the same field may legitimately appear once pre- and once 
post-complex-type.
   - `SourceFieldConfig` requires `name` and `dataType` (validated in the 
constructor).
   
   ---
   
   Stacked on #18815 (the `PinotDataType.INTEGER` → `INT` rename). The first 
commit here is that rename; review/merge this after #18815, and it will reduce 
to the feature commit once #18815 lands and this branch is rebased.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to