Jackie-Jiang opened a new pull request, #18816:
URL: https://github.com/apache/pinot/pull/18816
## Summary
Adds an optional per-source-field data type fix during ingestion, configured
via a new `SourceFieldConfig` listed under
`ingestionConfig.sourceFieldConfigs`. It coerces the data type of a
source/input field **before** other transformers consume it — useful when a
source field arrives with a type that a downstream enricher or transform
expression does not expect (e.g. an epoch timestamp arriving as a `String` that
`toEpochDays(ts)` expects as `LONG`).
```json
"ingestionConfig": {
"sourceFieldConfigs": [
{ "name": "ts", "dataType": "LONG" },
{ "name": "rawId", "dataType": "LONG", "preComplexTypeTransform": true }
]
}
```
### Placement in the transformer chain
The fix is applied as a `DataTypeTransformer`, with
`preComplexTypeTransform` selecting the phase (mirroring how `RecordEnricher`
distinguishes pre/post complex-type):
- `true` → before the `ComplexTypeTransformer` and the pre-complex-type
`RecordEnricher`s, so the corrected value can feed complex-type flattening and
pre-complex-type enrichment.
- `false` (default) → after the `ComplexTypeTransformer`, before the
post-complex-type `RecordEnricher`s and the `ExpressionTransformer`, so
flattened/unnested fields can be fixed before expressions run.
Source fields that are not schema columns are extracted automatically (the
transformer's `getInputColumns()` flows into
`IngestionUtils.getFieldsForRecordExtractor` via the `TransformPipeline`).
### Implementation notes
- `DataTypeTransformer` gains a `Map<String, PinotDataType>` constructor so
the source-field fix reuses the existing conversion logic.
- `TableConfigUtils` validates that a field is configured at most once **per
phase** — the same field may legitimately appear once pre- and once
post-complex-type.
- `SourceFieldConfig` requires `name` and `dataType` (validated in the
constructor).
---
Stacked on #18815 (the `PinotDataType.INTEGER` → `INT` rename). The first
commit here is that rename; review/merge this after #18815, and it will reduce
to the feature commit once #18815 lands and this branch is rebased.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]