voonhous opened a new pull request, #18695:
URL: https://github.com/apache/hudi/pull/18695
### Describe the issue this Pull Request addresses
Variant, Blob, and Vector are recently added types. Index code (column
stats, partition stats, bloom filters, expression-index column gating) was not
taught about them. Stats on these columns are meaningless until proper support
lands. Disable index population on these columns by default for now.
### Summary and Changelog
V2 already excluded all three types. The gaps were in V1, which is still
active for:
- BLOOM_FILTERS (always V1)
- COLUMN_STATS / PARTITION_STATS / EXPRESSION_INDEX on table version 8
Changes:
- `HoodieTableMetadataUtil.isColumnTypeSupportedV1`:
- AVRO branch now also excludes `BLOB` and `VECTOR`.
- SPARK branch now also excludes `VECTOR` (`BLOB` and `VARIANT` were
already excluded).
- `HoodieIndexUtils` expression-index error message now lists `VARIANT,
BLOB, VECTOR` alongside `RECORD, ARRAY, MAP`. Behavior was already correct via
`HoodieSchemaType.isComplex()`; only the message text was stale.
- `TestHoodieTableMetadataUtil`: new
`testVariantBlobVectorColumnsAreNotSupportedForV1ColumnStats` covers all three
types under both `AVRO` and `SPARK` record types in V1.
Note: `HoodieSchemaType.VECTOR.toAvroType()` is `FIXED`, but the V1 check
switches on the `HoodieSchemaType` enum, so the existing `type != FIXED` does
not catch `VECTOR`. It must be listed explicitly.
### Impact
User-facing: indexes silently skip Variant/Blob/Vector columns instead of
indexing garbage. No on-disk format change. No public API change. Secondary
index was already protected by its allow-list. Expression index was already
blocked behaviorally; only its error wording changed.
### Risk Level
low
Single chokepoint (`isColumnTypeSupported`) feeds every column-list builder.
V2 path is unchanged. V1 change strictly narrows the supported set; no
previously-supported type is now rejected.
### Documentation Update
none
### Contributor's checklist
- [x] Read through contributor's guide
- [x] Enough context is provided in the sections above
- [x] Adequate tests were added if applicable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]