voonhous opened a new pull request, #18695:
URL: https://github.com/apache/hudi/pull/18695

   ### Describe the issue this Pull Request addresses
   
   Variant, Blob, and Vector are recently added types. Index code (column 
stats, partition stats, bloom filters, expression-index column gating) was not 
taught about them. Stats on these columns are meaningless until proper support 
lands. Disable index population on these columns by default for now.
   
   ### Summary and Changelog
   
   V2 already excluded all three types. The gaps were in V1, which is still 
active for:
   - BLOOM_FILTERS (always V1)
   - COLUMN_STATS / PARTITION_STATS / EXPRESSION_INDEX on table version 8
   
   Changes:
   - `HoodieTableMetadataUtil.isColumnTypeSupportedV1`:
     - AVRO branch now also excludes `BLOB` and `VECTOR`.
     - SPARK branch now also excludes `VECTOR` (`BLOB` and `VARIANT` were 
already excluded).
   - `HoodieIndexUtils` expression-index error message now lists `VARIANT, 
BLOB, VECTOR` alongside `RECORD, ARRAY, MAP`. Behavior was already correct via 
`HoodieSchemaType.isComplex()`; only the message text was stale.
   - `TestHoodieTableMetadataUtil`: new 
`testVariantBlobVectorColumnsAreNotSupportedForV1ColumnStats` covers all three 
types under both `AVRO` and `SPARK` record types in V1.
   
   Note: `HoodieSchemaType.VECTOR.toAvroType()` is `FIXED`, but the V1 check 
switches on the `HoodieSchemaType` enum, so the existing `type != FIXED` does 
not catch `VECTOR`. It must be listed explicitly.
   
   ### Impact
   
   User-facing: indexes silently skip Variant/Blob/Vector columns instead of 
indexing garbage. No on-disk format change. No public API change. Secondary 
index was already protected by its allow-list. Expression index was already 
blocked behaviorally; only its error wording changed.
   
   ### Risk Level
   
   low
   
   Single chokepoint (`isColumnTypeSupported`) feeds every column-list builder. 
V2 path is unchanged. V1 change strictly narrows the supported set; no 
previously-supported type is now rejected.
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [x] Read through contributor's guide
   - [x] Enough context is provided in the sections above
   - [x] Adequate tests were added if applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to