rahil-c commented on code in PR #18146:
URL: https://github.com/apache/hudi/pull/18146#discussion_r2848519697
##########
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchemaType.java:
##########
@@ -119,6 +119,8 @@ public enum HoodieSchemaType {
VARIANT(Schema.Type.RECORD),
+ VECTOR(Schema.Type.FIXED),
Review Comment:
@vinothchandar when i discussed with Tim we actually do not want to have
this as RECORD with additional fields:
https://github.com/apache/hudi/pull/18146#discussion_r2793856934
I would assume there would be more overhead with having this fields
approach, and im not sure what future extensibility we would capture by having
additional fields that isnt captured by the current model.
In my mind the only flexibility the dense VECTOR type would need is the
`storageBacking` field. Not sure what else evolution would be needed as we have
all the other required info such as dimension and element type, and those
likely do not change once a user defines this column.
In regards to sparse vectors, based on RFC 99
https://github.com/apache/hudi/pull/18184/changes we likely would not be using
this `VECTOR` type as its meant for DENSE vector cases, and we would focus on
defining a `SPARSE_VECTOR` type that will have a different backing and
expectations since we are keeping track of indices for non zero positions:
<img width="1277" height="190" alt="Screenshot 2026-02-24 at 9 15 42 AM"
src="https://github.com/user-attachments/assets/988997dd-c981-463a-9893-87272ddffa65"
/>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]