Re: [D] RFC-99: Hudi Type System [hudi]

via GitHub Thu, 13 Nov 2025 17:04:57 -0800


GitHub user the-other-tim-brown added a comment to the discussion: RFC-99: Hudi 
Type System


>From the Hudi perspective, we want to have a type that provides the context 
>for what the column represents. This is what logical types do today in 
>Parquet. For example, we store a long but we're able to interpret it as a 
>timestamp in other systems.

For blobs and vectors we'll want to do something similar where we can 
potentially represent the data as raw bytes or an array in the files on storage 
but the types in the Hudi schema will have that extra context for how we want 
to persist and then interpret the data. Spark has a concept of a 
[UserDefinedType](https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/types/UserDefinedType.html)
 that can help us carry this context through with the dataframe.

GitHub link: 
https://github.com/apache/hudi/discussions/14253#discussioncomment-14964252

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Re: [D] RFC-99: Hudi Type System [hudi]

Reply via email to