GitHub user the-other-tim-brown added a comment to the discussion: RFC-99: Hudi Type System
>From the Hudi perspective, we want to have a type that provides the context >for what the column represents. This is what logical types do today in >Parquet. For example, we store a long but we're able to interpret it as a >timestamp in other systems. For blobs and vectors we'll want to do something similar where we can potentially represent the data as raw bytes or an array in the files on storage but the types in the Hudi schema will have that extra context for how we want to persist and then interpret the data. Spark has a concept of a [UserDefinedType](https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/types/UserDefinedType.html) that can help us carry this context through with the dataframe. GitHub link: https://github.com/apache/hudi/discussions/14253#discussioncomment-14964252 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
