Hi Norman, Arrow has a concept of extension types [1] along with the possibility of proposing new canonical extension types [2]. This seems to cover the use-cases you mention but I might be misunderstanding?
Thanks, Micah [1] https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types [2] https://arrow.apache.org/docs/format/CanonicalExtensions.html On Wed, Apr 10, 2024 at 11:44 AM Norman Jordan <norman.jor...@improving.com.invalid> wrote: > Problem Description > > Currently Arrow schemas can only contain columns of types supported by > Arrow. In some cases an Arrow schema maps to an external schema. This can > result in the Arrow schema not being able to support all the columns from > the external schema. > > Consider an external system that contains a column of type UUID. To model > the schema in Arrow, the user has two choices: > > 1. Do not include the UUID column in the Arrow schema > > 2. Map the column to an existing Arrow type. This will not include the > original type information. A UUID can be mapped to a FixedSizeBinary, but > consumers of the Arrow schema will be unable to distinguish a > FixedSizeBinary field from a UUID field. > > Possible Solution > > * Add a new type code that represents unsupported types > > * Values for the new type are represented as variable length binary > > Some drivers can expose data even when they don’t understand the data > type. For example, the PostgreSQL driver will return the raw bytes for > fields of an unknown type. Using an explicit type lets clients know that > they should convert values if they were able to determine the actual data > type. > > Questions > > * What is the impact on existing clients when they encounter fields of > the unsupported type? > > * Is it safe to assume that all unsupported values can safely be > converted to a variable length binary? > > * How can we preserve information about the original type? > >