Hi Norman,
Arrow has a concept of extension types [1] along with the possibility of
proposing new canonical extension types [2].  This seems to cover the
use-cases you mention but I might be misunderstanding?

Thanks,
Micah

[1]
https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types
[2] https://arrow.apache.org/docs/format/CanonicalExtensions.html

On Wed, Apr 10, 2024 at 11:44 AM Norman Jordan
<norman.jor...@improving.com.invalid> wrote:

> Problem Description
>
> Currently Arrow schemas can only contain columns of types supported by
> Arrow. In some cases an Arrow schema maps to an external schema. This can
> result in the Arrow schema not being able to support all the columns from
> the external schema.
>
> Consider an external system that contains a column of type UUID. To model
> the schema in Arrow, the user has two choices:
>
>   1.  Do not include the UUID column in the Arrow schema
>
>   2.  Map the column to an existing Arrow type. This will not include the
> original type information. A UUID can be mapped to a FixedSizeBinary, but
> consumers of the Arrow schema will be unable to distinguish a
> FixedSizeBinary field from a UUID field.
>
> Possible Solution
>
>   *   Add a new type code that represents unsupported types
>
>   *   Values for the new type are represented as variable length binary
>
> Some drivers can expose data even when they don’t understand the data
> type. For example, the PostgreSQL driver will return the raw bytes for
> fields of an unknown type. Using an explicit type lets clients know that
> they should convert values if they were able to determine the actual data
> type.
>
> Questions
>
>   *   What is the impact on existing clients when they encounter fields of
> the unsupported type?
>
>   *   Is it safe to assume that all unsupported values can safely be
> converted to a variable length binary?
>
>   *   How can we preserve information about the original type?
>
>

Reply via email to