In the past we have discussed adding a canonical type for UUID and JSON. I
still think this is a good idea and could improve ergonomics in downstream
language bindings (e.g. by exposing JSON querying function or automatically
boxing UUIDs in built-in UUID types, like the Python uuid library). Has
anyone done any work on this to anyone's knowledge?

On Wed, Apr 10, 2024 at 3:05 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> Hi Norman,
> Arrow has a concept of extension types [1] along with the possibility of
> proposing new canonical extension types [2].  This seems to cover the
> use-cases you mention but I might be misunderstanding?
>
> Thanks,
> Micah
>
> [1]
>
> https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types
> [2] https://arrow.apache.org/docs/format/CanonicalExtensions.html
>
> On Wed, Apr 10, 2024 at 11:44 AM Norman Jordan
> <norman.jor...@improving.com.invalid> wrote:
>
> > Problem Description
> >
> > Currently Arrow schemas can only contain columns of types supported by
> > Arrow. In some cases an Arrow schema maps to an external schema. This can
> > result in the Arrow schema not being able to support all the columns from
> > the external schema.
> >
> > Consider an external system that contains a column of type UUID. To model
> > the schema in Arrow, the user has two choices:
> >
> >   1.  Do not include the UUID column in the Arrow schema
> >
> >   2.  Map the column to an existing Arrow type. This will not include the
> > original type information. A UUID can be mapped to a FixedSizeBinary, but
> > consumers of the Arrow schema will be unable to distinguish a
> > FixedSizeBinary field from a UUID field.
> >
> > Possible Solution
> >
> >   *   Add a new type code that represents unsupported types
> >
> >   *   Values for the new type are represented as variable length binary
> >
> > Some drivers can expose data even when they don’t understand the data
> > type. For example, the PostgreSQL driver will return the raw bytes for
> > fields of an unknown type. Using an explicit type lets clients know that
> > they should convert values if they were able to determine the actual data
> > type.
> >
> > Questions
> >
> >   *   What is the impact on existing clients when they encounter fields
> of
> > the unsupported type?
> >
> >   *   Is it safe to assume that all unsupported values can safely be
> > converted to a variable length binary?
> >
> >   *   How can we preserve information about the original type?
> >
> >
>

Reply via email to