The OP used UUID as an example. Would that be enough or the request is for
a flexible mechanism that allows the creation of one-off nominal types for
very specific use-cases?

—
Felipe

On Thu, 11 Apr 2024 at 05:06 Antoine Pitrou <anto...@python.org> wrote:

>
> Yes, JSON and UUID are obvious candidates for new canonical extension
> types. XML also comes to mind, but I'm not sure there's much of a use
> case for it.
>
> Regards
>
> Antoine.
>
>
> Le 10/04/2024 à 22:55, Wes McKinney a écrit :
> > In the past we have discussed adding a canonical type for UUID and JSON.
> I
> > still think this is a good idea and could improve ergonomics in
> downstream
> > language bindings (e.g. by exposing JSON querying function or
> automatically
> > boxing UUIDs in built-in UUID types, like the Python uuid library). Has
> > anyone done any work on this to anyone's knowledge?
> >
> > On Wed, Apr 10, 2024 at 3:05 PM Micah Kornfield <emkornfi...@gmail.com>
> > wrote:
> >
> >> Hi Norman,
> >> Arrow has a concept of extension types [1] along with the possibility of
> >> proposing new canonical extension types [2].  This seems to cover the
> >> use-cases you mention but I might be misunderstanding?
> >>
> >> Thanks,
> >> Micah
> >>
> >> [1]
> >>
> >>
> https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types
> >> [2] https://arrow.apache.org/docs/format/CanonicalExtensions.html
> >>
> >> On Wed, Apr 10, 2024 at 11:44 AM Norman Jordan
> >> <norman.jor...@improving.com.invalid> wrote:
> >>
> >>> Problem Description
> >>>
> >>> Currently Arrow schemas can only contain columns of types supported by
> >>> Arrow. In some cases an Arrow schema maps to an external schema. This
> can
> >>> result in the Arrow schema not being able to support all the columns
> from
> >>> the external schema.
> >>>
> >>> Consider an external system that contains a column of type UUID. To
> model
> >>> the schema in Arrow, the user has two choices:
> >>>
> >>>    1.  Do not include the UUID column in the Arrow schema
> >>>
> >>>    2.  Map the column to an existing Arrow type. This will not include
> the
> >>> original type information. A UUID can be mapped to a FixedSizeBinary,
> but
> >>> consumers of the Arrow schema will be unable to distinguish a
> >>> FixedSizeBinary field from a UUID field.
> >>>
> >>> Possible Solution
> >>>
> >>>    *   Add a new type code that represents unsupported types
> >>>
> >>>    *   Values for the new type are represented as variable length
> binary
> >>>
> >>> Some drivers can expose data even when they don’t understand the data
> >>> type. For example, the PostgreSQL driver will return the raw bytes for
> >>> fields of an unknown type. Using an explicit type lets clients know
> that
> >>> they should convert values if they were able to determine the actual
> data
> >>> type.
> >>>
> >>> Questions
> >>>
> >>>    *   What is the impact on existing clients when they encounter
> fields
> >> of
> >>> the unsupported type?
> >>>
> >>>    *   Is it safe to assume that all unsupported values can safely be
> >>> converted to a variable length binary?
> >>>
> >>>    *   How can we preserve information about the original type?
> >>>
> >>>
> >>
> >
>

Reply via email to