The OP used UUID as an example. Would that be enough or the request is for a flexible mechanism that allows the creation of one-off nominal types for very specific use-cases?
— Felipe On Thu, 11 Apr 2024 at 05:06 Antoine Pitrou <anto...@python.org> wrote: > > Yes, JSON and UUID are obvious candidates for new canonical extension > types. XML also comes to mind, but I'm not sure there's much of a use > case for it. > > Regards > > Antoine. > > > Le 10/04/2024 à 22:55, Wes McKinney a écrit : > > In the past we have discussed adding a canonical type for UUID and JSON. > I > > still think this is a good idea and could improve ergonomics in > downstream > > language bindings (e.g. by exposing JSON querying function or > automatically > > boxing UUIDs in built-in UUID types, like the Python uuid library). Has > > anyone done any work on this to anyone's knowledge? > > > > On Wed, Apr 10, 2024 at 3:05 PM Micah Kornfield <emkornfi...@gmail.com> > > wrote: > > > >> Hi Norman, > >> Arrow has a concept of extension types [1] along with the possibility of > >> proposing new canonical extension types [2]. This seems to cover the > >> use-cases you mention but I might be misunderstanding? > >> > >> Thanks, > >> Micah > >> > >> [1] > >> > >> > https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types > >> [2] https://arrow.apache.org/docs/format/CanonicalExtensions.html > >> > >> On Wed, Apr 10, 2024 at 11:44 AM Norman Jordan > >> <norman.jor...@improving.com.invalid> wrote: > >> > >>> Problem Description > >>> > >>> Currently Arrow schemas can only contain columns of types supported by > >>> Arrow. In some cases an Arrow schema maps to an external schema. This > can > >>> result in the Arrow schema not being able to support all the columns > from > >>> the external schema. > >>> > >>> Consider an external system that contains a column of type UUID. To > model > >>> the schema in Arrow, the user has two choices: > >>> > >>> 1. Do not include the UUID column in the Arrow schema > >>> > >>> 2. Map the column to an existing Arrow type. This will not include > the > >>> original type information. A UUID can be mapped to a FixedSizeBinary, > but > >>> consumers of the Arrow schema will be unable to distinguish a > >>> FixedSizeBinary field from a UUID field. > >>> > >>> Possible Solution > >>> > >>> * Add a new type code that represents unsupported types > >>> > >>> * Values for the new type are represented as variable length > binary > >>> > >>> Some drivers can expose data even when they don’t understand the data > >>> type. For example, the PostgreSQL driver will return the raw bytes for > >>> fields of an unknown type. Using an explicit type lets clients know > that > >>> they should convert values if they were able to determine the actual > data > >>> type. > >>> > >>> Questions > >>> > >>> * What is the impact on existing clients when they encounter > fields > >> of > >>> the unsupported type? > >>> > >>> * Is it safe to assume that all unsupported values can safely be > >>> converted to a variable length binary? > >>> > >>> * How can we preserve information about the original type? > >>> > >>> > >> > > >