I think an "Other" extension type is slightly different than an arbitrary 
extension type, though: the latter may be understood downstream but the former 
represents a point at which a component explicitly declares it does not know 
how to handle a field. In this example, the PostgreSQL ADBC driver might be 
able to provide a representation regardless, but a different driver (or say, 
the JDBC adapter, which cannot necessarily get a bytestring for an arbitrary 
JDBC type) may want an Other type to signal that it would fail if asked to 
provide particular columns.

On Fri, Apr 12, 2024, at 02:30, Dewey Dunnington wrote:
> Depending where your Arrow-encoded data is used, either extension
> types or generic field metadata are options. We have this problem in
> the ADBC Postgres driver, where we can convert *most* Postgres types
> to an Arrow type but there are some others where we can't or don't
> know or don't implement a conversion. Currently for these we return
> opaque binary (the Postgres COPY representation of the value) but put
> field metadata so that a consumer can implement a workaround for an
> unsupported type. It would be arguably better to have implemented this
> as an extension type; however, field metadata felt like less of a
> commitment when I first worked on this.
>
> Cheers,
>
> -dewey
>
> On Thu, Apr 11, 2024 at 1:20 PM Norman Jordan
> <norman.jor...@improving.com.invalid> wrote:
>>
>> I was using UUID as an example. It looks like extension types covers my 
>> original request.
>> ________________________________
>> From: Felipe Oliveira Carvalho <felipe...@gmail.com>
>> Sent: Thursday, April 11, 2024 7:15 AM
>> To: dev@arrow.apache.org <dev@arrow.apache.org>
>> Subject: Re: Unsupported/Other Type
>>
>> The OP used UUID as an example. Would that be enough or the request is for
>> a flexible mechanism that allows the creation of one-off nominal types for
>> very specific use-cases?
>>
>> —
>> Felipe
>>
>> On Thu, 11 Apr 2024 at 05:06 Antoine Pitrou <anto...@python.org> wrote:
>>
>> >
>> > Yes, JSON and UUID are obvious candidates for new canonical extension
>> > types. XML also comes to mind, but I'm not sure there's much of a use
>> > case for it.
>> >
>> > Regards
>> >
>> > Antoine.
>> >
>> >
>> > Le 10/04/2024 à 22:55, Wes McKinney a écrit :
>> > > In the past we have discussed adding a canonical type for UUID and JSON.
>> > I
>> > > still think this is a good idea and could improve ergonomics in
>> > downstream
>> > > language bindings (e.g. by exposing JSON querying function or
>> > automatically
>> > > boxing UUIDs in built-in UUID types, like the Python uuid library). Has
>> > > anyone done any work on this to anyone's knowledge?
>> > >
>> > > On Wed, Apr 10, 2024 at 3:05 PM Micah Kornfield <emkornfi...@gmail.com>
>> > > wrote:
>> > >
>> > >> Hi Norman,
>> > >> Arrow has a concept of extension types [1] along with the possibility of
>> > >> proposing new canonical extension types [2].  This seems to cover the
>> > >> use-cases you mention but I might be misunderstanding?
>> > >>
>> > >> Thanks,
>> > >> Micah
>> > >>
>> > >> [1]
>> > >>
>> > >>
>> > https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types
>> > >> [2] https://arrow.apache.org/docs/format/CanonicalExtensions.html
>> > >>
>> > >> On Wed, Apr 10, 2024 at 11:44 AM Norman Jordan
>> > >> <norman.jor...@improving.com.invalid> wrote:
>> > >>
>> > >>> Problem Description
>> > >>>
>> > >>> Currently Arrow schemas can only contain columns of types supported by
>> > >>> Arrow. In some cases an Arrow schema maps to an external schema. This
>> > can
>> > >>> result in the Arrow schema not being able to support all the columns
>> > from
>> > >>> the external schema.
>> > >>>
>> > >>> Consider an external system that contains a column of type UUID. To
>> > model
>> > >>> the schema in Arrow, the user has two choices:
>> > >>>
>> > >>>    1.  Do not include the UUID column in the Arrow schema
>> > >>>
>> > >>>    2.  Map the column to an existing Arrow type. This will not include
>> > the
>> > >>> original type information. A UUID can be mapped to a FixedSizeBinary,
>> > but
>> > >>> consumers of the Arrow schema will be unable to distinguish a
>> > >>> FixedSizeBinary field from a UUID field.
>> > >>>
>> > >>> Possible Solution
>> > >>>
>> > >>>    *   Add a new type code that represents unsupported types
>> > >>>
>> > >>>    *   Values for the new type are represented as variable length
>> > binary
>> > >>>
>> > >>> Some drivers can expose data even when they don’t understand the data
>> > >>> type. For example, the PostgreSQL driver will return the raw bytes for
>> > >>> fields of an unknown type. Using an explicit type lets clients know
>> > that
>> > >>> they should convert values if they were able to determine the actual
>> > data
>> > >>> type.
>> > >>>
>> > >>> Questions
>> > >>>
>> > >>>    *   What is the impact on existing clients when they encounter
>> > fields
>> > >> of
>> > >>> the unsupported type?
>> > >>>
>> > >>>    *   Is it safe to assume that all unsupported values can safely be
>> > >>> converted to a variable length binary?
>> > >>>
>> > >>>    *   How can we preserve information about the original type?
>> > >>>
>> > >>>
>> > >>
>> > >
>> >
>> Warning: The sender of this message could not be validated and may not be 
>> the actual sender.

Reply via email to