I think an "Other" extension type is slightly different than an arbitrary extension type, though: the latter may be understood downstream but the former represents a point at which a component explicitly declares it does not know how to handle a field. In this example, the PostgreSQL ADBC driver might be able to provide a representation regardless, but a different driver (or say, the JDBC adapter, which cannot necessarily get a bytestring for an arbitrary JDBC type) may want an Other type to signal that it would fail if asked to provide particular columns.
On Fri, Apr 12, 2024, at 02:30, Dewey Dunnington wrote: > Depending where your Arrow-encoded data is used, either extension > types or generic field metadata are options. We have this problem in > the ADBC Postgres driver, where we can convert *most* Postgres types > to an Arrow type but there are some others where we can't or don't > know or don't implement a conversion. Currently for these we return > opaque binary (the Postgres COPY representation of the value) but put > field metadata so that a consumer can implement a workaround for an > unsupported type. It would be arguably better to have implemented this > as an extension type; however, field metadata felt like less of a > commitment when I first worked on this. > > Cheers, > > -dewey > > On Thu, Apr 11, 2024 at 1:20 PM Norman Jordan > <[email protected]> wrote: >> >> I was using UUID as an example. It looks like extension types covers my >> original request. >> ________________________________ >> From: Felipe Oliveira Carvalho <[email protected]> >> Sent: Thursday, April 11, 2024 7:15 AM >> To: [email protected] <[email protected]> >> Subject: Re: Unsupported/Other Type >> >> The OP used UUID as an example. Would that be enough or the request is for >> a flexible mechanism that allows the creation of one-off nominal types for >> very specific use-cases? >> >> — >> Felipe >> >> On Thu, 11 Apr 2024 at 05:06 Antoine Pitrou <[email protected]> wrote: >> >> > >> > Yes, JSON and UUID are obvious candidates for new canonical extension >> > types. XML also comes to mind, but I'm not sure there's much of a use >> > case for it. >> > >> > Regards >> > >> > Antoine. >> > >> > >> > Le 10/04/2024 à 22:55, Wes McKinney a écrit : >> > > In the past we have discussed adding a canonical type for UUID and JSON. >> > I >> > > still think this is a good idea and could improve ergonomics in >> > downstream >> > > language bindings (e.g. by exposing JSON querying function or >> > automatically >> > > boxing UUIDs in built-in UUID types, like the Python uuid library). Has >> > > anyone done any work on this to anyone's knowledge? >> > > >> > > On Wed, Apr 10, 2024 at 3:05 PM Micah Kornfield <[email protected]> >> > > wrote: >> > > >> > >> Hi Norman, >> > >> Arrow has a concept of extension types [1] along with the possibility of >> > >> proposing new canonical extension types [2]. This seems to cover the >> > >> use-cases you mention but I might be misunderstanding? >> > >> >> > >> Thanks, >> > >> Micah >> > >> >> > >> [1] >> > >> >> > >> >> > https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types >> > >> [2] https://arrow.apache.org/docs/format/CanonicalExtensions.html >> > >> >> > >> On Wed, Apr 10, 2024 at 11:44 AM Norman Jordan >> > >> <[email protected]> wrote: >> > >> >> > >>> Problem Description >> > >>> >> > >>> Currently Arrow schemas can only contain columns of types supported by >> > >>> Arrow. In some cases an Arrow schema maps to an external schema. This >> > can >> > >>> result in the Arrow schema not being able to support all the columns >> > from >> > >>> the external schema. >> > >>> >> > >>> Consider an external system that contains a column of type UUID. To >> > model >> > >>> the schema in Arrow, the user has two choices: >> > >>> >> > >>> 1. Do not include the UUID column in the Arrow schema >> > >>> >> > >>> 2. Map the column to an existing Arrow type. This will not include >> > the >> > >>> original type information. A UUID can be mapped to a FixedSizeBinary, >> > but >> > >>> consumers of the Arrow schema will be unable to distinguish a >> > >>> FixedSizeBinary field from a UUID field. >> > >>> >> > >>> Possible Solution >> > >>> >> > >>> * Add a new type code that represents unsupported types >> > >>> >> > >>> * Values for the new type are represented as variable length >> > binary >> > >>> >> > >>> Some drivers can expose data even when they don’t understand the data >> > >>> type. For example, the PostgreSQL driver will return the raw bytes for >> > >>> fields of an unknown type. Using an explicit type lets clients know >> > that >> > >>> they should convert values if they were able to determine the actual >> > data >> > >>> type. >> > >>> >> > >>> Questions >> > >>> >> > >>> * What is the impact on existing clients when they encounter >> > fields >> > >> of >> > >>> the unsupported type? >> > >>> >> > >>> * Is it safe to assume that all unsupported values can safely be >> > >>> converted to a variable length binary? >> > >>> >> > >>> * How can we preserve information about the original type? >> > >>> >> > >>> >> > >> >> > > >> > >> Warning: The sender of this message could not be validated and may not be >> the actual sender.
