For an unsupported/other extension type.
On Wed, Apr 17, 2024, at 18:32, Antoine Pitrou wrote: > What is "this proposal"? > > > Le 17/04/2024 à 10:38, David Li a écrit : >> Should I take it that this proposal is dead in the water? While we could >> define our own Unknown/Other type for say the ADBC PostgreSQL driver it >> might be useful to have a singular type for consumers to latch on to. >> >> On Fri, Apr 12, 2024, at 07:32, David Li wrote: >>> I think an "Other" extension type is slightly different than an >>> arbitrary extension type, though: the latter may be understood >>> downstream but the former represents a point at which a component >>> explicitly declares it does not know how to handle a field. In this >>> example, the PostgreSQL ADBC driver might be able to provide a >>> representation regardless, but a different driver (or say, the JDBC >>> adapter, which cannot necessarily get a bytestring for an arbitrary >>> JDBC type) may want an Other type to signal that it would fail if asked >>> to provide particular columns. >>> >>> On Fri, Apr 12, 2024, at 02:30, Dewey Dunnington wrote: >>>> Depending where your Arrow-encoded data is used, either extension >>>> types or generic field metadata are options. We have this problem in >>>> the ADBC Postgres driver, where we can convert *most* Postgres types >>>> to an Arrow type but there are some others where we can't or don't >>>> know or don't implement a conversion. Currently for these we return >>>> opaque binary (the Postgres COPY representation of the value) but put >>>> field metadata so that a consumer can implement a workaround for an >>>> unsupported type. It would be arguably better to have implemented this >>>> as an extension type; however, field metadata felt like less of a >>>> commitment when I first worked on this. >>>> >>>> Cheers, >>>> >>>> -dewey >>>> >>>> On Thu, Apr 11, 2024 at 1:20 PM Norman Jordan >>>> <norman.jor...@improving.com.invalid> wrote: >>>>> >>>>> I was using UUID as an example. It looks like extension types covers my >>>>> original request. >>>>> ________________________________ >>>>> From: Felipe Oliveira Carvalho <felipe...@gmail.com> >>>>> Sent: Thursday, April 11, 2024 7:15 AM >>>>> To: dev@arrow.apache.org <dev@arrow.apache.org> >>>>> Subject: Re: Unsupported/Other Type >>>>> >>>>> The OP used UUID as an example. Would that be enough or the request is for >>>>> a flexible mechanism that allows the creation of one-off nominal types for >>>>> very specific use-cases? >>>>> >>>>> — >>>>> Felipe >>>>> >>>>> On Thu, 11 Apr 2024 at 05:06 Antoine Pitrou <anto...@python.org> wrote: >>>>> >>>>>> >>>>>> Yes, JSON and UUID are obvious candidates for new canonical extension >>>>>> types. XML also comes to mind, but I'm not sure there's much of a use >>>>>> case for it. >>>>>> >>>>>> Regards >>>>>> >>>>>> Antoine. >>>>>> >>>>>> >>>>>> Le 10/04/2024 à 22:55, Wes McKinney a écrit : >>>>>>> In the past we have discussed adding a canonical type for UUID and JSON. >>>>>> I >>>>>>> still think this is a good idea and could improve ergonomics in >>>>>> downstream >>>>>>> language bindings (e.g. by exposing JSON querying function or >>>>>> automatically >>>>>>> boxing UUIDs in built-in UUID types, like the Python uuid library). Has >>>>>>> anyone done any work on this to anyone's knowledge? >>>>>>> >>>>>>> On Wed, Apr 10, 2024 at 3:05 PM Micah Kornfield <emkornfi...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Norman, >>>>>>>> Arrow has a concept of extension types [1] along with the possibility >>>>>>>> of >>>>>>>> proposing new canonical extension types [2]. This seems to cover the >>>>>>>> use-cases you mention but I might be misunderstanding? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Micah >>>>>>>> >>>>>>>> [1] >>>>>>>> >>>>>>>> >>>>>> https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types >>>>>>>> [2] https://arrow.apache.org/docs/format/CanonicalExtensions.html >>>>>>>> >>>>>>>> On Wed, Apr 10, 2024 at 11:44 AM Norman Jordan >>>>>>>> <norman.jor...@improving.com.invalid> wrote: >>>>>>>> >>>>>>>>> Problem Description >>>>>>>>> >>>>>>>>> Currently Arrow schemas can only contain columns of types supported by >>>>>>>>> Arrow. In some cases an Arrow schema maps to an external schema. This >>>>>> can >>>>>>>>> result in the Arrow schema not being able to support all the columns >>>>>> from >>>>>>>>> the external schema. >>>>>>>>> >>>>>>>>> Consider an external system that contains a column of type UUID. To >>>>>> model >>>>>>>>> the schema in Arrow, the user has two choices: >>>>>>>>> >>>>>>>>> 1. Do not include the UUID column in the Arrow schema >>>>>>>>> >>>>>>>>> 2. Map the column to an existing Arrow type. This will not >>>>>>>>> include >>>>>> the >>>>>>>>> original type information. A UUID can be mapped to a FixedSizeBinary, >>>>>> but >>>>>>>>> consumers of the Arrow schema will be unable to distinguish a >>>>>>>>> FixedSizeBinary field from a UUID field. >>>>>>>>> >>>>>>>>> Possible Solution >>>>>>>>> >>>>>>>>> * Add a new type code that represents unsupported types >>>>>>>>> >>>>>>>>> * Values for the new type are represented as variable length >>>>>> binary >>>>>>>>> >>>>>>>>> Some drivers can expose data even when they don’t understand the data >>>>>>>>> type. For example, the PostgreSQL driver will return the raw bytes for >>>>>>>>> fields of an unknown type. Using an explicit type lets clients know >>>>>> that >>>>>>>>> they should convert values if they were able to determine the actual >>>>>> data >>>>>>>>> type. >>>>>>>>> >>>>>>>>> Questions >>>>>>>>> >>>>>>>>> * What is the impact on existing clients when they encounter >>>>>> fields >>>>>>>> of >>>>>>>>> the unsupported type? >>>>>>>>> >>>>>>>>> * Is it safe to assume that all unsupported values can safely be >>>>>>>>> converted to a variable length binary? >>>>>>>>> >>>>>>>>> * How can we preserve information about the original type? >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> Warning: The sender of this message could not be validated and may not be >>>>> the actual sender.