For an unsupported/other extension type.

On Wed, Apr 17, 2024, at 18:32, Antoine Pitrou wrote:
> What is "this proposal"?
>
>
> Le 17/04/2024 à 10:38, David Li a écrit :
>> Should I take it that this proposal is dead in the water? While we could 
>> define our own Unknown/Other type for say the ADBC PostgreSQL driver it 
>> might be useful to have a singular type for consumers to latch on to.
>> 
>> On Fri, Apr 12, 2024, at 07:32, David Li wrote:
>>> I think an "Other" extension type is slightly different than an
>>> arbitrary extension type, though: the latter may be understood
>>> downstream but the former represents a point at which a component
>>> explicitly declares it does not know how to handle a field. In this
>>> example, the PostgreSQL ADBC driver might be able to provide a
>>> representation regardless, but a different driver (or say, the JDBC
>>> adapter, which cannot necessarily get a bytestring for an arbitrary
>>> JDBC type) may want an Other type to signal that it would fail if asked
>>> to provide particular columns.
>>>
>>> On Fri, Apr 12, 2024, at 02:30, Dewey Dunnington wrote:
>>>> Depending where your Arrow-encoded data is used, either extension
>>>> types or generic field metadata are options. We have this problem in
>>>> the ADBC Postgres driver, where we can convert *most* Postgres types
>>>> to an Arrow type but there are some others where we can't or don't
>>>> know or don't implement a conversion. Currently for these we return
>>>> opaque binary (the Postgres COPY representation of the value) but put
>>>> field metadata so that a consumer can implement a workaround for an
>>>> unsupported type. It would be arguably better to have implemented this
>>>> as an extension type; however, field metadata felt like less of a
>>>> commitment when I first worked on this.
>>>>
>>>> Cheers,
>>>>
>>>> -dewey
>>>>
>>>> On Thu, Apr 11, 2024 at 1:20 PM Norman Jordan
>>>> <norman.jor...@improving.com.invalid> wrote:
>>>>>
>>>>> I was using UUID as an example. It looks like extension types covers my 
>>>>> original request.
>>>>> ________________________________
>>>>> From: Felipe Oliveira Carvalho <felipe...@gmail.com>
>>>>> Sent: Thursday, April 11, 2024 7:15 AM
>>>>> To: dev@arrow.apache.org <dev@arrow.apache.org>
>>>>> Subject: Re: Unsupported/Other Type
>>>>>
>>>>> The OP used UUID as an example. Would that be enough or the request is for
>>>>> a flexible mechanism that allows the creation of one-off nominal types for
>>>>> very specific use-cases?
>>>>>
>>>>> —
>>>>> Felipe
>>>>>
>>>>> On Thu, 11 Apr 2024 at 05:06 Antoine Pitrou <anto...@python.org> wrote:
>>>>>
>>>>>>
>>>>>> Yes, JSON and UUID are obvious candidates for new canonical extension
>>>>>> types. XML also comes to mind, but I'm not sure there's much of a use
>>>>>> case for it.
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Antoine.
>>>>>>
>>>>>>
>>>>>> Le 10/04/2024 à 22:55, Wes McKinney a écrit :
>>>>>>> In the past we have discussed adding a canonical type for UUID and JSON.
>>>>>> I
>>>>>>> still think this is a good idea and could improve ergonomics in
>>>>>> downstream
>>>>>>> language bindings (e.g. by exposing JSON querying function or
>>>>>> automatically
>>>>>>> boxing UUIDs in built-in UUID types, like the Python uuid library). Has
>>>>>>> anyone done any work on this to anyone's knowledge?
>>>>>>>
>>>>>>> On Wed, Apr 10, 2024 at 3:05 PM Micah Kornfield <emkornfi...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Norman,
>>>>>>>> Arrow has a concept of extension types [1] along with the possibility 
>>>>>>>> of
>>>>>>>> proposing new canonical extension types [2].  This seems to cover the
>>>>>>>> use-cases you mention but I might be misunderstanding?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Micah
>>>>>>>>
>>>>>>>> [1]
>>>>>>>>
>>>>>>>>
>>>>>> https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types
>>>>>>>> [2] https://arrow.apache.org/docs/format/CanonicalExtensions.html
>>>>>>>>
>>>>>>>> On Wed, Apr 10, 2024 at 11:44 AM Norman Jordan
>>>>>>>> <norman.jor...@improving.com.invalid> wrote:
>>>>>>>>
>>>>>>>>> Problem Description
>>>>>>>>>
>>>>>>>>> Currently Arrow schemas can only contain columns of types supported by
>>>>>>>>> Arrow. In some cases an Arrow schema maps to an external schema. This
>>>>>> can
>>>>>>>>> result in the Arrow schema not being able to support all the columns
>>>>>> from
>>>>>>>>> the external schema.
>>>>>>>>>
>>>>>>>>> Consider an external system that contains a column of type UUID. To
>>>>>> model
>>>>>>>>> the schema in Arrow, the user has two choices:
>>>>>>>>>
>>>>>>>>>     1.  Do not include the UUID column in the Arrow schema
>>>>>>>>>
>>>>>>>>>     2.  Map the column to an existing Arrow type. This will not 
>>>>>>>>> include
>>>>>> the
>>>>>>>>> original type information. A UUID can be mapped to a FixedSizeBinary,
>>>>>> but
>>>>>>>>> consumers of the Arrow schema will be unable to distinguish a
>>>>>>>>> FixedSizeBinary field from a UUID field.
>>>>>>>>>
>>>>>>>>> Possible Solution
>>>>>>>>>
>>>>>>>>>     *   Add a new type code that represents unsupported types
>>>>>>>>>
>>>>>>>>>     *   Values for the new type are represented as variable length
>>>>>> binary
>>>>>>>>>
>>>>>>>>> Some drivers can expose data even when they don’t understand the data
>>>>>>>>> type. For example, the PostgreSQL driver will return the raw bytes for
>>>>>>>>> fields of an unknown type. Using an explicit type lets clients know
>>>>>> that
>>>>>>>>> they should convert values if they were able to determine the actual
>>>>>> data
>>>>>>>>> type.
>>>>>>>>>
>>>>>>>>> Questions
>>>>>>>>>
>>>>>>>>>     *   What is the impact on existing clients when they encounter
>>>>>> fields
>>>>>>>> of
>>>>>>>>> the unsupported type?
>>>>>>>>>
>>>>>>>>>     *   Is it safe to assume that all unsupported values can safely be
>>>>>>>>> converted to a variable length binary?
>>>>>>>>>
>>>>>>>>>     *   How can we preserve information about the original type?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>> Warning: The sender of this message could not be validated and may not be 
>>>>> the actual sender.

Reply via email to