> may want an Other type to signal that it would fail if asked to provide
particular columns.

I interpret "would fail" to mean we are still speaking in some kind of
"planning stage" and not yet actually creating arrays.  So I don't know
that this needs to be a data type.  In other words, I see this as
`std::optional<DataType>` and not a unique instance of `DataType`.

However, if you did need to actually create an array, and you wanted some
way of saying "there is no data here because I failed to interpret the
type" then maybe you could create an extension type based on the null type?

On Wed, Apr 17, 2024 at 2:57 AM David Li <lidav...@apache.org> wrote:

> For an unsupported/other extension type.
>
> On Wed, Apr 17, 2024, at 18:32, Antoine Pitrou wrote:
> > What is "this proposal"?
> >
> >
> > Le 17/04/2024 à 10:38, David Li a écrit :
> >> Should I take it that this proposal is dead in the water? While we
> could define our own Unknown/Other type for say the ADBC PostgreSQL driver
> it might be useful to have a singular type for consumers to latch on to.
> >>
> >> On Fri, Apr 12, 2024, at 07:32, David Li wrote:
> >>> I think an "Other" extension type is slightly different than an
> >>> arbitrary extension type, though: the latter may be understood
> >>> downstream but the former represents a point at which a component
> >>> explicitly declares it does not know how to handle a field. In this
> >>> example, the PostgreSQL ADBC driver might be able to provide a
> >>> representation regardless, but a different driver (or say, the JDBC
> >>> adapter, which cannot necessarily get a bytestring for an arbitrary
> >>> JDBC type) may want an Other type to signal that it would fail if asked
> >>> to provide particular columns.
> >>>
> >>> On Fri, Apr 12, 2024, at 02:30, Dewey Dunnington wrote:
> >>>> Depending where your Arrow-encoded data is used, either extension
> >>>> types or generic field metadata are options. We have this problem in
> >>>> the ADBC Postgres driver, where we can convert *most* Postgres types
> >>>> to an Arrow type but there are some others where we can't or don't
> >>>> know or don't implement a conversion. Currently for these we return
> >>>> opaque binary (the Postgres COPY representation of the value) but put
> >>>> field metadata so that a consumer can implement a workaround for an
> >>>> unsupported type. It would be arguably better to have implemented this
> >>>> as an extension type; however, field metadata felt like less of a
> >>>> commitment when I first worked on this.
> >>>>
> >>>> Cheers,
> >>>>
> >>>> -dewey
> >>>>
> >>>> On Thu, Apr 11, 2024 at 1:20 PM Norman Jordan
> >>>> <norman.jor...@improving.com.invalid> wrote:
> >>>>>
> >>>>> I was using UUID as an example. It looks like extension types covers
> my original request.
> >>>>> ________________________________
> >>>>> From: Felipe Oliveira Carvalho <felipe...@gmail.com>
> >>>>> Sent: Thursday, April 11, 2024 7:15 AM
> >>>>> To: dev@arrow.apache.org <dev@arrow.apache.org>
> >>>>> Subject: Re: Unsupported/Other Type
> >>>>>
> >>>>> The OP used UUID as an example. Would that be enough or the request
> is for
> >>>>> a flexible mechanism that allows the creation of one-off nominal
> types for
> >>>>> very specific use-cases?
> >>>>>
> >>>>> —
> >>>>> Felipe
> >>>>>
> >>>>> On Thu, 11 Apr 2024 at 05:06 Antoine Pitrou <anto...@python.org>
> wrote:
> >>>>>
> >>>>>>
> >>>>>> Yes, JSON and UUID are obvious candidates for new canonical
> extension
> >>>>>> types. XML also comes to mind, but I'm not sure there's much of a
> use
> >>>>>> case for it.
> >>>>>>
> >>>>>> Regards
> >>>>>>
> >>>>>> Antoine.
> >>>>>>
> >>>>>>
> >>>>>> Le 10/04/2024 à 22:55, Wes McKinney a écrit :
> >>>>>>> In the past we have discussed adding a canonical type for UUID and
> JSON.
> >>>>>> I
> >>>>>>> still think this is a good idea and could improve ergonomics in
> >>>>>> downstream
> >>>>>>> language bindings (e.g. by exposing JSON querying function or
> >>>>>> automatically
> >>>>>>> boxing UUIDs in built-in UUID types, like the Python uuid
> library). Has
> >>>>>>> anyone done any work on this to anyone's knowledge?
> >>>>>>>
> >>>>>>> On Wed, Apr 10, 2024 at 3:05 PM Micah Kornfield <
> emkornfi...@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi Norman,
> >>>>>>>> Arrow has a concept of extension types [1] along with the
> possibility of
> >>>>>>>> proposing new canonical extension types [2].  This seems to cover
> the
> >>>>>>>> use-cases you mention but I might be misunderstanding?
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Micah
> >>>>>>>>
> >>>>>>>> [1]
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types
> >>>>>>>> [2] https://arrow.apache.org/docs/format/CanonicalExtensions.html
> >>>>>>>>
> >>>>>>>> On Wed, Apr 10, 2024 at 11:44 AM Norman Jordan
> >>>>>>>> <norman.jor...@improving.com.invalid> wrote:
> >>>>>>>>
> >>>>>>>>> Problem Description
> >>>>>>>>>
> >>>>>>>>> Currently Arrow schemas can only contain columns of types
> supported by
> >>>>>>>>> Arrow. In some cases an Arrow schema maps to an external schema.
> This
> >>>>>> can
> >>>>>>>>> result in the Arrow schema not being able to support all the
> columns
> >>>>>> from
> >>>>>>>>> the external schema.
> >>>>>>>>>
> >>>>>>>>> Consider an external system that contains a column of type UUID.
> To
> >>>>>> model
> >>>>>>>>> the schema in Arrow, the user has two choices:
> >>>>>>>>>
> >>>>>>>>>     1.  Do not include the UUID column in the Arrow schema
> >>>>>>>>>
> >>>>>>>>>     2.  Map the column to an existing Arrow type. This will not
> include
> >>>>>> the
> >>>>>>>>> original type information. A UUID can be mapped to a
> FixedSizeBinary,
> >>>>>> but
> >>>>>>>>> consumers of the Arrow schema will be unable to distinguish a
> >>>>>>>>> FixedSizeBinary field from a UUID field.
> >>>>>>>>>
> >>>>>>>>> Possible Solution
> >>>>>>>>>
> >>>>>>>>>     *   Add a new type code that represents unsupported types
> >>>>>>>>>
> >>>>>>>>>     *   Values for the new type are represented as variable
> length
> >>>>>> binary
> >>>>>>>>>
> >>>>>>>>> Some drivers can expose data even when they don’t understand the
> data
> >>>>>>>>> type. For example, the PostgreSQL driver will return the raw
> bytes for
> >>>>>>>>> fields of an unknown type. Using an explicit type lets clients
> know
> >>>>>> that
> >>>>>>>>> they should convert values if they were able to determine the
> actual
> >>>>>> data
> >>>>>>>>> type.
> >>>>>>>>>
> >>>>>>>>> Questions
> >>>>>>>>>
> >>>>>>>>>     *   What is the impact on existing clients when they
> encounter
> >>>>>> fields
> >>>>>>>> of
> >>>>>>>>> the unsupported type?
> >>>>>>>>>
> >>>>>>>>>     *   Is it safe to assume that all unsupported values can
> safely be
> >>>>>>>>> converted to a variable length binary?
> >>>>>>>>>
> >>>>>>>>>     *   How can we preserve information about the original type?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>> Warning: The sender of this message could not be validated and may
> not be the actual sender.
>

Reply via email to