> may want an Other type to signal that it would fail if asked to provide particular columns.
I interpret "would fail" to mean we are still speaking in some kind of "planning stage" and not yet actually creating arrays. So I don't know that this needs to be a data type. In other words, I see this as `std::optional<DataType>` and not a unique instance of `DataType`. However, if you did need to actually create an array, and you wanted some way of saying "there is no data here because I failed to interpret the type" then maybe you could create an extension type based on the null type? On Wed, Apr 17, 2024 at 2:57 AM David Li <lidav...@apache.org> wrote: > For an unsupported/other extension type. > > On Wed, Apr 17, 2024, at 18:32, Antoine Pitrou wrote: > > What is "this proposal"? > > > > > > Le 17/04/2024 à 10:38, David Li a écrit : > >> Should I take it that this proposal is dead in the water? While we > could define our own Unknown/Other type for say the ADBC PostgreSQL driver > it might be useful to have a singular type for consumers to latch on to. > >> > >> On Fri, Apr 12, 2024, at 07:32, David Li wrote: > >>> I think an "Other" extension type is slightly different than an > >>> arbitrary extension type, though: the latter may be understood > >>> downstream but the former represents a point at which a component > >>> explicitly declares it does not know how to handle a field. In this > >>> example, the PostgreSQL ADBC driver might be able to provide a > >>> representation regardless, but a different driver (or say, the JDBC > >>> adapter, which cannot necessarily get a bytestring for an arbitrary > >>> JDBC type) may want an Other type to signal that it would fail if asked > >>> to provide particular columns. > >>> > >>> On Fri, Apr 12, 2024, at 02:30, Dewey Dunnington wrote: > >>>> Depending where your Arrow-encoded data is used, either extension > >>>> types or generic field metadata are options. We have this problem in > >>>> the ADBC Postgres driver, where we can convert *most* Postgres types > >>>> to an Arrow type but there are some others where we can't or don't > >>>> know or don't implement a conversion. Currently for these we return > >>>> opaque binary (the Postgres COPY representation of the value) but put > >>>> field metadata so that a consumer can implement a workaround for an > >>>> unsupported type. It would be arguably better to have implemented this > >>>> as an extension type; however, field metadata felt like less of a > >>>> commitment when I first worked on this. > >>>> > >>>> Cheers, > >>>> > >>>> -dewey > >>>> > >>>> On Thu, Apr 11, 2024 at 1:20 PM Norman Jordan > >>>> <norman.jor...@improving.com.invalid> wrote: > >>>>> > >>>>> I was using UUID as an example. It looks like extension types covers > my original request. > >>>>> ________________________________ > >>>>> From: Felipe Oliveira Carvalho <felipe...@gmail.com> > >>>>> Sent: Thursday, April 11, 2024 7:15 AM > >>>>> To: dev@arrow.apache.org <dev@arrow.apache.org> > >>>>> Subject: Re: Unsupported/Other Type > >>>>> > >>>>> The OP used UUID as an example. Would that be enough or the request > is for > >>>>> a flexible mechanism that allows the creation of one-off nominal > types for > >>>>> very specific use-cases? > >>>>> > >>>>> — > >>>>> Felipe > >>>>> > >>>>> On Thu, 11 Apr 2024 at 05:06 Antoine Pitrou <anto...@python.org> > wrote: > >>>>> > >>>>>> > >>>>>> Yes, JSON and UUID are obvious candidates for new canonical > extension > >>>>>> types. XML also comes to mind, but I'm not sure there's much of a > use > >>>>>> case for it. > >>>>>> > >>>>>> Regards > >>>>>> > >>>>>> Antoine. > >>>>>> > >>>>>> > >>>>>> Le 10/04/2024 à 22:55, Wes McKinney a écrit : > >>>>>>> In the past we have discussed adding a canonical type for UUID and > JSON. > >>>>>> I > >>>>>>> still think this is a good idea and could improve ergonomics in > >>>>>> downstream > >>>>>>> language bindings (e.g. by exposing JSON querying function or > >>>>>> automatically > >>>>>>> boxing UUIDs in built-in UUID types, like the Python uuid > library). Has > >>>>>>> anyone done any work on this to anyone's knowledge? > >>>>>>> > >>>>>>> On Wed, Apr 10, 2024 at 3:05 PM Micah Kornfield < > emkornfi...@gmail.com> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Hi Norman, > >>>>>>>> Arrow has a concept of extension types [1] along with the > possibility of > >>>>>>>> proposing new canonical extension types [2]. This seems to cover > the > >>>>>>>> use-cases you mention but I might be misunderstanding? > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Micah > >>>>>>>> > >>>>>>>> [1] > >>>>>>>> > >>>>>>>> > >>>>>> > https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types > >>>>>>>> [2] https://arrow.apache.org/docs/format/CanonicalExtensions.html > >>>>>>>> > >>>>>>>> On Wed, Apr 10, 2024 at 11:44 AM Norman Jordan > >>>>>>>> <norman.jor...@improving.com.invalid> wrote: > >>>>>>>> > >>>>>>>>> Problem Description > >>>>>>>>> > >>>>>>>>> Currently Arrow schemas can only contain columns of types > supported by > >>>>>>>>> Arrow. In some cases an Arrow schema maps to an external schema. > This > >>>>>> can > >>>>>>>>> result in the Arrow schema not being able to support all the > columns > >>>>>> from > >>>>>>>>> the external schema. > >>>>>>>>> > >>>>>>>>> Consider an external system that contains a column of type UUID. > To > >>>>>> model > >>>>>>>>> the schema in Arrow, the user has two choices: > >>>>>>>>> > >>>>>>>>> 1. Do not include the UUID column in the Arrow schema > >>>>>>>>> > >>>>>>>>> 2. Map the column to an existing Arrow type. This will not > include > >>>>>> the > >>>>>>>>> original type information. A UUID can be mapped to a > FixedSizeBinary, > >>>>>> but > >>>>>>>>> consumers of the Arrow schema will be unable to distinguish a > >>>>>>>>> FixedSizeBinary field from a UUID field. > >>>>>>>>> > >>>>>>>>> Possible Solution > >>>>>>>>> > >>>>>>>>> * Add a new type code that represents unsupported types > >>>>>>>>> > >>>>>>>>> * Values for the new type are represented as variable > length > >>>>>> binary > >>>>>>>>> > >>>>>>>>> Some drivers can expose data even when they don’t understand the > data > >>>>>>>>> type. For example, the PostgreSQL driver will return the raw > bytes for > >>>>>>>>> fields of an unknown type. Using an explicit type lets clients > know > >>>>>> that > >>>>>>>>> they should convert values if they were able to determine the > actual > >>>>>> data > >>>>>>>>> type. > >>>>>>>>> > >>>>>>>>> Questions > >>>>>>>>> > >>>>>>>>> * What is the impact on existing clients when they > encounter > >>>>>> fields > >>>>>>>> of > >>>>>>>>> the unsupported type? > >>>>>>>>> > >>>>>>>>> * Is it safe to assume that all unsupported values can > safely be > >>>>>>>>> converted to a variable length binary? > >>>>>>>>> > >>>>>>>>> * How can we preserve information about the original type? > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> Warning: The sender of this message could not be validated and may > not be the actual sender. >