A further advantage of third-party extension types is that they give you a way to experiment without as much concern for compatibility.
I think writing an extension type if possible, and promoting it to an official type (extension or otherwise) only if necessary, is a good general approach. On Tue, May 23, 2023 at 2:48 PM Will Jones <will.jones...@gmail.com> wrote: > Hello Arrow devs, > > I actually have a use case where we'd like to support a new number type in > Arrow, but instead of larger numbers, smaller ones. :) For machine learning > use cases, we at Lance would like to support bfloat16 [1]. These are 16-bit > floating point numbers that trade significant digits to exponent, so they > have the same range as float 32 but less precision than float 16. They are > natively supported on newer AI-focused silicon [1] > > I'm just starting to look at this, so not yet sure what the pros and cons > are of implementing it as an extension type versus a native Arrow type. My > initial ideas: > > Pros of an extension type: > * It can be moved through Arrow-native systems that don't implement it, as > long as they preserve extension type information. > > Pros of a native type: > * We have established patterns for writing compute kernels for natively > supported types. > > If we were to implement these as extension types, I think bfloat16 and the > number types Ian Joiner mentions would be best implemented as extension > types based on fixed-size binary. We have a native float16 type already, > but I think making bfloat16 an extension type based on that it could get > accidentally manipulated as a float16, which IIUC would be invalid. > > If anyone has any advice from our work thus far on extension types, I'd > welcome your input. > > Best, > > Will Jones > > [1] > > https://urldefense.com/v3/__https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus__;!!K-Hz7m0Vt54!lyl3ZVe7uNEaUQrW2uJ8yJyzVJzONy9SZu0zkJLWN0WfDdu9V2ZpEN6ElavNaRrJUn8SjSMJ80Wp_UPoUq44vQ$ > [2] > https://urldefense.com/v3/__https://en.wikipedia.org/wiki/Bfloat16_floating-point_format__;!!K-Hz7m0Vt54!lyl3ZVe7uNEaUQrW2uJ8yJyzVJzONy9SZu0zkJLWN0WfDdu9V2ZpEN6ElavNaRrJUn8SjSMJ80Wp_UMhJnNRZQ$ > > On Tue, May 23, 2023 at 10:49 AM Antoine Pitrou <anto...@python.org> > wrote: > > > > > Your question seems unspecific, but we now have the possibility of > > standardizing canonical extension types (which are, of course, optional > > to implement and support): > > > > > https://urldefense.com/v3/__https://arrow.apache.org/docs/format/CanonicalExtensions.html__;!!K-Hz7m0Vt54!lyl3ZVe7uNEaUQrW2uJ8yJyzVJzONy9SZu0zkJLWN0WfDdu9V2ZpEN6ElavNaRrJUn8SjSMJ80Wp_UPRLGl1Gg$ > > > > > > Le 23/05/2023 à 19:45, Ian Joiner a écrit : > > > That’s a possibility. Do we consider officially support them? > > > > > > > > > On Tuesday, May 23, 2023, Antoine Pitrou <anto...@python.org> wrote: > > > > > >> > > >> I'm not sure what you're actually proposing here. A new extension type > > >> perhaps? > > >> > > >> > > >> Le 23/05/2023 à 19:13, Ian Joiner a écrit : > > >> > > >>> Hi, > > >>> > > >>> We need to have really large integers (with 128, 256 and 512 bits) as > > well > > >>> as decimals (up to at least decimal1024) because they do actually > > exist in > > >>> crypto / web3 space. > > >>> > > >>> See > https://urldefense.com/v3/__https://docs.rs/primitive-types/latest/primitive_types/__;!!K-Hz7m0Vt54!lyl3ZVe7uNEaUQrW2uJ8yJyzVJzONy9SZu0zkJLWN0WfDdu9V2ZpEN6ElavNaRrJUn8SjSMJ80Wp_UN9rRd91w$ > for an > > >>> example > > >>> of what needs to be supported. > > >>> > > >>> If accepted we can implement the types for C++/Python and Rust. > > >>> > > >>> Thanks, > > >>> Ian > > >>> > > >>> > > > > > >