A further advantage of third-party extension types is that they give you a
way to experiment without as much concern for compatibility.

I think writing an extension type if possible, and promoting it to an
official type (extension or otherwise) only if necessary, is a good general
approach.

On Tue, May 23, 2023 at 2:48 PM Will Jones <will.jones...@gmail.com> wrote:

> Hello Arrow devs,
>
> I actually have a use case where we'd like to support a new number type in
> Arrow, but instead of larger numbers, smaller ones. :) For machine learning
> use cases, we at Lance would like to support bfloat16 [1]. These are 16-bit
> floating point numbers that trade significant digits to exponent, so they
> have the same range as float 32 but less precision than float 16. They are
> natively supported on newer AI-focused silicon [1]
>
> I'm just starting to look at this, so not yet sure what the pros and cons
> are of implementing it as an extension type versus a native Arrow type. My
> initial ideas:
>
> Pros of an extension type:
> * It can be moved through Arrow-native systems that don't implement it, as
> long as they preserve extension type information.
>
> Pros of a native type:
> * We have established patterns for writing compute kernels for natively
> supported types.
>
> If we were to implement these as extension types, I think bfloat16 and the
> number types Ian Joiner mentions would be best implemented as extension
> types based on fixed-size binary. We have a native float16 type already,
> but I think making bfloat16 an extension type based on that it could get
> accidentally manipulated as a float16, which IIUC would be invalid.
>
> If anyone has any advice from our work thus far on extension types, I'd
> welcome your input.
>
> Best,
>
> Will Jones
>
> [1]
>
> https://urldefense.com/v3/__https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus__;!!K-Hz7m0Vt54!lyl3ZVe7uNEaUQrW2uJ8yJyzVJzONy9SZu0zkJLWN0WfDdu9V2ZpEN6ElavNaRrJUn8SjSMJ80Wp_UPoUq44vQ$
> [2]
> https://urldefense.com/v3/__https://en.wikipedia.org/wiki/Bfloat16_floating-point_format__;!!K-Hz7m0Vt54!lyl3ZVe7uNEaUQrW2uJ8yJyzVJzONy9SZu0zkJLWN0WfDdu9V2ZpEN6ElavNaRrJUn8SjSMJ80Wp_UMhJnNRZQ$
>
> On Tue, May 23, 2023 at 10:49 AM Antoine Pitrou <anto...@python.org>
> wrote:
>
> >
> > Your question seems unspecific, but we now have the possibility of
> > standardizing canonical extension types (which are, of course, optional
> > to implement and support):
> >
> >
> https://urldefense.com/v3/__https://arrow.apache.org/docs/format/CanonicalExtensions.html__;!!K-Hz7m0Vt54!lyl3ZVe7uNEaUQrW2uJ8yJyzVJzONy9SZu0zkJLWN0WfDdu9V2ZpEN6ElavNaRrJUn8SjSMJ80Wp_UPRLGl1Gg$
> >
> >
> > Le 23/05/2023 à 19:45, Ian Joiner a écrit :
> > > That’s a possibility. Do we consider officially support them?
> > >
> > >
> > > On Tuesday, May 23, 2023, Antoine Pitrou <anto...@python.org> wrote:
> > >
> > >>
> > >> I'm not sure what you're actually proposing here. A new extension type
> > >> perhaps?
> > >>
> > >>
> > >> Le 23/05/2023 à 19:13, Ian Joiner a écrit :
> > >>
> > >>> Hi,
> > >>>
> > >>> We need to have really large integers (with 128, 256 and 512 bits) as
> > well
> > >>> as decimals (up to at least decimal1024) because they do actually
> > exist in
> > >>> crypto / web3 space.
> > >>>
> > >>> See
> https://urldefense.com/v3/__https://docs.rs/primitive-types/latest/primitive_types/__;!!K-Hz7m0Vt54!lyl3ZVe7uNEaUQrW2uJ8yJyzVJzONy9SZu0zkJLWN0WfDdu9V2ZpEN6ElavNaRrJUn8SjSMJ80Wp_UN9rRd91w$
> for an
> > >>> example
> > >>> of what needs to be supported.
> > >>>
> > >>> If accepted we can implement the types for C++/Python and Rust.
> > >>>
> > >>> Thanks,
> > >>> Ian
> > >>>
> > >>>
> > >
> >
>

Reply via email to