Hello Arrow devs,

I actually have a use case where we'd like to support a new number type in
Arrow, but instead of larger numbers, smaller ones. :) For machine learning
use cases, we at Lance would like to support bfloat16 [1]. These are 16-bit
floating point numbers that trade significant digits to exponent, so they
have the same range as float 32 but less precision than float 16. They are
natively supported on newer AI-focused silicon [1]

I'm just starting to look at this, so not yet sure what the pros and cons
are of implementing it as an extension type versus a native Arrow type. My
initial ideas:

Pros of an extension type:
* It can be moved through Arrow-native systems that don't implement it, as
long as they preserve extension type information.

Pros of a native type:
* We have established patterns for writing compute kernels for natively
supported types.

If we were to implement these as extension types, I think bfloat16 and the
number types Ian Joiner mentions would be best implemented as extension
types based on fixed-size binary. We have a native float16 type already,
but I think making bfloat16 an extension type based on that it could get
accidentally manipulated as a float16, which IIUC would be invalid.

If anyone has any advice from our work thus far on extension types, I'd
welcome your input.

Best,

Will Jones

[1]
https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus
[2] https://en.wikipedia.org/wiki/Bfloat16_floating-point_format

On Tue, May 23, 2023 at 10:49 AM Antoine Pitrou <anto...@python.org> wrote:

>
> Your question seems unspecific, but we now have the possibility of
> standardizing canonical extension types (which are, of course, optional
> to implement and support):
>
> https://arrow.apache.org/docs/format/CanonicalExtensions.html
>
>
> Le 23/05/2023 à 19:45, Ian Joiner a écrit :
> > That’s a possibility. Do we consider officially support them?
> >
> >
> > On Tuesday, May 23, 2023, Antoine Pitrou <anto...@python.org> wrote:
> >
> >>
> >> I'm not sure what you're actually proposing here. A new extension type
> >> perhaps?
> >>
> >>
> >> Le 23/05/2023 à 19:13, Ian Joiner a écrit :
> >>
> >>> Hi,
> >>>
> >>> We need to have really large integers (with 128, 256 and 512 bits) as
> well
> >>> as decimals (up to at least decimal1024) because they do actually
> exist in
> >>> crypto / web3 space.
> >>>
> >>> See https://docs.rs/primitive-types/latest/primitive_types/ for an
> >>> example
> >>> of what needs to be supported.
> >>>
> >>> If accepted we can implement the types for C++/Python and Rust.
> >>>
> >>> Thanks,
> >>> Ian
> >>>
> >>>
> >
>

Reply via email to