Hello Arrow devs, I actually have a use case where we'd like to support a new number type in Arrow, but instead of larger numbers, smaller ones. :) For machine learning use cases, we at Lance would like to support bfloat16 [1]. These are 16-bit floating point numbers that trade significant digits to exponent, so they have the same range as float 32 but less precision than float 16. They are natively supported on newer AI-focused silicon [1]
I'm just starting to look at this, so not yet sure what the pros and cons are of implementing it as an extension type versus a native Arrow type. My initial ideas: Pros of an extension type: * It can be moved through Arrow-native systems that don't implement it, as long as they preserve extension type information. Pros of a native type: * We have established patterns for writing compute kernels for natively supported types. If we were to implement these as extension types, I think bfloat16 and the number types Ian Joiner mentions would be best implemented as extension types based on fixed-size binary. We have a native float16 type already, but I think making bfloat16 an extension type based on that it could get accidentally manipulated as a float16, which IIUC would be invalid. If anyone has any advice from our work thus far on extension types, I'd welcome your input. Best, Will Jones [1] https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus [2] https://en.wikipedia.org/wiki/Bfloat16_floating-point_format On Tue, May 23, 2023 at 10:49 AM Antoine Pitrou <anto...@python.org> wrote: > > Your question seems unspecific, but we now have the possibility of > standardizing canonical extension types (which are, of course, optional > to implement and support): > > https://arrow.apache.org/docs/format/CanonicalExtensions.html > > > Le 23/05/2023 à 19:45, Ian Joiner a écrit : > > That’s a possibility. Do we consider officially support them? > > > > > > On Tuesday, May 23, 2023, Antoine Pitrou <anto...@python.org> wrote: > > > >> > >> I'm not sure what you're actually proposing here. A new extension type > >> perhaps? > >> > >> > >> Le 23/05/2023 à 19:13, Ian Joiner a écrit : > >> > >>> Hi, > >>> > >>> We need to have really large integers (with 128, 256 and 512 bits) as > well > >>> as decimals (up to at least decimal1024) because they do actually > exist in > >>> crypto / web3 space. > >>> > >>> See https://docs.rs/primitive-types/latest/primitive_types/ for an > >>> example > >>> of what needs to be supported. > >>> > >>> If accepted we can implement the types for C++/Python and Rust. > >>> > >>> Thanks, > >>> Ian > >>> > >>> > > >