Hi everyone, I would like to work on this JIRA ticket:
https://issues.apache.org/jira/browse/ARROW-9404
([C++] Add support for Decimal16, Decimal32 and Decimal64)

This will be my first experience with contributing to Arrow, so I want to ask 
advice what approach should I use.
As far as I know, currently, Arrow supports only Decimal128 and its basic and 
primary implementations located at `cpp/src/arrow/util/basic_decimal.h` and 
`.../util/decimal.h`. In current implementation 128-bit Decimal represented by 
two 64-bit integers. So there are several approaches that can be applied:

1. From current BasicDecimal128 class make a template class 
BasicDecimal<bit_width>, whose `low` and `high` variable types will be 
dependent on `bit_width` template parameter, implementation of methods also 
will be rewritten to depend on `bit_width`. As the result, we will have classes 
that work with `bit_width / 2`-bit integers.
The disadvantage of this approach is that even when we can fit our decimal in a 
single int value, we will still be splitting it into two variables and have to 
do that unnecessary extra logic of handling these two variables. But all 
Decimals will be the instances of one template class and will be consistent 
with each other.

2. Implement new template class BasicDecimal<bit_width> and Decimal<bit_width> 
which will work only for bit_width <= 64 (where we can represent our decimal 
with a single `int##bit_width##_t` variable), and also reimplement all methods 
of Decimals in this new class.
But that approach makes ambiguous what Decimal is, because technically 
Decimal64 and Decimal128 will be completely different classes which can create 
some inconsistency between them.

3. If we have some variable that indicates a maximum bit integer, then we can 
try to apply the following approach. Define template BasicDecimal<bit_width> 
class whose value will be represented not by ints variables, but by array of 
ints:
```
// Pseudo-code, may be incorrect
template<int width>
class BasicDecimal{
    using int_type = IntBitWidthTypes<max(width, MAX_INT_WIDTH)>::type;
    int_type values[max(width / MAX_INT_WIDTH, 1)];
// all of these can be computed at the compile time
....
};

using BasicDecimal128 = BasicDecimal<128>;
using BasicDecimal64 = BasicDecimal<64>;
....
```
In the result, Decimal128 will have an uint64_t array of 2 elements, Decimal64 
will have an uint64_t array of 1 element, Decimal32 - uint32_t array of 1 
element, and so on...
This also allows us to define decimals of arbitrary bitness. For example, 
Decimal256 will be represented as an array of uint64_t with 4 elements.
The bad side of this approach is its complexity - we need to rewrite the whole 
BasicDecimal and Decimal class.

Which one of these approaches will be the correct one?

P.S. I've just noticed that I'm not being able to assign JIRA tasks for myself, 
how can I do this?


--------------------------------------------------------------------
Joint Stock Company Intel A/O
Registered legal address: Krylatsky Hills Business Park,
17 Krylatskaya Str., Bldg 4, Moscow 121614,
Russian Federation

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Reply via email to