Actually these options still make some sense - but not as much as before.

The use case: unit conversion

Data about prices exported from sql in Decimal(38,10) which uses 128
bit but the numbers are actually prices which expressed in cents fit
perfectly in uint32

Having scaling would reduce bandwidth/disk usage by factor of 4.

What would be the best approach to such use case?

Would decimal_scale CastOption be OK or should it rather be compute
'multiply' kernel ?

BR,

Jacek


śr., 12 lut 2020 o 19:32 Jacek Pliszka <jacek.plis...@gmail.com> napisał(a):
>
> OK, then what I proposed does not make sense and I can just copy the
> solution you pointed out.
>
> Thank you,
>
> Jacek
>
> śr., 12 lut 2020 o 19:27 Wes McKinney <wesmck...@gmail.com> napisał(a):
> >
> > On Wed, Feb 12, 2020 at 12:09 PM Jacek Pliszka <jacek.plis...@gmail.com> 
> > wrote:
> > >
> > > Hi!
> > >
> > > ARROW-3329 - we can discuss there.
> > >
> > > > It seems like it makes sense to implement both lossless safe casts
> > > > (when all zeros after the decimal point) and lossy casts (fractional
> > > > part discarded) from decimal to integer, do I have that right?
> > >
> > > Yes, though if I understood your examples are the same case - in both
> > > cases fractional part is discarded - just it is all 0s in the first
> > > case.
> > >
> > > The key question is whether CastFunctor in cast.cc has access to scale
> > > of the decimal? If yes how?
> >
> > Yes, it's in the type of the input array. Here's a kernel
> > implementation that uses the TimestampType metadata of the input
> >
> > https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/cast.cc#L521
> >
> > >
> > > If not - these are the options I've came up with:
> > >
> > > Let's assume Decimal128Type value is  n
> > >
> > > Then I expect that base call
> > > .cast('int64') will return  overflow for n beyond int64 values, value 
> > > otherwise
> > >
> > > Option 1:
> > >
> > > .cast('int64', decimal_scale=s)  would calculate  n/10**s and return
> > > overflow if it is beyond int64, value otherwise
> > >
> > > Option 2:
> > >
> > > .cast('int64', bytes_group=0) would return n & 0x00000000FFFFFFFF
> > > .cast('int64', bytes_group=1) would return (n >> 64) & 0x00000000FFFFFFFF
> > > .cast('int64') would have default value bytes_group=0
> > >
> > > Option 3:
> > >
> > > cast has no CastOptions but we add  multiply compute kernel and have
> > > something like this instead:
> > >
> > > .compute('multiply', 10**-s).cast('int64')
> > >
> > > BR,
> > >
> > > Jacek

Reply via email to