Re: [ARROW-3329] Re: Decimal casting or scaling

2020-02-12 Thread Wes McKinney
On Wed, Feb 12, 2020 at 2:37 PM Jacek Pliszka  wrote:
>
> Actually these options still make some sense - but not as much as before.
>
> The use case: unit conversion
>
> Data about prices exported from sql in Decimal(38,10) which uses 128
> bit but the numbers are actually prices which expressed in cents fit
> perfectly in uint32
>
> Having scaling would reduce bandwidth/disk usage by factor of 4.

You'd need to implement a separate function for this since you're
changing the semantics of the cast. I don't think it makes sense to
convert from 123.45 (decimal) to 12345 (uint32) in Cast

> What would be the best approach to such use case?
>
> Would decimal_scale CastOption be OK or should it rather be compute
> 'multiply' kernel ?
>
> BR,
>
> Jacek
>
>
> śr., 12 lut 2020 o 19:32 Jacek Pliszka  napisał(a):
> >
> > OK, then what I proposed does not make sense and I can just copy the
> > solution you pointed out.
> >
> > Thank you,
> >
> > Jacek
> >
> > śr., 12 lut 2020 o 19:27 Wes McKinney  napisał(a):
> > >
> > > On Wed, Feb 12, 2020 at 12:09 PM Jacek Pliszka  
> > > wrote:
> > > >
> > > > Hi!
> > > >
> > > > ARROW-3329 - we can discuss there.
> > > >
> > > > > It seems like it makes sense to implement both lossless safe casts
> > > > > (when all zeros after the decimal point) and lossy casts (fractional
> > > > > part discarded) from decimal to integer, do I have that right?
> > > >
> > > > Yes, though if I understood your examples are the same case - in both
> > > > cases fractional part is discarded - just it is all 0s in the first
> > > > case.
> > > >
> > > > The key question is whether CastFunctor in cast.cc has access to scale
> > > > of the decimal? If yes how?
> > >
> > > Yes, it's in the type of the input array. Here's a kernel
> > > implementation that uses the TimestampType metadata of the input
> > >
> > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/cast.cc#L521
> > >
> > > >
> > > > If not - these are the options I've came up with:
> > > >
> > > > Let's assume Decimal128Type value is  n
> > > >
> > > > Then I expect that base call
> > > > .cast('int64') will return  overflow for n beyond int64 values, value 
> > > > otherwise
> > > >
> > > > Option 1:
> > > >
> > > > .cast('int64', decimal_scale=s)  would calculate  n/10**s and return
> > > > overflow if it is beyond int64, value otherwise
> > > >
> > > > Option 2:
> > > >
> > > > .cast('int64', bytes_group=0) would return n & 0x
> > > > .cast('int64', bytes_group=1) would return (n >> 64) & 
> > > > 0x
> > > > .cast('int64') would have default value bytes_group=0
> > > >
> > > > Option 3:
> > > >
> > > > cast has no CastOptions but we add  multiply compute kernel and have
> > > > something like this instead:
> > > >
> > > > .compute('multiply', 10**-s).cast('int64')
> > > >
> > > > BR,
> > > >
> > > > Jacek


Re: [ARROW-3329] Re: Decimal casting or scaling

2020-02-12 Thread Jacek Pliszka
Actually these options still make some sense - but not as much as before.

The use case: unit conversion

Data about prices exported from sql in Decimal(38,10) which uses 128
bit but the numbers are actually prices which expressed in cents fit
perfectly in uint32

Having scaling would reduce bandwidth/disk usage by factor of 4.

What would be the best approach to such use case?

Would decimal_scale CastOption be OK or should it rather be compute
'multiply' kernel ?

BR,

Jacek


śr., 12 lut 2020 o 19:32 Jacek Pliszka  napisał(a):
>
> OK, then what I proposed does not make sense and I can just copy the
> solution you pointed out.
>
> Thank you,
>
> Jacek
>
> śr., 12 lut 2020 o 19:27 Wes McKinney  napisał(a):
> >
> > On Wed, Feb 12, 2020 at 12:09 PM Jacek Pliszka  
> > wrote:
> > >
> > > Hi!
> > >
> > > ARROW-3329 - we can discuss there.
> > >
> > > > It seems like it makes sense to implement both lossless safe casts
> > > > (when all zeros after the decimal point) and lossy casts (fractional
> > > > part discarded) from decimal to integer, do I have that right?
> > >
> > > Yes, though if I understood your examples are the same case - in both
> > > cases fractional part is discarded - just it is all 0s in the first
> > > case.
> > >
> > > The key question is whether CastFunctor in cast.cc has access to scale
> > > of the decimal? If yes how?
> >
> > Yes, it's in the type of the input array. Here's a kernel
> > implementation that uses the TimestampType metadata of the input
> >
> > https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/cast.cc#L521
> >
> > >
> > > If not - these are the options I've came up with:
> > >
> > > Let's assume Decimal128Type value is  n
> > >
> > > Then I expect that base call
> > > .cast('int64') will return  overflow for n beyond int64 values, value 
> > > otherwise
> > >
> > > Option 1:
> > >
> > > .cast('int64', decimal_scale=s)  would calculate  n/10**s and return
> > > overflow if it is beyond int64, value otherwise
> > >
> > > Option 2:
> > >
> > > .cast('int64', bytes_group=0) would return n & 0x
> > > .cast('int64', bytes_group=1) would return (n >> 64) & 0x
> > > .cast('int64') would have default value bytes_group=0
> > >
> > > Option 3:
> > >
> > > cast has no CastOptions but we add  multiply compute kernel and have
> > > something like this instead:
> > >
> > > .compute('multiply', 10**-s).cast('int64')
> > >
> > > BR,
> > >
> > > Jacek


Re: [ARROW-3329] Re: Decimal casting or scaling

2020-02-12 Thread Jacek Pliszka
OK, then what I proposed does not make sense and I can just copy the
solution you pointed out.

Thank you,

Jacek

śr., 12 lut 2020 o 19:27 Wes McKinney  napisał(a):
>
> On Wed, Feb 12, 2020 at 12:09 PM Jacek Pliszka  
> wrote:
> >
> > Hi!
> >
> > ARROW-3329 - we can discuss there.
> >
> > > It seems like it makes sense to implement both lossless safe casts
> > > (when all zeros after the decimal point) and lossy casts (fractional
> > > part discarded) from decimal to integer, do I have that right?
> >
> > Yes, though if I understood your examples are the same case - in both
> > cases fractional part is discarded - just it is all 0s in the first
> > case.
> >
> > The key question is whether CastFunctor in cast.cc has access to scale
> > of the decimal? If yes how?
>
> Yes, it's in the type of the input array. Here's a kernel
> implementation that uses the TimestampType metadata of the input
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/cast.cc#L521
>
> >
> > If not - these are the options I've came up with:
> >
> > Let's assume Decimal128Type value is  n
> >
> > Then I expect that base call
> > .cast('int64') will return  overflow for n beyond int64 values, value 
> > otherwise
> >
> > Option 1:
> >
> > .cast('int64', decimal_scale=s)  would calculate  n/10**s and return
> > overflow if it is beyond int64, value otherwise
> >
> > Option 2:
> >
> > .cast('int64', bytes_group=0) would return n & 0x
> > .cast('int64', bytes_group=1) would return (n >> 64) & 0x
> > .cast('int64') would have default value bytes_group=0
> >
> > Option 3:
> >
> > cast has no CastOptions but we add  multiply compute kernel and have
> > something like this instead:
> >
> > .compute('multiply', 10**-s).cast('int64')
> >
> > BR,
> >
> > Jacek


Re: [ARROW-3329] Re: Decimal casting or scaling

2020-02-12 Thread Wes McKinney
On Wed, Feb 12, 2020 at 12:09 PM Jacek Pliszka  wrote:
>
> Hi!
>
> ARROW-3329 - we can discuss there.
>
> > It seems like it makes sense to implement both lossless safe casts
> > (when all zeros after the decimal point) and lossy casts (fractional
> > part discarded) from decimal to integer, do I have that right?
>
> Yes, though if I understood your examples are the same case - in both
> cases fractional part is discarded - just it is all 0s in the first
> case.
>
> The key question is whether CastFunctor in cast.cc has access to scale
> of the decimal? If yes how?

Yes, it's in the type of the input array. Here's a kernel
implementation that uses the TimestampType metadata of the input

https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/cast.cc#L521

>
> If not - these are the options I've came up with:
>
> Let's assume Decimal128Type value is  n
>
> Then I expect that base call
> .cast('int64') will return  overflow for n beyond int64 values, value 
> otherwise
>
> Option 1:
>
> .cast('int64', decimal_scale=s)  would calculate  n/10**s and return
> overflow if it is beyond int64, value otherwise
>
> Option 2:
>
> .cast('int64', bytes_group=0) would return n & 0x
> .cast('int64', bytes_group=1) would return (n >> 64) & 0x
> .cast('int64') would have default value bytes_group=0
>
> Option 3:
>
> cast has no CastOptions but we add  multiply compute kernel and have
> something like this instead:
>
> .compute('multiply', 10**-s).cast('int64')
>
> BR,
>
> Jacek


[ARROW-3329] Re: Decimal casting or scaling

2020-02-12 Thread Jacek Pliszka
Hi!

ARROW-3329 - we can discuss there.

> It seems like it makes sense to implement both lossless safe casts
> (when all zeros after the decimal point) and lossy casts (fractional
> part discarded) from decimal to integer, do I have that right?

Yes, though if I understood your examples are the same case - in both
cases fractional part is discarded - just it is all 0s in the first
case.

The key question is whether CastFunctor in cast.cc has access to scale
of the decimal? If yes how?

If not - these are the options I've came up with:

Let's assume Decimal128Type value is  n

Then I expect that base call
.cast('int64') will return  overflow for n beyond int64 values, value otherwise

Option 1:

.cast('int64', decimal_scale=s)  would calculate  n/10**s and return
overflow if it is beyond int64, value otherwise

Option 2:

.cast('int64', bytes_group=0) would return n & 0x
.cast('int64', bytes_group=1) would return (n >> 64) & 0x
.cast('int64') would have default value bytes_group=0

Option 3:

cast has no CastOptions but we add  multiply compute kernel and have
something like this instead:

.compute('multiply', 10**-s).cast('int64')

BR,

Jacek