Re: SPIP: support decimals with negative scale in decimal operation

2018-09-23 Thread Felix Cheung
DISCUSS thread is good to have...



From: Marco Gaido 
Sent: Friday, September 21, 2018 3:31 AM
To: Wenchen Fan
Cc: dev
Subject: Re: SPIP: support decimals with negative scale in decimal operation

Hi Wenchen,
Thank you for the clarification. I agree that this is more a bug fix rather 
than an improvement. I apologize for the error. Please consider this as a 
design doc.

Thanks,
Marco

Il giorno ven 21 set 2018 alle ore 12:04 Wenchen Fan 
mailto:cloud0...@gmail.com>> ha scritto:
Hi Marco,

Thanks for sending it! The problem is clearly explained in this email, but I 
would not treat it as a SPIP. It proposes a fix for a very tricky bug, and SPIP 
is usually for new features. Others please correct me if I was wrong.

Thanks,
Wenchen

On Fri, Sep 21, 2018 at 5:47 PM Marco Gaido 
mailto:marcogaid...@gmail.com>> wrote:
Hi all,

I am writing this e-mail in order to discuss the issue which is reported in 
SPARK-25454 and according to Wenchen's suggestion I prepared a design doc for 
it.

The problem we are facing here is that our rules for decimals operations are 
taken from Hive and MS SQL server and they explicitly don't support decimals 
with negative scales. So the rules we have currently are not meant to deal with 
negative scales. The issue is that Spark, instead, doesn't forbid negative 
scales and - indeed - there are cases in which we are producing them (eg. a SQL 
constant like 1e8 would be turned to a decimal(1, -8)).

Having negative scales most likely wasn't really intended. But unfortunately 
getting rid of them would be a breaking change as many operations working fine 
currently would not be allowed anymore and would overflow (eg. select 1e36 * 
1). As such, this is something I'd definitely agree on doing, but I think 
we can target only for 3.0.

What we can start doing now, instead, is updating our rules in order to handle 
properly also the case when decimal scales are negative. From my investigation, 
it turns out that the only operations which has problems with them is Divide.

Here you can find the design doc with all the details: 
https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit?usp=sharing.
 The document is also linked in SPARK-25454. There is also already a PR with 
the change: https://github.com/apache/spark/pull/22450.

Looking forward to hear your feedback,
Thanks.
Marco


Re: SPIP: support decimals with negative scale in decimal operation

2018-09-21 Thread Marco Gaido
Hi Wenchen,
Thank you for the clarification. I agree that this is more a bug fix rather
than an improvement. I apologize for the error. Please consider this as a
design doc.

Thanks,
Marco

Il giorno ven 21 set 2018 alle ore 12:04 Wenchen Fan 
ha scritto:

> Hi Marco,
>
> Thanks for sending it! The problem is clearly explained in this email, but
> I would not treat it as a SPIP. It proposes a fix for a very tricky bug,
> and SPIP is usually for new features. Others please correct me if I was
> wrong.
>
> Thanks,
> Wenchen
>
> On Fri, Sep 21, 2018 at 5:47 PM Marco Gaido 
> wrote:
>
>> Hi all,
>>
>> I am writing this e-mail in order to discuss the issue which is reported
>> in SPARK-25454 and according to Wenchen's suggestion I prepared a design
>> doc for it.
>>
>> The problem we are facing here is that our rules for decimals operations
>> are taken from Hive and MS SQL server and they explicitly don't support
>> decimals with negative scales. So the rules we have currently are not meant
>> to deal with negative scales. The issue is that Spark, instead, doesn't
>> forbid negative scales and - indeed - there are cases in which we are
>> producing them (eg. a SQL constant like 1e8 would be turned to a decimal(1,
>> -8)).
>>
>> Having negative scales most likely wasn't really intended. But
>> unfortunately getting rid of them would be a breaking change as many
>> operations working fine currently would not be allowed anymore and would
>> overflow (eg. select 1e36 * 1). As such, this is something I'd
>> definitely agree on doing, but I think we can target only for 3.0.
>>
>> What we can start doing now, instead, is updating our rules in order to
>> handle properly also the case when decimal scales are negative. From my
>> investigation, it turns out that the only operations which has problems
>> with them is Divide.
>>
>> Here you can find the design doc with all the details:
>> https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit?usp=sharing.
>> The document is also linked in SPARK-25454. There is also already a PR with
>> the change: https://github.com/apache/spark/pull/22450.
>>
>> Looking forward to hear your feedback,
>> Thanks.
>> Marco
>>
>


Re: SPIP: support decimals with negative scale in decimal operation

2018-09-21 Thread Wenchen Fan
Hi Marco,

Thanks for sending it! The problem is clearly explained in this email, but
I would not treat it as a SPIP. It proposes a fix for a very tricky bug,
and SPIP is usually for new features. Others please correct me if I was
wrong.

Thanks,
Wenchen

On Fri, Sep 21, 2018 at 5:47 PM Marco Gaido  wrote:

> Hi all,
>
> I am writing this e-mail in order to discuss the issue which is reported
> in SPARK-25454 and according to Wenchen's suggestion I prepared a design
> doc for it.
>
> The problem we are facing here is that our rules for decimals operations
> are taken from Hive and MS SQL server and they explicitly don't support
> decimals with negative scales. So the rules we have currently are not meant
> to deal with negative scales. The issue is that Spark, instead, doesn't
> forbid negative scales and - indeed - there are cases in which we are
> producing them (eg. a SQL constant like 1e8 would be turned to a decimal(1,
> -8)).
>
> Having negative scales most likely wasn't really intended. But
> unfortunately getting rid of them would be a breaking change as many
> operations working fine currently would not be allowed anymore and would
> overflow (eg. select 1e36 * 1). As such, this is something I'd
> definitely agree on doing, but I think we can target only for 3.0.
>
> What we can start doing now, instead, is updating our rules in order to
> handle properly also the case when decimal scales are negative. From my
> investigation, it turns out that the only operations which has problems
> with them is Divide.
>
> Here you can find the design doc with all the details:
> https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit?usp=sharing.
> The document is also linked in SPARK-25454. There is also already a PR with
> the change: https://github.com/apache/spark/pull/22450.
>
> Looking forward to hear your feedback,
> Thanks.
> Marco
>


SPIP: support decimals with negative scale in decimal operation

2018-09-21 Thread Marco Gaido
Hi all,

I am writing this e-mail in order to discuss the issue which is reported in
SPARK-25454 and according to Wenchen's suggestion I prepared a design doc
for it.

The problem we are facing here is that our rules for decimals operations
are taken from Hive and MS SQL server and they explicitly don't support
decimals with negative scales. So the rules we have currently are not meant
to deal with negative scales. The issue is that Spark, instead, doesn't
forbid negative scales and - indeed - there are cases in which we are
producing them (eg. a SQL constant like 1e8 would be turned to a decimal(1,
-8)).

Having negative scales most likely wasn't really intended. But
unfortunately getting rid of them would be a breaking change as many
operations working fine currently would not be allowed anymore and would
overflow (eg. select 1e36 * 1). As such, this is something I'd
definitely agree on doing, but I think we can target only for 3.0.

What we can start doing now, instead, is updating our rules in order to
handle properly also the case when decimal scales are negative. From my
investigation, it turns out that the only operations which has problems
with them is Divide.

Here you can find the design doc with all the details:
https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit?usp=sharing.
The document is also linked in SPARK-25454. There is also already a PR with
the change: https://github.com/apache/spark/pull/22450.

Looking forward to hear your feedback,
Thanks.
Marco