DISCUSS thread is good to have...

________________________________
From: Marco Gaido <marcogaid...@gmail.com>
Sent: Friday, September 21, 2018 3:31 AM
To: Wenchen Fan
Cc: dev
Subject: Re: SPIP: support decimals with negative scale in decimal operation

Hi Wenchen,
Thank you for the clarification. I agree that this is more a bug fix rather 
than an improvement. I apologize for the error. Please consider this as a 
design doc.

Thanks,
Marco

Il giorno ven 21 set 2018 alle ore 12:04 Wenchen Fan 
<cloud0...@gmail.com<mailto:cloud0...@gmail.com>> ha scritto:
Hi Marco,

Thanks for sending it! The problem is clearly explained in this email, but I 
would not treat it as a SPIP. It proposes a fix for a very tricky bug, and SPIP 
is usually for new features. Others please correct me if I was wrong.

Thanks,
Wenchen

On Fri, Sep 21, 2018 at 5:47 PM Marco Gaido 
<marcogaid...@gmail.com<mailto:marcogaid...@gmail.com>> wrote:
Hi all,

I am writing this e-mail in order to discuss the issue which is reported in 
SPARK-25454 and according to Wenchen's suggestion I prepared a design doc for 
it.

The problem we are facing here is that our rules for decimals operations are 
taken from Hive and MS SQL server and they explicitly don't support decimals 
with negative scales. So the rules we have currently are not meant to deal with 
negative scales. The issue is that Spark, instead, doesn't forbid negative 
scales and - indeed - there are cases in which we are producing them (eg. a SQL 
constant like 1e8 would be turned to a decimal(1, -8)).

Having negative scales most likely wasn't really intended. But unfortunately 
getting rid of them would be a breaking change as many operations working fine 
currently would not be allowed anymore and would overflow (eg. select 1e36 * 
10000). As such, this is something I'd definitely agree on doing, but I think 
we can target only for 3.0.

What we can start doing now, instead, is updating our rules in order to handle 
properly also the case when decimal scales are negative. From my investigation, 
it turns out that the only operations which has problems with them is Divide.

Here you can find the design doc with all the details: 
https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit?usp=sharing.
 The document is also linked in SPARK-25454. There is also already a PR with 
the change: https://github.com/apache/spark/pull/22450.

Looking forward to hear your feedback,
Thanks.
Marco

Reply via email to