Re: SPIP: support decimals with negative scale in decimal operation
DISCUSS thread is good to have... From: Marco Gaido Sent: Friday, September 21, 2018 3:31 AM To: Wenchen Fan Cc: dev Subject: Re: SPIP: support decimals with negative scale in decimal operation Hi Wenchen, Thank you for the clarification. I agree that this is more a bug fix rather than an improvement. I apologize for the error. Please consider this as a design doc. Thanks, Marco Il giorno ven 21 set 2018 alle ore 12:04 Wenchen Fan mailto:cloud0...@gmail.com>> ha scritto: Hi Marco, Thanks for sending it! The problem is clearly explained in this email, but I would not treat it as a SPIP. It proposes a fix for a very tricky bug, and SPIP is usually for new features. Others please correct me if I was wrong. Thanks, Wenchen On Fri, Sep 21, 2018 at 5:47 PM Marco Gaido mailto:marcogaid...@gmail.com>> wrote: Hi all, I am writing this e-mail in order to discuss the issue which is reported in SPARK-25454 and according to Wenchen's suggestion I prepared a design doc for it. The problem we are facing here is that our rules for decimals operations are taken from Hive and MS SQL server and they explicitly don't support decimals with negative scales. So the rules we have currently are not meant to deal with negative scales. The issue is that Spark, instead, doesn't forbid negative scales and - indeed - there are cases in which we are producing them (eg. a SQL constant like 1e8 would be turned to a decimal(1, -8)). Having negative scales most likely wasn't really intended. But unfortunately getting rid of them would be a breaking change as many operations working fine currently would not be allowed anymore and would overflow (eg. select 1e36 * 1). As such, this is something I'd definitely agree on doing, but I think we can target only for 3.0. What we can start doing now, instead, is updating our rules in order to handle properly also the case when decimal scales are negative. From my investigation, it turns out that the only operations which has problems with them is Divide. Here you can find the design doc with all the details: https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit?usp=sharing. The document is also linked in SPARK-25454. There is also already a PR with the change: https://github.com/apache/spark/pull/22450. Looking forward to hear your feedback, Thanks. Marco
Re: SPIP: support decimals with negative scale in decimal operation
Hi Wenchen, Thank you for the clarification. I agree that this is more a bug fix rather than an improvement. I apologize for the error. Please consider this as a design doc. Thanks, Marco Il giorno ven 21 set 2018 alle ore 12:04 Wenchen Fan ha scritto: > Hi Marco, > > Thanks for sending it! The problem is clearly explained in this email, but > I would not treat it as a SPIP. It proposes a fix for a very tricky bug, > and SPIP is usually for new features. Others please correct me if I was > wrong. > > Thanks, > Wenchen > > On Fri, Sep 21, 2018 at 5:47 PM Marco Gaido > wrote: > >> Hi all, >> >> I am writing this e-mail in order to discuss the issue which is reported >> in SPARK-25454 and according to Wenchen's suggestion I prepared a design >> doc for it. >> >> The problem we are facing here is that our rules for decimals operations >> are taken from Hive and MS SQL server and they explicitly don't support >> decimals with negative scales. So the rules we have currently are not meant >> to deal with negative scales. The issue is that Spark, instead, doesn't >> forbid negative scales and - indeed - there are cases in which we are >> producing them (eg. a SQL constant like 1e8 would be turned to a decimal(1, >> -8)). >> >> Having negative scales most likely wasn't really intended. But >> unfortunately getting rid of them would be a breaking change as many >> operations working fine currently would not be allowed anymore and would >> overflow (eg. select 1e36 * 1). As such, this is something I'd >> definitely agree on doing, but I think we can target only for 3.0. >> >> What we can start doing now, instead, is updating our rules in order to >> handle properly also the case when decimal scales are negative. From my >> investigation, it turns out that the only operations which has problems >> with them is Divide. >> >> Here you can find the design doc with all the details: >> https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit?usp=sharing. >> The document is also linked in SPARK-25454. There is also already a PR with >> the change: https://github.com/apache/spark/pull/22450. >> >> Looking forward to hear your feedback, >> Thanks. >> Marco >> >
Re: SPIP: support decimals with negative scale in decimal operation
Hi Marco, Thanks for sending it! The problem is clearly explained in this email, but I would not treat it as a SPIP. It proposes a fix for a very tricky bug, and SPIP is usually for new features. Others please correct me if I was wrong. Thanks, Wenchen On Fri, Sep 21, 2018 at 5:47 PM Marco Gaido wrote: > Hi all, > > I am writing this e-mail in order to discuss the issue which is reported > in SPARK-25454 and according to Wenchen's suggestion I prepared a design > doc for it. > > The problem we are facing here is that our rules for decimals operations > are taken from Hive and MS SQL server and they explicitly don't support > decimals with negative scales. So the rules we have currently are not meant > to deal with negative scales. The issue is that Spark, instead, doesn't > forbid negative scales and - indeed - there are cases in which we are > producing them (eg. a SQL constant like 1e8 would be turned to a decimal(1, > -8)). > > Having negative scales most likely wasn't really intended. But > unfortunately getting rid of them would be a breaking change as many > operations working fine currently would not be allowed anymore and would > overflow (eg. select 1e36 * 1). As such, this is something I'd > definitely agree on doing, but I think we can target only for 3.0. > > What we can start doing now, instead, is updating our rules in order to > handle properly also the case when decimal scales are negative. From my > investigation, it turns out that the only operations which has problems > with them is Divide. > > Here you can find the design doc with all the details: > https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit?usp=sharing. > The document is also linked in SPARK-25454. There is also already a PR with > the change: https://github.com/apache/spark/pull/22450. > > Looking forward to hear your feedback, > Thanks. > Marco >
SPIP: support decimals with negative scale in decimal operation
Hi all, I am writing this e-mail in order to discuss the issue which is reported in SPARK-25454 and according to Wenchen's suggestion I prepared a design doc for it. The problem we are facing here is that our rules for decimals operations are taken from Hive and MS SQL server and they explicitly don't support decimals with negative scales. So the rules we have currently are not meant to deal with negative scales. The issue is that Spark, instead, doesn't forbid negative scales and - indeed - there are cases in which we are producing them (eg. a SQL constant like 1e8 would be turned to a decimal(1, -8)). Having negative scales most likely wasn't really intended. But unfortunately getting rid of them would be a breaking change as many operations working fine currently would not be allowed anymore and would overflow (eg. select 1e36 * 1). As such, this is something I'd definitely agree on doing, but I think we can target only for 3.0. What we can start doing now, instead, is updating our rules in order to handle properly also the case when decimal scales are negative. From my investigation, it turns out that the only operations which has problems with them is Divide. Here you can find the design doc with all the details: https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit?usp=sharing. The document is also linked in SPARK-25454. There is also already a PR with the change: https://github.com/apache/spark/pull/22450. Looking forward to hear your feedback, Thanks. Marco