Re: Implicit Casts for Arithmetic Operators

Benedict Elliott Smith Fri, 16 Nov 2018 05:10:56 -0800

So, this thread somewhat petered out.

There are still a number of unresolved issues, but to make progress I wonder if 
it would first be helpful to have a vote on ensuring we are ANSI SQL 92 
compliant for our arithmetic?  This seems like a sensible baseline, since we 
will hopefully minimise surprise to operators this way.


If people largely agree, I will call a vote, and we can pick up a couple of 
more focused discussions afterwards on how we interpret the leeway it gives.


> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws> wrote:
> 
> Hi,
> 
> From reading the spec. Precision is always implementation defined. The spec 
> specifies scale in several cases, but never precision for any type or 
> operation (addition/subtraction, multiplication, division).
> 
> So we don't implement anything remotely approaching precision and scale in 
> CQL when it comes to numbers I think? So we aren't going to follow the spec 
> for scale. We are already pretty far down that road so I would leave it 
> alone. 
> 
> I don't think the spec is asking for the most approximate type. It's just 
> saying the result is approximate, and the precision is implementation 
> defined. We could return either float or double. I think if one of the 
> operands is a double we should return a double because clearly the schema 
> thought a double was required to represent that number. I would also be in 
> favor of returning a double all the time so that people can expect a 
> consistent type from expressions involving approximate numbers.
> 
> I am a big fan of widening for arithmetic expressions in a database to avoid 
> having to error on overflow. You can go to the trouble of only widening the 
> minimum amount, but I think it's simpler if we always widen to bigint and 
> double. This would be something the spec allows.
> 
> Definitely if we can make overflow not occur we should and the spec allows 
> that. We should also not return different types for the same operand types 
> just to work around overflow if we detect we need more precision.
> 
> Ariel
> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for digging this 
>> out (and Mike for getting some empirical examples).
>> 
>> We still have to decide on the approximate data type to return; right 
>> now, we have float+bigint=double, but float+int=float.  I think this is 
>> fairly inconsistent, and either the approximate type should always win, 
>> or we should always upgrade to double for mixed operands.
>> 
>> The quoted spec also suggests that decimal+float=float, and decimal
>> +double=double, whereas we currently have decimal+float=decimal, and 
>> decimal+double=decimal
>> 
>> If we’re going to go with an approximate operand implying an approximate 
>> result, I think we should do it consistently (and consistent with the 
>> SQL92 spec), and have the type of the approximate operand always be the 
>> return type.
>> 
>> This would still leave a decision for float+double, though.  The most 
>> consistent behaviour with that stated above would be to always take the 
>> most approximate type to return (i.e. float), but this would seem to me 
>> to be fairly unexpected for the user.
>> 
>> 
>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws> wrote:
>>> 
>>> Hi,
>>> 
>>> I agree with what's been said about expectations regarding expressions 
>>> involving floating point numbers. I think that if one of the inputs is 
>>> approximate then the result should be approximate.
>>> 
>>> One thing we could look at for inspiration is the SQL spec. Not to follow 
>>> dogmatically necessarily.
>>> 
>>> From the SQL 92 spec regarding assignment 
>>> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6:
>>> "
>>>        Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
>>>        FLOAT, REAL, and DOUBLE PRECISION are numbers and are all mutually
>>>        comparable and mutually assignable. If an assignment would result
>>>        in a loss of the most significant digits, an exception condition
>>>        is raised. If least significant digits are lost, implementation-
>>>        defined rounding or truncating occurs with no exception condition
>>>        being raised. The rules for arithmetic are generally governed by
>>>        Subclause 6.12, "<numeric value expression>".
>>> "
>>> 
>>> Section 6.12 numeric value expressions:
>>> "
>>>        1) If the data type of both operands of a dyadic arithmetic opera-
>>>           tor is exact numeric, then the data type of the result is exact
>>>           numeric, with precision and scale determined as follows:
>>> ...
>>>        2) If the data type of either operand of a dyadic arithmetic op-
>>>           erator is approximate numeric, then the data type of the re-
>>>           sult is approximate numeric. The precision of the result is
>>>           implementation-defined.
>>> "
>>> 
>>> And this makes sense to me. I think we should only return an exact result 
>>> if both of the inputs are exact.
>>> 
>>> I think we might want to look closely at the SQL spec and especially when 
>>> the spec requires an error to be generated. Those are sometimes in the spec 
>>> to prevent subtle paths to wrong answers. Any time we deviate from the spec 
>>> we should be asking why is it in the spec and why are we deviating.
>>> 
>>> Another issue besides overflow handling is how we determine precision and 
>>> scale for expressions involving two exact types.
>>> 
>>> Ariel
>>> 
>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
>>>> Hi,
>>>> 
>>>> I'm not sure if I would prefer the Postgres way of doing things, which is
>>>> returning just about any type depending on the order of operators.
>>>> Considering it actually mentions in the docs that using numeric/decimal is
>>>> slow and also multiple times that floating points are inexact. So doing
>>>> some math with Postgres (9.6.5):
>>>> 
>>>> SELECT 2147483647::bigint*1.0::double precision returns double
>>>> precision 2147483647
>>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
>>>> SELECT 2147483647::bigint*1.0::real returns double
>>>> SELECT 2147483647::double precision*1::bigint returns double 2147483647
>>>> SELECT 2147483647::double precision*1.0::bigint returns double 2147483647
>>>> 
>>>> With + - we can get the same amount of mixture of returned types. There's
>>>> no difference in those calculations, just some casting. To me
>>>> floating-point math indicates inexactness and has errors and whoever mixes
>>>> up two different types should understand that. If one didn't want exact
>>>> numeric type, why would the server return such? The floating point value
>>>> itself could be wrong already before the calculation - trying to say we do
>>>> it lossless is just wrong.
>>>> 
>>>> Fun with 2.65:
>>>> 
>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
>>>> SELECT 2.65::double precision * 1::int returns double 2.65
>>>> 
>>>> SELECT round(2.65) returns numeric 4
>>>> SELECT round(2.65::double precision) returns double 4
>>>> 
>>>> SELECT 2.65 * 1 returns double 2.65
>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
>>>> SELECT 2.65 * 1.0 returns numeric 2.650
>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
>>>> 
>>>> SELECT round(2.65) * 1 returns numeric 3
>>>> SELECT round(2.65) * round(1) returns double 3
>>>> 
>>>> So as we're going to have silly values in any case, why pretend something
>>>> else? Also, exact calculations are slow if we crunch large amount of
>>>> numbers. I guess I slightly deviated towards Postgres' implemention in this
>>>> case, but I wish it wasn't used as a benchmark in this case. And most
>>>> importantly, I would definitely want the exact same type returned each time
>>>> I do a calculation.
>>>> 
>>>> - Micke
>>>> 
>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith 
>>>> <bened...@apache.org>
>>>> wrote:
>>>> 
>>>>> As far as I can tell we reached a relatively strong consensus that we
>>>>> should implement lossless casts by default?  Does anyone have anything 
>>>>> more
>>>>> to add?
>>>>> 
>>>>> Looking at the emails, everyone who participated and expressed a
>>>>> preference was in favour of the “Postgres approach” of upcasting to 
>>>>> decimal
>>>>> for mixed float/int operands?
>>>>> 
>>>>> I’d like to get a clear-cut decision on this, so we know what we’re doing
>>>>> for 4.0.  Then hopefully we can move on to a collective decision on 
>>>>> Ariel’s
>>>>> concerns about overflow, which I think are also pressing - particularly 
>>>>> for
>>>>> tinyint and smallint.  This does also impact implicit casts for mixed
>>>>> integer type operations, but an approach for these will probably fall out
>>>>> of any decision on overflow.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <murukesh.moha...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> I think you're conflating two things here. There's the loss resulting
>>>>> from
>>>>>> using some operators, and loss involved in casting. Dividing an integer
>>>>> by
>>>>>> another integer to obtain an integer result can result in loss, but
>>>>> there's
>>>>>> no implicit casting there and no loss due to casting.  Casting an integer
>>>>>> to a float can also result in loss. So dividing an integer by a float,
>>>>> for
>>>>>> example, with an implicit cast has an additional avenue for loss: the
>>>>>> implicit cast for the operands so that they're of the same type. I
>>>>> believe
>>>>>> this discussion so far has been about the latter, not the loss from the
>>>>>> operations themselves.
>>>>>> 
>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <benjamin.le...@datastax.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I would like to try to clarify things a bit to help people to understand
>>>>>>> the true complexity of the problem.
>>>>>>> 
>>>>>>> The *float *and *double *types are inexact numeric types. Not only at
>>>>> the
>>>>>>> operation level.
>>>>>>> 
>>>>>>> If you insert 676543.21 in a *float* column and then read it, you will
>>>>>>> realize that the value has been truncated to 676543.2.
>>>>>>> 
>>>>>>> If you want accuracy the only way is to avoid those inexact types.
>>>>>>> Using *decimals
>>>>>>> *during operations will mitigate the problem but will not remove it.
>>>>>>> 
>>>>>>> 
>>>>>>> I do not recall PostgreSQL behaving has described. If I am not mistaken
>>>>> in
>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to what MS SQL
>>>>>>> server and Oracle do. So all thoses databases will lose precision if you
>>>>>>> are not carefull.
>>>>>>> 
>>>>>>> If you truly need precision you can have it by using exact numeric types
>>>>>>> for your data types. Of course it has a cost on performance, memory and
>>>>>>> disk usage.
>>>>>>> 
>>>>>>> The advantage of the current approach is that it give you the choice.
>>>>> It is
>>>>>>> up to you to decide what you need for your application. It is also in
>>>>> line
>>>>>>> with the way CQL behave everywhere else.
>>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> Muru
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>>> 
>>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Implicit Casts for Arithmetic Operators

Reply via email to