So, this thread somewhat petered out. There are still a number of unresolved issues, but to make progress I wonder if it would first be helpful to have a vote on ensuring we are ANSI SQL 92 compliant for our arithmetic? This seems like a sensible baseline, since we will hopefully minimise surprise to operators this way.
If people largely agree, I will call a vote, and we can pick up a couple of more focused discussions afterwards on how we interpret the leeway it gives. > On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws> wrote: > > Hi, > > From reading the spec. Precision is always implementation defined. The spec > specifies scale in several cases, but never precision for any type or > operation (addition/subtraction, multiplication, division). > > So we don't implement anything remotely approaching precision and scale in > CQL when it comes to numbers I think? So we aren't going to follow the spec > for scale. We are already pretty far down that road so I would leave it > alone. > > I don't think the spec is asking for the most approximate type. It's just > saying the result is approximate, and the precision is implementation > defined. We could return either float or double. I think if one of the > operands is a double we should return a double because clearly the schema > thought a double was required to represent that number. I would also be in > favor of returning a double all the time so that people can expect a > consistent type from expressions involving approximate numbers. > > I am a big fan of widening for arithmetic expressions in a database to avoid > having to error on overflow. You can go to the trouble of only widening the > minimum amount, but I think it's simpler if we always widen to bigint and > double. This would be something the spec allows. > > Definitely if we can make overflow not occur we should and the spec allows > that. We should also not return different types for the same operand types > just to work around overflow if we detect we need more precision. > > Ariel > On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote: >> If it’s in the SQL spec, I’m fairly convinced. Thanks for digging this >> out (and Mike for getting some empirical examples). >> >> We still have to decide on the approximate data type to return; right >> now, we have float+bigint=double, but float+int=float. I think this is >> fairly inconsistent, and either the approximate type should always win, >> or we should always upgrade to double for mixed operands. >> >> The quoted spec also suggests that decimal+float=float, and decimal >> +double=double, whereas we currently have decimal+float=decimal, and >> decimal+double=decimal >> >> If we’re going to go with an approximate operand implying an approximate >> result, I think we should do it consistently (and consistent with the >> SQL92 spec), and have the type of the approximate operand always be the >> return type. >> >> This would still leave a decision for float+double, though. The most >> consistent behaviour with that stated above would be to always take the >> most approximate type to return (i.e. float), but this would seem to me >> to be fairly unexpected for the user. >> >> >>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws> wrote: >>> >>> Hi, >>> >>> I agree with what's been said about expectations regarding expressions >>> involving floating point numbers. I think that if one of the inputs is >>> approximate then the result should be approximate. >>> >>> One thing we could look at for inspiration is the SQL spec. Not to follow >>> dogmatically necessarily. >>> >>> From the SQL 92 spec regarding assignment >>> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6: >>> " >>> Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT, >>> FLOAT, REAL, and DOUBLE PRECISION are numbers and are all mutually >>> comparable and mutually assignable. If an assignment would result >>> in a loss of the most significant digits, an exception condition >>> is raised. If least significant digits are lost, implementation- >>> defined rounding or truncating occurs with no exception condition >>> being raised. The rules for arithmetic are generally governed by >>> Subclause 6.12, "<numeric value expression>". >>> " >>> >>> Section 6.12 numeric value expressions: >>> " >>> 1) If the data type of both operands of a dyadic arithmetic opera- >>> tor is exact numeric, then the data type of the result is exact >>> numeric, with precision and scale determined as follows: >>> ... >>> 2) If the data type of either operand of a dyadic arithmetic op- >>> erator is approximate numeric, then the data type of the re- >>> sult is approximate numeric. The precision of the result is >>> implementation-defined. >>> " >>> >>> And this makes sense to me. I think we should only return an exact result >>> if both of the inputs are exact. >>> >>> I think we might want to look closely at the SQL spec and especially when >>> the spec requires an error to be generated. Those are sometimes in the spec >>> to prevent subtle paths to wrong answers. Any time we deviate from the spec >>> we should be asking why is it in the spec and why are we deviating. >>> >>> Another issue besides overflow handling is how we determine precision and >>> scale for expressions involving two exact types. >>> >>> Ariel >>> >>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote: >>>> Hi, >>>> >>>> I'm not sure if I would prefer the Postgres way of doing things, which is >>>> returning just about any type depending on the order of operators. >>>> Considering it actually mentions in the docs that using numeric/decimal is >>>> slow and also multiple times that floating points are inexact. So doing >>>> some math with Postgres (9.6.5): >>>> >>>> SELECT 2147483647::bigint*1.0::double precision returns double >>>> precision 2147483647 >>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0 >>>> SELECT 2147483647::bigint*1.0::real returns double >>>> SELECT 2147483647::double precision*1::bigint returns double 2147483647 >>>> SELECT 2147483647::double precision*1.0::bigint returns double 2147483647 >>>> >>>> With + - we can get the same amount of mixture of returned types. There's >>>> no difference in those calculations, just some casting. To me >>>> floating-point math indicates inexactness and has errors and whoever mixes >>>> up two different types should understand that. If one didn't want exact >>>> numeric type, why would the server return such? The floating point value >>>> itself could be wrong already before the calculation - trying to say we do >>>> it lossless is just wrong. >>>> >>>> Fun with 2.65: >>>> >>>> SELECT 2.65::real * 1::int returns double 2.65000009536743 >>>> SELECT 2.65::double precision * 1::int returns double 2.65 >>>> >>>> SELECT round(2.65) returns numeric 4 >>>> SELECT round(2.65::double precision) returns double 4 >>>> >>>> SELECT 2.65 * 1 returns double 2.65 >>>> SELECT 2.65 * 1::bigint returns numeric 2.65 >>>> SELECT 2.65 * 1.0 returns numeric 2.650 >>>> SELECT 2.65 * 1.0::double precision returns double 2.65 >>>> >>>> SELECT round(2.65) * 1 returns numeric 3 >>>> SELECT round(2.65) * round(1) returns double 3 >>>> >>>> So as we're going to have silly values in any case, why pretend something >>>> else? Also, exact calculations are slow if we crunch large amount of >>>> numbers. I guess I slightly deviated towards Postgres' implemention in this >>>> case, but I wish it wasn't used as a benchmark in this case. And most >>>> importantly, I would definitely want the exact same type returned each time >>>> I do a calculation. >>>> >>>> - Micke >>>> >>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith >>>> <bened...@apache.org> >>>> wrote: >>>> >>>>> As far as I can tell we reached a relatively strong consensus that we >>>>> should implement lossless casts by default? Does anyone have anything >>>>> more >>>>> to add? >>>>> >>>>> Looking at the emails, everyone who participated and expressed a >>>>> preference was in favour of the “Postgres approach” of upcasting to >>>>> decimal >>>>> for mixed float/int operands? >>>>> >>>>> I’d like to get a clear-cut decision on this, so we know what we’re doing >>>>> for 4.0. Then hopefully we can move on to a collective decision on >>>>> Ariel’s >>>>> concerns about overflow, which I think are also pressing - particularly >>>>> for >>>>> tinyint and smallint. This does also impact implicit casts for mixed >>>>> integer type operations, but an approach for these will probably fall out >>>>> of any decision on overflow. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <murukesh.moha...@gmail.com> >>>>> wrote: >>>>>> >>>>>> I think you're conflating two things here. There's the loss resulting >>>>> from >>>>>> using some operators, and loss involved in casting. Dividing an integer >>>>> by >>>>>> another integer to obtain an integer result can result in loss, but >>>>> there's >>>>>> no implicit casting there and no loss due to casting. Casting an integer >>>>>> to a float can also result in loss. So dividing an integer by a float, >>>>> for >>>>>> example, with an implicit cast has an additional avenue for loss: the >>>>>> implicit cast for the operands so that they're of the same type. I >>>>> believe >>>>>> this discussion so far has been about the latter, not the loss from the >>>>>> operations themselves. >>>>>> >>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <benjamin.le...@datastax.com> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I would like to try to clarify things a bit to help people to understand >>>>>>> the true complexity of the problem. >>>>>>> >>>>>>> The *float *and *double *types are inexact numeric types. Not only at >>>>> the >>>>>>> operation level. >>>>>>> >>>>>>> If you insert 676543.21 in a *float* column and then read it, you will >>>>>>> realize that the value has been truncated to 676543.2. >>>>>>> >>>>>>> If you want accuracy the only way is to avoid those inexact types. >>>>>>> Using *decimals >>>>>>> *during operations will mitigate the problem but will not remove it. >>>>>>> >>>>>>> >>>>>>> I do not recall PostgreSQL behaving has described. If I am not mistaken >>>>> in >>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to what MS SQL >>>>>>> server and Oracle do. So all thoses databases will lose precision if you >>>>>>> are not carefull. >>>>>>> >>>>>>> If you truly need precision you can have it by using exact numeric types >>>>>>> for your data types. Of course it has a cost on performance, memory and >>>>>>> disk usage. >>>>>>> >>>>>>> The advantage of the current approach is that it give you the choice. >>>>> It is >>>>>>> up to you to decide what you need for your application. It is also in >>>>> line >>>>>>> with the way CQL behave everywhere else. >>>>>>> >>>>>> -- >>>>>> >>>>>> Muru >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>> >>>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: dev-h...@cassandra.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org