I can’t agree more. We should be able to make changes in a manner that improves the DB In the long term, rather than live with the technical debt of arbitrary decisions made by a handful of people.
I also agree that putting a knob in place to let people migrate over is a reasonable decision. Jon On Wed, Nov 21, 2018 at 4:54 PM Benedict Elliott Smith <bened...@apache.org> wrote: > The goal is simply to agree on a set of well-defined principles for how we > should behave. If we don’t like the implications that arise, we’ll have > another vote? A democracy cannot bind itself, so I never understood this > fear of a decision. > > A database also has a thousand toggles. If we absolutely need to, we can > introduce one more. > > We should be doing this upfront a great deal more often. Doing it > retrospectively sucks, but in my opinion it's a bad reason to bind > ourselves to whatever made it in. > > Do we anywhere define the principles of our current behaviour? I couldn’t > find it. > > > > On 21 Nov 2018, at 21:08, Sylvain Lebresne <lebre...@gmail.com> wrote: > > > > On Tue, Nov 20, 2018 at 5:02 PM Benedict Elliott Smith < > bened...@apache.org> > > wrote: > > > >> FWIW, my meaning of arithmetic in this context extends to any features > we > >> have already released (such as aggregates, and perhaps other built-in > >> functions) that operate on the same domain. We should be consistent, > after > >> all. > >> > >> Whether or not we need to revisit any existing functionality we can > figure > >> out after the fact, once we have agreed what our behaviour should be. > >> > > > > I'm not sure I correctly understand the process suggested, but I don't > > particularly like/agree with what I understand. What I understand is a > > suggestion for voting on agreeing to be ANSI SQL 92 compliant, with no > real > > evaluation of what that entails (at least I haven't seen one), and that > > this vote, if passed, would imply we'd then make any backward > incompatible > > change necessary to achieve compliance ("my meaning of arithmetic in this > > context extends to any features we have already released" and "Whether or > > not we need to revisit any existing functionality we can figure out after > > the fact, once we have agreed what our behaviour should be"). > > > > This might make sense of a new product, but at our stage that seems > > backward to me. I think we owe our users to first make the effort of > > identifying what "inconsistencies" our existing arithmetic has[1] and > > _then_ consider what options we have to fix those, with their pros and > cons > > (including how bad they break backward compatibility). And if _then_ > > getting ANSI SQL 92 compliant proves to not be disruptive (or at least > > acceptably so), then sure, that's great. > > > > [1]: one possibly efficient way to do that could actually be to compare > our > > arithmetic to ANSI SQL 92. Not that all differences found would imply > > inconsistencies/wrongness of our arithmetic, but still, it should be > > helpful. And I guess my whole point is that we should that analysis > first, > > and then maybe decide that being ANSI SQL 92 is a reasonable option, not > > decide first and live with the consequences no matter what they are. > > > > -- > > Sylvain > > > > > >> I will make this more explicit for the vote, but just to clarify the > >> intention so that we are all discussing the same thing. > >> > >> > >>> On 20 Nov 2018, at 14:18, Ariel Weisberg <adwei...@fastmail.fm> wrote: > >>> > >>> Hi, > >>> > >>> +1 > >>> > >>> This is a public API so we will be much better off if we get it right > >> the first time. > >>> > >>> Ariel > >>> > >>>> On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <j...@jonhaddad.com> > >> wrote: > >>>> > >>>> Sounds good to me. > >>>> > >>>> On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith < > >> bened...@apache.org> > >>>> wrote: > >>>> > >>>>> So, this thread somewhat petered out. > >>>>> > >>>>> There are still a number of unresolved issues, but to make progress I > >>>>> wonder if it would first be helpful to have a vote on ensuring we are > >> ANSI > >>>>> SQL 92 compliant for our arithmetic? This seems like a sensible > >> baseline, > >>>>> since we will hopefully minimise surprise to operators this way. > >>>>> > >>>>> If people largely agree, I will call a vote, and we can pick up a > >> couple > >>>>> of more focused discussions afterwards on how we interpret the leeway > >> it > >>>>> gives. > >>>>> > >>>>> > >>>>>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws> wrote: > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> From reading the spec. Precision is always implementation defined. > The > >>>>> spec specifies scale in several cases, but never precision for any > >> type or > >>>>> operation (addition/subtraction, multiplication, division). > >>>>>> > >>>>>> So we don't implement anything remotely approaching precision and > >> scale > >>>>> in CQL when it comes to numbers I think? So we aren't going to follow > >> the > >>>>> spec for scale. We are already pretty far down that road so I would > >> leave > >>>>> it alone. > >>>>>> > >>>>>> I don't think the spec is asking for the most approximate type. It's > >>>>> just saying the result is approximate, and the precision is > >> implementation > >>>>> defined. We could return either float or double. I think if one of > the > >>>>> operands is a double we should return a double because clearly the > >> schema > >>>>> thought a double was required to represent that number. I would also > >> be in > >>>>> favor of returning a double all the time so that people can expect a > >>>>> consistent type from expressions involving approximate numbers. > >>>>>> > >>>>>> I am a big fan of widening for arithmetic expressions in a database > to > >>>>> avoid having to error on overflow. You can go to the trouble of only > >>>>> widening the minimum amount, but I think it's simpler if we always > >> widen to > >>>>> bigint and double. This would be something the spec allows. > >>>>>> > >>>>>> Definitely if we can make overflow not occur we should and the spec > >>>>> allows that. We should also not return different types for the same > >> operand > >>>>> types just to work around overflow if we detect we need more > precision. > >>>>>> > >>>>>> Ariel > >>>>>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote: > >>>>>>> If it’s in the SQL spec, I’m fairly convinced. Thanks for digging > >> this > >>>>>>> out (and Mike for getting some empirical examples). > >>>>>>> > >>>>>>> We still have to decide on the approximate data type to return; > right > >>>>>>> now, we have float+bigint=double, but float+int=float. I think > this > >> is > >>>>>>> fairly inconsistent, and either the approximate type should always > >> win, > >>>>>>> or we should always upgrade to double for mixed operands. > >>>>>>> > >>>>>>> The quoted spec also suggests that decimal+float=float, and decimal > >>>>>>> +double=double, whereas we currently have decimal+float=decimal, > and > >>>>>>> decimal+double=decimal > >>>>>>> > >>>>>>> If we’re going to go with an approximate operand implying an > >>>>> approximate > >>>>>>> result, I think we should do it consistently (and consistent with > the > >>>>>>> SQL92 spec), and have the type of the approximate operand always be > >> the > >>>>>>> return type. > >>>>>>> > >>>>>>> This would still leave a decision for float+double, though. The > most > >>>>>>> consistent behaviour with that stated above would be to always take > >> the > >>>>>>> most approximate type to return (i.e. float), but this would seem > to > >> me > >>>>>>> to be fairly unexpected for the user. > >>>>>>> > >>>>>>> > >>>>>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws> > wrote: > >>>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> I agree with what's been said about expectations regarding > >> expressions > >>>>> involving floating point numbers. I think that if one of the inputs > is > >>>>> approximate then the result should be approximate. > >>>>>>>> > >>>>>>>> One thing we could look at for inspiration is the SQL spec. Not to > >>>>> follow dogmatically necessarily. > >>>>>>>> > >>>>>>>> From the SQL 92 spec regarding assignment > >>>>> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section > 4.6: > >>>>>>>> " > >>>>>>>> Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT, > >>>>>>>> FLOAT, REAL, and DOUBLE PRECISION are numbers and are all > >>>>> mutually > >>>>>>>> comparable and mutually assignable. If an assignment would > >>>>> result > >>>>>>>> in a loss of the most significant digits, an exception > >> condition > >>>>>>>> is raised. If least significant digits are lost, > >> implementation- > >>>>>>>> defined rounding or truncating occurs with no exception > >>>>> condition > >>>>>>>> being raised. The rules for arithmetic are generally governed > >> by > >>>>>>>> Subclause 6.12, "<numeric value expression>". > >>>>>>>> " > >>>>>>>> > >>>>>>>> Section 6.12 numeric value expressions: > >>>>>>>> " > >>>>>>>> 1) If the data type of both operands of a dyadic arithmetic > >>>>> opera- > >>>>>>>> tor is exact numeric, then the data type of the result is > >>>>> exact > >>>>>>>> numeric, with precision and scale determined as follows: > >>>>>>>> ... > >>>>>>>> 2) If the data type of either operand of a dyadic arithmetic > >> op- > >>>>>>>> erator is approximate numeric, then the data type of the > re- > >>>>>>>> sult is approximate numeric. The precision of the result is > >>>>>>>> implementation-defined. > >>>>>>>> " > >>>>>>>> > >>>>>>>> And this makes sense to me. I think we should only return an exact > >>>>> result if both of the inputs are exact. > >>>>>>>> > >>>>>>>> I think we might want to look closely at the SQL spec and > especially > >>>>> when the spec requires an error to be generated. Those are sometimes > >> in the > >>>>> spec to prevent subtle paths to wrong answers. Any time we deviate > >> from the > >>>>> spec we should be asking why is it in the spec and why are we > >> deviating. > >>>>>>>> > >>>>>>>> Another issue besides overflow handling is how we determine > >> precision > >>>>> and scale for expressions involving two exact types. > >>>>>>>> > >>>>>>>> Ariel > >>>>>>>> > >>>>>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote: > >>>>>>>>> Hi, > >>>>>>>>> > >>>>>>>>> I'm not sure if I would prefer the Postgres way of doing things, > >>>>> which is > >>>>>>>>> returning just about any type depending on the order of > operators. > >>>>>>>>> Considering it actually mentions in the docs that using > >>>>> numeric/decimal is > >>>>>>>>> slow and also multiple times that floating points are inexact. So > >>>>> doing > >>>>>>>>> some math with Postgres (9.6.5): > >>>>>>>>> > >>>>>>>>> SELECT 2147483647::bigint*1.0::double precision returns double > >>>>>>>>> precision 2147483647 > >>>>>>>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0 > >>>>>>>>> SELECT 2147483647::bigint*1.0::real returns double > >>>>>>>>> SELECT 2147483647::double precision*1::bigint returns double > >>>>> 2147483647 > >>>>>>>>> SELECT 2147483647::double precision*1.0::bigint returns double > >>>>> 2147483647 > >>>>>>>>> > >>>>>>>>> With + - we can get the same amount of mixture of returned types. > >>>>> There's > >>>>>>>>> no difference in those calculations, just some casting. To me > >>>>>>>>> floating-point math indicates inexactness and has errors and > >> whoever > >>>>> mixes > >>>>>>>>> up two different types should understand that. If one didn't want > >>>>> exact > >>>>>>>>> numeric type, why would the server return such? The floating > point > >>>>> value > >>>>>>>>> itself could be wrong already before the calculation - trying to > >> say > >>>>> we do > >>>>>>>>> it lossless is just wrong. > >>>>>>>>> > >>>>>>>>> Fun with 2.65: > >>>>>>>>> > >>>>>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743 > >>>>>>>>> SELECT 2.65::double precision * 1::int returns double 2.65 > >>>>>>>>> > >>>>>>>>> SELECT round(2.65) returns numeric 4 > >>>>>>>>> SELECT round(2.65::double precision) returns double 4 > >>>>>>>>> > >>>>>>>>> SELECT 2.65 * 1 returns double 2.65 > >>>>>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65 > >>>>>>>>> SELECT 2.65 * 1.0 returns numeric 2.650 > >>>>>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65 > >>>>>>>>> > >>>>>>>>> SELECT round(2.65) * 1 returns numeric 3 > >>>>>>>>> SELECT round(2.65) * round(1) returns double 3 > >>>>>>>>> > >>>>>>>>> So as we're going to have silly values in any case, why pretend > >>>>> something > >>>>>>>>> else? Also, exact calculations are slow if we crunch large amount > >> of > >>>>>>>>> numbers. I guess I slightly deviated towards Postgres' > implemention > >>>>> in this > >>>>>>>>> case, but I wish it wasn't used as a benchmark in this case. And > >> most > >>>>>>>>> importantly, I would definitely want the exact same type returned > >>>>> each time > >>>>>>>>> I do a calculation. > >>>>>>>>> > >>>>>>>>> - Micke > >>>>>>>>> > >>>>>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith < > >>>>> bened...@apache.org> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> As far as I can tell we reached a relatively strong consensus > >> that we > >>>>>>>>>> should implement lossless casts by default? Does anyone have > >>>>> anything more > >>>>>>>>>> to add? > >>>>>>>>>> > >>>>>>>>>> Looking at the emails, everyone who participated and expressed a > >>>>>>>>>> preference was in favour of the “Postgres approach” of upcasting > >> to > >>>>> decimal > >>>>>>>>>> for mixed float/int operands? > >>>>>>>>>> > >>>>>>>>>> I’d like to get a clear-cut decision on this, so we know what > >> we’re > >>>>> doing > >>>>>>>>>> for 4.0. Then hopefully we can move on to a collective decision > >> on > >>>>> Ariel’s > >>>>>>>>>> concerns about overflow, which I think are also pressing - > >>>>> particularly for > >>>>>>>>>> tinyint and smallint. This does also impact implicit casts for > >> mixed > >>>>>>>>>> integer type operations, but an approach for these will probably > >>>>> fall out > >>>>>>>>>> of any decision on overflow. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan < > >>>>> murukesh.moha...@gmail.com> > >>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> I think you're conflating two things here. There's the loss > >>>>> resulting > >>>>>>>>>> from > >>>>>>>>>>> using some operators, and loss involved in casting. Dividing an > >>>>> integer > >>>>>>>>>> by > >>>>>>>>>>> another integer to obtain an integer result can result in loss, > >> but > >>>>>>>>>> there's > >>>>>>>>>>> no implicit casting there and no loss due to casting. Casting > an > >>>>> integer > >>>>>>>>>>> to a float can also result in loss. So dividing an integer by a > >>>>> float, > >>>>>>>>>> for > >>>>>>>>>>> example, with an implicit cast has an additional avenue for > loss: > >>>>> the > >>>>>>>>>>> implicit cast for the operands so that they're of the same > type. > >> I > >>>>>>>>>> believe > >>>>>>>>>>> this discussion so far has been about the latter, not the loss > >> from > >>>>> the > >>>>>>>>>>> operations themselves. > >>>>>>>>>>> > >>>>>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer < > >>>>> benjamin.le...@datastax.com> > >>>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Hi, > >>>>>>>>>>>> > >>>>>>>>>>>> I would like to try to clarify things a bit to help people to > >>>>> understand > >>>>>>>>>>>> the true complexity of the problem. > >>>>>>>>>>>> > >>>>>>>>>>>> The *float *and *double *types are inexact numeric types. Not > >> only > >>>>> at > >>>>>>>>>> the > >>>>>>>>>>>> operation level. > >>>>>>>>>>>> > >>>>>>>>>>>> If you insert 676543.21 in a *float* column and then read it, > >> you > >>>>> will > >>>>>>>>>>>> realize that the value has been truncated to 676543.2. > >>>>>>>>>>>> > >>>>>>>>>>>> If you want accuracy the only way is to avoid those inexact > >> types. > >>>>>>>>>>>> Using *decimals > >>>>>>>>>>>> *during operations will mitigate the problem but will not > remove > >>>>> it. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> I do not recall PostgreSQL behaving has described. If I am not > >>>>> mistaken > >>>>>>>>>> in > >>>>>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to > >> what > >>>>> MS SQL > >>>>>>>>>>>> server and Oracle do. So all thoses databases will lose > >> precision > >>>>> if you > >>>>>>>>>>>> are not carefull. > >>>>>>>>>>>> > >>>>>>>>>>>> If you truly need precision you can have it by using exact > >> numeric > >>>>> types > >>>>>>>>>>>> for your data types. Of course it has a cost on performance, > >>>>> memory and > >>>>>>>>>>>> disk usage. > >>>>>>>>>>>> > >>>>>>>>>>>> The advantage of the current approach is that it give you the > >>>>> choice. > >>>>>>>>>> It is > >>>>>>>>>>>> up to you to decide what you need for your application. It is > >> also > >>>>> in > >>>>>>>>>> line > >>>>>>>>>>>> with the way CQL behave everywhere else. > >>>>>>>>>>>> > >>>>>>>>>>> -- > >>>>>>>>>>> > >>>>>>>>>>> Muru > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >> --------------------------------------------------------------------- > >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >>>>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>>>> > >> --------------------------------------------------------------------- > >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > --------------------------------------------------------------------- > >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > >>>>>>> > >>>>>> > >>>>>> > --------------------------------------------------------------------- > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > >>>>>> > >>>>> > >>>>> > >>>>> --------------------------------------------------------------------- > >>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > >>>>> > >>>>> -- > >>>> Jon Haddad > >>>> http://www.rustyrazorblade.com > >>>> twitter: rustyrazorblade > >>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >>> For additional commands, e-mail: dev-h...@cassandra.apache.org > >>> > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >> For additional commands, e-mail: dev-h...@cassandra.apache.org > >> > >> > > -- Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade