Re: Implicit Casts for Arithmetic Operators

Jonathan Haddad Wed, 21 Nov 2018 16:25:30 -0800

I can’t agree more. We should be able to make changes in a manner that
improves the DB In the long term, rather than live with the technical debt
of arbitrary decisions made by a handful of people.


I also agree that putting a knob in place to let people migrate over is a
reasonable decision.

Jon

On Wed, Nov 21, 2018 at 4:54 PM Benedict Elliott Smith <bened...@apache.org>
wrote:

> The goal is simply to agree on a set of well-defined principles for how we
> should behave.  If we don’t like the implications that arise, we’ll have
> another vote?  A democracy cannot bind itself, so I never understood this
> fear of a decision.
>
> A database also has a thousand toggles.  If we absolutely need to, we can
> introduce one more.
>
> We should be doing this upfront a great deal more often.  Doing it
> retrospectively sucks, but in my opinion it's a bad reason to bind
> ourselves to whatever made it in.
>
> Do we anywhere define the principles of our current behaviour?  I couldn’t
> find it.
>
>
> > On 21 Nov 2018, at 21:08, Sylvain Lebresne <lebre...@gmail.com> wrote:
> >
> > On Tue, Nov 20, 2018 at 5:02 PM Benedict Elliott Smith <
> bened...@apache.org>
> > wrote:
> >
> >> FWIW, my meaning of arithmetic in this context extends to any features
> we
> >> have already released (such as aggregates, and perhaps other built-in
> >> functions) that operate on the same domain.  We should be consistent,
> after
> >> all.
> >>
> >> Whether or not we need to revisit any existing functionality we can
> figure
> >> out after the fact, once we have agreed what our behaviour should be.
> >>
> >
> > I'm not sure I correctly understand the process suggested, but I don't
> > particularly like/agree with what I understand. What I understand is a
> > suggestion for voting on agreeing to be ANSI SQL 92 compliant, with no
> real
> > evaluation of what that entails (at least I haven't seen one), and that
> > this vote, if passed, would imply we'd then make any backward
> incompatible
> > change necessary to achieve compliance ("my meaning of arithmetic in this
> > context extends to any features we have already released" and "Whether or
> > not we need to revisit any existing functionality we can figure out after
> > the fact, once we have agreed what our behaviour should be").
> >
> > This might make sense of a new product, but at our stage that seems
> > backward to me. I think we owe our users to first make the effort of
> > identifying what "inconsistencies" our existing arithmetic has[1] and
> > _then_ consider what options we have to fix those, with their pros and
> cons
> > (including how bad they break backward compatibility). And if _then_
> > getting ANSI SQL 92 compliant proves to not be disruptive (or at least
> > acceptably so), then sure, that's great.
> >
> > [1]: one possibly efficient way to do that could actually be to compare
> our
> > arithmetic to ANSI SQL 92. Not that all differences found would imply
> > inconsistencies/wrongness of our arithmetic, but still, it should be
> > helpful. And I guess my whole point is that we should that analysis
> first,
> > and then maybe decide that being ANSI SQL 92 is a reasonable option, not
> > decide first and live with the consequences no matter what they are.
> >
> > --
> > Sylvain
> >
> >
> >> I will make this more explicit for the vote, but just to clarify the
> >> intention so that we are all discussing the same thing.
> >>
> >>
> >>> On 20 Nov 2018, at 14:18, Ariel Weisberg <adwei...@fastmail.fm> wrote:
> >>>
> >>> Hi,
> >>>
> >>> +1
> >>>
> >>> This is a public API so we will be much better off if we get it right
> >> the first time.
> >>>
> >>> Ariel
> >>>
> >>>> On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <j...@jonhaddad.com>
> >> wrote:
> >>>>
> >>>> Sounds good to me.
> >>>>
> >>>> On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <
> >> bened...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> So, this thread somewhat petered out.
> >>>>>
> >>>>> There are still a number of unresolved issues, but to make progress I
> >>>>> wonder if it would first be helpful to have a vote on ensuring we are
> >> ANSI
> >>>>> SQL 92 compliant for our arithmetic?  This seems like a sensible
> >> baseline,
> >>>>> since we will hopefully minimise surprise to operators this way.
> >>>>>
> >>>>> If people largely agree, I will call a vote, and we can pick up a
> >> couple
> >>>>> of more focused discussions afterwards on how we interpret the leeway
> >> it
> >>>>> gives.
> >>>>>
> >>>>>
> >>>>>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> From reading the spec. Precision is always implementation defined.
> The
> >>>>> spec specifies scale in several cases, but never precision for any
> >> type or
> >>>>> operation (addition/subtraction, multiplication, division).
> >>>>>>
> >>>>>> So we don't implement anything remotely approaching precision and
> >> scale
> >>>>> in CQL when it comes to numbers I think? So we aren't going to follow
> >> the
> >>>>> spec for scale. We are already pretty far down that road so I would
> >> leave
> >>>>> it alone.
> >>>>>>
> >>>>>> I don't think the spec is asking for the most approximate type. It's
> >>>>> just saying the result is approximate, and the precision is
> >> implementation
> >>>>> defined. We could return either float or double. I think if one of
> the
> >>>>> operands is a double we should return a double because clearly the
> >> schema
> >>>>> thought a double was required to represent that number. I would also
> >> be in
> >>>>> favor of returning a double all the time so that people can expect a
> >>>>> consistent type from expressions involving approximate numbers.
> >>>>>>
> >>>>>> I am a big fan of widening for arithmetic expressions in a database
> to
> >>>>> avoid having to error on overflow. You can go to the trouble of only
> >>>>> widening the minimum amount, but I think it's simpler if we always
> >> widen to
> >>>>> bigint and double. This would be something the spec allows.
> >>>>>>
> >>>>>> Definitely if we can make overflow not occur we should and the spec
> >>>>> allows that. We should also not return different types for the same
> >> operand
> >>>>> types just to work around overflow if we detect we need more
> precision.
> >>>>>>
> >>>>>> Ariel
> >>>>>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
> >>>>>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for digging
> >> this
> >>>>>>> out (and Mike for getting some empirical examples).
> >>>>>>>
> >>>>>>> We still have to decide on the approximate data type to return;
> right
> >>>>>>> now, we have float+bigint=double, but float+int=float.  I think
> this
> >> is
> >>>>>>> fairly inconsistent, and either the approximate type should always
> >> win,
> >>>>>>> or we should always upgrade to double for mixed operands.
> >>>>>>>
> >>>>>>> The quoted spec also suggests that decimal+float=float, and decimal
> >>>>>>> +double=double, whereas we currently have decimal+float=decimal,
> and
> >>>>>>> decimal+double=decimal
> >>>>>>>
> >>>>>>> If we’re going to go with an approximate operand implying an
> >>>>> approximate
> >>>>>>> result, I think we should do it consistently (and consistent with
> the
> >>>>>>> SQL92 spec), and have the type of the approximate operand always be
> >> the
> >>>>>>> return type.
> >>>>>>>
> >>>>>>> This would still leave a decision for float+double, though.  The
> most
> >>>>>>> consistent behaviour with that stated above would be to always take
> >> the
> >>>>>>> most approximate type to return (i.e. float), but this would seem
> to
> >> me
> >>>>>>> to be fairly unexpected for the user.
> >>>>>>>
> >>>>>>>
> >>>>>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws>
> wrote:
> >>>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> I agree with what's been said about expectations regarding
> >> expressions
> >>>>> involving floating point numbers. I think that if one of the inputs
> is
> >>>>> approximate then the result should be approximate.
> >>>>>>>>
> >>>>>>>> One thing we could look at for inspiration is the SQL spec. Not to
> >>>>> follow dogmatically necessarily.
> >>>>>>>>
> >>>>>>>> From the SQL 92 spec regarding assignment
> >>>>> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section
> 4.6:
> >>>>>>>> "
> >>>>>>>>     Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
> >>>>>>>>     FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
> >>>>> mutually
> >>>>>>>>     comparable and mutually assignable. If an assignment would
> >>>>> result
> >>>>>>>>     in a loss of the most significant digits, an exception
> >> condition
> >>>>>>>>     is raised. If least significant digits are lost,
> >> implementation-
> >>>>>>>>     defined rounding or truncating occurs with no exception
> >>>>> condition
> >>>>>>>>     being raised. The rules for arithmetic are generally governed
> >> by
> >>>>>>>>     Subclause 6.12, "<numeric value expression>".
> >>>>>>>> "
> >>>>>>>>
> >>>>>>>> Section 6.12 numeric value expressions:
> >>>>>>>> "
> >>>>>>>>     1) If the data type of both operands of a dyadic arithmetic
> >>>>> opera-
> >>>>>>>>        tor is exact numeric, then the data type of the result is
> >>>>> exact
> >>>>>>>>        numeric, with precision and scale determined as follows:
> >>>>>>>> ...
> >>>>>>>>     2) If the data type of either operand of a dyadic arithmetic
> >> op-
> >>>>>>>>        erator is approximate numeric, then the data type of the
> re-
> >>>>>>>>        sult is approximate numeric. The precision of the result is
> >>>>>>>>        implementation-defined.
> >>>>>>>> "
> >>>>>>>>
> >>>>>>>> And this makes sense to me. I think we should only return an exact
> >>>>> result if both of the inputs are exact.
> >>>>>>>>
> >>>>>>>> I think we might want to look closely at the SQL spec and
> especially
> >>>>> when the spec requires an error to be generated. Those are sometimes
> >> in the
> >>>>> spec to prevent subtle paths to wrong answers. Any time we deviate
> >> from the
> >>>>> spec we should be asking why is it in the spec and why are we
> >> deviating.
> >>>>>>>>
> >>>>>>>> Another issue besides overflow handling is how we determine
> >> precision
> >>>>> and scale for expressions involving two exact types.
> >>>>>>>>
> >>>>>>>> Ariel
> >>>>>>>>
> >>>>>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I'm not sure if I would prefer the Postgres way of doing things,
> >>>>> which is
> >>>>>>>>> returning just about any type depending on the order of
> operators.
> >>>>>>>>> Considering it actually mentions in the docs that using
> >>>>> numeric/decimal is
> >>>>>>>>> slow and also multiple times that floating points are inexact. So
> >>>>> doing
> >>>>>>>>> some math with Postgres (9.6.5):
> >>>>>>>>>
> >>>>>>>>> SELECT 2147483647::bigint*1.0::double precision returns double
> >>>>>>>>> precision 2147483647
> >>>>>>>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
> >>>>>>>>> SELECT 2147483647::bigint*1.0::real returns double
> >>>>>>>>> SELECT 2147483647::double precision*1::bigint returns double
> >>>>> 2147483647
> >>>>>>>>> SELECT 2147483647::double precision*1.0::bigint returns double
> >>>>> 2147483647
> >>>>>>>>>
> >>>>>>>>> With + - we can get the same amount of mixture of returned types.
> >>>>> There's
> >>>>>>>>> no difference in those calculations, just some casting. To me
> >>>>>>>>> floating-point math indicates inexactness and has errors and
> >> whoever
> >>>>> mixes
> >>>>>>>>> up two different types should understand that. If one didn't want
> >>>>> exact
> >>>>>>>>> numeric type, why would the server return such? The floating
> point
> >>>>> value
> >>>>>>>>> itself could be wrong already before the calculation - trying to
> >> say
> >>>>> we do
> >>>>>>>>> it lossless is just wrong.
> >>>>>>>>>
> >>>>>>>>> Fun with 2.65:
> >>>>>>>>>
> >>>>>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
> >>>>>>>>> SELECT 2.65::double precision * 1::int returns double 2.65
> >>>>>>>>>
> >>>>>>>>> SELECT round(2.65) returns numeric 4
> >>>>>>>>> SELECT round(2.65::double precision) returns double 4
> >>>>>>>>>
> >>>>>>>>> SELECT 2.65 * 1 returns double 2.65
> >>>>>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
> >>>>>>>>> SELECT 2.65 * 1.0 returns numeric 2.650
> >>>>>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
> >>>>>>>>>
> >>>>>>>>> SELECT round(2.65) * 1 returns numeric 3
> >>>>>>>>> SELECT round(2.65) * round(1) returns double 3
> >>>>>>>>>
> >>>>>>>>> So as we're going to have silly values in any case, why pretend
> >>>>> something
> >>>>>>>>> else? Also, exact calculations are slow if we crunch large amount
> >> of
> >>>>>>>>> numbers. I guess I slightly deviated towards Postgres'
> implemention
> >>>>> in this
> >>>>>>>>> case, but I wish it wasn't used as a benchmark in this case. And
> >> most
> >>>>>>>>> importantly, I would definitely want the exact same type returned
> >>>>> each time
> >>>>>>>>> I do a calculation.
> >>>>>>>>>
> >>>>>>>>> - Micke
> >>>>>>>>>
> >>>>>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
> >>>>> bened...@apache.org>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> As far as I can tell we reached a relatively strong consensus
> >> that we
> >>>>>>>>>> should implement lossless casts by default?  Does anyone have
> >>>>> anything more
> >>>>>>>>>> to add?
> >>>>>>>>>>
> >>>>>>>>>> Looking at the emails, everyone who participated and expressed a
> >>>>>>>>>> preference was in favour of the “Postgres approach” of upcasting
> >> to
> >>>>> decimal
> >>>>>>>>>> for mixed float/int operands?
> >>>>>>>>>>
> >>>>>>>>>> I’d like to get a clear-cut decision on this, so we know what
> >> we’re
> >>>>> doing
> >>>>>>>>>> for 4.0.  Then hopefully we can move on to a collective decision
> >> on
> >>>>> Ariel’s
> >>>>>>>>>> concerns about overflow, which I think are also pressing -
> >>>>> particularly for
> >>>>>>>>>> tinyint and smallint.  This does also impact implicit casts for
> >> mixed
> >>>>>>>>>> integer type operations, but an approach for these will probably
> >>>>> fall out
> >>>>>>>>>> of any decision on overflow.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
> >>>>> murukesh.moha...@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> I think you're conflating two things here. There's the loss
> >>>>> resulting
> >>>>>>>>>> from
> >>>>>>>>>>> using some operators, and loss involved in casting. Dividing an
> >>>>> integer
> >>>>>>>>>> by
> >>>>>>>>>>> another integer to obtain an integer result can result in loss,
> >> but
> >>>>>>>>>> there's
> >>>>>>>>>>> no implicit casting there and no loss due to casting.  Casting
> an
> >>>>> integer
> >>>>>>>>>>> to a float can also result in loss. So dividing an integer by a
> >>>>> float,
> >>>>>>>>>> for
> >>>>>>>>>>> example, with an implicit cast has an additional avenue for
> loss:
> >>>>> the
> >>>>>>>>>>> implicit cast for the operands so that they're of the same
> type.
> >> I
> >>>>>>>>>> believe
> >>>>>>>>>>> this discussion so far has been about the latter, not the loss
> >> from
> >>>>> the
> >>>>>>>>>>> operations themselves.
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
> >>>>> benjamin.le...@datastax.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I would like to try to clarify things a bit to help people to
> >>>>> understand
> >>>>>>>>>>>> the true complexity of the problem.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The *float *and *double *types are inexact numeric types. Not
> >> only
> >>>>> at
> >>>>>>>>>> the
> >>>>>>>>>>>> operation level.
> >>>>>>>>>>>>
> >>>>>>>>>>>> If you insert 676543.21 in a *float* column and then read it,
> >> you
> >>>>> will
> >>>>>>>>>>>> realize that the value has been truncated to 676543.2.
> >>>>>>>>>>>>
> >>>>>>>>>>>> If you want accuracy the only way is to avoid those inexact
> >> types.
> >>>>>>>>>>>> Using *decimals
> >>>>>>>>>>>> *during operations will mitigate the problem but will not
> remove
> >>>>> it.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> I do not recall PostgreSQL behaving has described. If I am not
> >>>>> mistaken
> >>>>>>>>>> in
> >>>>>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to
> >> what
> >>>>> MS SQL
> >>>>>>>>>>>> server and Oracle do. So all thoses databases will lose
> >> precision
> >>>>> if you
> >>>>>>>>>>>> are not carefull.
> >>>>>>>>>>>>
> >>>>>>>>>>>> If you truly need precision you can have it by using exact
> >> numeric
> >>>>> types
> >>>>>>>>>>>> for your data types. Of course it has a cost on performance,
> >>>>> memory and
> >>>>>>>>>>>> disk usage.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The advantage of the current approach is that it give you the
> >>>>> choice.
> >>>>>>>>>> It is
> >>>>>>>>>>>> up to you to decide what you need for your application. It is
> >> also
> >>>>> in
> >>>>>>>>>> line
> >>>>>>>>>>>> with the way CQL behave everywhere else.
> >>>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>>
> >>>>>>>>>>> Muru
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >> ---------------------------------------------------------------------
> >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>>>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>>>>>
> >>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>>>>
> >>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>>>
> >>>>> --
> >>>> Jon Haddad
> >>>> http://www.rustyrazorblade.com
> >>>> twitter: rustyrazorblade
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>
> >>
>
> --
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Implicit Casts for Arithmetic Operators

Reply via email to