PLEASE TAKE MY EMAIL FROM THIS SHIT !!
-----Mensagem original----- De: Michael Burman [mailto:y...@iki.fi] Enviada em: terça-feira, 20 de novembro de 2018 13:51 Para: dev@cassandra.apache.org Assunto: Re: Implicit Casts for Arithmetic Operators Yep, that's a good approach. - Micke On Tue, Nov 20, 2018 at 5:12 PM Ariel Weisberg <adwei...@fastmail.fm> wrote: > Hi, > > +1 > > This is a public API so we will be much better off if we get it right > the first time. > > Ariel > > > On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <j...@jonhaddad.com> wrote: > > > > Sounds good to me. > > > > On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith < > bened...@apache.org> > > wrote: > > > >> So, this thread somewhat petered out. > >> > >> There are still a number of unresolved issues, but to make progress > >> I wonder if it would first be helpful to have a vote on ensuring we > >> are > ANSI > >> SQL 92 compliant for our arithmetic? This seems like a sensible > baseline, > >> since we will hopefully minimise surprise to operators this way. > >> > >> If people largely agree, I will call a vote, and we can pick up a > >> couple of more focused discussions afterwards on how we interpret > >> the leeway it gives. > >> > >> > >>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws> wrote: > >>> > >>> Hi, > >>> > >>> From reading the spec. Precision is always implementation defined. > >>> The > >> spec specifies scale in several cases, but never precision for any > >> type > or > >> operation (addition/subtraction, multiplication, division). > >>> > >>> So we don't implement anything remotely approaching precision and > >>> scale > >> in CQL when it comes to numbers I think? So we aren't going to > >> follow > the > >> spec for scale. We are already pretty far down that road so I would > leave > >> it alone. > >>> > >>> I don't think the spec is asking for the most approximate type. > >>> It's > >> just saying the result is approximate, and the precision is > implementation > >> defined. We could return either float or double. I think if one of > >> the operands is a double we should return a double because clearly > >> the > schema > >> thought a double was required to represent that number. I would > >> also be > in > >> favor of returning a double all the time so that people can expect > >> a consistent type from expressions involving approximate numbers. > >>> > >>> I am a big fan of widening for arithmetic expressions in a > >>> database to > >> avoid having to error on overflow. You can go to the trouble of > >> only widening the minimum amount, but I think it's simpler if we > >> always > widen to > >> bigint and double. This would be something the spec allows. > >>> > >>> Definitely if we can make overflow not occur we should and the > >>> spec > >> allows that. We should also not return different types for the same > operand > >> types just to work around overflow if we detect we need more precision. > >>> > >>> Ariel > >>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote: > >>>> If it’s in the SQL spec, I’m fairly convinced. Thanks for > >>>> digging > this > >>>> out (and Mike for getting some empirical examples). > >>>> > >>>> We still have to decide on the approximate data type to return; > >>>> right now, we have float+bigint=double, but float+int=float. I > >>>> think this > is > >>>> fairly inconsistent, and either the approximate type should > >>>> always > win, > >>>> or we should always upgrade to double for mixed operands. > >>>> > >>>> The quoted spec also suggests that decimal+float=float, and > >>>> decimal > >>>> +double=double, whereas we currently have decimal+float=decimal, > >>>> +and > >>>> decimal+double=decimal > >>>> > >>>> If we’re going to go with an approximate operand implying an > >> approximate > >>>> result, I think we should do it consistently (and consistent with > >>>> the > >>>> SQL92 spec), and have the type of the approximate operand always > >>>> be > the > >>>> return type. > >>>> > >>>> This would still leave a decision for float+double, though. The > >>>> most consistent behaviour with that stated above would be to > >>>> always take > the > >>>> most approximate type to return (i.e. float), but this would seem > >>>> to > me > >>>> to be fairly unexpected for the user. > >>>> > >>>> > >>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws> wrote: > >>>>> > >>>>> Hi, > >>>>> > >>>>> I agree with what's been said about expectations regarding > expressions > >> involving floating point numbers. I think that if one of the inputs > >> is approximate then the result should be approximate. > >>>>> > >>>>> One thing we could look at for inspiration is the SQL spec. Not > >>>>> to > >> follow dogmatically necessarily. > >>>>> > >>>>> From the SQL 92 spec regarding assignment > >> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6: > >>>>> " > >>>>> Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT, > >>>>> FLOAT, REAL, and DOUBLE PRECISION are numbers and are all > >> mutually > >>>>> comparable and mutually assignable. If an assignment would > >> result > >>>>> in a loss of the most significant digits, an exception > condition > >>>>> is raised. If least significant digits are lost, > implementation- > >>>>> defined rounding or truncating occurs with no exception > >> condition > >>>>> being raised. The rules for arithmetic are generally > >>>>> governed > by > >>>>> Subclause 6.12, "<numeric value expression>". > >>>>> " > >>>>> > >>>>> Section 6.12 numeric value expressions: > >>>>> " > >>>>> 1) If the data type of both operands of a dyadic > >>>>> arithmetic > >> opera- > >>>>> tor is exact numeric, then the data type of the result > >>>>> is > >> exact > >>>>> numeric, with precision and scale determined as follows: > >>>>> ... > >>>>> 2) If the data type of either operand of a dyadic > >>>>> arithmetic > op- > >>>>> erator is approximate numeric, then the data type of the re- > >>>>> sult is approximate numeric. The precision of the result is > >>>>> implementation-defined. > >>>>> " > >>>>> > >>>>> And this makes sense to me. I think we should only return an > >>>>> exact > >> result if both of the inputs are exact. > >>>>> > >>>>> I think we might want to look closely at the SQL spec and > >>>>> especially > >> when the spec requires an error to be generated. Those are > >> sometimes in > the > >> spec to prevent subtle paths to wrong answers. Any time we deviate > >> from > the > >> spec we should be asking why is it in the spec and why are we deviating. > >>>>> > >>>>> Another issue besides overflow handling is how we determine > >>>>> precision > >> and scale for expressions involving two exact types. > >>>>> > >>>>> Ariel > >>>>> > >>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote: > >>>>>> Hi, > >>>>>> > >>>>>> I'm not sure if I would prefer the Postgres way of doing > >>>>>> things, > >> which is > >>>>>> returning just about any type depending on the order of operators. > >>>>>> Considering it actually mentions in the docs that using > >> numeric/decimal is > >>>>>> slow and also multiple times that floating points are inexact. > >>>>>> So > >> doing > >>>>>> some math with Postgres (9.6.5): > >>>>>> > >>>>>> SELECT 2147483647 <(214)%20748-3647>::bigint*1.0::double > >>>>>> precision > returns double > >>>>>> precision 2147483647 <(214)%20748-3647> SELECT 2147483647 > >>>>>> <(214)%20748-3647>::bigint*1.0 returns numeric > 2147483647 <(214)%20748-3647>.0 > >>>>>> SELECT 2147483647 <(214)%20748-3647>::bigint*1.0::real returns > double > >>>>>> SELECT 2147483647 <(214)%20748-3647>::double > >>>>>> precision*1::bigint > returns double > >> 2147483647 <(214)%20748-3647> > >>>>>> SELECT 2147483647 <(214)%20748-3647>::double > >>>>>> precision*1.0::bigint > returns double > >> 2147483647 <(214)%20748-3647> > >>>>>> > >>>>>> With + - we can get the same amount of mixture of returned types. > >> There's > >>>>>> no difference in those calculations, just some casting. To me > >>>>>> floating-point math indicates inexactness and has errors and > >>>>>> whoever > >> mixes > >>>>>> up two different types should understand that. If one didn't > >>>>>> want > >> exact > >>>>>> numeric type, why would the server return such? The floating > >>>>>> point > >> value > >>>>>> itself could be wrong already before the calculation - trying > >>>>>> to say > >> we do > >>>>>> it lossless is just wrong. > >>>>>> > >>>>>> Fun with 2.65: > >>>>>> > >>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743 > >>>>>> SELECT 2.65::double precision * 1::int returns double 2.65 > >>>>>> > >>>>>> SELECT round(2.65) returns numeric 4 SELECT round(2.65::double > >>>>>> precision) returns double 4 > >>>>>> > >>>>>> SELECT 2.65 * 1 returns double 2.65 SELECT 2.65 * 1::bigint > >>>>>> returns numeric 2.65 SELECT 2.65 * 1.0 returns numeric 2.650 > >>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65 > >>>>>> > >>>>>> SELECT round(2.65) * 1 returns numeric 3 SELECT round(2.65) * > >>>>>> round(1) returns double 3 > >>>>>> > >>>>>> So as we're going to have silly values in any case, why pretend > >> something > >>>>>> else? Also, exact calculations are slow if we crunch large > >>>>>> amount of numbers. I guess I slightly deviated towards > >>>>>> Postgres' implemention > >> in this > >>>>>> case, but I wish it wasn't used as a benchmark in this case. > >>>>>> And > most > >>>>>> importantly, I would definitely want the exact same type > >>>>>> returned > >> each time > >>>>>> I do a calculation. > >>>>>> > >>>>>> - Micke > >>>>>> > >>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith < > >> bened...@apache.org> > >>>>>> wrote: > >>>>>> > >>>>>>> As far as I can tell we reached a relatively strong consensus > >>>>>>> that > we > >>>>>>> should implement lossless casts by default? Does anyone have > >> anything more > >>>>>>> to add? > >>>>>>> > >>>>>>> Looking at the emails, everyone who participated and expressed > >>>>>>> a preference was in favour of the “Postgres approach” of > >>>>>>> upcasting to > >> decimal > >>>>>>> for mixed float/int operands? > >>>>>>> > >>>>>>> I’d like to get a clear-cut decision on this, so we know what > >>>>>>> we’re > >> doing > >>>>>>> for 4.0. Then hopefully we can move on to a collective > >>>>>>> decision on > >> Ariel’s > >>>>>>> concerns about overflow, which I think are also pressing - > >> particularly for > >>>>>>> tinyint and smallint. This does also impact implicit casts > >>>>>>> for > mixed > >>>>>>> integer type operations, but an approach for these will > >>>>>>> probably > >> fall out > >>>>>>> of any decision on overflow. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan < > >> murukesh.moha...@gmail.com> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> I think you're conflating two things here. There's the loss > >> resulting > >>>>>>> from > >>>>>>>> using some operators, and loss involved in casting. Dividing > >>>>>>>> an > >> integer > >>>>>>> by > >>>>>>>> another integer to obtain an integer result can result in > >>>>>>>> loss, > but > >>>>>>> there's > >>>>>>>> no implicit casting there and no loss due to casting. > >>>>>>>> Casting an > >> integer > >>>>>>>> to a float can also result in loss. So dividing an integer by > >>>>>>>> a > >> float, > >>>>>>> for > >>>>>>>> example, with an implicit cast has an additional avenue for loss: > >> the > >>>>>>>> implicit cast for the operands so that they're of the same > >>>>>>>> type. I > >>>>>>> believe > >>>>>>>> this discussion so far has been about the latter, not the > >>>>>>>> loss > from > >> the > >>>>>>>> operations themselves. > >>>>>>>> > >>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer < > >> benjamin.le...@datastax.com> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Hi, > >>>>>>>>> > >>>>>>>>> I would like to try to clarify things a bit to help people > >>>>>>>>> to > >> understand > >>>>>>>>> the true complexity of the problem. > >>>>>>>>> > >>>>>>>>> The *float *and *double *types are inexact numeric types. > >>>>>>>>> Not > only > >> at > >>>>>>> the > >>>>>>>>> operation level. > >>>>>>>>> > >>>>>>>>> If you insert 676543.21 in a *float* column and then read > >>>>>>>>> it, you > >> will > >>>>>>>>> realize that the value has been truncated to 676543.2. > >>>>>>>>> > >>>>>>>>> If you want accuracy the only way is to avoid those inexact > types. > >>>>>>>>> Using *decimals > >>>>>>>>> *during operations will mitigate the problem but will not > >>>>>>>>> remove > >> it. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> I do not recall PostgreSQL behaving has described. If I am > >>>>>>>>> not > >> mistaken > >>>>>>> in > >>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to > >>>>>>>>> what > >> MS SQL > >>>>>>>>> server and Oracle do. So all thoses databases will lose > >>>>>>>>> precision > >> if you > >>>>>>>>> are not carefull. > >>>>>>>>> > >>>>>>>>> If you truly need precision you can have it by using exact > numeric > >> types > >>>>>>>>> for your data types. Of course it has a cost on performance, > >> memory and > >>>>>>>>> disk usage. > >>>>>>>>> > >>>>>>>>> The advantage of the current approach is that it give you > >>>>>>>>> the > >> choice. > >>>>>>> It is > >>>>>>>>> up to you to decide what you need for your application. It > >>>>>>>>> is > also > >> in > >>>>>>> line > >>>>>>>>> with the way CQL behave everywhere else. > >>>>>>>>> > >>>>>>>> -- > >>>>>>>> > >>>>>>>> Muru > >>>>>>> > >>>>>>> > >>>>>>> > --------------------------------------------------------------------- > >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > >>>>>>> > >>>>>>> > >>>>> > >>>>> ---------------------------------------------------------------- > >>>>> ----- To unsubscribe, e-mail: > >>>>> dev-unsubscr...@cassandra.apache.org > >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > >>>>> > >>>> > >>>> > >>>> ----------------------------------------------------------------- > >>>> ---- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > >>>> > >>> > >>> ------------------------------------------------------------------ > >>> --- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >>> For additional commands, e-mail: dev-h...@cassandra.apache.org > >>> > >> > >> > >> ------------------------------------------------------------------- > >> -- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >> For additional commands, e-mail: dev-h...@cassandra.apache.org > >> > >> -- > > Jon Haddad > > http://www.rustyrazorblade.com > > twitter: rustyrazorblade > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org