+1 on Postgres approach. In the last 5 years I’ve seen people move from Oracle and SQL server to some variant of Cassandra or Postgres and other new tech is also more likely to support Postgres (Cockroach..)
I don’t care either way. It really depends on what you are storing. Rahul Singh Chief Executive Officer m 202.905.2818 Anant Corporation 1010 Wisconsin Ave NW, Suite 250 Washington, D.C. 20007 We build and manage digital business technology platforms. On Oct 2, 2018, 11:11 AM -0700, Jonathan Haddad <j...@jonhaddad.com>, wrote: > Thanks for bringing this up, it definitely needs to be discussed. > > Last surprise is difficult here, since all major databases have their own > way of doing things and people will just assume that their way is the right > way. On that note, some people will be surprised no matter what we do. > > I'd rather avoid the pitfalls of returning incorrect results, so either > option 2 or 3 sound reasonable, but leaning towards the Postgres approach > of always returning a decimal for those cases. > > Jon > > > > On Tue, Oct 2, 2018 at 10:54 AM Benedict Elliott Smith <bened...@apache.org> > wrote: > > > I agree, in broad strokes at least. Interested to hear others’ positions. > > > > > > > > > On 2 Oct 2018, at 16:44, Ariel Weisberg <ar...@weisberg.ws> wrote: > > > > > > Hi, > > > > > > I think overflow and the role of widening conversions are pretty linked > > so I'll continue to inject that into this discussion. Also overflow is much > > worse since most applications won't be impacted by a loss of precision when > > an expression involves an int and float, but will care quite a bit if they > > get some nonsense wrapped number in an integer only expression. > > > > > > For VoltDB in practice we didn't run into issues with applications not > > making progress due to exceptions with real data due to the widening > > conversions. The range of double and long are pretty big and that hides > > wrap around/infinity. > > > > > > I think the proposal of having all operations return a decimal is > > attractive in that these expressions always result in a consistent type. > > Two pain points might be whether client languages have decimal support and > > whether there is a performance issue? The nice thing about always returning > > decimal is we can sidestep the issue of overflow. > > > > > > I would start with seeing if that's acceptable, and if it isn't then > > look at other approaches like returning a variety of types such when doing > > int + int return a bigint or int + float return a double. > > > > > > If we take an approach that allows overflow the ideal end state IMO > > would be to get all users to run Cassandra in way that overflow results in > > an error even in the context of aggregation. The road to get there is > > tricky, but maybe start by having it as an opt in tunable in > > cassandra.yaml. I don't know how/when we could ever change that as a > > default and it's unfortunate having an option like this that 99% won't know > > they should flip. > > > > > > It seems like having the default throw on overflow is not as bad as it > > sounds if you do the widening conversions since most people won't run into > > them. The change in the column types of results sets actually sounds worse > > if we want to also improve aggregrations. Many applications won't notice if > > the client library abstracts that away, but I think there are still cases > > where people would notice the type changing. > > > > > > Ariel > > > > > > > On Tue, Oct 2, 2018, at 11:09 AM, Benedict Elliott Smith wrote: > > > > This (overflow) is an excellent point, but this also affects > > > > aggregations which were introduced a long time ago. They already > > > > inherit Java semantics for all of the relevant types (silent wrap > > > > around). We probably want to be consistent, meaning either changing > > > > aggregations (which incurs a cost for changing API) or continuing the > > > > java semantics here. > > > > > > > > This is why having these discussions explicitly in the community before > > > > a release is so critical, in my view. It’s very easy for these > > semantic > > > > changes to go unnoticed on a JIRA, and then ossify. > > > > > > > > > > > > > On 2 Oct 2018, at 15:48, Ariel Weisberg <ar...@weisberg.ws> wrote: > > > > > > > > > > Hi, > > > > > > > > > > I think we should decide based on what is least surprising as you > > mention, but isn't overridden by some other concern. > > > > > > > > > > It seems to me the priorities are > > > > > > > > > > * Correctness > > > > > * Performance > > > > > * User visible complexity > > > > > * Developer visible complexity > > > > > > > > > > Defaulting to silent implicit data loss is not ideal from a > > correctness standpoint. > > > > > > > > > > Doing something better like using wider types doesn't seem like a > > performance issue. > > > > > > > > > > From a user standpoint doing something less lossy doesn't look more > > complex as long as it's consistent, and documented and doesn't change from > > version to version. > > > > > > > > > > There is some developer complexity, but this is a public API and we > > only get one shot at this. > > > > > > > > > > I wonder about how overflow is handled as well. In VoltDB I think we > > threw on overflow and tended to just do widening conversions to make that > > less common. We didn't imitate another database (as far as I know) we just > > went with what least likely to silently corrupt data. > > > > > > > https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213 > > < > > https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213 > > > > > > > > > > https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764 > > < > > https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764 > > > > > > > > > > > > > Ariel > > > > > > > > > > > On Tue, Oct 2, 2018, at 7:30 AM, Benedict Elliott Smith wrote: > > > > > > ç introduced arithmetic operators, and alongside these > > > > > > came implicit casts for their operands. There is a semantic decision > > to > > > > > > be made, and I think the project would do well to explicitly raise > > this > > > > > > kind of question for wider input before release, since the project > > > > > > is > > > > > > bound by them forever more. > > > > > > > > > > > > In this case, the choice is between lossy and lossless casts for > > > > > > operations involving integers and floating point numbers. In > > essence, > > > > > > should: > > > > > > > > > > > > (1) float + int = float, double + bigint = double; or > > > > > > (2) float + int = double, double + bigint = decimal; or > > > > > > (3) float + int = decimal, double + bigint = decimal > > > > > > > > > > > > Option 1 performs a lossy implicit cast from int -> float, or bigint > > -> > > > > > > double. Simply casting between these types changes the value. This > > is > > > > > > what MS SQL Server does. > > > > > > Options 2 and 3 cast without loss of precision, and 3 (or > > thereabouts) > > > > > > is what PostgreSQL does. > > > > > > > > > > > > The question I’m interested in is not just which is the right > > decision, > > > > > > but how the right decision should be arrived at. My view is that we > > > > > > should primarily aim for least surprise to the user, but I’m keen to > > > > > > hear from others. > > > > > > --------------------------------------------------------------------- > > > > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > > > > > <mailto: > > dev-unsubscr...@cassandra.apache.org> > > > > > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > <mailto:dev-h...@cassandra.apache.org> > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org <mailto: > > dev-unsubscr...@cassandra.apache.org> > > > > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > <mailto:dev-h...@cassandra.apache.org> > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > -- > Jon Haddad > http://www.rustyrazorblade.com > twitter: rustyrazorblade