[ https://issues.apache.org/jira/browse/CASSANDRA-12417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15545433#comment-15545433 ]
Sylvain Lebresne commented on CASSANDRA-12417: ---------------------------------------------- bq. If my understanding is right, both functions (sum and avg) can end up returning different results, even in normal situation, due to the fact that both use now the Kahan's algorithm I was really thinking about average on integers, which as far as I can tell from the patch, will not return a different result. You're probably right on floats, but to some extend that doesn't change the overall point, which is unrelated to the actual patch: having averages break even though neither the input nor the output overflow is a bug which we should fix on 3.0. If we prefer having separate patches for 3.0 and 3.X/trunk so we preserve the float summation (which is more arguably an improvement) _but_ fixes the average, that's totally fine by me, but I disagree with not fixing a know and easy to fix bug in 3.0. > Built-in AVG aggregate is much less useful than it should be > ------------------------------------------------------------ > > Key: CASSANDRA-12417 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12417 > Project: Cassandra > Issue Type: Bug > Components: CQL > Reporter: Branimir Lambov > Assignee: Alex Petrov > > For fixed-size integer types overflow is all but guaranteed to happen, > yielding incorrect result. While for sum it is somewhat acceptable as the > result cannot fit the type, this is not the case for average. > As the result of average is always within the scope of the source type, > failing to produce it only signifies a bad implementation. Yes, one can solve > this by type-casting, but do we really want to always have to be telling > people that the correct spelling of the average function is > {{cast(avg(cast(value as bigint))) as int)}}, especially if this is so > trivial to fix? > Additionally, the straightforward addition we use for floating point versions > is not a good choice numerically for larger numbers of values. We should > switch to a more stable version, e.g. iterative mean using {{avg = avg + > (value - avg) / count}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)