[ 
https://issues.apache.org/jira/browse/CASSANDRA-12417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15545433#comment-15545433
 ] 

Sylvain Lebresne commented on CASSANDRA-12417:
----------------------------------------------

bq. If my understanding is right, both functions (sum and avg) can end up 
returning different results, even in normal situation, due to the fact that 
both use now the Kahan's algorithm

I was really thinking about average on integers, which as far as I can tell 
from the patch, will not return a different result. You're probably right on 
floats, but to some extend that doesn't change the overall point, which is 
unrelated to the actual patch: having averages break even though neither the 
input nor the output overflow is a bug which we should fix on 3.0. If we prefer 
having separate patches for 3.0 and 3.X/trunk so we preserve the float 
summation (which is more arguably an improvement) _but_ fixes the average, 
that's totally fine by me, but I disagree with not fixing a know and easy to 
fix bug in 3.0.

> Built-in AVG aggregate is much less useful than it should be
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-12417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12417
>             Project: Cassandra
>          Issue Type: Bug
>          Components: CQL
>            Reporter: Branimir Lambov
>            Assignee: Alex Petrov
>
> For fixed-size integer types overflow is all but guaranteed to happen, 
> yielding incorrect result. While for sum it is somewhat acceptable as the 
> result cannot fit the type, this is not the case for average.
> As the result of average is always within the scope of the source type, 
> failing to produce it only signifies a bad implementation. Yes, one can solve 
> this by type-casting, but do we really want to always have to be telling 
> people that the correct spelling of the average function is 
> {{cast(avg(cast(value as bigint))) as int)}}, especially if this is so 
> trivial to fix?
> Additionally, the straightforward addition we use for floating point versions 
> is not a good choice numerically for larger numbers of values. We should 
> switch to a more stable version, e.g. iterative mean using {{avg = avg + 
> (value - avg) / count}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to