[ 
https://issues.apache.org/jira/browse/CASSANDRA-12417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-12417:
------------------------------------
    Status: Patch Available  (was: Open)

To improve summation precision for {{float}} and {{double}} implementations, 
I've used the summation with Kahan's algorithm. {{BigDecimal}} implementation 
is unchanged.

For {{avg}}, all whole numbers are using single implementation (long 
arithmetics, falling back to {{BigInteger}} when needed) for sum calculation in 
order to avoid situations when the sum is large, although resulting number does 
not overflow the current type boundary. For floating point numbers, logic is 
similar, using Kahan's algorithm for calculating sum, switching to 
{{BigDecimal}} on infinity/overflow.

|[trunk|https://github.com/ifesdjeen/cassandra/tree/12417-trunk] 
|[utest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-12417-trunk-testall/]
 
[dtest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-12417-trunk-dtest/]
 |


> Built-in AVG aggregate is much less useful than it should be
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-12417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12417
>             Project: Cassandra
>          Issue Type: Bug
>          Components: CQL
>            Reporter: Branimir Lambov
>            Assignee: Alex Petrov
>
> For fixed-size integer types overflow is all but guaranteed to happen, 
> yielding incorrect result. While for sum it is somewhat acceptable as the 
> result cannot fit the type, this is not the case for average.
> As the result of average is always within the scope of the source type, 
> failing to produce it only signifies a bad implementation. Yes, one can solve 
> this by type-casting, but do we really want to always have to be telling 
> people that the correct spelling of the average function is 
> {{cast(avg(cast(value as bigint))) as int)}}, especially if this is so 
> trivial to fix?
> Additionally, the straightforward addition we use for floating point versions 
> is not a good choice numerically for larger numbers of values. We should 
> switch to a more stable version, e.g. iterative mean using {{avg = avg + 
> (value - avg) / count}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to