[ 
https://issues.apache.org/jira/browse/CALCITE-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde updated CALCITE-4483:
---------------------------------
    Summary: WITHIN DISTINCT clause for aggregate functions (experimental)  
(was: Add WITHIN DISTINCT clause for aggregate functions)

> WITHIN DISTINCT clause for aggregate functions (experimental)
> -------------------------------------------------------------
>
>                 Key: CALCITE-4483
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4483
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Julian Hyde
>            Assignee: Julian Hyde
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add a {{WITHIN DISTINCT}} clause to aggregate functions, allowing duplicate 
> rows to be eliminated before entering the function.
> This feature is non-standard, and in fact does not exist in any database I am 
> aware of.
> It is related to {{DISTINCT}}, and is in fact a generalization of it. 
> {{DISTINCT}} can always be rewritten in terms of {{WITHIN DISTINCT}}. For 
> example, {{SUM(DISTINCT sal)}} is equivalent to {{SUM(sal) WITHIN DISTINCT 
> (sal)}}.
> Consider the query
> {noformat}
> SELECT SUM(age),
>    SUM(DISTINCT age),
>   SUM(age) WITHIN DISTINCT (name)
> FROM Friends{noformat}
> where {{Friends}} has the rows
> {noformat}
> name   age job
> ====== === ==========
> Julian  16 Programmer
> Dick    15
> Anne    13 Car wash
> George  15 Lifeguard
> George  15 Dog walker
> Timmy    4
> {noformat}
> Note that there are two rows for George, because she has two jobs.
> The values of the columns are as follows:
>  * {{SUM(age)}} has the value (16 + 15 + 13 + 15 + 15 + 4) = 78;
>  * {{SUM(DISTINCT age)}} has the value (16 + 15 + 13 + 4) = 48;
>  * {{SUM(age) WITHIN DISTINCT (name)}} has the value (16 + 15 + 13 + 15 + 4) 
> = 63.
> {{WITHIN DISTINCT}} has treated the two 15 values for George as one value, 
> but has still counted the 15 for Dick separately.
> The {{WITHIN DISTINCT}} clause can be useful to prevent double-counting when 
> duplicates have been added via a many-to-one join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to