Julian Hyde created CALCITE-4483: ------------------------------------ Summary: Add WITHIN DISTINCT clause for aggregate functions Key: CALCITE-4483 URL: https://issues.apache.org/jira/browse/CALCITE-4483 Project: Calcite Issue Type: Bug Reporter: Julian Hyde
Add a {{WITHIN DISTINCT}} clause to aggregate functions, allowing duplicate rows to be eliminated before entering the function. This feature is non-standard, and in fact does not exist in any database I am aware of. It is related to {{DISTINCT}}, and is in fact a generalization of it. {{DISTINCT}} can always be rewritten in terms of {{WITHIN DISTINCT}}. For example, {{SUM(DISTINCT sal)}} is equivalent to {{SUM(sal) WITHIN DISTINCT (sal)}}. Consider the query {noformat} SELECT SUM(age), SUM(DISTINCT age), SUM(age) WITHIN DISTINCT (name) FROM Friends{noformat} where {{Friends}} has the rows {noformat} name age job ====== === ========== Julian 16 Programmer Dick 15 Anne 13 Car wash George 15 Lifeguard George 15 Dog walker Timmy 4 {noformat} Note that there are two rows for George, because she has two jobs. The values of the columns are as follows: * {{SUM(age)}} has the value (16 + 15 + 13 + 15 + 15 + 4) = 78; * {{SUM(DISTINCT age)}} has the value (16 + 15 + 13 + 4) = 48; * {{SUM(age) WITHIN DISTINCT (name)}} has the value (16 + 15 + 13 + 15 + 4) = 63. {{WITHIN DISTINCT}} has treated the two 15 values for George as one value, but has still counted the 15 for Dick separately. The {{WITHIN DISTINCT}} clause can be useful to prevent double-counting when duplicates have been added via a many-to-one join. -- This message was sent by Atlassian Jira (v8.3.4#803005)