[ https://issues.apache.org/jira/browse/CALCITE-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17283432#comment-17283432 ]
Julian Hyde edited comment on CALCITE-4483 at 2/11/21, 11:58 PM: ----------------------------------------------------------------- Based on [~amaliujia]'s feedback, I changed the behavior to throw if the operand value is not functionally dependent on the {{WITHIN DISTINCT}} key within the current group. (The non-throwing behavior is available via another variant of the planner rule.) I implemented {{UNIQUE_VALUE}}, but it was not sufficient to solve the problems. When evaluated in grouping setsĀ \(x) and \(x, y), we needed the result from the second grouping set (where the value was indeed unique) but were not interested in the result from the first grouping set (where the value is not unique). But it was still calculated in the first grouping set, and therefore threw an exception that aborted the query. CALCITE-4484 remains open (and has an implementation that is almost complete) but we will not use it to implement {{WITHIN DISTINCT}}. So, the implementation is based on something like {{FILTER (WHERE MIN(v) IS NOT DISTINCT FROM MAX(v))}}, which works fine as long as the aggregate function does not want to receive NULL values (see further note below). was (Author: julianhyde): Based on [~amaliujia]'s feedback, I changed the behavior to throw if the operand value is not functionally dependent on the {{WITHIN DISTINCT}} key within the current group. (The non-throwing behavior is available via another variant of the planner rule.) I implemented {{UNIQUE_VALUE}}, but it was not sufficient to solve the problems. When evaluated in grouping setsĀ \(x) and \(x, y), we needed the result from the second grouping set (where the value was indeed unique) but were not interested in the result from the first grouping set (where the value is not unique). But it was still calculated in the first grouping set, and therefore threw an exception that aborted the query. CALCITE-4484 remains open (and has an implementation that is almost complete) but we will not use it to implement {{WITHIN DISTINCT}}. So, the implementation is based on something like {{FILTER (WHERE MAX(v) IS NOT DISTINCT FROM MIN(x))}}, which works fine as long as the aggregate function does not want to receive NULL values (see further note below). > WITHIN DISTINCT clause for aggregate functions (experimental) > ------------------------------------------------------------- > > Key: CALCITE-4483 > URL: https://issues.apache.org/jira/browse/CALCITE-4483 > Project: Calcite > Issue Type: Bug > Reporter: Julian Hyde > Assignee: Julian Hyde > Priority: Major > Labels: pull-request-available > Fix For: 1.27.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Add a {{WITHIN DISTINCT}} clause to aggregate functions, allowing duplicate > rows to be eliminated before entering the function. > This feature is non-standard, and in fact does not exist in any database I am > aware of. Therefore this feature is experimental, and may evolve or be > removed without notice. > It is related to {{DISTINCT}}, and is in fact a generalization of it. > {{DISTINCT}} can always be rewritten in terms of {{WITHIN DISTINCT}}. For > example, {{SUM(DISTINCT sal)}} is equivalent to {{SUM(sal) WITHIN DISTINCT > (sal)}}. > Consider the query > {noformat} > SELECT SUM(age), > SUM(DISTINCT age), > SUM(age) WITHIN DISTINCT (name) > FROM Friends{noformat} > where {{Friends}} has the rows > {noformat} > name age job > ====== === ========== > Julian 16 Programmer > Dick 15 > Anne 13 Car wash > George 15 Lifeguard > George 15 Dog walker > Timmy 4 > {noformat} > Note that there are two rows for George, because she has two jobs. > The values of the columns are as follows: > * {{SUM(age)}} has the value (16 + 15 + 13 + 15 + 15 + 4) = 78; > * {{SUM(DISTINCT age)}} has the value (16 + 15 + 13 + 4) = 48; > * {{SUM(age) WITHIN DISTINCT (name)}} has the value (16 + 15 + 13 + 15 + 4) > = 63. > {{WITHIN DISTINCT}} has treated the two 15 values for George as one value, > but has still counted the 15 for Dick separately. > The {{WITHIN DISTINCT}} clause can be useful to prevent double-counting when > duplicates have been added via a many-to-one join. -- This message was sent by Atlassian Jira (v8.3.4#803005)