[ 
https://issues.apache.org/jira/browse/DRILL-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17656043#comment-17656043
 ] 

ASF GitHub Bot commented on DRILL-8381:
---------------------------------------

vvysotskyi opened a new pull request, #2734:
URL: https://github.com/apache/drill/pull/2734

   # [DRILL-8381](https://issues.apache.org/jira/browse/DRILL-8381): Add 
support for filtered aggregate calls
   
   ## Description
   For the case when filtering expression is specified, Drill will generate an 
`if` expression to obtain field value that will be used in aggregate function 
only when the filter predicate is true. Filter expression specified within an 
aggregate function is present in the underlying project, so it is enough to get 
a reference to it to use it as a condition.
   
   ## Documentation
   NA
   
   ## Testing
   Added UT.
   




> Add support for filtered aggregate calls
> ----------------------------------------
>
>                 Key: DRILL-8381
>                 URL: https://issues.apache.org/jira/browse/DRILL-8381
>             Project: Apache Drill
>          Issue Type: New Feature
>            Reporter: Vova Vysotskyi
>            Assignee: Vova Vysotskyi
>            Priority: Major
>
> Currently, Drill ignores filters for filtered aggregate calls and returns 
> incorrect results.
> Here is the example query for which Drill will return incorrect results:
> {code:sql}
> SELECT count(n_name) FILTER(WHERE n_regionkey = 1) AS 
> nations_count_in_1_region,
> count(n_name) FILTER(WHERE n_regionkey = 2) AS nations_count_in_2_region,
> count(n_name) FILTER(WHERE n_regionkey = 3) AS nations_count_in_3_region,
> count(n_name) FILTER(WHERE n_regionkey = 4) AS nations_count_in_4_region,
> count(n_name) FILTER(WHERE n_regionkey = 0) AS nations_count_in_0_region
> FROM cp.`tpch/nation.parquet`
> {code}
> {noformat}
> +---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
> | nations_count_in_1_region | nations_count_in_2_region | 
> nations_count_in_3_region | nations_count_in_4_region | 
> nations_count_in_0_region |
> +---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
> | 25                        | 25                        | 25                  
>       | 25                        | 25                        |
> +---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
> {noformat}
> But the correct result is
> {noformat}
> +---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
> | nations_count_in_1_region | nations_count_in_2_region | 
> nations_count_in_3_region | nations_count_in_4_region | 
> nations_count_in_0_region |
> +---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
> | 5                         | 5                         | 5                   
>       | 5                         | 5                         |
> +---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
> {noformat}
> Side note:
> The query above could be rewritten using PIVOT:
> {code:sql}
> SELECT `1` nations_count_in_1_region, `2` nations_count_in_2_region, `3` 
> nations_count_in_3_region, `4` nations_count_in_4_region, `0` 
> nations_count_in_0_region
> FROM (SELECT n_name, n_regionkey FROM cp.`tpch/nation.parquet`) 
> PIVOT(count(n_name) FOR n_regionkey IN (0, 1, 2, 3, 4))
> {code}
> And will return correct results when this issue is fixed and Calcite is 
> updated to 1.33.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to