[ 
https://issues.apache.org/jira/browse/DRILL-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14291512#comment-14291512
 ] 

Jinfeng Ni commented on DRILL-2063:
-----------------------------------

+1. Looks good to me.

I'm not completely sure that the new change in Calcite's master branch would 
fix all issues about duplicate field names in Aggregate. We may take a further 
look to see if we need port your change to Calcite, once we complete the 
rebasing work. 

> Wrong result for query with aggregate expression
> ------------------------------------------------
>
>                 Key: DRILL-2063
>                 URL: https://issues.apache.org/jira/browse/DRILL-2063
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>            Priority: Critical
>             Fix For: 0.8.0
>
>         Attachments: 
> 0001-DRILL-2063-Fix-is-in-Calcite-ensure-uniqueness-of-fi.patch, 
> 0001-DRILL-2063-Make-sure-internal-fields-in-the-row-type.patch
>
>
> The following query gives wrong result for avg_price: 
> {code}
> 0: jdbc:drill:zk=local> select l_suppkey, 
> sum(l_extendedprice)/sum(l_quantity) as avg_price from 
> cp.`tpch/lineitem.parquet` where l_orderkey in (select o_orderkey from 
> cp.`tpch/orders.parquet` where o_custkey = 2) and l_suppkey = 4 group by 
> l_suppkey;
> +------------+------------+
> | l_suppkey  | avg_price  |
> +------------+------------+
> | 4          | 0.1111111111111111 |
> +------------+------------+
> {code}
> If I include the aggregate functions explicitly outside of the expression, I 
> get the right result: 
> {code}
> 0: jdbc:drill:zk=local> select l_suppkey, sum(l_extendedprice) as 
> total_price, sum(l_quantity) as total_qty, 
> sum(l_extendedprice)/sum(l_quantity) as avg_price from 
> cp.`tpch/lineitem.parquet` where l_orderkey in (select o_orderkey from 
> cp.`tpch/orders.parquet` where o_custkey = 2) and l_suppkey = 4 group by 
> l_suppkey;
> +------------+-------------+------------+------------+
> | l_suppkey  | total_price | total_qty  | avg_price  |
> +------------+-------------+------------+------------+
> | 4          | 49480.92    | 36.0       | 1374.47    |
> +------------+-------------+------------+------------+
> {code}
> Note that the wrong result  in the first query is because of using the wrong 
> column for the numerator of the division.  It is actually doing 
> l_suppkey/total_qty  (4/36 = 0.11111).   Since this is an egregious error, I 
> am marking this critical. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to