[ https://issues.apache.org/jira/browse/DRILL-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14291512#comment-14291512 ]
Jinfeng Ni commented on DRILL-2063: ----------------------------------- +1. Looks good to me. I'm not completely sure that the new change in Calcite's master branch would fix all issues about duplicate field names in Aggregate. We may take a further look to see if we need port your change to Calcite, once we complete the rebasing work. > Wrong result for query with aggregate expression > ------------------------------------------------ > > Key: DRILL-2063 > URL: https://issues.apache.org/jira/browse/DRILL-2063 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization > Reporter: Aman Sinha > Assignee: Aman Sinha > Priority: Critical > Fix For: 0.8.0 > > Attachments: > 0001-DRILL-2063-Fix-is-in-Calcite-ensure-uniqueness-of-fi.patch, > 0001-DRILL-2063-Make-sure-internal-fields-in-the-row-type.patch > > > The following query gives wrong result for avg_price: > {code} > 0: jdbc:drill:zk=local> select l_suppkey, > sum(l_extendedprice)/sum(l_quantity) as avg_price from > cp.`tpch/lineitem.parquet` where l_orderkey in (select o_orderkey from > cp.`tpch/orders.parquet` where o_custkey = 2) and l_suppkey = 4 group by > l_suppkey; > +------------+------------+ > | l_suppkey | avg_price | > +------------+------------+ > | 4 | 0.1111111111111111 | > +------------+------------+ > {code} > If I include the aggregate functions explicitly outside of the expression, I > get the right result: > {code} > 0: jdbc:drill:zk=local> select l_suppkey, sum(l_extendedprice) as > total_price, sum(l_quantity) as total_qty, > sum(l_extendedprice)/sum(l_quantity) as avg_price from > cp.`tpch/lineitem.parquet` where l_orderkey in (select o_orderkey from > cp.`tpch/orders.parquet` where o_custkey = 2) and l_suppkey = 4 group by > l_suppkey; > +------------+-------------+------------+------------+ > | l_suppkey | total_price | total_qty | avg_price | > +------------+-------------+------------+------------+ > | 4 | 49480.92 | 36.0 | 1374.47 | > +------------+-------------+------------+------------+ > {code} > Note that the wrong result in the first query is because of using the wrong > column for the numerator of the division. It is actually doing > l_suppkey/total_qty (4/36 = 0.11111). Since this is an egregious error, I > am marking this critical. -- This message was sent by Atlassian JIRA (v6.3.4#6332)