Robert Hou created DRILL-7136: --------------------------------- Summary: Num_buckets for HashAgg in profile may be inaccurate Key: DRILL-7136 URL: https://issues.apache.org/jira/browse/DRILL-7136 Project: Apache Drill Issue Type: Bug Components: Tools, Build & Test Affects Versions: 1.16.0 Reporter: Robert Hou Assignee: Pritesh Maker Fix For: 1.16.0 Attachments: 23650ee5-6721-8a8f-7dd3-f5dd09a3a7b0.sys.drill
I ran TPCH query 17 with sf 1000. Here is the query: {noformat} select sum(l.l_extendedprice) / 7.0 as avg_yearly from lineitem l, part p where p.p_partkey = l.l_partkey and p.p_brand = 'Brand#13' and p.p_container = 'JUMBO CAN' and l.l_quantity < ( select 0.2 * avg(l2.l_quantity) from lineitem l2 where l2.l_partkey = p.p_partkey ); {noformat} One of the hash agg operators has resized 6 times. It should have 4M buckets. But the profile shows it has 64K buckets. I have attached a sample profile. In this profile, the hash agg operator is (04-02). {noformat} Operator Metrics Minor Fragment NUM_BUCKETS NUM_ENTRIES NUM_RESIZING RESIZING_TIME_MS NUM_PARTITIONS SPILLED_PARTITIONS SPILL_MB SPILL_CYCLE INPUT_BATCH_COUNT AVG_INPUT_BATCH_BYTES AVG_INPUT_ROW_BYTES INPUT_RECORD_COUNT OUTPUT_BATCH_COUNT AVG_OUTPUT_BATCH_BYTES AVG_OUTPUT_ROW_BYTES OUTPUT_RECORD_COUNT 04-00-02 65,536 748,746 6 364 1 582 0 813 582,653 18 26,316,456 401 1,631,943 25 26,176,350 {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)