[ https://issues.apache.org/jira/browse/DRILL-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967202#comment-16967202 ]
Boaz Ben-Zvi commented on DRILL-7136: ------------------------------------- When a Hash-Aggr partition is spilled, its hash-table is reset (i.e. reallocated at the default size of 64K), but the prior number of times resizing happened is left as is, as well as the resizing time; hence these stats show the total (across possible multiple iterations of reset-build-spill). So when the stats are reported, they show the *current* hash-table size, and the *accumulated* resizing stats. (Hash-Join does not have this issue, as the hash-table is built only when the partition is whole in memory). [~rhou] - should these stats be reported differently ? > Num_buckets for HashAgg in profile may be inaccurate > ---------------------------------------------------- > > Key: DRILL-7136 > URL: https://issues.apache.org/jira/browse/DRILL-7136 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build & Test > Affects Versions: 1.16.0 > Reporter: Robert Hou > Assignee: Boaz Ben-Zvi > Priority: Major > Attachments: 23650ee5-6721-8a8f-7dd3-f5dd09a3a7b0.sys.drill > > > I ran TPCH query 17 with sf 1000. Here is the query: > {noformat} > select > sum(l.l_extendedprice) / 7.0 as avg_yearly > from > lineitem l, > part p > where > p.p_partkey = l.l_partkey > and p.p_brand = 'Brand#13' > and p.p_container = 'JUMBO CAN' > and l.l_quantity < ( > select > 0.2 * avg(l2.l_quantity) > from > lineitem l2 > where > l2.l_partkey = p.p_partkey > ); > {noformat} > One of the hash agg operators has resized 6 times. It should have 4M > buckets. But the profile shows it has 64K buckets. > I have attached a sample profile. In this profile, the hash agg operator is > (04-02). > {noformat} > Operator Metrics > Minor Fragment NUM_BUCKETS NUM_ENTRIES NUM_RESIZING > RESIZING_TIME_MS NUM_PARTITIONS SPILLED_PARTITIONS SPILL_MB > SPILL_CYCLE INPUT_BATCH_COUNT AVG_INPUT_BATCH_BYTES > AVG_INPUT_ROW_BYTES INPUT_RECORD_COUNT OUTPUT_BATCH_COUNT > AVG_OUTPUT_BATCH_BYTES AVG_OUTPUT_ROW_BYTES OUTPUT_RECORD_COUNT > 04-00-02 65,536 748,746 6 364 1 > 582 0 813 582,653 18 26,316,456 401 1,631,943 > 25 26,176,350 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)