Robert Hou created DRILL-7136:
---------------------------------

             Summary: Num_buckets for HashAgg in profile may be inaccurate
                 Key: DRILL-7136
                 URL: https://issues.apache.org/jira/browse/DRILL-7136
             Project: Apache Drill
          Issue Type: Bug
          Components: Tools, Build & Test
    Affects Versions: 1.16.0
            Reporter: Robert Hou
            Assignee: Pritesh Maker
             Fix For: 1.16.0
         Attachments: 23650ee5-6721-8a8f-7dd3-f5dd09a3a7b0.sys.drill

I ran TPCH query 17 with sf 1000.  Here is the query:
{noformat}
select
  sum(l.l_extendedprice) / 7.0 as avg_yearly
from
  lineitem l,
  part p
where
  p.p_partkey = l.l_partkey
  and p.p_brand = 'Brand#13'
  and p.p_container = 'JUMBO CAN'
  and l.l_quantity < (
    select
      0.2 * avg(l2.l_quantity)
    from
      lineitem l2
    where
      l2.l_partkey = p.p_partkey
  );
{noformat}

One of the hash agg operators has resized 6 times.  It should have 4M buckets.  
But the profile shows it has 64K buckets.



I have attached a sample profile.  In this profile, the hash agg operator is 
(04-02).
{noformat}
Operator Metrics
Minor Fragment  NUM_BUCKETS     NUM_ENTRIES     NUM_RESIZING    
RESIZING_TIME_MS        NUM_PARTITIONS  SPILLED_PARTITIONS      SPILL_MB        
SPILL_CYCLE     INPUT_BATCH_COUNT       AVG_INPUT_BATCH_BYTES   
AVG_INPUT_ROW_BYTES     INPUT_RECORD_COUNT      OUTPUT_BATCH_COUNT      
AVG_OUTPUT_BATCH_BYTES  AVG_OUTPUT_ROW_BYTES    OUTPUT_RECORD_COUNT
04-00-02        65,536  748,746 6       364     1               582     0       
813     582,653 18      26,316,456      401     1,631,943       25      
26,176,350
{noformat}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to