Jim Apple created IMPALA-6031:
---------------------------------

             Summary: Distributed plan describes coordinator-only nodes as 
scanning
                 Key: IMPALA-6031
                 URL: https://issues.apache.org/jira/browse/IMPALA-6031
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 2.11.0
            Reporter: Jim Apple


In a cluster with one coordinator-only node and three executor-only nodes:

{noformat}
Query: explain select count(*) from web_sales a, web_sales b where 
a.ws_order_number = b.ws_order_number and a.ws_item_sk = b.ws_item_sk
+------------------------------------------------------------------------------------------+
| Explain String                                                                
           |
+------------------------------------------------------------------------------------------+
| Per-Host Resource Reservation: Memory=136.00MB                                
           |
| Per-Host Resource Estimates: Memory=3.04GB                                    
           |
|                                                                               
           |
| F03:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                         
           |
|   PLAN-ROOT SINK                                                              
           |
|   |  mem-estimate=0B mem-reservation=0B                                       
           |
|   |                                                                           
           |
|   07:AGGREGATE [FINALIZE]                                                     
           |
|   |  output: count:merge(*)                                                   
           |
|   |  mem-estimate=10.00MB mem-reservation=0B                                  
           |
|   |  tuple-ids=2 row-size=8B cardinality=1                                    
           |
|   |                                                                           
           |
|   06:EXCHANGE [UNPARTITIONED]                                                 
           |
|      mem-estimate=0B mem-reservation=0B                                       
           |
|      tuple-ids=2 row-size=8B cardinality=1                                    
           |
|                                                                               
           |
| F02:PLAN FRAGMENT [HASH(a.ws_item_sk,a.ws_order_number)] hosts=4 instances=4  
           |
|   DATASTREAM SINK [FRAGMENT=F03, EXCHANGE=06, UNPARTITIONED]                  
           |
|   |  mem-estimate=0B mem-reservation=0B                                       
           |
|   03:AGGREGATE                                                                
           |
|   |  output: count(*)                                                         
           |
|   |  mem-estimate=10.00MB mem-reservation=0B                                  
           |
|   |  tuple-ids=2 row-size=8B cardinality=1                                    
           |
|   |                                                                           
           |
|   02:HASH JOIN [INNER JOIN, PARTITIONED]                                      
           |
|   |  hash predicates: a.ws_item_sk = b.ws_item_sk, a.ws_order_number = 
b.ws_order_number |
|   |  runtime filters: RF000 <- b.ws_item_sk, RF001 <- b.ws_order_number       
           |
|   |  mem-estimate=2.95GB mem-reservation=136.00MB                             
           |
|   |  tuple-ids=0,1 row-size=32B cardinality=720000376                         
           |
|   |                                                                           
           |
|   |--05:EXCHANGE [HASH(b.ws_item_sk,b.ws_order_number)]                       
           |
|   |     mem-estimate=0B mem-reservation=0B                                    
           |
|   |     tuple-ids=1 row-size=16B cardinality=720000376                        
           |
|   |                                                                           
           |
|   04:EXCHANGE [HASH(a.ws_item_sk,a.ws_order_number)]                          
           |
|      mem-estimate=0B mem-reservation=0B                                       
           |
|      tuple-ids=0 row-size=16B cardinality=720000376                           
           |
|                                                                               
           |
| F00:PLAN FRAGMENT [RANDOM] hosts=4 instances=4                                
           |
|   DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=04, 
HASH(a.ws_item_sk,a.ws_order_number)]      |
|   |  mem-estimate=0B mem-reservation=0B                                       
           |
|   00:SCAN HDFS [tpcds_1000_parquet.web_sales a, RANDOM]                       
           |
|      partitions=1824/1824 files=1824 size=47.08GB                             
           |
|      runtime filters: RF000 -> a.ws_item_sk, RF001 -> a.ws_order_number       
           |
|      table stats: 720000376 rows total                                        
           |
|      column stats: all                                                        
           |
|      mem-estimate=80.00MB mem-reservation=0B                                  
           |
|      tuple-ids=0 row-size=16B cardinality=720000376                           
           |
|                                                                               
           |
| F01:PLAN FRAGMENT [RANDOM] hosts=4 instances=4                                
           |
|   DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=05, 
HASH(b.ws_item_sk,b.ws_order_number)]      |
|   |  mem-estimate=0B mem-reservation=0B                                       
           |
|   01:SCAN HDFS [tpcds_1000_parquet.web_sales b, RANDOM]                       
           |
|      partitions=1824/1824 files=1824 size=47.08GB                             
           |
|      table stats: 720000376 rows total                                        
           |
|      column stats: all                                                        
           |
|      mem-estimate=80.00MB mem-reservation=0B                                  
           |
|      tuple-ids=1 row-size=16B cardinality=720000376                           
           |
+------------------------------------------------------------------------------------------+
{noformat}

It looks like the scans are going to be on 4 hosts, but actually, after running 
the query:

{noformat}
summary;
+-----------------+--------+----------+----------+---------+------------+-----------+---------------+--------------------------------------+
| Operator        | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | Peak 
Mem  | Est. Peak Mem | Detail                               |
+-----------------+--------+----------+----------+---------+------------+-----------+---------------+--------------------------------------+
| 07:AGGREGATE    | 1      | 0ns      | 0ns      | 1       | 1          | 28.00 
KB  | 10.00 MB      | FINALIZE                             |
| 06:EXCHANGE     | 1      | 0ns      | 0ns      | 3       | 1          | 0 B   
    | 0 B           | UNPARTITIONED                        |
| 03:AGGREGATE    | 3      | 345.33ms | 378.00ms | 3       | 1          | 
139.91 KB | 10.00 MB      |                                      |
| 02:HASH JOIN    | 3      | 90.39s   | 97.03s   | 720.00M | 720.00M    | 2.57 
GB   | 2.95 GB       | INNER JOIN, PARTITIONED              |
| |--05:EXCHANGE  | 3      | 4.48s    | 4.65s    | 720.00M | 720.00M    | 0 B   
    | 0 B           | HASH(b.ws_item_sk,b.ws_order_number) |
| |  01:SCAN HDFS | 3      | 59.31s   | 67.16s   | 720.00M | 720.00M    | 22.88 
MB  | 80.00 MB      | tpcds_1000_parquet.web_sales b       |
| 04:EXCHANGE     | 3      | 4.57s    | 4.87s    | 720.00M | 720.00M    | 0 B   
    | 0 B           | HASH(a.ws_item_sk,a.ws_order_number) |
| 00:SCAN HDFS    | 3      | 21.09s   | 22.68s   | 720.00M | 720.00M    | 23.45 
MB  | 80.00 MB      | tpcds_1000_parquet.web_sales a       |
+-----------------+--------+----------+----------+---------+------------+-----------+---------------+--------------------------------------+
{noformat}

It looks to me like the distributed plan thinks the coordinator will scan, but 
the coordinator does not scan.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to