Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12221 )

Change subject: [PROTOTYPE] IMPALA-5872: Test case builder for query planner
......................................................................


Patch Set 1:

The following example exports a testcase from an 8-node cluster and replays the 
plan on my local mini cluster (1 node).

On 8-node cluster (tpcds1000 SF, q30)

[xxxxxxxx:21000] default> use tpcds_1000_parquet;
Query: use tpcds_1000_parquet
[xxxxxxxx:21000] tpcds_1000_parquet> export testcase into outfile 
'hdfs:///tmp/' with customer_total_return as
 (select wr_returning_customer_sk as ctr_customer_sk
        ,ca_state as ctr_state,
 sum(wr_return_amt) as ctr_total_return
 from web_returns
     ,date_dim
     ,customer_address
 where wr_returned_date_sk = d_date_sk
   and d_year =2002
   and wr_returning_addr_sk = ca_address_sk
 group by wr_returning_customer_sk
         ,ca_state)
  select  
c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag
       
,c_birth_day,c_birth_month,c_birth_year,c_birth_country,c_login,c_email_address
       ,c_last_review_date,ctr_total_return
 from customer_total_return ctr1
     ,customer_address
     ,customer
 where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2
   from customer_total_return ctr2
                    where ctr1.ctr_state = ctr2.ctr_state)
       and ca_address_sk = c_current_addr_sk
       and ca_state = 'IL'
       and ctr1.ctr_customer_sk = c_customer_sk
 order by 
c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag
                  
,c_birth_day,c_birth_month,c_birth_year,c_birth_country,c_login,c_email_address
                  ,c_last_review_date,ctr_total_return
limit 100;
Query: export testcase into outfile 'hdfs:///tmp/' with customer_total_return as
 (select wr_returning_customer_sk as ctr_customer_sk
        ,ca_state as ctr_state,
 sum(wr_return_amt) as ctr_total_return
 from web_returns
     ,date_dim
     ,customer_address
 where wr_returned_date_sk = d_date_sk
   and d_year =2002
   and wr_returning_addr_sk = ca_address_sk
 group by wr_returning_customer_sk
         ,ca_state)
  select  
c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag
       
,c_birth_day,c_birth_month,c_birth_year,c_birth_country,c_login,c_email_address
       ,c_last_review_date,ctr_total_return
 from customer_total_return ctr1
     ,customer_address
     ,customer
 where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2
   from customer_total_return ctr2
                    where ctr1.ctr_state = ctr2.ctr_state)
       and ca_address_sk = c_current_addr_sk
       and ca_state = 'IL'
       and ctr1.ctr_customer_sk = c_customer_sk
 order by 
c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag
                  
,c_birth_day,c_birth_month,c_birth_year,c_birth_country,c_login,c_email_address
                  ,c_last_review_date,ctr_total_return
limit 100
Query submitted at: 2019-01-11 14:20:57 (Coordinator: http://xxxxxxxx:25000)
Query progress can be monitored at: 
http://xxxxxxxx:25000/query_plan?query_id=a74384f0af33b8ca:ca1342c000000000
+-----------------------------------------------------------------------------------------------------+
| Test case data output path                                                    
                      |
+-----------------------------------------------------------------------------------------------------+
| 
hdfs://xxxxxxxx:8020/tmp/impala-testcase-data-1dc27e8a-a293-48bb-b91e-8384550cebed
 |
+-----------------------------------------------------------------------------------------------------+
WARNINGS: Impala does not have READ_WRITE access to path 'hdfs://xxxxxxxx:8020/'

Fetched 1 row(s) in 1.46s
[xxxxxxxx:21000] tpcds_1000_parquet>

[user@host ~]$ ls -lSh impala-testcase-data-1dc27e8a-a293-48bb-b91e-8384550cebed
-rw-r--r-- 1 user user 503K Jan 11 14:22 
impala-testcase-data-1dc27e8a-a293-48bb-b91e-8384550cebed

Copy the file 
hdfs://xxxxxxxx:8020/tmp/impala-testcase-data-1dc27e8a-a293-48bb-b91e-8384550cebed
 into a target cluster.

-- Load the testcase on the local cluster. (the target cluster doesn't contain 
these Dbs/tables, so they are imported)

[localhost:21000] default> load testcase from 
'hdfs:///tmp/impala-testcase-data-1dc27e8a-a293-48bb-b91e-8384550cebed';
Query: load testcase from 
'hdfs:///tmp/impala-testcase-data-1dc27e8a-a293-48bb-b91e-8384550cebed'
+----------------------------------------------+
| summary                                      |
+----------------------------------------------+
| 1 db(s), 11 table(s) and 0 view(s) imported. |
+----------------------------------------------+
Fetched 1 row(s) in 1.07s

[localhost:21000] tpcds_1000_parquet> show tables in tpcds_1000_parquet;
Query: show tables in tpcds_1000_parquet
+------------------+
| name             |
+------------------+
| customer         |
| customer_address |
| date_dim         |
| web_returns      |
+------------------+
Fetched 4 row(s) in 0.00s
[localhost:21000] tpcds_1000_parquet>

[localhost:21000] tpcds_1000_parquet> set EXPLAIN_LEVEL=3;
EXPLAIN_LEVEL set to 3
[localhost:21000] tpcds_1000_parquet> set PLANNER_DEBUG_MODE=true;
PLANNER_DEBUG_MODE set to true
[localhost:21000] tpcds_1000_parquet> explain with customer_total_return as
 (select wr_returning_customer_sk as ctr_customer_sk
        ,ca_state as ctr_state,
 sum(wr_return_amt) as ctr_total_return
 from web_returns
     ,date_dim
     ,customer_address
 where wr_returned_date_sk = d_date_sk
   and d_year =2002
   and wr_returning_addr_sk = ca_address_sk
 group by wr_returning_customer_sk
         ,ca_state)
  select  
c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag
       
,c_birth_day,c_birth_month,c_birth_year,c_birth_country,c_login,c_email_address
       ,c_last_review_date,ctr_total_return
 from customer_total_return ctr1
     ,customer_address
     ,customer
 where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2
   from customer_total_return ctr2
                    where ctr1.ctr_state = ctr2.ctr_state)
       and ca_address_sk = c_current_addr_sk
       and ca_state = 'IL'
       and ctr1.ctr_customer_sk = c_customer_sk
 order by 
c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag
                  
,c_birth_day,c_birth_month,c_birth_year,c_birth_country,c_login,c_email_address
                  ,c_last_review_date,ctr_total_return
limit 100;


here is a snippet of the explain plan

| F08:PLAN FRAGMENT [RANDOM] hosts=7 instances=7             <=====             
                                                                                
                                                                                
     
| Per-Host Resources: mem-estimate=52.95MB mem-reservation=20.94MB 
thread-reservation=2 runtime-filters-memory=3.00MB                              
                                                                                
                                           |
|   DATASTREAM SINK [FRAGMENT=F11, EXCHANGE=28, HASH(wr_returning_addr_sk)]     
                                                                                
                                                                                
                              |
|   |  mem-estimate=0B mem-reservation=0B thread-reservation=0                  
                                                                                
                                                                                
                              |
|   11:HASH JOIN [INNER JOIN, BROADCAST]                                        
                                                                                
                                                                                
                              |
|   |  hash predicates: wr_returned_date_sk = d_date_sk                         
                                                                                
                                                                                
                              |
|   |  fk/pk conjuncts: wr_returned_date_sk = d_date_sk                         
                                                                                
                                                                                
                              |
|   |  runtime filters: RF012[bloom] <- d_date_sk                               
                                                                                
                                                                                
                              |
|   |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB 
thread-reservation=0                                                            
                                                                                
                                       |
|   |  tuple-ids=7,8 row-size=24B cardinality=12290653                          
                                                                                
                                                                                
                              |
|   |  in pipelines: 08(GETNEXT), 09(OPEN)                                      
                                                                                
                                                                                
                              |
|   |                                                                           
                                                                                
                                                                                
                              |
|   |--27:EXCHANGE [BROADCAST]                                                  
                                                                                
                                                                                
                              |
|   |     mem-estimate=16.00KB mem-reservation=0B thread-reservation=0          
                                                                                
                                                                                
                              |
|   |     tuple-ids=8 row-size=8B cardinality=373                               
                                                                                
                                                                                
                              |
|   |     in pipelines: 09(GETNEXT)                                             
                                                                                
                                                                                
                              |
|   |                                                                           
                                                                                
                                                                                
                              |
|   08:SCAN HDFS [tpcds_1000_parquet.web_returns, RANDOM]                       
                                                                                
                                                                                
                              |
|      partitions=2185/2185 files=2185 size=6.50GB                              
                                                                                
                                                                                
                              |
|      runtime filters: RF010[bloom] -> wr_returning_addr_sk, RF012[bloom] -> 
wr_returned_date_sk                                                             
                                                                                
                                |
|      stored statistics:                                                       
                                                                                
                                                                                
                              |
|        table: rows=71997522 size=6.50GB                                       
                                                                                
                                                                                
                              |
|        partitions: 2185/2185 rows=71997522                                    
                                                                                
                                                                                
                              |
|        columns: all                                                           
                                                                                
                                                                                
                              |
|      extrapolated-rows=disabled max-scan-range-rows=3239259                   
                                                                                
                                                                                
                              |
|      mem-estimate=48.00MB mem-reservation=16.00MB thread-reservation=1        
                                                                                
                                                                                
                              |
|      tuple-ids=7 row-size=16B cardinality=71997522                            
                                                                                
                                                                                
                              |
|      in pipelines: 08(GETNEXT)                                                
                                                                                
                                                                                
                              |
|                                                                               
                                                                                
                                                                                
                              |
| F09:PLAN FRAGMENT [RANDOM] hosts=1 instances=1                                
                                                                                
                                                                                
                              |
| Per-Host Resources: mem-estimate=32.00MB mem-reservation=512.00KB 
thread-reservation=2                                                            
                                                                                
                                          |
|   DATASTREAM SINK [FRAGMENT=F08, EXCHANGE=27, BROADCAST]                      
                                                                                
                                                                                
                              |
|   |  mem-estimate=0B mem-reservation=0B thread-reservation=0                  
                                                                                
                                                                                
                              |
|   09:SCAN HDFS [tpcds_1000_parquet.date_dim, RANDOM]                          
                                                                                
                                                                                
                              |
|      partitions=1/1 files=1 size=2.15MB                                       
                                                                                
                                                                                
                              |
|      predicates: d_year = CAST(2002 AS INT)                                   
                                                                                
                                                                                
                              |
|      stored statistics:                                                       
                                                                                
                                                                                
                              |
|        table: rows=73049 size=2.15MB                                          
                                                                                
                                                                                
                              |
|        columns: all                                                           
                                                                                
                                                                                
                              |
|      extrapolated-rows=disabled max-scan-range-rows=73049                     
                                                                                
                                                                                
                              |
|      parquet statistics predicates: d_year = CAST(2002 AS INT)                
                                                                                
                                                                                
                              |
|      parquet dictionary predicates: d_year = CAST(2002 AS INT)                
                                                                                
                                                                                
                              |
|      mem-estimate=32.00MB mem-reservation=512.00KB thread-reservation=1       
                                                                                
                                                                                
                              |
|      tuple-ids=8 row-size=8B cardinality=373                                  
                                                                                
                                                                                
                              |
|      in pipelines: 09(GETNEXT)                                                
                                                                                
                                                                                
                              |
|                                                                               
                                                                                
                                                                                
                              |
| F10:PLAN FRAGMENT [RANDOM] hosts=7 instances=7   <====                        
                                                                                
                                                                                
                                   |
| Per-Host Resources: mem-estimate=48.00MB mem-reservation=4.00MB 
thread-reservation=2                                                            
                                                                                
                                            |
|   DATASTREAM SINK [FRAGMENT=F11, EXCHANGE=29, HASH(ca_address_sk)]            
                                                                                
                                                                                
                              |
|   |  mem-estimate=0B mem-reservation=0B thread-reservation=0                  
                                                                                
                                                                                
                              |
|   10:SCAN HDFS [tpcds_1000_parquet.customer_address, RANDOM]                  
                                                                                
                                                                                
                              |
|      partitions=1/1 files=7 size=111.28MB                                     
                                                                                
                                                                                
                              |
|      stored statistics:                                                       
                                                                                
                                                                                
                              |
|        table: rows=6000000 size=111.28MB                                      
                                                                                
                                                                                
                              |
|        columns: all                                                           
                                                                                
                                                                                
                              |
|      extrapolated-rows=disabled max-scan-range-rows=858070                    
                                                                                
                                                                                
                              |
|      mem-estimate=48.00MB mem-reservation=4.00MB thread-reservation=1         
                                                                                
                                                                                
                              |
|      tuple-ids=9 row-size=18B cardinality=6000000                             
                                                                                
                                                                                
                              |
|      in pipelines: 10(GETNEXT)                                                
                                                                                
                                                                                
                              |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Fetched 353 row(s) in 0.39s


--
To view, visit http://gerrit.cloudera.org:8080/12221
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec83eeb2dc5136768b70ed581fb8d3ed0335cb52
Gerrit-Change-Number: 12221
Gerrit-PatchSet: 1
Gerrit-Owner: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Comment-Date: Fri, 11 Jan 2019 22:33:41 +0000
Gerrit-HasComments: No

Reply via email to