Jim Apple created IMPALA-6031: --------------------------------- Summary: Distributed plan describes coordinator-only nodes as scanning Key: IMPALA-6031 URL: https://issues.apache.org/jira/browse/IMPALA-6031 Project: IMPALA Issue Type: Bug Components: Frontend Affects Versions: Impala 2.11.0 Reporter: Jim Apple
In a cluster with one coordinator-only node and three executor-only nodes: {noformat} Query: explain select count(*) from web_sales a, web_sales b where a.ws_order_number = b.ws_order_number and a.ws_item_sk = b.ws_item_sk +------------------------------------------------------------------------------------------+ | Explain String | +------------------------------------------------------------------------------------------+ | Per-Host Resource Reservation: Memory=136.00MB | | Per-Host Resource Estimates: Memory=3.04GB | | | | F03:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 | | PLAN-ROOT SINK | | | mem-estimate=0B mem-reservation=0B | | | | | 07:AGGREGATE [FINALIZE] | | | output: count:merge(*) | | | mem-estimate=10.00MB mem-reservation=0B | | | tuple-ids=2 row-size=8B cardinality=1 | | | | | 06:EXCHANGE [UNPARTITIONED] | | mem-estimate=0B mem-reservation=0B | | tuple-ids=2 row-size=8B cardinality=1 | | | | F02:PLAN FRAGMENT [HASH(a.ws_item_sk,a.ws_order_number)] hosts=4 instances=4 | | DATASTREAM SINK [FRAGMENT=F03, EXCHANGE=06, UNPARTITIONED] | | | mem-estimate=0B mem-reservation=0B | | 03:AGGREGATE | | | output: count(*) | | | mem-estimate=10.00MB mem-reservation=0B | | | tuple-ids=2 row-size=8B cardinality=1 | | | | | 02:HASH JOIN [INNER JOIN, PARTITIONED] | | | hash predicates: a.ws_item_sk = b.ws_item_sk, a.ws_order_number = b.ws_order_number | | | runtime filters: RF000 <- b.ws_item_sk, RF001 <- b.ws_order_number | | | mem-estimate=2.95GB mem-reservation=136.00MB | | | tuple-ids=0,1 row-size=32B cardinality=720000376 | | | | | |--05:EXCHANGE [HASH(b.ws_item_sk,b.ws_order_number)] | | | mem-estimate=0B mem-reservation=0B | | | tuple-ids=1 row-size=16B cardinality=720000376 | | | | | 04:EXCHANGE [HASH(a.ws_item_sk,a.ws_order_number)] | | mem-estimate=0B mem-reservation=0B | | tuple-ids=0 row-size=16B cardinality=720000376 | | | | F00:PLAN FRAGMENT [RANDOM] hosts=4 instances=4 | | DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=04, HASH(a.ws_item_sk,a.ws_order_number)] | | | mem-estimate=0B mem-reservation=0B | | 00:SCAN HDFS [tpcds_1000_parquet.web_sales a, RANDOM] | | partitions=1824/1824 files=1824 size=47.08GB | | runtime filters: RF000 -> a.ws_item_sk, RF001 -> a.ws_order_number | | table stats: 720000376 rows total | | column stats: all | | mem-estimate=80.00MB mem-reservation=0B | | tuple-ids=0 row-size=16B cardinality=720000376 | | | | F01:PLAN FRAGMENT [RANDOM] hosts=4 instances=4 | | DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=05, HASH(b.ws_item_sk,b.ws_order_number)] | | | mem-estimate=0B mem-reservation=0B | | 01:SCAN HDFS [tpcds_1000_parquet.web_sales b, RANDOM] | | partitions=1824/1824 files=1824 size=47.08GB | | table stats: 720000376 rows total | | column stats: all | | mem-estimate=80.00MB mem-reservation=0B | | tuple-ids=1 row-size=16B cardinality=720000376 | +------------------------------------------------------------------------------------------+ {noformat} It looks like the scans are going to be on 4 hosts, but actually, after running the query: {noformat} summary; +-----------------+--------+----------+----------+---------+------------+-----------+---------------+--------------------------------------+ | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | +-----------------+--------+----------+----------+---------+------------+-----------+---------------+--------------------------------------+ | 07:AGGREGATE | 1 | 0ns | 0ns | 1 | 1 | 28.00 KB | 10.00 MB | FINALIZE | | 06:EXCHANGE | 1 | 0ns | 0ns | 3 | 1 | 0 B | 0 B | UNPARTITIONED | | 03:AGGREGATE | 3 | 345.33ms | 378.00ms | 3 | 1 | 139.91 KB | 10.00 MB | | | 02:HASH JOIN | 3 | 90.39s | 97.03s | 720.00M | 720.00M | 2.57 GB | 2.95 GB | INNER JOIN, PARTITIONED | | |--05:EXCHANGE | 3 | 4.48s | 4.65s | 720.00M | 720.00M | 0 B | 0 B | HASH(b.ws_item_sk,b.ws_order_number) | | | 01:SCAN HDFS | 3 | 59.31s | 67.16s | 720.00M | 720.00M | 22.88 MB | 80.00 MB | tpcds_1000_parquet.web_sales b | | 04:EXCHANGE | 3 | 4.57s | 4.87s | 720.00M | 720.00M | 0 B | 0 B | HASH(a.ws_item_sk,a.ws_order_number) | | 00:SCAN HDFS | 3 | 21.09s | 22.68s | 720.00M | 720.00M | 23.45 MB | 80.00 MB | tpcds_1000_parquet.web_sales a | +-----------------+--------+----------+----------+---------+------------+-----------+---------------+--------------------------------------+ {noformat} It looks to me like the distributed plan thinks the coordinator will scan, but the coordinator does not scan. -- This message was sent by Atlassian JIRA (v6.4.14#64029)