Hello Tim Armstrong, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/6459 to look at the new patch set (#8). Change subject: IMPALA-4883: Union Codegen ...................................................................... IMPALA-4883: Union Codegen For each non-passthrough child of the Union node, codegen the loop that does per row tuple materialization. Testing: Ran test_queries.py test locally in exchaustive mode. Benchmark: Ran a local benchmark on a local 10 GB TPCDS dataset on an unpartitioned store_sales table. SELECT COUNT(c), COUNT(ss_customer_sk), COUNT(ss_cdemo_sk), COUNT(ss_hdemo_sk), COUNT(ss_addr_sk), COUNT(ss_store_sk), COUNT(ss_promo_sk), COUNT(ss_ticket_number), COUNT(ss_quantity), COUNT(ss_wholesale_cost), COUNT(ss_list_price), COUNT(ss_sales_price), COUNT(ss_ext_discount_amt), COUNT(ss_ext_sales_price), COUNT(ss_ext_wholesale_cost), COUNT(ss_ext_list_price), COUNT(ss_ext_tax), COUNT(ss_coupon_amt), COUNT(ss_net_paid), COUNT(ss_net_paid_inc_tax), COUNT(ss_net_profit), COUNT(ss_sold_date_sk) FROM ( select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned ) t Before: 39s704ms Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ------------------------------------------------------------------------------------------------------------------------------ 13:AGGREGATE 1 194.504us 194.504us 1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE 1 17.284us 17.284us 3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 3 2s202ms 2s934ms 3 1 115.00 KB 10.00 MB 00:UNION 3 32s514ms 34s926ms 288.01M 288.01M 3.08 MB 0 |--02:SCAN HDFS 3 158.373ms 216.085ms 28.80M 28.80M 489.71 MB 1.88 GB tpcds_10_parquet.store_sales |--03:SCAN HDFS 3 167.002ms 171.738ms 28.80M 28.80M 489.74 MB 1.88 GB tpcds_10_parquet.store_sales |--04:SCAN HDFS 3 125.331ms 145.496ms 28.80M 28.80M 489.57 MB 1.88 GB tpcds_10_parquet.store_sales |--05:SCAN HDFS 3 148.478ms 194.311ms 28.80M 28.80M 489.69 MB 1.88 GB tpcds_10_parquet.store_sales |--06:SCAN HDFS 3 143.995ms 162.781ms 28.80M 28.80M 489.57 MB 1.88 GB tpcds_10_parquet.store_sales |--07:SCAN HDFS 3 169.731ms 250.201ms 28.80M 28.80M 489.58 MB 1.88 GB tpcds_10_parquet.store_sales |--08:SCAN HDFS 3 164.110ms 254.374ms 28.80M 28.80M 489.61 MB 1.88 GB tpcds_10_parquet.store_sales |--09:SCAN HDFS 3 135.631ms 162.117ms 28.80M 28.80M 489.63 MB 1.88 GB tpcds_10_parquet.store_sales |--10:SCAN HDFS 3 138.736ms 167.778ms 28.80M 28.80M 489.67 MB 1.88 GB tpcds_10_parquet.store_sales 01:SCAN HDFS 3 202.015ms 248.728ms 28.80M 28.80M 489.68 MB 1.88 GB tpcds_10_parquet.store_sales After: 20s177ms Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ------------------------------------------------------------------------------------------------------------------------------ 13:AGGREGATE 1 174.617us 174.617us 1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE 1 16.693us 16.693us 3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 3 2s830ms 3s615ms 3 1 115.00 KB 10.00 MB 00:UNION 3 4s296ms 5s258ms 288.01M 288.01M 3.08 MB 0 |--02:SCAN HDFS 3 1s212ms 1s340ms 28.80M 28.80M 488.82 MB 1.88 GB tpcds_10_parquet.store_sales |--03:SCAN HDFS 3 1s387ms 1s570ms 28.80M 28.80M 489.37 MB 1.88 GB tpcds_10_parquet.store_sales |--04:SCAN HDFS 3 1s224ms 1s347ms 28.80M 28.80M 487.22 MB 1.88 GB tpcds_10_parquet.store_sales |--05:SCAN HDFS 3 1s245ms 1s321ms 28.80M 28.80M 489.25 MB 1.88 GB tpcds_10_parquet.store_sales |--06:SCAN HDFS 3 1s232ms 1s505ms 28.80M 28.80M 484.21 MB 1.88 GB tpcds_10_parquet.store_sales |--07:SCAN HDFS 3 1s348ms 1s518ms 28.80M 28.80M 488.20 MB 1.88 GB tpcds_10_parquet.store_sales |--08:SCAN HDFS 3 1s231ms 1s335ms 28.80M 28.80M 483.58 MB 1.88 GB tpcds_10_parquet.store_sales |--09:SCAN HDFS 3 1s179ms 1s349ms 28.80M 28.80M 482.76 MB 1.88 GB tpcds_10_parquet.store_sales |--10:SCAN HDFS 3 1s121ms 1s154ms 28.80M 28.80M 486.59 MB 1.88 GB tpcds_10_parquet.store_sales 01:SCAN HDFS 3 1s284ms 1s523ms 28.80M 28.80M 486.70 MB 1.88 GB tpcds_10_parquet.store_sales Change-Id: Ib4107d27582ff5416172810364a6e76d3d93c439 --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/CMakeLists.txt A be/src/exec/union-node-ir.cc M be/src/exec/union-node.cc M be/src/exec/union-node.h M testdata/workloads/functional-query/queries/QueryTest/union.test 7 files changed, 185 insertions(+), 53 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/6459/8 -- To view, visit http://gerrit.cloudera.org:8080/6459 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib4107d27582ff5416172810364a6e76d3d93c439 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>