Taras Bobrovytsky has uploaded a new patch set (#8).

Change subject: IMPALA-4883: Union Codegen
......................................................................

IMPALA-4883: Union Codegen

For each non-passthrough child of the Union node, codegen the loop that
does per row tuple materialization.

Testing:
Ran test_queries.py test locally in exchaustive mode.

Benchmark:
Ran a local benchmark on a local 10 GB TPCDS dataset on an unpartitioned
store_sales table.

SELECT
  COUNT(c),
  COUNT(ss_customer_sk),
  COUNT(ss_cdemo_sk),
  COUNT(ss_hdemo_sk),
  COUNT(ss_addr_sk),
  COUNT(ss_store_sk),
  COUNT(ss_promo_sk),
  COUNT(ss_ticket_number),
  COUNT(ss_quantity),
  COUNT(ss_wholesale_cost),
  COUNT(ss_list_price),
  COUNT(ss_sales_price),
  COUNT(ss_ext_discount_amt),
  COUNT(ss_ext_sales_price),
  COUNT(ss_ext_wholesale_cost),
  COUNT(ss_ext_list_price),
  COUNT(ss_ext_tax),
  COUNT(ss_coupon_amt),
  COUNT(ss_net_paid),
  COUNT(ss_net_paid_inc_tax),
  COUNT(ss_net_profit),
  COUNT(ss_sold_date_sk)
FROM (
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
) t

Before: 39s704ms
Operator          #Hosts   Avg Time   Max Time    #Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE           1  194.504us  194.504us        1           1   28.00 KB  
      -1.00 B  FINALIZE
12:EXCHANGE            1   17.284us   17.284us        3           1          0  
      -1.00 B  UNPARTITIONED
11:AGGREGATE           3    2s202ms    2s934ms        3           1  115.00 KB  
     10.00 MB
00:UNION               3   32s514ms   34s926ms  288.01M     288.01M    3.08 MB  
            0
|--02:SCAN HDFS        3  158.373ms  216.085ms   28.80M      28.80M  489.71 MB  
      1.88 GB  tpcds_10_parquet.store_sales
|--03:SCAN HDFS        3  167.002ms  171.738ms   28.80M      28.80M  489.74 MB  
      1.88 GB  tpcds_10_parquet.store_sales
|--04:SCAN HDFS        3  125.331ms  145.496ms   28.80M      28.80M  489.57 MB  
      1.88 GB  tpcds_10_parquet.store_sales
|--05:SCAN HDFS        3  148.478ms  194.311ms   28.80M      28.80M  489.69 MB  
      1.88 GB  tpcds_10_parquet.store_sales
|--06:SCAN HDFS        3  143.995ms  162.781ms   28.80M      28.80M  489.57 MB  
      1.88 GB  tpcds_10_parquet.store_sales
|--07:SCAN HDFS        3  169.731ms  250.201ms   28.80M      28.80M  489.58 MB  
      1.88 GB  tpcds_10_parquet.store_sales
|--08:SCAN HDFS        3  164.110ms  254.374ms   28.80M      28.80M  489.61 MB  
      1.88 GB  tpcds_10_parquet.store_sales
|--09:SCAN HDFS        3  135.631ms  162.117ms   28.80M      28.80M  489.63 MB  
      1.88 GB  tpcds_10_parquet.store_sales
|--10:SCAN HDFS        3  138.736ms  167.778ms   28.80M      28.80M  489.67 MB  
      1.88 GB  tpcds_10_parquet.store_sales
01:SCAN HDFS           3  202.015ms  248.728ms   28.80M      28.80M  489.68 MB  
      1.88 GB  tpcds_10_parquet.store_sales

After: 20s177ms
Operator          #Hosts   Avg Time   Max Time    #Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE           1  174.617us  174.617us        1           1   28.00 KB  
      -1.00 B  FINALIZE
12:EXCHANGE            1   16.693us   16.693us        3           1          0  
      -1.00 B  UNPARTITIONED
11:AGGREGATE           3    2s830ms    3s615ms        3           1  115.00 KB  
     10.00 MB
00:UNION               3    4s296ms    5s258ms  288.01M     288.01M    3.08 MB  
            0
|--02:SCAN HDFS        3    1s212ms    1s340ms   28.80M      28.80M  488.82 MB  
      1.88 GB  tpcds_10_parquet.store_sales
|--03:SCAN HDFS        3    1s387ms    1s570ms   28.80M      28.80M  489.37 MB  
      1.88 GB  tpcds_10_parquet.store_sales
|--04:SCAN HDFS        3    1s224ms    1s347ms   28.80M      28.80M  487.22 MB  
      1.88 GB  tpcds_10_parquet.store_sales
|--05:SCAN HDFS        3    1s245ms    1s321ms   28.80M      28.80M  489.25 MB  
      1.88 GB  tpcds_10_parquet.store_sales
|--06:SCAN HDFS        3    1s232ms    1s505ms   28.80M      28.80M  484.21 MB  
      1.88 GB  tpcds_10_parquet.store_sales
|--07:SCAN HDFS        3    1s348ms    1s518ms   28.80M      28.80M  488.20 MB  
      1.88 GB  tpcds_10_parquet.store_sales
|--08:SCAN HDFS        3    1s231ms    1s335ms   28.80M      28.80M  483.58 MB  
      1.88 GB  tpcds_10_parquet.store_sales
|--09:SCAN HDFS        3    1s179ms    1s349ms   28.80M      28.80M  482.76 MB  
      1.88 GB  tpcds_10_parquet.store_sales
|--10:SCAN HDFS        3    1s121ms    1s154ms   28.80M      28.80M  486.59 MB  
      1.88 GB  tpcds_10_parquet.store_sales
01:SCAN HDFS           3    1s284ms    1s523ms   28.80M      28.80M  486.70 MB  
      1.88 GB  tpcds_10_parquet.store_sales

Change-Id: Ib4107d27582ff5416172810364a6e76d3d93c439
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
A be/src/exec/union-node-ir.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M testdata/workloads/functional-query/queries/QueryTest/union.test
7 files changed, 183 insertions(+), 53 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/6459/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6459
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib4107d27582ff5416172810364a6e76d3d93c439
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>

Reply via email to