[
https://issues.apache.org/jira/browse/HIVE-9368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mostafa Mokhtar updated HIVE-9368:
----------------------------------
Attachment: explain_fetch_column_stats_on.txt
explain_fetch_column_stats_off.txt
> Physical optimizer : Join order in Explain is different from join order
> provided by Calcite
> -------------------------------------------------------------------------------------------
>
> Key: HIVE-9368
> URL: https://issues.apache.org/jira/browse/HIVE-9368
> Project: Hive
> Issue Type: Bug
> Components: Physical Optimizer
> Affects Versions: 0.14.0
> Reporter: Mostafa Mokhtar
> Assignee: Vikram Dixit K
> Fix For: 0.15.0
>
> Attachments: explain_fetch_column_stats_off.txt,
> explain_fetch_column_stats_on.txt
>
>
> Join order in explain is different from that provided by Calcite, this was
> observed during the Fidelity POC.
> Logical plan from Calcite :
> {code}
> 2015-01-13 18:54:42,892 DEBUG [main]: parse.CalcitePlanner
> (CalcitePlanner.java:apply(743)) - Plan After Join Reordering:
> HiveProject(scale=[$0], time_key_num=[$1], dataset_code=[$2],
> cost_center_lvl1_id=[$3], cost_pool_lvl6_id=[$4], lvl5_id=[$5],
> view_lvl1_id=[$6], from_lvl1_id=[$7], plan_id=[$8], client_id=[$9],
> lob_id=[$10], product_id=[$11], fprs_lvl5_id=[$12], ssn_id=[$13],
> account_id=[$14], mtd_balance=[$15]): rowcount = 2.53152774E8, cumulative
> cost = {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 636
> HiveAggregate(group=[{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}],
> agg#0=[SUM($15)]): rowcount = 2.53152774E8, cumulative cost =
> {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 634
> HiveProject($f0=[$0], $f1=[$1], $f2=[$2], $f3=[$3], $f4=[$4], $f5=[$24],
> $f6=[$6], $f7=[$7], $f8=[$8], $f9=[$9], $f10=[$10], $f11=[$11], $f12=[$21],
> $f13=[$18], $f14=[$19], $f15=[*($13, $20)]): rowcount = 3.401053197411791E11,
> cumulative cost = {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 632
> HiveProject(scale=[$7], time_key_num=[$8], dataset_code=[$9],
> cost_center_lvl1_id=[$10], cost_pool_lvl6_id=[$11], activity_id=[$12],
> view_lvl1_id=[$13], from_lvl1_id=[$14], plan_id=[$15], client_id=[$16],
> lob_id=[$17], product_id=[$18], fprs_id=[$19], mtd_balance=[$20],
> time_key_num0=[$0], activity_id0=[$1], plan_id0=[$2], fprs_id0=[$3],
> ssn_id=[$4], account_id=[$5], driver_pct=[$6], lvl5_id=[$25],
> current_ind=[$26], fprs_id1=[$27], lvl5_id0=[$21], rollup_key=[$22],
> current_ind0=[$23], activity_id1=[$24]): rowcount = 3.401053197411791E11,
> cumulative cost = {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 692
> HiveJoin(condition=[AND(AND(AND(=($8, $0), =($15, $2)), =($19, $3)),
> =($12, $1))], joinType=[inner]): rowcount = 3.401053197411791E11, cumulative
> cost = {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 690
> HiveProject(time_key_num=[$0], activity_id=[$1], plan_id=[$2],
> fprs_id=[$3], ssn_id=[$4], account_id=[$5], driver_pct=[$6]): rowcount =
> 2.926396239E9, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 590
>
> HiveTableScan(table=[[fidelity.fcap_drivers_part_exp_inter_bucket_256]]):
> rowcount = 2.926396239E9, cumulative cost = {0}, id = 465
> HiveJoin(condition=[=($12, $20)], joinType=[inner]): rowcount =
> 1.0871372980143067E8, cumulative cost = {2.2067125966323376E7 rows, 0.0 cpu,
> 0.0 io}, id = 688
> HiveJoin(condition=[=($5, $17)], joinType=[inner]): rowcount =
> 1.4392118216323378E7, cumulative cost = {6880237.75 rows, 0.0 cpu, 0.0 io},
> id = 653
> HiveProject(scale=[$0], time_key_num=[$1], dataset_code=[$2],
> cost_center_lvl1_id=[$3], cost_pool_lvl6_id=[$4], activity_id=[$5],
> view_lvl1_id=[$6], from_lvl1_id=[$7], plan_id=[$8], client_id=[$9],
> lob_id=[$10], product_id=[$11], fprs_id=[$12], mtd_balance=[$14]): rowcount =
> 6870067.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 587
>
> HiveTableScan(table=[[fidelity.fcap_agg_prod_exp_nofund_decimal]]): rowcount
> = 6870067.0, cumulative cost = {0}, id = 464
> HiveProject(lvl5_id=[$36], rollup_key=[$48], current_ind=[$51],
> activity_id=[$60]): rowcount = 10170.75, cumulative cost = {0.0 rows, 0.0
> cpu, 0.0 io}, id = 628
> HiveFilter(condition=[AND(=($51, 'Y'), =($48, 'TOTACT'))]):
> rowcount = 10170.75, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 626
> HiveTableScan(table=[[fidelity.fobi_activity_dim_mv]]):
> rowcount = 40683.0, cumulative cost = {0}, id = 467
> HiveProject(lvl5_id=[$36], current_ind=[$51], fprs_id=[$58]):
> rowcount = 794770.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 622
> HiveFilter(condition=[=($51, 'Y')]): rowcount = 794770.0,
> cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 620
> HiveTableScan(table=[[fidelity.fobi_fprs_dim_mv_orc]]):
> rowcount = 1589540.0, cumulative cost = {0}, id = 466
> {code}
> Plan #1 with Fetch column stats on
> {code}
> OK
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-2 depends on stages: Stage-1
> Stage-0 depends on stages: Stage-2
> Stage-3 depends on stages: Stage-0
> STAGE PLANS:
> Stage: Stage-1
> Tez
> Edges:
> Map 4 <- Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE)
> Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> DagName: mmokhtar_20150113185454_d7ce6ecf-2d50-45ed-8a88-6283bb091b0e:3
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: driver
> filterExpr: (((time_key_num is not null and plan_id is not
> null) and fprs_id is not null) and activity_id is not null) (type: boolean)
> Statistics: Num rows: 2926396239 Data size: 468223398240
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (((time_key_num is not null and plan_id is not
> null) and fprs_id is not null) and activity_id is not null) (type: boolean)
> Statistics: Num rows: 2926396239 Data size: 468223398240
> Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: time_key_num (type: bigint), activity_id
> (type: bigint), plan_id (type: bigint), fprs_id (type: bigint), ssn_id (type:
> bigint), account_id (type: bigint), driver_pct (type: decimal(28,12))
> outputColumnNames: _col0, _col1, _col2, _col3, _col4,
> _col5, _col6
> Statistics: Num rows: 2926396239 Data size:
> 468223398240 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: bigint), _col2 (type:
> bigint), _col3 (type: bigint), _col1 (type: bigint)
> sort order: ++++
> Map-reduce partition columns: _col0 (type: bigint),
> _col2 (type: bigint), _col3 (type: bigint), _col1 (type: bigint)
> Statistics: Num rows: 2926396239 Data size:
> 468223398240 Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col4 (type: bigint), _col5 (type:
> bigint), _col6 (type: decimal(28,12))
> Execution mode: vectorized
> Map 4
> Map Operator Tree:
> TableScan
> alias: balance
> filterExpr: (((activity_id is not null and fprs_id is not
> null) and time_key_num is not null) and plan_id is not null) (type: boolean)
> Statistics: Num rows: 6870067 Data size: 2102240502 Basic
> stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (((activity_id is not null and fprs_id is not
> null) and time_key_num is not null) and plan_id is not null) (type: boolean)
> Statistics: Num rows: 6870067 Data size: 1483934472 Basic
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: scale (type: bigint), time_key_num (type:
> bigint), dataset_code (type: bigint), cost_center_lvl1_id (type: bigint),
> cost_pool_lvl6_id (type: bigint), activity_id (type: bigint), view_lvl1_id
> (type: bigint), from_lvl1_id (type: bigint), plan_id (type: bigint),
> client_id (type: bigint), lob_id (type: bigint), product_id (type: bigint),
> fprs_id (type: bigint), mtd_balance (type: decimal(28,12))
> outputColumnNames: _col0, _col1, _col2, _col3, _col4,
> _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13
> Statistics: Num rows: 6870067 Data size: 1483934472
> Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col5 (type: bigint)
> 1 _col3 (type: bigint)
> outputColumnNames: _col0, _col1, _col2, _col3, _col4,
> _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14
> input vertices:
> 1 Map 5
> Statistics: Num rows: 57555202 Data size: 17151450196
> Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col12 (type: bigint)
> 1 _col2 (type: bigint)
> outputColumnNames: _col0, _col1, _col2, _col3,
> _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13,
> _col14, _col18
> input vertices:
> 1 Map 6
> Statistics: Num rows: 869509350 Data size:
> 266069861100 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col0 (type: bigint), _col1 (type:
> bigint), _col10 (type: bigint), _col11 (type: bigint), _col12 (type: bigint),
> _col13 (type: decimal(28,12)), _col14 (type: string), _col18 (type: bigint),
> _col2 (type: bigint), _col3 (type: bigint), _col4 (type: bigint), _col5
> (type: bigint), _col6 (type: bigint), _col7 (type: bigint), _col8 (type:
> bigint), _col9 (type: bigint)
> outputColumnNames: _col0, _col1, _col10, _col11,
> _col12, _col13, _col14, _col18, _col2, _col3, _col4, _col5, _col6, _col7,
> _col8, _col9
> Statistics: Num rows: 869509350 Data size:
> 266069861100 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col1 (type: bigint), _col8
> (type: bigint), _col12 (type: bigint), _col5 (type: bigint)
> sort order: ++++
> Map-reduce partition columns: _col1 (type:
> bigint), _col8 (type: bigint), _col12 (type: bigint), _col5 (type: bigint)
> Statistics: Num rows: 869509350 Data size:
> 266069861100 Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col0 (type: bigint), _col2
> (type: bigint), _col3 (type: bigint), _col4 (type: bigint), _col6 (type:
> bigint), _col7 (type: bigint), _col9 (type: bigint), _col10 (type: bigint),
> _col11 (type: bigint), _col13 (type: decimal(28,12)), _col14 (type: string),
> _col18 (type: bigint)
> Execution mode: vectorized
> Map 5
> Map Operator Tree:
> TableScan
> alias: act
> filterExpr: (((current_ind = 'Y') and (rollup_key =
> 'TOTACT')) and activity_id is not null) (type: boolean)
> Statistics: Num rows: 40683 Data size: 271025472 Basic
> stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (((current_ind = 'Y') and (rollup_key =
> 'TOTACT')) and activity_id is not null) (type: boolean)
> Statistics: Num rows: 20341 Data size: 5553093 Basic
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: lvl5_id (type: string), activity_id (type:
> bigint)
> outputColumnNames: _col0, _col3
> Statistics: Num rows: 20341 Data size: 1993418 Basic
> stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col3 (type: bigint)
> sort order: +
> Map-reduce partition columns: _col3 (type: bigint)
> Statistics: Num rows: 20341 Data size: 1993418 Basic
> stats: COMPLETE Column stats: COMPLETE
> value expressions: _col0 (type: string)
> Execution mode: vectorized
> Map 6
> Map Operator Tree:
> TableScan
> alias: fprs
> filterExpr: ((current_ind = 'Y') and fprs_id is not null)
> (type: boolean)
> Statistics: Num rows: 1589540 Data size: 6862044180 Basic
> stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: ((current_ind = 'Y') and fprs_id is not null)
> (type: boolean)
> Statistics: Num rows: 794770 Data size: 80271770 Basic
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: lvl5_id (type: bigint), fprs_id (type:
> bigint)
> outputColumnNames: _col0, _col2
> Statistics: Num rows: 794770 Data size: 12716320 Basic
> stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col2 (type: bigint)
> sort order: +
> Map-reduce partition columns: _col2 (type: bigint)
> Statistics: Num rows: 794770 Data size: 12716320
> Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col0 (type: bigint)
> Execution mode: vectorized
> Reducer 2
> Reduce Operator Tree:
> Merge Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col0 (type: bigint), _col2 (type: bigint), _col3 (type:
> bigint), _col1 (type: bigint)
> 1 _col1 (type: bigint), _col8 (type: bigint), _col12 (type:
> bigint), _col5 (type: bigint)
> outputColumnNames: _col4, _col5, _col6, _col7, _col8, _col9,
> _col10, _col11, _col13, _col14, _col15, _col16, _col17, _col18, _col20,
> _col21, _col25
> Statistics: Num rows: 28151027141 Data size: 8389006088018
> Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col7 (type: bigint), _col8 (type: bigint),
> _col9 (type: bigint), _col10 (type: bigint), _col11 (type: bigint), _col21
> (type: string), _col13 (type: bigint), _col14 (type: bigint), _col15 (type:
> bigint), _col16 (type: bigint), _col17 (type: bigint), _col18 (type: bigint),
> _col25 (type: bigint), _col4 (type: bigint), _col5 (type: bigint), (_col20 *
> _col6) (type: decimal(38,24))
> outputColumnNames: _col0, _col1, _col2, _col3, _col4,
> _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14,
> _col15
> Statistics: Num rows: 28151027141 Data size: 8389006088018
> Basic stats: COMPLETE Column stats: COMPLETE
> Group By Operator
> aggregations: sum(_col15)
> keys: _col0 (type: bigint), _col1 (type: bigint), _col2
> (type: bigint), _col3 (type: bigint), _col4 (type: bigint), _col5 (type:
> string), _col6 (type: bigint), _col7 (type: bigint), _col8 (type: bigint),
> _col9 (type: bigint), _col10 (type: bigint), _col11 (type: bigint), _col12
> (type: bigint), _col13 (type: bigint), _col14 (type: bigint)
> mode: hash
> outputColumnNames: _col0, _col1, _col2, _col3, _col4,
> _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14,
> _col15
> Statistics: Num rows: 28151027141 Data size:
> 8389006088018 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: bigint), _col1 (type:
> bigint), _col2 (type: bigint), _col3 (type: bigint), _col4 (type: bigint),
> _col5 (type: string), _col6 (type: bigint), _col7 (type: bigint), _col8
> (type: bigint), _col9 (type: bigint), _col10 (type: bigint), _col11 (type:
> bigint), _col12 (type: bigint), _col13 (type: bigint), _col14 (type: bigint)
> sort order: +++++++++++++++
> Map-reduce partition columns: _col0 (type: bigint),
> _col1 (type: bigint), _col2 (type: bigint), _col3 (type: bigint), _col4
> (type: bigint), _col5 (type: string), _col6 (type: bigint), _col7 (type:
> bigint), _col8 (type: bigint), _col9 (type: bigint), _col10 (type: bigint),
> _col11 (type: bigint), _col12 (type: bigint), _col13 (type: bigint), _col14
> (type: bigint)
> Statistics: Num rows: 28151027141 Data size:
> 8389006088018 Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col15 (type: decimal(38,24))
> Reducer 3
> Reduce Operator Tree:
> Group By Operator
> aggregations: sum(VALUE._col0)
> keys: KEY._col0 (type: bigint), KEY._col1 (type: bigint),
> KEY._col2 (type: bigint), KEY._col3 (type: bigint), KEY._col4 (type: bigint),
> KEY._col5 (type: string), KEY._col6 (type: bigint), KEY._col7 (type: bigint),
> KEY._col8 (type: bigint), KEY._col9 (type: bigint), KEY._col10 (type:
> bigint), KEY._col11 (type: bigint), KEY._col12 (type: bigint), KEY._col13
> (type: bigint), KEY._col14 (type: bigint)
> mode: mergepartial
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5,
> _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15
> Statistics: Num rows: 28151027141 Data size: 8389006088018
> Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint),
> _col2 (type: bigint), _col3 (type: bigint), _col4 (type: bigint),
> UDFToLong(_col5) (type: bigint), _col6 (type: bigint), _col7 (type: bigint),
> _col8 (type: bigint), _col9 (type: bigint), _col10 (type: bigint), _col11
> (type: bigint), _col12 (type: bigint), _col13 (type: bigint),
> UDFToString(_col14) (type: string), CAST( _col15 AS decimal(28,12)) (type:
> decimal(28,12))
> outputColumnNames: _col0, _col1, _col2, _col3, _col4,
> _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14,
> _col15
> Statistics: Num rows: 28151027141 Data size: 11260410856400
> Basic stats: COMPLETE Column stats: COMPLETE
> File Output Operator
> compressed: false
> Statistics: Num rows: 28151027141 Data size:
> 11260410856400 Basic stats: COMPLETE Column stats: COMPLETE
> table:
> input format:
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
> name: fidelity.fcap_agg_part_exp_hive_decimal_decimal
> Stage: Stage-2
> Dependency Collection
> Stage: Stage-0
> Move Operator
> tables:
> replace: false
> table:
> input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
> name: fidelity.fcap_agg_part_exp_hive_decimal_decimal
> Stage: Stage-3
> Stats-Aggr Operator
> {code}
> Plan #2 with fetch column stats off
> {code}STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-2 depends on stages: Stage-1
> Stage-0 depends on stages: Stage-2
> Stage-3 depends on stages: Stage-0
> STAGE PLANS:
> Stage: Stage-1
> Tez
> Edges:
> Map 1 <- Map 5 (BROADCAST_EDGE)
> Map 3 <- Map 4 (BROADCAST_EDGE)
> Map 5 <- Map 3 (BROADCAST_EDGE)
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> DagName: mmokhtar_20150113185454_7c350b7b-53fa-4bbb-a747-24ec0c2a6ab7:4
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: driver
> filterExpr: (((time_key_num is not null and plan_id is not
> null) and fprs_id is not null) and activity_id is not null) (type: boolean)
> Statistics: Num rows: 2926396239 Data size: 468223398240
> Basic stats: COMPLETE Column stats: NONE
> Filter Operator
> predicate: (((time_key_num is not null and plan_id is not
> null) and fprs_id is not null) and activity_id is not null) (type: boolean)
> Statistics: Num rows: 182899765 Data size: 29263962400
> Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: time_key_num (type: bigint), activity_id
> (type: bigint), plan_id (type: bigint), fprs_id (type: bigint), ssn_id (type:
> bigint), account_id (type: bigint), driver_pct (type: decimal(28,12))
> outputColumnNames: _col0, _col1, _col2, _col3, _col4,
> _col5, _col6
> Statistics: Num rows: 182899765 Data size: 29263962400
> Basic stats: COMPLETE Column stats: NONE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col0 (type: bigint), _col2 (type: bigint), _col3
> (type: bigint), _col1 (type: bigint)
> 1 _col1 (type: bigint), _col8 (type: bigint),
> _col12 (type: bigint), _col5 (type: bigint)
> outputColumnNames: _col4, _col5, _col6, _col7, _col8,
> _col9, _col10, _col11, _col13, _col14, _col15, _col16, _col17, _col18,
> _col20, _col21, _col25
> input vertices:
> 1 Map 5
> Statistics: Num rows: 201189745 Data size:
> 32190359337 Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: _col7 (type: bigint), _col8 (type:
> bigint), _col9 (type: bigint), _col10 (type: bigint), _col11 (type: bigint),
> _col21 (type: string), _col13 (type: bigint), _col14 (type: bigint), _col15
> (type: bigint), _col16 (type: bigint), _col17 (type: bigint), _col18 (type:
> bigint), _col25 (type: bigint), _col4 (type: bigint), _col5 (type: bigint),
> (_col20 * _col6) (type: decimal(38,24))
> outputColumnNames: _col0, _col1, _col2, _col3,
> _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13,
> _col14, _col15
> Statistics: Num rows: 201189745 Data size:
> 32190359337 Basic stats: COMPLETE Column stats: NONE
> Group By Operator
> aggregations: sum(_col15)
> keys: _col0 (type: bigint), _col1 (type: bigint),
> _col2 (type: bigint), _col3 (type: bigint), _col4 (type: bigint), _col5
> (type: string), _col6 (type: bigint), _col7 (type: bigint), _col8 (type:
> bigint), _col9 (type: bigint), _col10 (type: bigint), _col11 (type: bigint),
> _col12 (type: bigint), _col13 (type: bigint), _col14 (type: bigint)
> mode: hash
> outputColumnNames: _col0, _col1, _col2, _col3,
> _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13,
> _col14, _col15
> Statistics: Num rows: 201189745 Data size:
> 32190359337 Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col0 (type: bigint), _col1
> (type: bigint), _col2 (type: bigint), _col3 (type: bigint), _col4 (type:
> bigint), _col5 (type: string), _col6 (type: bigint), _col7 (type: bigint),
> _col8 (type: bigint), _col9 (type: bigint), _col10 (type: bigint), _col11
> (type: bigint), _col12 (type: bigint), _col13 (type: bigint), _col14 (type:
> bigint)
> sort order: +++++++++++++++
> Map-reduce partition columns: _col0 (type:
> bigint), _col1 (type: bigint), _col2 (type: bigint), _col3 (type: bigint),
> _col4 (type: bigint), _col5 (type: string), _col6 (type: bigint), _col7
> (type: bigint), _col8 (type: bigint), _col9 (type: bigint), _col10 (type:
> bigint), _col11 (type: bigint), _col12 (type: bigint), _col13 (type: bigint),
> _col14 (type: bigint)
> Statistics: Num rows: 201189745 Data size:
> 32190359337 Basic stats: COMPLETE Column stats: NONE
> value expressions: _col15 (type: decimal(38,24))
> Execution mode: vectorized
> Map 3
> Map Operator Tree:
> TableScan
> alias: balance
> filterExpr: (((activity_id is not null and fprs_id is not
> null) and time_key_num is not null) and plan_id is not null) (type: boolean)
> Statistics: Num rows: 6870067 Data size: 2102240502 Basic
> stats: COMPLETE Column stats: NONE
> Filter Operator
> predicate: (((activity_id is not null and fprs_id is not
> null) and time_key_num is not null) and plan_id is not null) (type: boolean)
> Statistics: Num rows: 429380 Data size: 131390279 Basic
> stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: scale (type: bigint), time_key_num (type:
> bigint), dataset_code (type: bigint), cost_center_lvl1_id (type: bigint),
> cost_pool_lvl6_id (type: bigint), activity_id (type: bigint), view_lvl1_id
> (type: bigint), from_lvl1_id (type: bigint), plan_id (type: bigint),
> client_id (type: bigint), lob_id (type: bigint), product_id (type: bigint),
> fprs_id (type: bigint), mtd_balance (type: decimal(28,12))
> outputColumnNames: _col0, _col1, _col2, _col3, _col4,
> _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13
> Statistics: Num rows: 429380 Data size: 131390279 Basic
> stats: COMPLETE Column stats: NONE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col5 (type: bigint)
> 1 _col3 (type: bigint)
> outputColumnNames: _col0, _col1, _col2, _col3, _col4,
> _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14
> input vertices:
> 1 Map 4
> Statistics: Num rows: 472318 Data size: 144529310
> Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col12 (type: bigint)
> sort order: +
> Map-reduce partition columns: _col12 (type: bigint)
> Statistics: Num rows: 472318 Data size: 144529310
> Basic stats: COMPLETE Column stats: NONE
> value expressions: _col0 (type: bigint), _col1
> (type: bigint), _col2 (type: bigint), _col3 (type: bigint), _col4 (type:
> bigint), _col5 (type: bigint), _col6 (type: bigint), _col7 (type: bigint),
> _col8 (type: bigint), _col9 (type: bigint), _col10 (type: bigint), _col11
> (type: bigint), _col13 (type: decimal(28,12)), _col14 (type: string)
> Execution mode: vectorized
> Map 4
> Map Operator Tree:
> TableScan
> alias: act
> filterExpr: (((current_ind = 'Y') and (rollup_key =
> 'TOTACT')) and activity_id is not null) (type: boolean)
> Statistics: Num rows: 40683 Data size: 271025472 Basic
> stats: COMPLETE Column stats: NONE
> Filter Operator
> predicate: (((current_ind = 'Y') and (rollup_key =
> 'TOTACT')) and activity_id is not null) (type: boolean)
> Statistics: Num rows: 5085 Data size: 33875685 Basic
> stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: lvl5_id (type: string), activity_id (type:
> bigint)
> outputColumnNames: _col0, _col3
> Statistics: Num rows: 5085 Data size: 33875685 Basic
> stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col3 (type: bigint)
> sort order: +
> Map-reduce partition columns: _col3 (type: bigint)
> Statistics: Num rows: 5085 Data size: 33875685 Basic
> stats: COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Execution mode: vectorized
> Map 5
> Map Operator Tree:
> TableScan
> alias: fprs
> filterExpr: ((current_ind = 'Y') and fprs_id is not null)
> (type: boolean)
> Statistics: Num rows: 1589540 Data size: 6862044180 Basic
> stats: COMPLETE Column stats: NONE
> Filter Operator
> predicate: ((current_ind = 'Y') and fprs_id is not null)
> (type: boolean)
> Statistics: Num rows: 397385 Data size: 1715511045 Basic
> stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: lvl5_id (type: bigint), fprs_id (type:
> bigint)
> outputColumnNames: _col0, _col2
> Statistics: Num rows: 397385 Data size: 1715511045
> Basic stats: COMPLETE Column stats: NONE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col12 (type: bigint)
> 1 _col2 (type: bigint)
> outputColumnNames: _col0, _col1, _col2, _col3, _col4,
> _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14,
> _col18
> input vertices:
> 0 Map 3
> Statistics: Num rows: 519549 Data size: 158982244
> Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: _col0 (type: bigint), _col1 (type:
> bigint), _col10 (type: bigint), _col11 (type: bigint), _col12 (type: bigint),
> _col13 (type: decimal(28,12)), _col14 (type: string), _col18 (type: bigint),
> _col2 (type: bigint), _col3 (type: bigint), _col4 (type: bigint), _col5
> (type: bigint), _col6 (type: bigint), _col7 (type: bigint), _col8 (type:
> bigint), _col9 (type: bigint)
> outputColumnNames: _col0, _col1, _col10, _col11,
> _col12, _col13, _col14, _col18, _col2, _col3, _col4, _col5, _col6, _col7,
> _col8, _col9
> Statistics: Num rows: 519549 Data size: 158982244
> Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col1 (type: bigint), _col8
> (type: bigint), _col12 (type: bigint), _col5 (type: bigint)
> sort order: ++++
> Map-reduce partition columns: _col1 (type:
> bigint), _col8 (type: bigint), _col12 (type: bigint), _col5 (type: bigint)
> Statistics: Num rows: 519549 Data size: 158982244
> Basic stats: COMPLETE Column stats: NONE
> value expressions: _col0 (type: bigint), _col2
> (type: bigint), _col3 (type: bigint), _col4 (type: bigint), _col6 (type:
> bigint), _col7 (type: bigint), _col9 (type: bigint), _col10 (type: bigint),
> _col11 (type: bigint), _col13 (type: decimal(28,12)), _col14 (type: string),
> _col18 (type: bigint)
> Execution mode: vectorized
> Reducer 2
> Reduce Operator Tree:
> Group By Operator
> aggregations: sum(VALUE._col0)
> keys: KEY._col0 (type: bigint), KEY._col1 (type: bigint),
> KEY._col2 (type: bigint), KEY._col3 (type: bigint), KEY._col4 (type: bigint),
> KEY._col5 (type: string), KEY._col6 (type: bigint), KEY._col7 (type: bigint),
> KEY._col8 (type: bigint), KEY._col9 (type: bigint), KEY._col10 (type:
> bigint), KEY._col11 (type: bigint), KEY._col12 (type: bigint), KEY._col13
> (type: bigint), KEY._col14 (type: bigint)
> mode: mergepartial
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5,
> _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15
> Statistics: Num rows: 100594872 Data size: 16095179588 Basic
> stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint),
> _col2 (type: bigint), _col3 (type: bigint), _col4 (type: bigint),
> UDFToLong(_col5) (type: bigint), _col6 (type: bigint), _col7 (type: bigint),
> _col8 (type: bigint), _col9 (type: bigint), _col10 (type: bigint), _col11
> (type: bigint), _col12 (type: bigint), _col13 (type: bigint),
> UDFToString(_col14) (type: string), CAST( _col15 AS decimal(28,12)) (type:
> decimal(28,12))
> outputColumnNames: _col0, _col1, _col2, _col3, _col4,
> _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14,
> _col15
> Statistics: Num rows: 100594872 Data size: 16095179588
> Basic stats: COMPLETE Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 100594872 Data size: 16095179588
> Basic stats: COMPLETE Column stats: NONE
> table:
> input format:
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
> name: fidelity.fcap_agg_part_exp_hive_decimal_decimal
> Stage: Stage-2
> Dependency Collection
> Stage: Stage-0
> Move Operator
> tables:
> replace: false
> table:
> input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
> name: fidelity.fcap_agg_part_exp_hive_decimal_decimal
> Stage: Stage-3
> Stats-Aggr Operator
> {code}
> Query used
> {code}
> explain insert INTO table fidelity.FCAP_AGG_PART_EXP_hive_decimal_decimal
> SELECT BALANCE.SCALE, BALANCE.TIME_KEY_NUM, BALANCE.DATASET_CODE,
> BALANCE.Cost_Center_LVL1_ID, BALANCE.Cost_Pool_LVL6_ID, act.lvl5_ID,
> BALANCE.VIEW_LVL1_ID, BALANCE.FROM_LVL1_ID, BALANCE.PLAN_ID,
> BALANCE.CLIENT_ID, BALANCE.LOB_ID, BALANCE.PRODUCT_ID, fprs.lvl5_ID
> fprs_lvl5_id, DRIVER.SSN_ID, DRIVER.ACCOUNT_ID, SUM ( BALANCE.MTD_BALANCE *
> DRIVER.DRIVER_PCT) AS MTD_BALANCE FROM
> fidelity.FCAP_AGG_PROD_EXP_NOFUND_decimal BALANCE JOIN
> fidelity.fcap_drivers_part_exp_inter_bucket_256 DRIVER ON
> BALANCE.TIME_KEY_NUM = DRIVER.TIME_KEY_NUM AND BALANCE.PLAN_ID =
> DRIVER.PLAN_ID AND BALANCE.FPRS_ID = DRIVER.FPRS_ID AND BALANCE.ACTIVITY_ID =
> DRIVER.ACTIVITY_ID INNER JOIN fidelity.fobi_fprs_dim_mv_orc fprs ON
> balance.FPRS_ID = fprs.fprs_id AND fprs.current_ind = 'Y' inner join
> fidelity.fobi_activity_dim_mv act ON BALANCE.activity_id = act.activity_id
> AND act.current_ind = 'Y' AND act.ROLLUP_KEY = 'TOTACT' GROUP BY
> BALANCE.SCALE, BALANCE.TIME_KEY_NUM, BALANCE.DATASET_CODE,
> BALANCE.Cost_center_LVL1_ID, BALANCE.Cost_Pool_LVL6_ID, act.lvl5_ID,
> BALANCE.VIEW_LVL1_ID, BALANCE.FROM_LVL1_ID, BALANCE.PLAN_ID,
> BALANCE.CLIENT_ID, BALANCE.LOB_ID, BALANCE.PRODUCT_ID, fprs.lvl5_ID,
> DRIVER.SSN_ID, DRIVER.ACCOUNT_ID;
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)