LongShangRen created HIVE-15117: ----------------------------------- Summary: Partition filters are not pushed down with lateral view and undeterministic UDF Key: HIVE-15117 URL: https://issues.apache.org/jira/browse/HIVE-15117 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.1 Reporter: LongShangRen Fix For: 1.2.1
sql with lateral view didn't push down partition column as expected!. here is how it can be reproduced. 1. *create test table* {quote} create table test_lateral_view (id bigint,json_cont string) partitioned by (vt string); {quote} 2. *explain below sql* {quote} select * from test_lateral_view a lateral view json_tuple(json_cont, 'iids', 'indexs') b as iids,indexs where a.vt = '2016-10-27' and rand()>0.5; {quote} here is my result: {quote} STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: a Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Lateral View Forward Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Select Operator expressions: id (type: bigint), json_cont (type: string), vt (type: string) outputColumnNames: id, json_cont, vt Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Lateral View Join Operator outputColumnNames: _col0, _col1, _col2, _col6, _col7 Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE Filter Operator {color:red} predicate: ((_col2 = '2016-10-27') and (rand() > 0.5)) (type: boolean) {color:red} Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Select Operator expressions: _col0 (type: bigint), _col1 (type: string), '2016-10-27' (type: string), _col6 (type: string), _col7 (type: string) outputColumnNames: _col0, _col1, _col2, _col3, _col4 Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE ListSink Select Operator expressions: json_cont (type: string), 'iids' (type: string), 'indexs' (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE UDTF Operator Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE function name: json_tuple Lateral View Join Operator outputColumnNames: _col0, _col1, _col2, _col6, _col7 Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE Filter Operator predicate: ((_col2 = '2016-10-27') and (rand() > 0.5)) (type: boolean) Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Select Operator expressions: _col0 (type: bigint), _col1 (type: string), '2016-10-27' (type: string), _col6 (type: string), _col7 (type: string) outputColumnNames: _col0, _col1, _col2, _col3, _col4 Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE ListSink {quote} As you can see,the partition column is in filter operator,which means this sql will scan the whole table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)