Janaki Lahorani created HIVE-19940:
--------------------------------------
Summary: Push predicates with deterministic UDFs with RBO
Key: HIVE-19940
URL: https://issues.apache.org/jira/browse/HIVE-19940
Project: Hive
Issue Type: Bug
Reporter: Janaki Lahorani
Assignee: Janaki Lahorani
With RBO, predicates with any UDF doesn't get pushed down. It makes sense to
not pushdown the predicates with non-deterministic function as the meaning of
the query changes after the predicate is resolved to use the function. But
pushing a deterministic function is beneficial.
Test Case:
{code}
set hive.cbo.enable=false;
CREATE TABLE `testb`(
`cola` string COMMENT '',
`colb` string COMMENT '',
`colc` string COMMENT '')
PARTITIONED BY (
`part1` string,
`part2` string,
`part3` string)
STORED AS AVRO;
CREATE TABLE `testa`(
`col1` string COMMENT '',
`col2` string COMMENT '',
`col3` string COMMENT '',
`col4` string COMMENT '',
`col5` string COMMENT '')
PARTITIONED BY (
`part1` string,
`part2` string,
`part3` string)
STORED AS AVRO;
insert into testA partition (part1='US', part2='ABC', part3='123')
values ('12.34', '100', '200', '300', 'abc'),
('12.341', '1001', '2001', '3001', 'abcd');
insert into testA partition (part1='UK', part2='DEF', part3='123')
values ('12.34', '100', '200', '300', 'abc'),
('12.341', '1001', '2001', '3001', 'abcd');
insert into testA partition (part1='US', part2='DEF', part3='200')
values ('12.34', '100', '200', '300', 'abc'),
('12.341', '1001', '2001', '3001', 'abcd');
insert into testA partition (part1='CA', part2='ABC', part3='300')
values ('12.34', '100', '200', '300', 'abc'),
('12.341', '1001', '2001', '3001', 'abcd');
insert into testB partition (part1='CA', part2='ABC', part3='300')
values ('600', '700', 'abc'), ('601', '701', 'abcd');
insert into testB partition (part1='CA', part2='ABC', part3='400')
values ( '600', '700', 'abc'), ( '601', '701', 'abcd');
insert into testB partition (part1='UK', part2='PQR', part3='500')
values ('600', '700', 'abc'), ('601', '701', 'abcd');
insert into testB partition (part1='US', part2='DEF', part3='200')
values ( '600', '700', 'abc'), ('601', '701', 'abcd');
insert into testB partition (part1='US', part2='PQR', part3='123')
values ( '600', '700', 'abc'), ('601', '701', 'abcd');
-- views with deterministic functions
create view viewDeterministicUDFA partitioned on (vpart1, vpart2, vpart3) as
select
cast(col1 as decimal(38,18)) as vcol1,
cast(col2 as decimal(38,18)) as vcol2,
cast(col3 as decimal(38,18)) as vcol3,
cast(col4 as decimal(38,18)) as vcol4,
cast(col5 as char(10)) as vcol5,
cast(part1 as char(2)) as vpart1,
cast(part2 as char(3)) as vpart2,
cast(part3 as char(3)) as vpart3
from testa
where part1 in ('US', 'CA');
create view viewDeterministicUDFB partitioned on (vpart1, vpart2, vpart3) as
select
cast(cola as decimal(38,18)) as vcolA,
cast(colb as decimal(38,18)) as vcolB,
cast(colc as char(10)) as vcolC,
cast(part1 as char(2)) as vpart1,
cast(part2 as char(3)) as vpart2,
cast(part3 as char(3)) as vpart3
from testb
where part1 in ('US', 'CA');
explain
select vcol1, vcol2, vcol3, vcola, vcolb
from viewDeterministicUDFA a inner join viewDeterministicUDFB b
on a.vpart1 = b.vpart1
and a.vpart2 = b.vpart2
and a.vpart3 = b.vpart3
and a.vpart1 = 'US'
and a.vpart2 = 'DEF'
and a.vpart3 = '200';
{code}
Plan where the CAST is not pushed down.
{code}
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: testa
filterExpr: (part1) IN ('US', 'CA') (type: boolean)
Statistics: Num rows: 6 Data size: 13740 Basic stats: COMPLETE
Column stats: NONE
Select Operator
expressions: CAST( col1 AS decimal(38,18)) (type:
decimal(38,18)), CAST( col2 AS decimal(38,18)) (type: decimal(38,18)), CAST(
col3 AS decimal(38,18)) (type: decimal(38,18)), CAST( part1 AS CHAR(2)) (type:
char(2)), CAST( part2 AS CHAR(3)) (type: char(3)), CAST( part3 AS CHAR(3))
(type: char(3))
outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7
Statistics: Num rows: 6 Data size: 13740 Basic stats: COMPLETE
Column stats: NONE
Filter Operator
predicate: ((_col5 = 'US') and (_col6 = 'DEF') and (_col7 =
'200')) (type: boolean)
Statistics: Num rows: 1 Data size: 2290 Basic stats: COMPLETE
Column stats: NONE
Reduce Output Operator
key expressions: 'US' (type: char(2)), 'DEF' (type: char(3)),
'200' (type: char(3))
sort order: +++
Map-reduce partition columns: 'US' (type: char(2)), 'DEF'
(type: char(3)), '200' (type: char(3))
Statistics: Num rows: 1 Data size: 2290 Basic stats: COMPLETE
Column stats: NONE
value expressions: _col0 (type: decimal(38,18)), _col1 (type:
decimal(38,18)), _col2 (type: decimal(38,18))
TableScan
alias: testb
filterExpr: (part1) IN ('US', 'CA') (type: boolean)
Statistics: Num rows: 8 Data size: 12720 Basic stats: COMPLETE
Column stats: NONE
Select Operator
expressions: CAST( cola AS decimal(38,18)) (type:
decimal(38,18)), CAST( colb AS decimal(38,18)) (type: decimal(38,18)), CAST(
part1 AS CHAR(2)) (type: char(2)), CAST( part2 AS CHAR(3)) (type: char(3)),
CAST( part3 AS CHAR(3)) (type: char(3))
outputColumnNames: _col0, _col1, _col3, _col4, _col5
Statistics: Num rows: 8 Data size: 12720 Basic stats: COMPLETE
Column stats: NONE
Filter Operator
predicate: ((_col5 = '200') and _col3 is not null and _col4 is
not null) (type: boolean)
Statistics: Num rows: 4 Data size: 6360 Basic stats: COMPLETE
Column stats: NONE
Reduce Output Operator
key expressions: _col3 (type: char(2)), _col4 (type:
char(3)), '200' (type: char(3))
sort order: +++
Map-reduce partition columns: _col3 (type: char(2)), _col4
(type: char(3)), '200' (type: char(3))
Statistics: Num rows: 4 Data size: 6360 Basic stats: COMPLETE
Column stats: NONE
value expressions: _col0 (type: decimal(38,18)), _col1 (type:
decimal(38,18))
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col5 (type: char(2)), _col6 (type: char(3)), _col7 (type:
char(3))
1 _col3 (type: char(2)), _col4 (type: char(3)), _col5 (type:
char(3))
outputColumnNames: _col0, _col1, _col2, _col8, _col9
Statistics: Num rows: 4 Data size: 6996 Basic stats: COMPLETE Column
stats: NONE
Select Operator
expressions: _col0 (type: decimal(38,18)), _col1 (type:
decimal(38,18)), _col2 (type: decimal(38,18)), _col8 (type: decimal(38,18)),
_col9 (type: decimal(38,18))
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 4 Data size: 6996 Basic stats: COMPLETE
Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 4 Data size: 6996 Basic stats: COMPLETE
Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)