[jira] [Commented] (HIVE-19940) Push predicates with deterministic UDFs with RBO

Hive QA (JIRA) Thu, 12 Jul 2018 10:29:19 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-19940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16541995#comment-16541995
 ]


Hive QA commented on HIVE-19940:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12931239/HIVE-19940.3.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/12561/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/12561/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-12561/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2018-07-12 17:27:41.500
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-12561/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2018-07-12 17:27:41.504
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 5ade740 HIVE-20088: Beeline config location path is assembled 
incorrectly (Denes Bodo via Zoltan Haindrich)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 5ade740 HIVE-20088: Beeline config location path is assembled 
incorrectly (Denes Bodo via Zoltan Haindrich)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2018-07-12 17:27:42.170
+ rm -rf ../yetus_PreCommit-HIVE-Build-12561
+ mkdir ../yetus_PreCommit-HIVE-Build-12561
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-12561
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-12561/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: 
ql/src/test/results/clientpositive/masking_disablecbo_2.q.out:555
Falling back to three-way merge...
Applied patch to 
'ql/src/test/results/clientpositive/masking_disablecbo_2.q.out' with conflicts.
Going to apply patch with: git apply -p0
/data/hiveptest/working/scratch/build.patch:420: trailing whitespace.
                        sort order: 
/data/hiveptest/working/scratch/build.patch:635: trailing whitespace.
                        sort order: 
/data/hiveptest/working/scratch/build.patch:1715: trailing whitespace.
        Reducer 11 
error: patch failed: 
ql/src/test/results/clientpositive/masking_disablecbo_2.q.out:555
Falling back to three-way merge...
Applied patch to 
'ql/src/test/results/clientpositive/masking_disablecbo_2.q.out' with conflicts.
/data/hiveptest/working/scratch/build.patch:3290: new blank line at EOF.
+
U ql/src/test/results/clientpositive/masking_disablecbo_2.q.out
warning: 4 lines add whitespace errors.
+ result=1
+ '[' 1 -ne 0 ']'
+ rm -rf yetus_PreCommit-HIVE-Build-12561
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12931239 - PreCommit-HIVE-Build

> Push predicates with deterministic UDFs with RBO
> ------------------------------------------------
>
>                 Key: HIVE-19940
>                 URL: https://issues.apache.org/jira/browse/HIVE-19940
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Janaki Lahorani
>            Assignee: Janaki Lahorani
>            Priority: Major
>         Attachments: HIVE-19940.1.patch, HIVE-19940.2.patch, 
> HIVE-19940.3.patch
>
>
> With RBO, predicates with any UDF doesn't get pushed down.  It makes sense to 
> not pushdown the predicates with non-deterministic function as the meaning of 
> the query changes after the predicate is resolved to use the function.  But 
> pushing a deterministic function is beneficial.
> Test Case:
> {code}
> set hive.cbo.enable=false;
> CREATE TABLE `testb`(
>    `cola` string COMMENT '',
>    `colb` string COMMENT '',
>    `colc` string COMMENT '')
> PARTITIONED BY (
>    `part1` string,
>    `part2` string,
>    `part3` string)
> STORED AS AVRO;
> CREATE TABLE `testa`(
>    `col1` string COMMENT '',
>    `col2` string COMMENT '',
>    `col3` string COMMENT '',
>    `col4` string COMMENT '',
>    `col5` string COMMENT '')
> PARTITIONED BY (
>    `part1` string,
>    `part2` string,
>    `part3` string)
> STORED AS AVRO;
> insert into testA partition (part1='US', part2='ABC', part3='123')
> values ('12.34', '100', '200', '300', 'abc'),
> ('12.341', '1001', '2001', '3001', 'abcd');
> insert into testA partition (part1='UK', part2='DEF', part3='123')
> values ('12.34', '100', '200', '300', 'abc'),
> ('12.341', '1001', '2001', '3001', 'abcd');
> insert into testA partition (part1='US', part2='DEF', part3='200')
> values ('12.34', '100', '200', '300', 'abc'),
> ('12.341', '1001', '2001', '3001', 'abcd');
> insert into testA partition (part1='CA', part2='ABC', part3='300')
> values ('12.34', '100', '200', '300', 'abc'),
> ('12.341', '1001', '2001', '3001', 'abcd');
> insert into testB partition (part1='CA', part2='ABC', part3='300')
> values ('600', '700', 'abc'), ('601', '701', 'abcd');
> insert into testB partition (part1='CA', part2='ABC', part3='400')
> values ( '600', '700', 'abc'), ( '601', '701', 'abcd');
> insert into testB partition (part1='UK', part2='PQR', part3='500')
> values ('600', '700', 'abc'), ('601', '701', 'abcd');
> insert into testB partition (part1='US', part2='DEF', part3='200')
> values ( '600', '700', 'abc'), ('601', '701', 'abcd');
> insert into testB partition (part1='US', part2='PQR', part3='123')
> values ( '600', '700', 'abc'), ('601', '701', 'abcd');
> -- views with deterministic functions
> create view viewDeterministicUDFA partitioned on (vpart1, vpart2, vpart3) as 
> select
>  cast(col1 as decimal(38,18)) as vcol1,
>  cast(col2 as decimal(38,18)) as vcol2,
>  cast(col3 as decimal(38,18)) as vcol3,
>  cast(col4 as decimal(38,18)) as vcol4,
>  cast(col5 as char(10)) as vcol5,
>  cast(part1 as char(2)) as vpart1,
>  cast(part2 as char(3)) as vpart2,
>  cast(part3 as char(3)) as vpart3
>  from testa
> where part1 in ('US', 'CA');
> create view viewDeterministicUDFB partitioned on (vpart1, vpart2, vpart3) as 
> select
>  cast(cola as decimal(38,18)) as vcolA,
>  cast(colb as decimal(38,18)) as vcolB,
>  cast(colc as char(10)) as vcolC,
>  cast(part1 as char(2)) as vpart1,
>  cast(part2 as char(3)) as vpart2,
>  cast(part3 as char(3)) as vpart3
>  from testb
> where part1 in ('US', 'CA');
> explain
> select vcol1, vcol2, vcol3, vcola, vcolb
> from viewDeterministicUDFA a inner join viewDeterministicUDFB b
> on a.vpart1 = b.vpart1
> and a.vpart2 = b.vpart2
> and a.vpart3 = b.vpart3
> and a.vpart1 = 'US'
> and a.vpart2 = 'DEF'
> and a.vpart3 = '200';
> {code}
> Plan where the CAST is not pushed down.
> {code}
> STAGE PLANS:
>   Stage: Stage-1
>     Map Reduce
>       Map Operator Tree:
>           TableScan
>             alias: testa
>             filterExpr: (part1) IN ('US', 'CA') (type: boolean)
>             Statistics: Num rows: 6 Data size: 13740 Basic stats: COMPLETE 
> Column stats: NONE
>             Select Operator
>               expressions: CAST( col1 AS decimal(38,18)) (type: 
> decimal(38,18)), CAST( col2 AS decimal(38,18)) (type: decimal(38,18)), CAST( 
> col3 AS decimal(38,18)) (type: decimal(38,18)), CAST( part1 AS CHAR(2)) 
> (type: char(2)), CAST( part2 AS CHAR(3)) (type: char(3)), CAST( part3 AS 
> CHAR(3)) (type: char(3))
>               outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7
>               Statistics: Num rows: 6 Data size: 13740 Basic stats: COMPLETE 
> Column stats: NONE
>               Filter Operator
>                 predicate: ((_col5 = 'US') and (_col6 = 'DEF') and (_col7 = 
> '200')) (type: boolean)
>                 Statistics: Num rows: 1 Data size: 2290 Basic stats: COMPLETE 
> Column stats: NONE
>                 Reduce Output Operator
>                   key expressions: 'US' (type: char(2)), 'DEF' (type: 
> char(3)), '200' (type: char(3))
>                   sort order: +++
>                   Map-reduce partition columns: 'US' (type: char(2)), 'DEF' 
> (type: char(3)), '200' (type: char(3))
>                   Statistics: Num rows: 1 Data size: 2290 Basic stats: 
> COMPLETE Column stats: NONE
>                   value expressions: _col0 (type: decimal(38,18)), _col1 
> (type: decimal(38,18)), _col2 (type: decimal(38,18))
>           TableScan
>             alias: testb
>             filterExpr: (part1) IN ('US', 'CA') (type: boolean)
>             Statistics: Num rows: 8 Data size: 12720 Basic stats: COMPLETE 
> Column stats: NONE
>             Select Operator
>               expressions: CAST( cola AS decimal(38,18)) (type: 
> decimal(38,18)), CAST( colb AS decimal(38,18)) (type: decimal(38,18)), CAST( 
> part1 AS CHAR(2)) (type: char(2)), CAST( part2 AS CHAR(3)) (type: char(3)), 
> CAST( part3 AS CHAR(3)) (type: char(3))
>               outputColumnNames: _col0, _col1, _col3, _col4, _col5
>               Statistics: Num rows: 8 Data size: 12720 Basic stats: COMPLETE 
> Column stats: NONE
>               Filter Operator
>                 predicate: ((_col5 = '200') and _col3 is not null and _col4 
> is not null) (type: boolean)
>                 Statistics: Num rows: 4 Data size: 6360 Basic stats: COMPLETE 
> Column stats: NONE
>                 Reduce Output Operator
>                   key expressions: _col3 (type: char(2)), _col4 (type: 
> char(3)), '200' (type: char(3))
>                   sort order: +++
>                   Map-reduce partition columns: _col3 (type: char(2)), _col4 
> (type: char(3)), '200' (type: char(3))
>                   Statistics: Num rows: 4 Data size: 6360 Basic stats: 
> COMPLETE Column stats: NONE
>                   value expressions: _col0 (type: decimal(38,18)), _col1 
> (type: decimal(38,18))
>       Reduce Operator Tree:
>         Join Operator
>           condition map:
>                Inner Join 0 to 1
>           keys:
>             0 _col5 (type: char(2)), _col6 (type: char(3)), _col7 (type: 
> char(3))
>             1 _col3 (type: char(2)), _col4 (type: char(3)), _col5 (type: 
> char(3))
>           outputColumnNames: _col0, _col1, _col2, _col8, _col9
>           Statistics: Num rows: 4 Data size: 6996 Basic stats: COMPLETE 
> Column stats: NONE
>           Select Operator
>             expressions: _col0 (type: decimal(38,18)), _col1 (type: 
> decimal(38,18)), _col2 (type: decimal(38,18)), _col8 (type: decimal(38,18)), 
> _col9 (type: decimal(38,18))
>             outputColumnNames: _col0, _col1, _col2, _col3, _col4
>             Statistics: Num rows: 4 Data size: 6996 Basic stats: COMPLETE 
> Column stats: NONE
>             File Output Operator
>               compressed: false
>               Statistics: Num rows: 4 Data size: 6996 Basic stats: COMPLETE 
> Column stats: NONE
>               table:
>                   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>                   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19940) Push predicates with deterministic UDFs with RBO

Reply via email to