[jira] [Commented] (HIVE-21739) Make metastore DB backward compatible with pre-catalog versions of hive.

2019-05-20 Thread Aditya Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844504#comment-16844504
 ] 

Aditya Shah commented on HIVE-21739:


[~alangates] for verifying the changes I was trying to fix (HIVE-21751) and run 
tests present in the testutils module. I also noticed the DB install tests in 
standalone-metastore (broken HIVE-21758). Do the standalone-metastore's 
DBinstall tests suffice the testing requirement for schema changes or the 
testutils tests are also performed.

Thanks, 

Aditya

> Make metastore DB backward compatible with pre-catalog versions of hive.
> 
>
> Key: HIVE-21739
> URL: https://issues.apache.org/jira/browse/HIVE-21739
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 1.2.0, 2.1.1
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21739.1.patch, HIVE-21739.2.patch, 
> HIVE-21739.3.patch, HIVE-21739.patch
>
>
> Since the addition of foreign key constraint between Database ('DBS') table 
> and catalogs ('CTLGS') table in HIVE-18755 we are able to run a simple create 
> database command with an older version of Metastore Server. This is due to 
> older versions having JDO schema as per older schema of 'DBS' which did not 
> have an additional 'CTLG_NAME' column.
> The error is as follows: 
> {code:java}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:Exception thrown flushing changes to datastore)
> 
> java.sql.BatchUpdateException: Cannot add or update a child row: a foreign 
> key constraint fails ("metastore_1238"."DBS", CONSTRAINT "CTLG_FK1" FOREIGN 
> KEY ("CTLG_NAME") REFERENCES "CTLGS" ("NAME"))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21739) Make metastore DB backward compatible with pre-catalog versions of hive.

2019-05-20 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-21739:
---
Status: Open  (was: Patch Available)

> Make metastore DB backward compatible with pre-catalog versions of hive.
> 
>
> Key: HIVE-21739
> URL: https://issues.apache.org/jira/browse/HIVE-21739
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.1, 1.2.0
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21739.1.patch, HIVE-21739.2.patch, 
> HIVE-21739.3.patch, HIVE-21739.patch
>
>
> Since the addition of foreign key constraint between Database ('DBS') table 
> and catalogs ('CTLGS') table in HIVE-18755 we are able to run a simple create 
> database command with an older version of Metastore Server. This is due to 
> older versions having JDO schema as per older schema of 'DBS' which did not 
> have an additional 'CTLG_NAME' column.
> The error is as follows: 
> {code:java}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:Exception thrown flushing changes to datastore)
> 
> java.sql.BatchUpdateException: Cannot add or update a child row: a foreign 
> key constraint fails ("metastore_1238"."DBS", CONSTRAINT "CTLG_FK1" FOREIGN 
> KEY ("CTLG_NAME") REFERENCES "CTLGS" ("NAME"))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21739) Make metastore DB backward compatible with pre-catalog versions of hive.

2019-05-20 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-21739:
---
Attachment: HIVE-21739.3.patch
Status: Patch Available  (was: Open)

Fixing unit tests

> Make metastore DB backward compatible with pre-catalog versions of hive.
> 
>
> Key: HIVE-21739
> URL: https://issues.apache.org/jira/browse/HIVE-21739
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.1, 1.2.0
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21739.1.patch, HIVE-21739.2.patch, 
> HIVE-21739.3.patch, HIVE-21739.patch
>
>
> Since the addition of foreign key constraint between Database ('DBS') table 
> and catalogs ('CTLGS') table in HIVE-18755 we are able to run a simple create 
> database command with an older version of Metastore Server. This is due to 
> older versions having JDO schema as per older schema of 'DBS' which did not 
> have an additional 'CTLG_NAME' column.
> The error is as follows: 
> {code:java}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:Exception thrown flushing changes to datastore)
> 
> java.sql.BatchUpdateException: Cannot add or update a child row: a foreign 
> key constraint fails ("metastore_1238"."DBS", CONSTRAINT "CTLG_FK1" FOREIGN 
> KEY ("CTLG_NAME") REFERENCES "CTLGS" ("NAME"))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-13781) Tez Job failed with FileNotFoundException when partition dir doesnt exists

2019-05-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844462#comment-16844462
 ] 

Hive QA commented on HIVE-13781:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
9s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
15s{color} | {color:blue} ql in master has 2258 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
45s{color} | {color:red} ql: The patch generated 26 new + 112 unchanged - 0 
fixed = 138 total (was 112) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 26m 11s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-17262/dev-support/hive-personality.sh
 |
| git revision | master / 8b8e702 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-17262/yetus/diff-checkstyle-ql.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-17262/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Tez Job failed with FileNotFoundException when partition dir doesnt exists 
> ---
>
> Key: HIVE-13781
> URL: https://issues.apache.org/jira/browse/HIVE-13781
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 0.14.0, 2.0.0, 3.1.1
>Reporter: Feng Yuan
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-13781.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> when i have a partitioned table a with partition "day",in metadata a have 
> partition day: 20160501,20160502,but partition 20160501's dir didnt exits.
> so when i use tez engine to run hive -e "select day,count(*) from a where 
> xx=xx group by day"
> hive throws FileNotFoundException.
> but mr work.
> repo eg:
> CREATE EXTERNAL TABLE `a`(
>   `a` string)
> PARTITIONED BY ( 
>   `l_date` string);
> insert overwrite table a partition(l_date='2016-04-08') values (1),(2);
> insert overwrite table a partition(l_date='2016-04-09') values (1),(2);
> hadoop dfs -rm -r -f /warehouse/a/l_date=2016-04-09
> select l_date,count(*) from a where a='1' group by l_date;
> error:
> ut: a initializer failed, vertex=vertex_1463493135662_10445_1_00 [Map 1], 
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
> hdfs://bfdhadoopcool/warehouse/test.db/a/l_date=2015-04-09
>   at 
> 

[jira] [Commented] (HIVE-21760) Sharedwork optimization should be bypassed for SMB joins

2019-05-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1689#comment-1689
 ] 

Hive QA commented on HIVE-21760:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12969213/HIVE-21760.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 16057 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/17261/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/17261/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-17261/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12969213 - PreCommit-HIVE-Build

> Sharedwork optimization should be bypassed for SMB joins
> 
>
> Key: HIVE-21760
> URL: https://issues.apache.org/jira/browse/HIVE-21760
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21760.1.patch
>
>
> SMB join introduces DUMMY OPERATOR, if shared work optimizer merges plan 
> containing dummy operator task generation fails.
>  I am not sure what is the root cause of failure in task generation but 
> presumably it has some assumption regarding plan containing dummy operator
> *Reproducer*
> {code:sql}
> SELECT `t`.`p_name`
> FROM (SELECT `p_name`, `p_type`, `p_size` + 1 AS `size`
> FROM `part`) AS `t`
> LEFT JOIN (SELECT `t5`.`size`, `t2`.`c`, `t2`.`ck`
> FROM (SELECT `p_size` + 1 AS `+`, COUNT(*) AS `c`, COUNT(`p_type`) AS `ck`
> FROM `part`
> WHERE `p_size` IS NOT NULL
> GROUP BY `p_size` + 1) AS `t2`
> INNER JOIN (SELECT `p_size` + 1 AS `size`
> FROM `part`
> WHERE `p_size` IS NOT NULL
> GROUP BY `p_size` + 1) AS `t5` ON `t2`.`+` = `t5`.`size`) AS `t6` ON 
> `t`.`size` = `t6`.`size`
> LEFT JOIN (SELECT `t9`.`p_type`, `t12`.`size`, TRUE AS `$f2`
> FROM (SELECT `p_type`, `p_size` + 1 AS `+`
> FROM `part`
> WHERE `p_size` IS NOT NULL AND `p_type` IS NOT NULL
> GROUP BY `p_type`, `p_size` + 1) AS `t9`
> INNER JOIN (SELECT `p_size` + 1 AS `size`
> FROM `part`
> WHERE `p_size` IS NOT NULL
> GROUP BY `p_size` + 1) AS `t12` ON `t9`.`+` = `t12`.`size`) AS `t14` ON 
> `t`.`p_type` = `t14`.`p_type` AND `t`.`size` = `t14`.`size`
> WHERE (`t14`.`$f2` IS NULL OR `t6`.`c` = 0 OR `t6`.`c` IS NULL) AND 
> (`t`.`p_type` IS NOT NULL OR `t6`.`c` = 0 OR `t6`.`c` IS NULL OR `t14`.`$f2` 
> IS NOT NULL) AND (`t6`.`ck` < `t6`.`c` IS NOT TRUE OR `t6`.`c` = 0 OR 
> `t6`.`c` IS NULL OR `t14`.`$f2` IS NOT NULL OR `t`.`p_type` IS NULL);
> {code}
> {code:java}
> java.lang.NullPointerException
>   at org.apache.hadoop.hive.ql.plan.TezWork.connect(TezWork.java:376)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWork.process(GenTezWork.java:470)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:90)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:72)
>   at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:641)
>   at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:278)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12562)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:370)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:289)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:671)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1905)
>   at 

[jira] [Commented] (HIVE-21760) Sharedwork optimization should be bypassed for SMB joins

2019-05-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844436#comment-16844436
 ] 

Hive QA commented on HIVE-21760:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
17s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
19s{color} | {color:blue} ql in master has 2258 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 26m 12s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-17261/dev-support/hive-personality.sh
 |
| git revision | master / 8b8e702 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-17261/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Sharedwork optimization should be bypassed for SMB joins
> 
>
> Key: HIVE-21760
> URL: https://issues.apache.org/jira/browse/HIVE-21760
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21760.1.patch
>
>
> SMB join introduces DUMMY OPERATOR, if shared work optimizer merges plan 
> containing dummy operator task generation fails.
>  I am not sure what is the root cause of failure in task generation but 
> presumably it has some assumption regarding plan containing dummy operator
> *Reproducer*
> {code:sql}
> SELECT `t`.`p_name`
> FROM (SELECT `p_name`, `p_type`, `p_size` + 1 AS `size`
> FROM `part`) AS `t`
> LEFT JOIN (SELECT `t5`.`size`, `t2`.`c`, `t2`.`ck`
> FROM (SELECT `p_size` + 1 AS `+`, COUNT(*) AS `c`, COUNT(`p_type`) AS `ck`
> FROM `part`
> WHERE `p_size` IS NOT NULL
> GROUP BY `p_size` + 1) AS `t2`
> INNER JOIN (SELECT `p_size` + 1 AS `size`
> FROM `part`
> WHERE `p_size` IS NOT NULL
> GROUP BY `p_size` + 1) AS `t5` ON `t2`.`+` = `t5`.`size`) AS `t6` ON 
> `t`.`size` = `t6`.`size`
> LEFT JOIN (SELECT `t9`.`p_type`, `t12`.`size`, TRUE AS `$f2`
> FROM (SELECT `p_type`, `p_size` + 1 AS `+`
> FROM `part`
> WHERE `p_size` IS NOT NULL AND `p_type` IS NOT NULL
> GROUP BY `p_type`, `p_size` + 1) AS `t9`
> INNER JOIN (SELECT `p_size` + 1 AS `size`
> FROM `part`
> WHERE `p_size` IS NOT NULL
> GROUP BY `p_size` + 1) AS `t12` ON `t9`.`+` = `t12`.`size`) AS `t14` ON 
> `t`.`p_type` = `t14`.`p_type` AND `t`.`size` = `t14`.`size`
> WHERE 

[jira] [Updated] (HIVE-13781) Tez Job failed with FileNotFoundException when partition dir doesnt exists

2019-05-20 Thread zhangbutao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-13781:
--
Environment: (was: hive 0.14.0 ,tez-0.5.2,hadoop 2.6.0)

> Tez Job failed with FileNotFoundException when partition dir doesnt exists 
> ---
>
> Key: HIVE-13781
> URL: https://issues.apache.org/jira/browse/HIVE-13781
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 0.14.0, 2.0.0, 3.1.1
>Reporter: Feng Yuan
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-13781.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> when i have a partitioned table a with partition "day",in metadata a have 
> partition day: 20160501,20160502,but partition 20160501's dir didnt exits.
> so when i use tez engine to run hive -e "select day,count(*) from a where 
> xx=xx group by day"
> hive throws FileNotFoundException.
> but mr work.
> repo eg:
> CREATE EXTERNAL TABLE `a`(
>   `a` string)
> PARTITIONED BY ( 
>   `l_date` string);
> insert overwrite table a partition(l_date='2016-04-08') values (1),(2);
> insert overwrite table a partition(l_date='2016-04-09') values (1),(2);
> hadoop dfs -rm -r -f /warehouse/a/l_date=2016-04-09
> select l_date,count(*) from a where a='1' group by l_date;
> error:
> ut: a initializer failed, vertex=vertex_1463493135662_10445_1_00 [Map 1], 
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
> hdfs://bfdhadoopcool/warehouse/test.db/a/l_date=2015-04-09
>   at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:129)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-13781) Tez Job failed with FileNotFoundException when partition dir doesnt exists

2019-05-20 Thread zhangbutao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-13781:
--
Attachment: HIVE-13781.1.patch
Status: Patch Available  (was: Open)

> Tez Job failed with FileNotFoundException when partition dir doesnt exists 
> ---
>
> Key: HIVE-13781
> URL: https://issues.apache.org/jira/browse/HIVE-13781
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 3.1.1, 2.0.0, 0.14.0
> Environment: hive 0.14.0 ,tez-0.5.2,hadoop 2.6.0
>Reporter: Feng Yuan
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-13781.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> when i have a partitioned table a with partition "day",in metadata a have 
> partition day: 20160501,20160502,but partition 20160501's dir didnt exits.
> so when i use tez engine to run hive -e "select day,count(*) from a where 
> xx=xx group by day"
> hive throws FileNotFoundException.
> but mr work.
> repo eg:
> CREATE EXTERNAL TABLE `a`(
>   `a` string)
> PARTITIONED BY ( 
>   `l_date` string);
> insert overwrite table a partition(l_date='2016-04-08') values (1),(2);
> insert overwrite table a partition(l_date='2016-04-09') values (1),(2);
> hadoop dfs -rm -r -f /warehouse/a/l_date=2016-04-09
> select l_date,count(*) from a where a='1' group by l_date;
> error:
> ut: a initializer failed, vertex=vertex_1463493135662_10445_1_00 [Map 1], 
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
> hdfs://bfdhadoopcool/warehouse/test.db/a/l_date=2015-04-09
>   at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:129)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-13781) Tez Job failed with FileNotFoundException when partition dir doesnt exists

2019-05-20 Thread zhangbutao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-13781:
--
Attachment: (was: HIVE-13781.1.patch)

> Tez Job failed with FileNotFoundException when partition dir doesnt exists 
> ---
>
> Key: HIVE-13781
> URL: https://issues.apache.org/jira/browse/HIVE-13781
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 0.14.0, 2.0.0, 3.1.1
> Environment: hive 0.14.0 ,tez-0.5.2,hadoop 2.6.0
>Reporter: Feng Yuan
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> when i have a partitioned table a with partition "day",in metadata a have 
> partition day: 20160501,20160502,but partition 20160501's dir didnt exits.
> so when i use tez engine to run hive -e "select day,count(*) from a where 
> xx=xx group by day"
> hive throws FileNotFoundException.
> but mr work.
> repo eg:
> CREATE EXTERNAL TABLE `a`(
>   `a` string)
> PARTITIONED BY ( 
>   `l_date` string);
> insert overwrite table a partition(l_date='2016-04-08') values (1),(2);
> insert overwrite table a partition(l_date='2016-04-09') values (1),(2);
> hadoop dfs -rm -r -f /warehouse/a/l_date=2016-04-09
> select l_date,count(*) from a where a='1' group by l_date;
> error:
> ut: a initializer failed, vertex=vertex_1463493135662_10445_1_00 [Map 1], 
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
> hdfs://bfdhadoopcool/warehouse/test.db/a/l_date=2015-04-09
>   at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:129)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21760) Sharedwork optimization should be bypassed for SMB joins

2019-05-20 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844423#comment-16844423
 ] 

Jesus Camacho Rodriguez commented on HIVE-21760:


+1 (pending tests)

Can you create a clone of this issue as a follow-up to enable 
SharedWorkOptimizer for SMB joins? Thanks

> Sharedwork optimization should be bypassed for SMB joins
> 
>
> Key: HIVE-21760
> URL: https://issues.apache.org/jira/browse/HIVE-21760
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21760.1.patch
>
>
> SMB join introduces DUMMY OPERATOR, if shared work optimizer merges plan 
> containing dummy operator task generation fails.
>  I am not sure what is the root cause of failure in task generation but 
> presumably it has some assumption regarding plan containing dummy operator
> *Reproducer*
> {code:sql}
> SELECT `t`.`p_name`
> FROM (SELECT `p_name`, `p_type`, `p_size` + 1 AS `size`
> FROM `part`) AS `t`
> LEFT JOIN (SELECT `t5`.`size`, `t2`.`c`, `t2`.`ck`
> FROM (SELECT `p_size` + 1 AS `+`, COUNT(*) AS `c`, COUNT(`p_type`) AS `ck`
> FROM `part`
> WHERE `p_size` IS NOT NULL
> GROUP BY `p_size` + 1) AS `t2`
> INNER JOIN (SELECT `p_size` + 1 AS `size`
> FROM `part`
> WHERE `p_size` IS NOT NULL
> GROUP BY `p_size` + 1) AS `t5` ON `t2`.`+` = `t5`.`size`) AS `t6` ON 
> `t`.`size` = `t6`.`size`
> LEFT JOIN (SELECT `t9`.`p_type`, `t12`.`size`, TRUE AS `$f2`
> FROM (SELECT `p_type`, `p_size` + 1 AS `+`
> FROM `part`
> WHERE `p_size` IS NOT NULL AND `p_type` IS NOT NULL
> GROUP BY `p_type`, `p_size` + 1) AS `t9`
> INNER JOIN (SELECT `p_size` + 1 AS `size`
> FROM `part`
> WHERE `p_size` IS NOT NULL
> GROUP BY `p_size` + 1) AS `t12` ON `t9`.`+` = `t12`.`size`) AS `t14` ON 
> `t`.`p_type` = `t14`.`p_type` AND `t`.`size` = `t14`.`size`
> WHERE (`t14`.`$f2` IS NULL OR `t6`.`c` = 0 OR `t6`.`c` IS NULL) AND 
> (`t`.`p_type` IS NOT NULL OR `t6`.`c` = 0 OR `t6`.`c` IS NULL OR `t14`.`$f2` 
> IS NOT NULL) AND (`t6`.`ck` < `t6`.`c` IS NOT TRUE OR `t6`.`c` = 0 OR 
> `t6`.`c` IS NULL OR `t14`.`$f2` IS NOT NULL OR `t`.`p_type` IS NULL);
> {code}
> {code:java}
> java.lang.NullPointerException
>   at org.apache.hadoop.hive.ql.plan.TezWork.connect(TezWork.java:376)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWork.process(GenTezWork.java:470)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:90)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:72)
>   at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:641)
>   at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:278)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12562)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:370)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:289)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:671)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1905)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1852)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1847)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:219)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:242)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:189)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:340)
>   at 
> 

[jira] [Updated] (HIVE-21760) Sharedwork optimization should be bypassed for SMB joins

2019-05-20 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21760:
---
Attachment: HIVE-21760.1.patch

> Sharedwork optimization should be bypassed for SMB joins
> 
>
> Key: HIVE-21760
> URL: https://issues.apache.org/jira/browse/HIVE-21760
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21760.1.patch
>
>
> SMB join introduces DUMMY OPERATOR, if shared work optimizer merges plan 
> containing dummy operator task generation fails.
>  I am not sure what is the root cause of failure in task generation but 
> presumably it has some assumption regarding plan containing dummy operator
> *Reproducer*
> {code:sql}
> SELECT `t`.`p_name`
> FROM (SELECT `p_name`, `p_type`, `p_size` + 1 AS `size`
> FROM `part`) AS `t`
> LEFT JOIN (SELECT `t5`.`size`, `t2`.`c`, `t2`.`ck`
> FROM (SELECT `p_size` + 1 AS `+`, COUNT(*) AS `c`, COUNT(`p_type`) AS `ck`
> FROM `part`
> WHERE `p_size` IS NOT NULL
> GROUP BY `p_size` + 1) AS `t2`
> INNER JOIN (SELECT `p_size` + 1 AS `size`
> FROM `part`
> WHERE `p_size` IS NOT NULL
> GROUP BY `p_size` + 1) AS `t5` ON `t2`.`+` = `t5`.`size`) AS `t6` ON 
> `t`.`size` = `t6`.`size`
> LEFT JOIN (SELECT `t9`.`p_type`, `t12`.`size`, TRUE AS `$f2`
> FROM (SELECT `p_type`, `p_size` + 1 AS `+`
> FROM `part`
> WHERE `p_size` IS NOT NULL AND `p_type` IS NOT NULL
> GROUP BY `p_type`, `p_size` + 1) AS `t9`
> INNER JOIN (SELECT `p_size` + 1 AS `size`
> FROM `part`
> WHERE `p_size` IS NOT NULL
> GROUP BY `p_size` + 1) AS `t12` ON `t9`.`+` = `t12`.`size`) AS `t14` ON 
> `t`.`p_type` = `t14`.`p_type` AND `t`.`size` = `t14`.`size`
> WHERE (`t14`.`$f2` IS NULL OR `t6`.`c` = 0 OR `t6`.`c` IS NULL) AND 
> (`t`.`p_type` IS NOT NULL OR `t6`.`c` = 0 OR `t6`.`c` IS NULL OR `t14`.`$f2` 
> IS NOT NULL) AND (`t6`.`ck` < `t6`.`c` IS NOT TRUE OR `t6`.`c` = 0 OR 
> `t6`.`c` IS NULL OR `t14`.`$f2` IS NOT NULL OR `t`.`p_type` IS NULL);
> {code}
> {code:java}
> java.lang.NullPointerException
>   at org.apache.hadoop.hive.ql.plan.TezWork.connect(TezWork.java:376)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWork.process(GenTezWork.java:470)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:90)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:72)
>   at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:641)
>   at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:278)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12562)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:370)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:289)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:671)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1905)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1852)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1847)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:219)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:242)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:189)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:340)
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:676)
>   at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:647)
>   at 
> 

[jira] [Updated] (HIVE-21760) Sharedwork optimization should be bypassed for SMB joins

2019-05-20 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21760:
---
Status: Patch Available  (was: Open)

> Sharedwork optimization should be bypassed for SMB joins
> 
>
> Key: HIVE-21760
> URL: https://issues.apache.org/jira/browse/HIVE-21760
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21760.1.patch
>
>
> SMB join introduces DUMMY OPERATOR, if shared work optimizer merges plan 
> containing dummy operator task generation fails.
>  I am not sure what is the root cause of failure in task generation but 
> presumably it has some assumption regarding plan containing dummy operator
> *Reproducer*
> {code:sql}
> SELECT `t`.`p_name`
> FROM (SELECT `p_name`, `p_type`, `p_size` + 1 AS `size`
> FROM `part`) AS `t`
> LEFT JOIN (SELECT `t5`.`size`, `t2`.`c`, `t2`.`ck`
> FROM (SELECT `p_size` + 1 AS `+`, COUNT(*) AS `c`, COUNT(`p_type`) AS `ck`
> FROM `part`
> WHERE `p_size` IS NOT NULL
> GROUP BY `p_size` + 1) AS `t2`
> INNER JOIN (SELECT `p_size` + 1 AS `size`
> FROM `part`
> WHERE `p_size` IS NOT NULL
> GROUP BY `p_size` + 1) AS `t5` ON `t2`.`+` = `t5`.`size`) AS `t6` ON 
> `t`.`size` = `t6`.`size`
> LEFT JOIN (SELECT `t9`.`p_type`, `t12`.`size`, TRUE AS `$f2`
> FROM (SELECT `p_type`, `p_size` + 1 AS `+`
> FROM `part`
> WHERE `p_size` IS NOT NULL AND `p_type` IS NOT NULL
> GROUP BY `p_type`, `p_size` + 1) AS `t9`
> INNER JOIN (SELECT `p_size` + 1 AS `size`
> FROM `part`
> WHERE `p_size` IS NOT NULL
> GROUP BY `p_size` + 1) AS `t12` ON `t9`.`+` = `t12`.`size`) AS `t14` ON 
> `t`.`p_type` = `t14`.`p_type` AND `t`.`size` = `t14`.`size`
> WHERE (`t14`.`$f2` IS NULL OR `t6`.`c` = 0 OR `t6`.`c` IS NULL) AND 
> (`t`.`p_type` IS NOT NULL OR `t6`.`c` = 0 OR `t6`.`c` IS NULL OR `t14`.`$f2` 
> IS NOT NULL) AND (`t6`.`ck` < `t6`.`c` IS NOT TRUE OR `t6`.`c` = 0 OR 
> `t6`.`c` IS NULL OR `t14`.`$f2` IS NOT NULL OR `t`.`p_type` IS NULL);
> {code}
> {code:java}
> java.lang.NullPointerException
>   at org.apache.hadoop.hive.ql.plan.TezWork.connect(TezWork.java:376)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWork.process(GenTezWork.java:470)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:90)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:72)
>   at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:641)
>   at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:278)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12562)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:370)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:289)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:671)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1905)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1852)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1847)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:219)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:242)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:189)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:340)
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:676)
>   at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:647)
>   at 
> 

[jira] [Updated] (HIVE-21760) Sharedwork optimization should be bypassed for SMB joins

2019-05-20 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21760:
---
Description: 
SMB join introduces DUMMY OPERATOR, if shared work optimizer merges plan 
containing dummy operator task generation fails.
 I am not sure what is the root cause of failure in task generation but 
presumably it has some assumption regarding plan containing dummy operator

*Reproducer*
{code:sql}
SELECT `t`.`p_name`
FROM (SELECT `p_name`, `p_type`, `p_size` + 1 AS `size`
FROM `part`) AS `t`
LEFT JOIN (SELECT `t5`.`size`, `t2`.`c`, `t2`.`ck`
FROM (SELECT `p_size` + 1 AS `+`, COUNT(*) AS `c`, COUNT(`p_type`) AS `ck`
FROM `part`
WHERE `p_size` IS NOT NULL
GROUP BY `p_size` + 1) AS `t2`
INNER JOIN (SELECT `p_size` + 1 AS `size`
FROM `part`
WHERE `p_size` IS NOT NULL
GROUP BY `p_size` + 1) AS `t5` ON `t2`.`+` = `t5`.`size`) AS `t6` ON `t`.`size` 
= `t6`.`size`
LEFT JOIN (SELECT `t9`.`p_type`, `t12`.`size`, TRUE AS `$f2`
FROM (SELECT `p_type`, `p_size` + 1 AS `+`
FROM `part`
WHERE `p_size` IS NOT NULL AND `p_type` IS NOT NULL
GROUP BY `p_type`, `p_size` + 1) AS `t9`
INNER JOIN (SELECT `p_size` + 1 AS `size`
FROM `part`
WHERE `p_size` IS NOT NULL
GROUP BY `p_size` + 1) AS `t12` ON `t9`.`+` = `t12`.`size`) AS `t14` ON 
`t`.`p_type` = `t14`.`p_type` AND `t`.`size` = `t14`.`size`
WHERE (`t14`.`$f2` IS NULL OR `t6`.`c` = 0 OR `t6`.`c` IS NULL) AND 
(`t`.`p_type` IS NOT NULL OR `t6`.`c` = 0 OR `t6`.`c` IS NULL OR `t14`.`$f2` IS 
NOT NULL) AND (`t6`.`ck` < `t6`.`c` IS NOT TRUE OR `t6`.`c` = 0 OR `t6`.`c` IS 
NULL OR `t14`.`$f2` IS NOT NULL OR `t`.`p_type` IS NULL);
{code}
{code:java}
java.lang.NullPointerException
at org.apache.hadoop.hive.ql.plan.TezWork.connect(TezWork.java:376)
at 
org.apache.hadoop.hive.ql.parse.GenTezWork.process(GenTezWork.java:470)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
at 
org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:90)
at 
org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
at 
org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
at 
org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
at 
org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
at 
org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
at 
org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
at 
org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:72)
at 
org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:641)
at 
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:278)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12562)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:370)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:289)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:671)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1905)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1852)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1847)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:219)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:242)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:189)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:340)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:676)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:647)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:182)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)
at 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:59)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 

[jira] [Updated] (HIVE-21760) Sharedwork optimization should be bypassed for SMB joins

2019-05-20 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21760:
---
Component/s: Query Planning

> Sharedwork optimization should be bypassed for SMB joins
> 
>
> Key: HIVE-21760
> URL: https://issues.apache.org/jira/browse/HIVE-21760
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>
> SMB join introduces DUMMY OPERATOR, if shared work optimizer merges plan 
> containing dummy operator task generation fails.
> I am not sure what is the root cause of failure in task generation but 
> presumably it has some assumption regarding plan containing dummy operator
> ==Reproducer==
> {code:sql}
> SELECT `t`.`p_name`
> FROM (SELECT `p_name`, `p_type`, `p_size` + 1 AS `size`
> FROM `part`) AS `t`
> LEFT JOIN (SELECT `t5`.`size`, `t2`.`c`, `t2`.`ck`
> FROM (SELECT `p_size` + 1 AS `+`, COUNT(*) AS `c`, COUNT(`p_type`) AS `ck`
> FROM `part`
> WHERE `p_size` IS NOT NULL
> GROUP BY `p_size` + 1) AS `t2`
> INNER JOIN (SELECT `p_size` + 1 AS `size`
> FROM `part`
> WHERE `p_size` IS NOT NULL
> GROUP BY `p_size` + 1) AS `t5` ON `t2`.`+` = `t5`.`size`) AS `t6` ON 
> `t`.`size` = `t6`.`size`
> LEFT JOIN (SELECT `t9`.`p_type`, `t12`.`size`, TRUE AS `$f2`
> FROM (SELECT `p_type`, `p_size` + 1 AS `+`
> FROM `part`
> WHERE `p_size` IS NOT NULL AND `p_type` IS NOT NULL
> GROUP BY `p_type`, `p_size` + 1) AS `t9`
> INNER JOIN (SELECT `p_size` + 1 AS `size`
> FROM `part`
> WHERE `p_size` IS NOT NULL
> GROUP BY `p_size` + 1) AS `t12` ON `t9`.`+` = `t12`.`size`) AS `t14` ON 
> `t`.`p_type` = `t14`.`p_type` AND `t`.`size` = `t14`.`size`
> WHERE (`t14`.`$f2` IS NULL OR `t6`.`c` = 0 OR `t6`.`c` IS NULL) AND 
> (`t`.`p_type` IS NOT NULL OR `t6`.`c` = 0 OR `t6`.`c` IS NULL OR `t14`.`$f2` 
> IS NOT NULL) AND (`t6`.`ck` < `t6`.`c` IS NOT TRUE OR `t6`.`c` = 0 OR 
> `t6`.`c` IS NULL OR `t14`.`$f2` IS NOT NULL OR `t`.`p_type` IS NULL);
> {code}
> {code}
> java.lang.NullPointerException
>   at org.apache.hadoop.hive.ql.plan.TezWork.connect(TezWork.java:376)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWork.process(GenTezWork.java:470)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:90)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
>   at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:72)
>   at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:641)
>   at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:278)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12562)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:370)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:289)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:671)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1905)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1852)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1847)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:219)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:242)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:189)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:340)
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:676)
>   at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:647)
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:182)
>   at 
> 

[jira] [Updated] (HIVE-21760) Sharedwork optimization should be bypassed for SMB joins

2019-05-20 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21760:
---
Description: 
SMB join introduces DUMMY OPERATOR, if shared work optimizer merges plan 
containing dummy operator task generation fails.
I am not sure what is the root cause of failure in task generation but 
presumably it has some assumption regarding plan containing dummy operator

==Reproducer==
{code:sql}
SELECT `t`.`p_name`
FROM (SELECT `p_name`, `p_type`, `p_size` + 1 AS `size`
FROM `part`) AS `t`
LEFT JOIN (SELECT `t5`.`size`, `t2`.`c`, `t2`.`ck`
FROM (SELECT `p_size` + 1 AS `+`, COUNT(*) AS `c`, COUNT(`p_type`) AS `ck`
FROM `part`
WHERE `p_size` IS NOT NULL
GROUP BY `p_size` + 1) AS `t2`
INNER JOIN (SELECT `p_size` + 1 AS `size`
FROM `part`
WHERE `p_size` IS NOT NULL
GROUP BY `p_size` + 1) AS `t5` ON `t2`.`+` = `t5`.`size`) AS `t6` ON `t`.`size` 
= `t6`.`size`
LEFT JOIN (SELECT `t9`.`p_type`, `t12`.`size`, TRUE AS `$f2`
FROM (SELECT `p_type`, `p_size` + 1 AS `+`
FROM `part`
WHERE `p_size` IS NOT NULL AND `p_type` IS NOT NULL
GROUP BY `p_type`, `p_size` + 1) AS `t9`
INNER JOIN (SELECT `p_size` + 1 AS `size`
FROM `part`
WHERE `p_size` IS NOT NULL
GROUP BY `p_size` + 1) AS `t12` ON `t9`.`+` = `t12`.`size`) AS `t14` ON 
`t`.`p_type` = `t14`.`p_type` AND `t`.`size` = `t14`.`size`
WHERE (`t14`.`$f2` IS NULL OR `t6`.`c` = 0 OR `t6`.`c` IS NULL) AND 
(`t`.`p_type` IS NOT NULL OR `t6`.`c` = 0 OR `t6`.`c` IS NULL OR `t14`.`$f2` IS 
NOT NULL) AND (`t6`.`ck` < `t6`.`c` IS NOT TRUE OR `t6`.`c` = 0 OR `t6`.`c` IS 
NULL OR `t14`.`$f2` IS NOT NULL OR `t`.`p_type` IS NULL);
{code}

{code}
java.lang.NullPointerException
at org.apache.hadoop.hive.ql.plan.TezWork.connect(TezWork.java:376)
at 
org.apache.hadoop.hive.ql.parse.GenTezWork.process(GenTezWork.java:470)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
at 
org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:90)
at 
org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
at 
org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
at 
org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
at 
org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
at 
org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
at 
org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
at 
org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:72)
at 
org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:641)
at 
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:278)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12562)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:370)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:289)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:671)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1905)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1852)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1847)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:219)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:242)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:189)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:340)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:676)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:647)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:182)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)
at 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:59)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 

[jira] [Assigned] (HIVE-21760) Sharedwork optimization should be bypassed for SMB joins

2019-05-20 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg reassigned HIVE-21760:
--


> Sharedwork optimization should be bypassed for SMB joins
> 
>
> Key: HIVE-21760
> URL: https://issues.apache.org/jira/browse/HIVE-21760
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>
> SMB join introduces DUMMY OPERATOR, if shared work optimizer merges plan 
> containing dummy operator task generation fails.
> I am not sure what is the root cause of failure in task generation but 
> presumably it has some assumption regarding plan containing dummy operator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21758) DBInstall tests broken on master and branch-3.1

2019-05-20 Thread Alan Gates (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned HIVE-21758:
-


> DBInstall tests broken on master and branch-3.1
> ---
>
> Key: HIVE-21758
> URL: https://issues.apache.org/jira/browse/HIVE-21758
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Tests
>Affects Versions: 3.1.1
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Major
>
> The Oracle and SqlServer install and upgrade tests in standalone-metastore 
> fail in master and branch-3.1.  In the Oracle case it appears the docker 
> container that was used no longer exists.  For SqlServer the cause of the 
> failures is not immediately clear.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21757) ACID: use a new write id for compaction's output instead of the visibility id

2019-05-20 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844339#comment-16844339
 ] 

Gopal V commented on HIVE-21757:


Found the other data-loss comment in RB - https://reviews.apache.org/r/66485/

"That's a REPL event & the trouble with IOW is that it also destroys commits in 
progress with the new base_n files, where n > all previous open txns."

So even if you're compacting transaction 10, if transaction 11,12 are open, you 
cannot create a base_13 until 11 & 12 are done. The compactor now blocks behind 
writes it is not compacting either.

> ACID: use a new write id for compaction's output instead of the visibility id
> -
>
> Key: HIVE-21757
> URL: https://issues.apache.org/jira/browse/HIVE-21757
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0
>Reporter: Vaibhav Gumashta
>Priority: Major
>
> HIVE-20823 added support for running compaction within a transaction. To 
> control the visibility of the output directory, it uses 
> base_writeId_visibilityId, where visibilityId is the transaction id of the 
> transaction that the compactor ran in. Perhaps we can keep using the 
> base_writeId format, by allocating a new writeId for the compactor and 
> creating the new base/delta with that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-21757) ACID: use a new write id for compaction's output instead of the visibility id

2019-05-20 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844325#comment-16844325
 ] 

Gopal V edited comment on HIVE-21757 at 5/20/19 9:50 PM:
-

Internally generating write-ids also broke the query-cache, the materialized 
view staleness check, but those were really performance features, not related 
to data-loss due to this idea.

Compaction of files has to be strictly local to warehouse and ideally 
idempotent on the table's versioning.


was (Author: gopalv):
Internally generating write-ids also broke the query-cache, the materialized 
view staleness check, but those were really performance features, not related 
to data-loss due to this idea.

> ACID: use a new write id for compaction's output instead of the visibility id
> -
>
> Key: HIVE-21757
> URL: https://issues.apache.org/jira/browse/HIVE-21757
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0
>Reporter: Vaibhav Gumashta
>Priority: Major
>
> HIVE-20823 added support for running compaction within a transaction. To 
> control the visibility of the output directory, it uses 
> base_writeId_visibilityId, where visibilityId is the transaction id of the 
> transaction that the compactor ran in. Perhaps we can keep using the 
> base_writeId format, by allocating a new writeId for the compactor and 
> creating the new base/delta with that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21757) ACID: use a new write id for compaction's output instead of the visibility id

2019-05-20 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844325#comment-16844325
 ] 

Gopal V commented on HIVE-21757:


Internally generating write-ids also broke the query-cache, the materialized 
view staleness check, but those were really performance features, not related 
to data-loss due to this idea.

> ACID: use a new write id for compaction's output instead of the visibility id
> -
>
> Key: HIVE-21757
> URL: https://issues.apache.org/jira/browse/HIVE-21757
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0
>Reporter: Vaibhav Gumashta
>Priority: Major
>
> HIVE-20823 added support for running compaction within a transaction. To 
> control the visibility of the output directory, it uses 
> base_writeId_visibilityId, where visibilityId is the transaction id of the 
> transaction that the compactor ran in. Perhaps we can keep using the 
> base_writeId format, by allocating a new writeId for the compactor and 
> creating the new base/delta with that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21757) ACID: use a new write id for compaction's output instead of the visibility id

2019-05-20 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844324#comment-16844324
 ] 

Gopal V commented on HIVE-21757:


-1 on this.

This was tried out before we switched to the visibility local-ids.

The knock-on effect was on replication, where the same write-id can get used on 
the remote side before the target side uses it for a real update.

> ACID: use a new write id for compaction's output instead of the visibility id
> -
>
> Key: HIVE-21757
> URL: https://issues.apache.org/jira/browse/HIVE-21757
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0
>Reporter: Vaibhav Gumashta
>Priority: Major
>
> HIVE-20823 added support for running compaction within a transaction. To 
> control the visibility of the output directory, it uses 
> base_writeId_visibilityId, where visibilityId is the transaction id of the 
> transaction that the compactor ran in. Perhaps we can keep using the 
> base_writeId format, by allocating a new writeId for the compactor and 
> creating the new base/delta with that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21757) ACID: use a new write id for compaction's output instead of the visibility id

2019-05-20 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844284#comment-16844284
 ] 

Todd Lipcon commented on HIVE-21757:


This is particularly important if we want to be able to cache the file listings 
for a table based on the table's latest write ID. If the compactor doesn't 
change write IDs, but changes the set of files, then that caching strategy 
becomes impossible. Given that file listing is pretty expensive on cloud 
stores, caching them can be quite useful for low-latency queries.

It seems likely this could cause problems for things like replication as well.

> ACID: use a new write id for compaction's output instead of the visibility id
> -
>
> Key: HIVE-21757
> URL: https://issues.apache.org/jira/browse/HIVE-21757
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0
>Reporter: Vaibhav Gumashta
>Priority: Major
>
> HIVE-20823 added support for running compaction within a transaction. To 
> control the visibility of the output directory, it uses 
> base_writeId_visibilityId, where visibilityId is the transaction id of the 
> transaction that the compactor ran in. Perhaps we can keep using the 
> base_writeId format, by allocating a new writeId for the compactor and 
> creating the new base/delta with that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18338) [Client, JDBC] Expose async interface through hive JDBC.

2019-05-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844268#comment-16844268
 ] 

Hive QA commented on HIVE-18338:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12969189/HIVE-18338.patch.4

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 16059 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.metastore.TestPartitionManagement.testPartitionDiscoveryTransactionalTable
 (batchId=222)
org.apache.hadoop.hive.ql.parse.TestReplAcidTablesBootstrapWithJsonMessage.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites
 (batchId=248)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/17260/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/17260/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-17260/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12969189 - PreCommit-HIVE-Build

> [Client, JDBC] Expose async interface through hive JDBC.
> 
>
> Key: HIVE-18338
> URL: https://issues.apache.org/jira/browse/HIVE-18338
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, JDBC
>Affects Versions: 2.3.2
>Reporter: Amruth S
>Assignee: Amruth S
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-18338.patch, HIVE-18338.patch.1, 
> HIVE-18338.patch.2, HIVE-18338.patch.3, HIVE-18338.patch.4
>
>
> This exposes async API in HiveStatement (jdbc module)
> The JDBC interface always have had strict synchronous APIs. 
> So the hive JDBC implementation also had to follow it though the hive server 
> is fully asynchronous.
> Developers trying to build proxies on top of hive servers end up writing 
> thrift client from scratch to make it asynchronous and robust to its restarts.
> The common pattern is
>  # Submit query, get operation handle and store in a persistent store
>  # Poll and wait for completion
>  # Stream results
>  # In the event of restarts, restore OperationHandle from persistent store 
> and continue execution.
> The patch does 2 things
>  * exposes operation handle (once a query is submitted) 
> {{getOperationhandle()}} 
> Developers can persist this along with the actual hive server url 
> {{getJdbcUrl}}
>  * latch APIs 
> Developers can create a statement and latch on to an operation handle that 
> was persisted earlier. For latch, the statement should be created from the 
> actual hive server URI connection in which the query was submitted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18338) [Client, JDBC] Expose async interface through hive JDBC.

2019-05-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844231#comment-16844231
 ] 

Hive QA commented on HIVE-18338:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
58s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
34s{color} | {color:blue} jdbc in master has 16 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
12s{color} | {color:red} jdbc: The patch generated 28 new + 7 unchanged - 0 
fixed = 35 total (was 7) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 14m 17s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-17260/dev-support/hive-personality.sh
 |
| git revision | master / 8b8e702 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-17260/yetus/diff-checkstyle-jdbc.txt
 |
| modules | C: jdbc U: jdbc |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-17260/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> [Client, JDBC] Expose async interface through hive JDBC.
> 
>
> Key: HIVE-18338
> URL: https://issues.apache.org/jira/browse/HIVE-18338
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, JDBC
>Affects Versions: 2.3.2
>Reporter: Amruth S
>Assignee: Amruth S
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-18338.patch, HIVE-18338.patch.1, 
> HIVE-18338.patch.2, HIVE-18338.patch.3, HIVE-18338.patch.4
>
>
> This exposes async API in HiveStatement (jdbc module)
> The JDBC interface always have had strict synchronous APIs. 
> So the hive JDBC implementation also had to follow it though the hive server 
> is fully asynchronous.
> Developers trying to build proxies on top of hive servers end up writing 
> thrift client from scratch to make it asynchronous and robust to its restarts.
> The common pattern is
>  # Submit query, get operation handle and store in a persistent store
>  # Poll and wait for completion
>  # Stream results
>  # In the event of restarts, restore OperationHandle from persistent store 
> and continue execution.
> The patch does 2 things
>  * exposes operation handle (once a query is submitted) 
> {{getOperationhandle()}} 
> Developers can persist this along with the actual hive server url 
> {{getJdbcUrl}}
>  * latch APIs 
> Developers can create a statement and latch on to an operation handle that 
> was 

[jira] [Commented] (HIVE-21715) Adding a new partition specified by location (which is empty) leads to Exceptions

2019-05-20 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844228#comment-16844228
 ] 

Ashutosh Chauhan commented on HIVE-21715:
-

+1

> Adding a new partition specified by location (which is empty) leads to 
> Exceptions
> -
>
> Key: HIVE-21715
> URL: https://issues.apache.org/jira/browse/HIVE-21715
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-21715.01.patch, HIVE-21715.01.patch, 
> HIVE-21715.02.patch, HIVE-21715.02.patch
>
>
> {code}
> create table supply (id int, part string, quantity int) partitioned by (day 
> int)
>stored as orc
>location 'hdfs:///tmp/a1'
>TBLPROPERTIES ('transactional'='true')
> ;
> alter table supply add partition (day=20110103) location 
>'hdfs:///tmp/a3';
> {code}
> check exception:
> {code}
> org.apache.hadoop.hive.ql.metadata.HiveException: Wrong file format. Please 
> check the file's format.
>   at 
> org.apache.hadoop.hive.ql.exec.MoveTask.checkFileFormats(MoveTask.java:696)
>   at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:370)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:210)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
> {code}
> If the format check is disabled; an exception happens from AcidUtils; because 
> during checking it doesn't expect it to be empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21683) ProxyFileSystem breaks with Hadoop trunk

2019-05-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844219#comment-16844219
 ] 

Hive QA commented on HIVE-21683:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12969187/hive-21683.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 16057 tests 
executed
*Failed tests:*
{noformat}
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerCustomCreatedDynamicPartitions
 (batchId=274)
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerCustomCreatedDynamicPartitionsUnionAll
 (batchId=274)
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerCustomNonExistent
 (batchId=274)
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerHighBytesRead 
(batchId=274)
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerHighShuffleBytes
 (batchId=274)
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerSlowQueryElapsedTime
 (batchId=274)
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerSlowQueryExecutionTime
 (batchId=274)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/17259/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/17259/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-17259/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12969187 - PreCommit-HIVE-Build

> ProxyFileSystem breaks with Hadoop trunk
> 
>
> Key: HIVE-21683
> URL: https://issues.apache.org/jira/browse/HIVE-21683
> Project: Hive
>  Issue Type: Bug
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Major
> Attachments: hive-21683-javassist.patch, hive-21683-simple.patch, 
> hive-21683.patch
>
>
> When trying to run with a recent build of Hadoop which includes HADOOP-15229 
> I ran into the following stack:
> {code}
> Caused by: java.lang.IllegalArgumentException: Wrong FS: 
> pfile:/src/hive/itests/qtest/target/warehouse/src/kv1.txt, expected: file:///
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:793) 
> ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:86)
>  ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:636)
>  ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930)
>  ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
>  ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:456)
>  ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:153)
>  ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:354) 
> ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.lambda$openFileWithOptions$0(ChecksumFileSystem.java:846)
>  ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at org.apache.hadoop.util.LambdaUtils.eval(LambdaUtils.java:52) 
> ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.openFileWithOptions(ChecksumFileSystem.java:845)
>  ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.FileSystem$FSDataInputStreamBuilder.build(FileSystem.java:4522)
>  ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:115) 
> ~[hadoop-mapreduce-client-core-3.1.1.6.0.99.0-135.jar:?]{code}
> We need to add appropriate path-swizzling wrappers for the new APIs in 
> ProxyFileSystem23



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21683) ProxyFileSystem breaks with Hadoop trunk

2019-05-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844205#comment-16844205
 ] 

Hive QA commented on HIVE-21683:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
58s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
30s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
53s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
18s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
20s{color} | {color:blue} shims/common in master has 6 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
22s{color} | {color:blue} shims/0.23 in master has 7 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
25s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m  
6s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
10s{color} | {color:red} shims/0.23: The patch generated 4 new + 65 unchanged - 
0 fixed = 69 total (was 65) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  2m  
7s{color} | {color:red} root: The patch generated 4 new + 79 unchanged - 0 
fixed = 83 total (was 79) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
27s{color} | {color:green} common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
30s{color} | {color:green} shims/0.23 generated 0 new + 6 unchanged - 1 fixed = 
6 total (was 7) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
10s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 53m 41s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  xml  compile  findbugs  
checkstyle  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-17259/dev-support/hive-personality.sh
 |
| git revision | master / 8b8e702 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-17259/yetus/diff-checkstyle-shims_0.23.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-17259/yetus/diff-checkstyle-root.txt
 |
| modules | C: shims/common shims/0.23 . U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-17259/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> ProxyFileSystem breaks with Hadoop trunk
> 
>
> Key: HIVE-21683
> URL: https://issues.apache.org/jira/browse/HIVE-21683
> Project: Hive
>  Issue Type: Bug
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Major
> Attachments: hive-21683-javassist.patch, hive-21683-simple.patch, 

[jira] [Work logged] (HIVE-21740) Collect LLAP execution latency metrics

2019-05-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21740?focusedWorklogId=245426=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-245426
 ]

ASF GitHub Bot logged work on HIVE-21740:
-

Author: ASF GitHub Bot
Created on: 20/May/19 18:34
Start Date: 20/May/19 18:34
Worklog Time Spent: 10m 
  Work Description: odraese commented on pull request #633: HIVE-21740: 
Collect LLAP execution latency metrics
URL: https://github.com/apache/hive/pull/633#discussion_r285701027
 
 

 ##
 File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
 ##
 @@ -4438,6 +4438,10 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 LLAP_COLLECT_LOCK_METRICS("hive.llap.lockmetrics.collect", false,
 "Whether lock metrics (wait times, counts) are collected for LLAP "
 + "related locks"),
+LLAP_DECAY_METRIC_SIZE("hive.llap.metrics.decay.size", 1028,
 
 Review comment:
   We need to test/verify is 1028 samples delivers us a stable enough average. 
Also, why this 1028?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 245426)
Time Spent: 0.5h  (was: 20m)

> Collect LLAP execution latency metrics
> --
>
> Key: HIVE-21740
> URL: https://issues.apache.org/jira/browse/HIVE-21740
> Project: Hive
>  Issue Type: New Feature
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21740.2.patch, HIVE-21740.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Collect metrics for LLAP task execution times



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21740) Collect LLAP execution latency metrics

2019-05-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21740?focusedWorklogId=245431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-245431
 ]

ASF GitHub Bot logged work on HIVE-21740:
-

Author: ASF GitHub Bot
Created on: 20/May/19 18:34
Start Date: 20/May/19 18:34
Worklog Time Spent: 10m 
  Work Description: odraese commented on pull request #633: HIVE-21740: 
Collect LLAP execution latency metrics
URL: https://github.com/apache/hive/pull/633#discussion_r285709384
 
 

 ##
 File path: 
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
 ##
 @@ -3154,4 +3165,28 @@ public void taskInfoUpdated(TezTaskAttemptID attemptId, 
boolean isGuaranteed) {
 + attemptId + ", " + newState);
 sendUpdateMessageAsync(ti, newState);
   }
+
+  private void updateMetrics(TaskAttemptImpl taskAttempt) {
+// Only do it for map tasks
+if (!isMapTask(taskAttempt)) {
+  return;
+}
+// Check if this task was already assigned to a node
+NodeInfo nodeInfo = knownTasks.get(taskAttempt).assignedNode;
+if (nodeInfo == null) {
+  return;
+}
+
+metrics.addTaskLatency(nodeInfo.shortStringBase, 
taskAttempt.getFinishTime() - taskAttempt.getLaunchTime());
+  }
+
+  private boolean isMapTask(TaskAttemptImpl taskAttempt) {
+boolean isMapTask = false;
+for(TezCounter counter : taskAttempt.getCounters().getGroup("HIVE")) {
 
 Review comment:
   I am sure, there are more efficient ways to figure out, if this is a Map 
task. The CounterGroupBase for example has a findCounter, therefore iterating 
all counters of a group to find a specific one, is not necessary. Maybe, 
instead of looking at the counters, there are other indicators (i.e. at the 
attached Vertex), giving away if this a Map task.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 245431)

> Collect LLAP execution latency metrics
> --
>
> Key: HIVE-21740
> URL: https://issues.apache.org/jira/browse/HIVE-21740
> Project: Hive
>  Issue Type: New Feature
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21740.2.patch, HIVE-21740.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Collect metrics for LLAP task execution times



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21740) Collect LLAP execution latency metrics

2019-05-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21740?focusedWorklogId=245433=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-245433
 ]

ASF GitHub Bot logged work on HIVE-21740:
-

Author: ASF GitHub Bot
Created on: 20/May/19 18:34
Start Date: 20/May/19 18:34
Worklog Time Spent: 10m 
  Work Description: odraese commented on pull request #633: HIVE-21740: 
Collect LLAP execution latency metrics
URL: https://github.com/apache/hive/pull/633#discussion_r285714898
 
 

 ##
 File path: 
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/metrics/LlapTaskSchedulerMetrics.java
 ##
 @@ -276,6 +302,43 @@ private void getTaskSchedulerStats(MetricsRecordBuilder 
rb) {
 .addCounter(SchedulerPendingPreemptionTaskCount, 
pendingPreemptionTasksCount.value())
 .addCounter(SchedulerPreemptedTaskCount, preemptedTasksCount.value())
 .addCounter(SchedulerCompletedDagCount, completedDagcount.value());
+daemonTaskLatency.forEach((k, v) -> rb.addGauge(v, v.getMean()));
+  }
+
+  static class DaemonLatencyMetric implements MetricsInfo {
+private String name;
+private ExponentiallyDecayingReservoir reservoir;
 
 Review comment:
   What is the benefit of using am exponential decay here, vs. simple sliding 
window (i.e. guava stats)? Specifically with the low amount of entries (right 
now 1028), a few more recent task execution times can have a huge impact on 
what we believe is the node's average over a longer period of time. The whole 
idea of using the average here is to smooth out the task execution times across 
a larger amount of tasks. I have the fear, that the exponential decay is 
contradicting this by giving few, very recent values, a high impact.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 245433)
Time Spent: 1.5h  (was: 1h 20m)

> Collect LLAP execution latency metrics
> --
>
> Key: HIVE-21740
> URL: https://issues.apache.org/jira/browse/HIVE-21740
> Project: Hive
>  Issue Type: New Feature
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21740.2.patch, HIVE-21740.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Collect metrics for LLAP task execution times



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21740) Collect LLAP execution latency metrics

2019-05-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21740?focusedWorklogId=245425=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-245425
 ]

ASF GitHub Bot logged work on HIVE-21740:
-

Author: ASF GitHub Bot
Created on: 20/May/19 18:34
Start Date: 20/May/19 18:34
Worklog Time Spent: 10m 
  Work Description: odraese commented on pull request #633: HIVE-21740: 
Collect LLAP execution latency metrics
URL: https://github.com/apache/hive/pull/633#discussion_r285701292
 
 

 ##
 File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
 ##
 @@ -4438,6 +4438,10 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 LLAP_COLLECT_LOCK_METRICS("hive.llap.lockmetrics.collect", false,
 "Whether lock metrics (wait times, counts) are collected for LLAP "
 + "related locks"),
+LLAP_DECAY_METRIC_SIZE("hive.llap.metrics.decay.size", 1028,
+"The number of samples to keep in the sampling reservoir"),
+LLAP_DECAY_METRIC_ALPHA("hive.llap.metrics.decay.alpha", 0.015f,
+"Exponential decay factor; higher is more biased towards newer 
values"),
 
 Review comment:
   Listing valid ranges for alpha would help.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 245425)
Time Spent: 20m  (was: 10m)

> Collect LLAP execution latency metrics
> --
>
> Key: HIVE-21740
> URL: https://issues.apache.org/jira/browse/HIVE-21740
> Project: Hive
>  Issue Type: New Feature
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21740.2.patch, HIVE-21740.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Collect metrics for LLAP task execution times



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21740) Collect LLAP execution latency metrics

2019-05-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21740?focusedWorklogId=245432=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-245432
 ]

ASF GitHub Bot logged work on HIVE-21740:
-

Author: ASF GitHub Bot
Created on: 20/May/19 18:34
Start Date: 20/May/19 18:34
Worklog Time Spent: 10m 
  Work Description: odraese commented on pull request #633: HIVE-21740: 
Collect LLAP execution latency metrics
URL: https://github.com/apache/hive/pull/633#discussion_r285712289
 
 

 ##
 File path: 
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/metrics/LlapTaskSchedulerMetrics.java
 ##
 @@ -254,6 +272,14 @@ public void setWmUnusedGuaranteed(int unusedGuaranteed) {
 wmUnusedGuaranteedCount.set(unusedGuaranteed);
   }
 
+  public void addTaskLatency(String key, long value) {
 
 Review comment:
   Rename key to daemonID and/or add comment, describing the parameters.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 245432)
Time Spent: 1h 20m  (was: 1h 10m)

> Collect LLAP execution latency metrics
> --
>
> Key: HIVE-21740
> URL: https://issues.apache.org/jira/browse/HIVE-21740
> Project: Hive
>  Issue Type: New Feature
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21740.2.patch, HIVE-21740.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Collect metrics for LLAP task execution times



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21740) Collect LLAP execution latency metrics

2019-05-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21740?focusedWorklogId=245427=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-245427
 ]

ASF GitHub Bot logged work on HIVE-21740:
-

Author: ASF GitHub Bot
Created on: 20/May/19 18:34
Start Date: 20/May/19 18:34
Worklog Time Spent: 10m 
  Work Description: odraese commented on pull request #633: HIVE-21740: 
Collect LLAP execution latency metrics
URL: https://github.com/apache/hive/pull/633#discussion_r285704254
 
 

 ##
 File path: 
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
 ##
 @@ -426,9 +428,12 @@ public LlapTaskSchedulerService(TaskSchedulerContext 
taskSchedulerContext, Clock
   this.pauseMonitor = new JvmPauseMonitor(conf);
   pauseMonitor.start();
   String displayName = "LlapTaskSchedulerMetrics-" + 
MetricsUtils.getHostName();
+  int decayMetricSampleSize = HiveConf.getIntVar(conf, 
ConfVars.LLAP_DECAY_METRIC_SIZE);
+  double decayMetricAlphaFactor = (double) HiveConf.getFloatVar(conf, 
ConfVars.LLAP_DECAY_METRIC_ALPHA);
 
 Review comment:
   Suggestion:
   Could be replaced with 
conf.getDouble(ConfVars.LLAP_DECAY_METRIC_ALPHA.varname);
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 245427)
Time Spent: 40m  (was: 0.5h)

> Collect LLAP execution latency metrics
> --
>
> Key: HIVE-21740
> URL: https://issues.apache.org/jira/browse/HIVE-21740
> Project: Hive
>  Issue Type: New Feature
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21740.2.patch, HIVE-21740.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Collect metrics for LLAP task execution times



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21740) Collect LLAP execution latency metrics

2019-05-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21740?focusedWorklogId=245428=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-245428
 ]

ASF GitHub Bot logged work on HIVE-21740:
-

Author: ASF GitHub Bot
Created on: 20/May/19 18:34
Start Date: 20/May/19 18:34
Worklog Time Spent: 10m 
  Work Description: odraese commented on pull request #633: HIVE-21740: 
Collect LLAP execution latency metrics
URL: https://github.com/apache/hive/pull/633#discussion_r285709679
 
 

 ##
 File path: 
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
 ##
 @@ -3154,4 +3165,28 @@ public void taskInfoUpdated(TezTaskAttemptID attemptId, 
boolean isGuaranteed) {
 + attemptId + ", " + newState);
 sendUpdateMessageAsync(ti, newState);
   }
+
+  private void updateMetrics(TaskAttemptImpl taskAttempt) {
+// Only do it for map tasks
+if (!isMapTask(taskAttempt)) {
+  return;
+}
+// Check if this task was already assigned to a node
+NodeInfo nodeInfo = knownTasks.get(taskAttempt).assignedNode;
+if (nodeInfo == null) {
+  return;
+}
+
+metrics.addTaskLatency(nodeInfo.shortStringBase, 
taskAttempt.getFinishTime() - taskAttempt.getLaunchTime());
+  }
+
+  private boolean isMapTask(TaskAttemptImpl taskAttempt) {
+boolean isMapTask = false;
+for(TezCounter counter : taskAttempt.getCounters().getGroup("HIVE")) {
+  if(counter.getName().startsWith("RECORDS_IN_Map")) {
+isMapTask = true;
 
 Review comment:
   and if you prefer to iterate the group, you should break here instead of 
comparing all counters to the end.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 245428)
Time Spent: 50m  (was: 40m)

> Collect LLAP execution latency metrics
> --
>
> Key: HIVE-21740
> URL: https://issues.apache.org/jira/browse/HIVE-21740
> Project: Hive
>  Issue Type: New Feature
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21740.2.patch, HIVE-21740.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Collect metrics for LLAP task execution times



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21740) Collect LLAP execution latency metrics

2019-05-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21740?focusedWorklogId=245430=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-245430
 ]

ASF GitHub Bot logged work on HIVE-21740:
-

Author: ASF GitHub Bot
Created on: 20/May/19 18:34
Start Date: 20/May/19 18:34
Worklog Time Spent: 10m 
  Work Description: odraese commented on pull request #633: HIVE-21740: 
Collect LLAP execution latency metrics
URL: https://github.com/apache/hive/pull/633#discussion_r285711920
 
 

 ##
 File path: 
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/metrics/LlapTaskSchedulerMetrics.java
 ##
 @@ -55,6 +61,9 @@
   private final JvmMetrics jvmMetrics;
   private final String sessionId;
   private final MetricsRegistry registry;
+  private final int decayMetricSampleSize;
+  private final double decayMetricAlphaFactor;
+  private Map daemonTaskLatency = new 
ConcurrentHashMap<>();
 
 Review comment:
   a comment, telling what the key is, would help.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 245430)
Time Spent: 1h 10m  (was: 1h)

> Collect LLAP execution latency metrics
> --
>
> Key: HIVE-21740
> URL: https://issues.apache.org/jira/browse/HIVE-21740
> Project: Hive
>  Issue Type: New Feature
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21740.2.patch, HIVE-21740.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Collect metrics for LLAP task execution times



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21740) Collect LLAP execution latency metrics

2019-05-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21740?focusedWorklogId=245429=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-245429
 ]

ASF GitHub Bot logged work on HIVE-21740:
-

Author: ASF GitHub Bot
Created on: 20/May/19 18:34
Start Date: 20/May/19 18:34
Worklog Time Spent: 10m 
  Work Description: odraese commented on pull request #633: HIVE-21740: 
Collect LLAP execution latency metrics
URL: https://github.com/apache/hive/pull/633#discussion_r285713028
 
 

 ##
 File path: 
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/metrics/LlapTaskSchedulerMetrics.java
 ##
 @@ -276,6 +302,43 @@ private void getTaskSchedulerStats(MetricsRecordBuilder 
rb) {
 .addCounter(SchedulerPendingPreemptionTaskCount, 
pendingPreemptionTasksCount.value())
 .addCounter(SchedulerPreemptedTaskCount, preemptedTasksCount.value())
 .addCounter(SchedulerCompletedDagCount, completedDagcount.value());
+daemonTaskLatency.forEach((k, v) -> rb.addGauge(v, v.getMean()));
 
 Review comment:
   Not sure that I understand the purpose of this line (storing the 
DaemonLatencyMetric as an own gauge upon reset)...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 245429)
Time Spent: 1h  (was: 50m)

> Collect LLAP execution latency metrics
> --
>
> Key: HIVE-21740
> URL: https://issues.apache.org/jira/browse/HIVE-21740
> Project: Hive
>  Issue Type: New Feature
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21740.2.patch, HIVE-21740.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Collect metrics for LLAP task execution times



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21646) Tez: Prevent TezTasks from escaping thread logging context

2019-05-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844150#comment-16844150
 ] 

Hive QA commented on HIVE-21646:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12969179/HIVE-21646.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 16057 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.exec.tez.TestDynamicPartitionPruner.testSingleSourceMultipleFiltersOrdering1
 (batchId=330)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites
 (batchId=246)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/17258/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/17258/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-17258/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12969179 - PreCommit-HIVE-Build

> Tez: Prevent TezTasks from escaping thread logging context
> --
>
> Key: HIVE-21646
> URL: https://issues.apache.org/jira/browse/HIVE-21646
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-21646.1.patch, HIVE-21646.1.patch
>
>
> If hive.exec.parallel is set to true to parallelize MoveTasks or StatsTasks, 
> the Tez task does not benefit from a new thread and will lose all the thread 
> context of the current query.
> Multiple threads even if they are spawned, will lock on SyncDagClient & make 
> progress sequentially.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18338) [Client, JDBC] Expose async interface through hive JDBC.

2019-05-20 Thread Amruth S (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amruth S updated HIVE-18338:

Attachment: HIVE-18338.patch.4

> [Client, JDBC] Expose async interface through hive JDBC.
> 
>
> Key: HIVE-18338
> URL: https://issues.apache.org/jira/browse/HIVE-18338
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, JDBC
>Affects Versions: 2.3.2
>Reporter: Amruth S
>Assignee: Amruth S
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-18338.patch, HIVE-18338.patch.1, 
> HIVE-18338.patch.2, HIVE-18338.patch.3, HIVE-18338.patch.4
>
>
> This exposes async API in HiveStatement (jdbc module)
> The JDBC interface always have had strict synchronous APIs. 
> So the hive JDBC implementation also had to follow it though the hive server 
> is fully asynchronous.
> Developers trying to build proxies on top of hive servers end up writing 
> thrift client from scratch to make it asynchronous and robust to its restarts.
> The common pattern is
>  # Submit query, get operation handle and store in a persistent store
>  # Poll and wait for completion
>  # Stream results
>  # In the event of restarts, restore OperationHandle from persistent store 
> and continue execution.
> The patch does 2 things
>  * exposes operation handle (once a query is submitted) 
> {{getOperationhandle()}} 
> Developers can persist this along with the actual hive server url 
> {{getJdbcUrl}}
>  * latch APIs 
> Developers can create a statement and latch on to an operation handle that 
> was persisted earlier. For latch, the statement should be created from the 
> actual hive server URI connection in which the query was submitted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18338) [Client, JDBC] Expose async interface through hive JDBC.

2019-05-20 Thread Amruth S (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amruth S updated HIVE-18338:

Description: 
This exposes async API in HiveStatement (jdbc module)

The JDBC interface always have had strict synchronous APIs. 
So the hive JDBC implementation also had to follow it though the hive server is 
fully asynchronous.

Developers trying to build proxies on top of hive servers end up writing thrift 
client from scratch to make it asynchronous and robust to its restarts.
The common pattern is
 # Submit query, get operation handle and store in a persistent store
 # Poll and wait for completion
 # Stream results
 # In the event of restarts, restore OperationHandle from persistent store and 
continue execution.

The patch does 2 things
 * exposes operation handle (once a query is submitted) 
{{getOperationhandle()}} 
Developers can persist this along with the actual hive server url {{getJdbcUrl}}
 * latch APIs 
Developers can create a statement and latch on to an operation handle that was 
persisted earlier. For latch, the statement should be created from the actual 
hive server URI connection in which the query was submitted.

  was:
Lot of users are struggling and rewriting a lot of boiler plate over thrift to 
get pure asynchronous capability. 

The idea is to expose operation handle, so that clients can persist it and 
later can latch on to the same execution.

*Problem statement*

Hive JDBC currently exposes 2 methods related to asynchronous execution
*executeAsync()* - to trigger a query execution and return immediately.
*waitForOperationToComplete()* - which waits till the current execution is 
complete *blocking the user thread*.

This has one problem

If the client process goes down, there is no way to resume queries although 
hive server is completely asynchronous.
*Proposal*

If operation handle could be exposed, we can latch on to an active execution of 
a query.

*Code changes*

Operation handle is exposed. So client can keep a copy.
latchSync() and latchAsync() methods take an operation handle and try to latch 
on to the current execution in hive server if present


> [Client, JDBC] Expose async interface through hive JDBC.
> 
>
> Key: HIVE-18338
> URL: https://issues.apache.org/jira/browse/HIVE-18338
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, JDBC
>Affects Versions: 2.3.2
>Reporter: Amruth S
>Assignee: Amruth S
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-18338.patch, HIVE-18338.patch.1, 
> HIVE-18338.patch.2, HIVE-18338.patch.3
>
>
> This exposes async API in HiveStatement (jdbc module)
> The JDBC interface always have had strict synchronous APIs. 
> So the hive JDBC implementation also had to follow it though the hive server 
> is fully asynchronous.
> Developers trying to build proxies on top of hive servers end up writing 
> thrift client from scratch to make it asynchronous and robust to its restarts.
> The common pattern is
>  # Submit query, get operation handle and store in a persistent store
>  # Poll and wait for completion
>  # Stream results
>  # In the event of restarts, restore OperationHandle from persistent store 
> and continue execution.
> The patch does 2 things
>  * exposes operation handle (once a query is submitted) 
> {{getOperationhandle()}} 
> Developers can persist this along with the actual hive server url 
> {{getJdbcUrl}}
>  * latch APIs 
> Developers can create a statement and latch on to an operation handle that 
> was persisted earlier. For latch, the statement should be created from the 
> actual hive server URI connection in which the query was submitted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21682) Concurrent queries in tez local mode fail

2019-05-20 Thread Todd Lipcon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned HIVE-21682:
--

Assignee: (was: Todd Lipcon)

I looked into this for a while and after hacking on it for a few days I 
couldn't get things stable (after fixing the IOContext and ObjectRegistry 
issues, I hit issues in that Tez and Hive assume they can write things into the 
current working directory (and different threads in the same process obviously 
share the same working directory). Chasing down all of those cases seemed like 
too much of a pain, so i just moved to a pseudo-distributed YARN cluster for 
our use case.

I think for this to work properly we'd need to make a Tez Local Mode which 
actually forks out separate JVMs for each Tez child instead of using threads.

> Concurrent queries in tez local mode fail
> -
>
> Key: HIVE-21682
> URL: https://issues.apache.org/jira/browse/HIVE-21682
> Project: Hive
>  Issue Type: Bug
>Reporter: Todd Lipcon
>Priority: Major
>
> As noted in TEZ-3420, Hive running with Tez local mode breaks if multiple 
> queries are submitted concurrently. As I noted 
> [there|https://issues.apache.org/jira/browse/TEZ-3420?focusedCommentId=16831937=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16831937]
>  it seems part of the problem is Hive's use of static global state for 
> IOContext in the case of Tez. Another issue is the use of a JVM-wide 
> ObjectRegistry



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21683) ProxyFileSystem breaks with Hadoop trunk

2019-05-20 Thread Todd Lipcon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HIVE-21683:
---
Attachment: hive-21683.patch

> ProxyFileSystem breaks with Hadoop trunk
> 
>
> Key: HIVE-21683
> URL: https://issues.apache.org/jira/browse/HIVE-21683
> Project: Hive
>  Issue Type: Bug
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Major
> Attachments: hive-21683-javassist.patch, hive-21683-simple.patch, 
> hive-21683.patch
>
>
> When trying to run with a recent build of Hadoop which includes HADOOP-15229 
> I ran into the following stack:
> {code}
> Caused by: java.lang.IllegalArgumentException: Wrong FS: 
> pfile:/src/hive/itests/qtest/target/warehouse/src/kv1.txt, expected: file:///
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:793) 
> ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:86)
>  ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:636)
>  ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930)
>  ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
>  ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:456)
>  ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:153)
>  ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:354) 
> ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.lambda$openFileWithOptions$0(ChecksumFileSystem.java:846)
>  ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at org.apache.hadoop.util.LambdaUtils.eval(LambdaUtils.java:52) 
> ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.openFileWithOptions(ChecksumFileSystem.java:845)
>  ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.FileSystem$FSDataInputStreamBuilder.build(FileSystem.java:4522)
>  ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:115) 
> ~[hadoop-mapreduce-client-core-3.1.1.6.0.99.0-135.jar:?]{code}
> We need to add appropriate path-swizzling wrappers for the new APIs in 
> ProxyFileSystem23



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21683) ProxyFileSystem breaks with Hadoop trunk

2019-05-20 Thread Todd Lipcon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HIVE-21683:
---
Status: Patch Available  (was: Open)

> ProxyFileSystem breaks with Hadoop trunk
> 
>
> Key: HIVE-21683
> URL: https://issues.apache.org/jira/browse/HIVE-21683
> Project: Hive
>  Issue Type: Bug
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Major
> Attachments: hive-21683-javassist.patch, hive-21683-simple.patch, 
> hive-21683.patch
>
>
> When trying to run with a recent build of Hadoop which includes HADOOP-15229 
> I ran into the following stack:
> {code}
> Caused by: java.lang.IllegalArgumentException: Wrong FS: 
> pfile:/src/hive/itests/qtest/target/warehouse/src/kv1.txt, expected: file:///
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:793) 
> ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:86)
>  ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:636)
>  ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930)
>  ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
>  ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:456)
>  ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:153)
>  ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:354) 
> ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.lambda$openFileWithOptions$0(ChecksumFileSystem.java:846)
>  ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at org.apache.hadoop.util.LambdaUtils.eval(LambdaUtils.java:52) 
> ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.openFileWithOptions(ChecksumFileSystem.java:845)
>  ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.fs.FileSystem$FSDataInputStreamBuilder.build(FileSystem.java:4522)
>  ~[hadoop-common-3.1.1.6.0.99.0-135.jar:?]
> at 
> org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:115) 
> ~[hadoop-mapreduce-client-core-3.1.1.6.0.99.0-135.jar:?]{code}
> We need to add appropriate path-swizzling wrappers for the new APIs in 
> ProxyFileSystem23



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21732) Configurable injection of load for LLAP task execution

2019-05-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21732?focusedWorklogId=245333=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-245333
 ]

ASF GitHub Bot logged work on HIVE-21732:
-

Author: ASF GitHub Bot
Created on: 20/May/19 17:01
Start Date: 20/May/19 17:01
Worklog Time Spent: 10m 
  Work Description: odraese commented on pull request #634: HIVE-21732: 
Configurable injection of latency for LLAP task execution
URL: https://github.com/apache/hive/pull/634#discussion_r285679383
 
 

 ##
 File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapLoadGeneratorService.java
 ##
 @@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.llap.daemon.impl;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.service.AbstractService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.net.InetAddress;
+import java.net.UnknownHostException;
+import java.util.Random;
+import java.util.concurrent.TimeUnit;
+
+/**
+ * Extra load generator service for LLAP.
+ */
+public class LlapLoadGeneratorService extends AbstractService {
+  private static final Logger LOG = 
LoggerFactory.getLogger(LlapLoadGeneratorService.class);
+  private long interval;
+  private float threshold;
+  private String[] victimsHostName;
+  private Thread[] threads;
+
+  public LlapLoadGeneratorService() {
+super("LlapLoadGeneratorService");
+  }
+
+  @Override
+  protected void serviceInit(Configuration conf) throws Exception {
+super.serviceInit(conf);
+threshold = HiveConf.getFloatVar(conf, 
HiveConf.ConfVars.HIVE_TEST_LOAD_UTILIZATION);
+victimsHostName = HiveConf.getTrimmedStringsVar(conf, 
HiveConf.ConfVars.HIVE_TEST_LOAD_HOSTNAMES);
+interval = HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.HIVE_TEST_LOAD_INTERVAL, TimeUnit.MILLISECONDS);
+LOG.info("LlapLoadGeneratorService init with {} {} {}", interval, 
threshold, victimsHostName);
 
 Review comment:
   Should we add some basic validity/range checks here? I.e. that threshold is 
between 0 and 1? Also that the host names are actually valid?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 245333)
Time Spent: 4h 10m  (was: 4h)

> Configurable injection of load for LLAP task execution
> --
>
> Key: HIVE-21732
> URL: https://issues.apache.org/jira/browse/HIVE-21732
> Project: Hive
>  Issue Type: Test
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21732.2.patch, HIVE-21732.3.patch, 
> HIVE-21732.4.patch, HIVE-21732.5.patch, HIVE-21732.6.patch, HIVE-21732.patch
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> For evaluating testing, it would be good to have a configurable way to inject 
> latency for LLAP tasks.
> The configuration should be able to control how much latency is injected into 
> each daemon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21732) Configurable injection of load for LLAP task execution

2019-05-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21732?focusedWorklogId=245332=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-245332
 ]

ASF GitHub Bot logged work on HIVE-21732:
-

Author: ASF GitHub Bot
Created on: 20/May/19 17:01
Start Date: 20/May/19 17:01
Worklog Time Spent: 10m 
  Work Description: odraese commented on pull request #634: HIVE-21732: 
Configurable injection of latency for LLAP task execution
URL: https://github.com/apache/hive/pull/634#discussion_r285685514
 
 

 ##
 File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapLoadGeneratorService.java
 ##
 @@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.llap.daemon.impl;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.service.AbstractService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.net.InetAddress;
+import java.net.UnknownHostException;
+import java.util.Random;
+import java.util.concurrent.TimeUnit;
+
+/**
+ * Extra load generator service for LLAP.
+ */
+public class LlapLoadGeneratorService extends AbstractService {
+  private static final Logger LOG = 
LoggerFactory.getLogger(LlapLoadGeneratorService.class);
+  private long interval;
+  private float threshold;
+  private String[] victimsHostName;
+  private Thread[] threads;
+
+  public LlapLoadGeneratorService() {
+super("LlapLoadGeneratorService");
+  }
+
+  @Override
+  protected void serviceInit(Configuration conf) throws Exception {
+super.serviceInit(conf);
+threshold = HiveConf.getFloatVar(conf, 
HiveConf.ConfVars.HIVE_TEST_LOAD_UTILIZATION);
+victimsHostName = HiveConf.getTrimmedStringsVar(conf, 
HiveConf.ConfVars.HIVE_TEST_LOAD_HOSTNAMES);
+interval = HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.HIVE_TEST_LOAD_INTERVAL, TimeUnit.MILLISECONDS);
+LOG.info("LlapLoadGeneratorService init with {} {} {}", interval, 
threshold, victimsHostName);
+  }
+
+  @Override
+  protected void serviceStart() throws UnknownHostException {
+String localHostName = InetAddress.getLocalHost().getHostName();
+LOG.debug("Local hostname is: {}", localHostName);
+for (String hostName : victimsHostName) {
+  if (hostName.equalsIgnoreCase(localHostName)) {
+LOG.debug("Starting load generator process on: {}", localHostName);
+threads = new Thread[Runtime.getRuntime().availableProcessors()];
+Random random = new Random();
+for (int i = 0; i < threads.length; i++) {
+  threads[i] = new Thread(new Runnable() {
+@Override
+public void run() {
+  while (!Thread.interrupted()) {
+if (random.nextFloat() <= threshold) {
+  // Keep it busy
+  long startTime = System.currentTimeMillis();
+  while (System.currentTimeMillis() - startTime < interval) {
 
 Review comment:
   BTW: We could also check for interrupted here. Otherwise, we won't interrupt 
the (potentially larger) interval when serviceStop is called.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 245332)
Time Spent: 4h  (was: 3h 50m)

> Configurable injection of load for LLAP task execution
> --
>
> Key: HIVE-21732
> URL: https://issues.apache.org/jira/browse/HIVE-21732
> Project: Hive
>  Issue Type: Test
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21732.2.patch, HIVE-21732.3.patch, 
> HIVE-21732.4.patch, HIVE-21732.5.patch, HIVE-21732.6.patch, 

[jira] [Work logged] (HIVE-21732) Configurable injection of load for LLAP task execution

2019-05-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21732?focusedWorklogId=245334=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-245334
 ]

ASF GitHub Bot logged work on HIVE-21732:
-

Author: ASF GitHub Bot
Created on: 20/May/19 17:01
Start Date: 20/May/19 17:01
Worklog Time Spent: 10m 
  Work Description: odraese commented on pull request #634: HIVE-21732: 
Configurable injection of latency for LLAP task execution
URL: https://github.com/apache/hive/pull/634#discussion_r285686183
 
 

 ##
 File path: 
llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TestLlapLoadGeneratorService.java
 ##
 @@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.llap.daemon.impl;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.junit.Test;
+
+import java.net.InetAddress;
+import java.net.UnknownHostException;
+import java.util.concurrent.TimeUnit;
+
+/**
+ * Test to make sure that the LLAP nodes are able to start with the load 
generator.
+ */
+public class TestLlapLoadGeneratorService {
+  @Test
+  public void testLoadGenerator() throws InterruptedException, 
UnknownHostException {
 
 Review comment:
   This test isn't testing anything but burning some CPU time w/o checking how 
much was really "wasted". If this is just used to debug the code (rather than 
really unit testing), it should have the @Ignore annotation.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 245334)

> Configurable injection of load for LLAP task execution
> --
>
> Key: HIVE-21732
> URL: https://issues.apache.org/jira/browse/HIVE-21732
> Project: Hive
>  Issue Type: Test
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21732.2.patch, HIVE-21732.3.patch, 
> HIVE-21732.4.patch, HIVE-21732.5.patch, HIVE-21732.6.patch, HIVE-21732.patch
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> For evaluating testing, it would be good to have a configurable way to inject 
> latency for LLAP tasks.
> The configuration should be able to control how much latency is injected into 
> each daemon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21732) Configurable injection of load for LLAP task execution

2019-05-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21732?focusedWorklogId=245335=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-245335
 ]

ASF GitHub Bot logged work on HIVE-21732:
-

Author: ASF GitHub Bot
Created on: 20/May/19 17:01
Start Date: 20/May/19 17:01
Worklog Time Spent: 10m 
  Work Description: odraese commented on pull request #634: HIVE-21732: 
Configurable injection of latency for LLAP task execution
URL: https://github.com/apache/hive/pull/634#discussion_r285685088
 
 

 ##
 File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapLoadGeneratorService.java
 ##
 @@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.llap.daemon.impl;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.service.AbstractService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.net.InetAddress;
+import java.net.UnknownHostException;
+import java.util.Random;
+import java.util.concurrent.TimeUnit;
+
+/**
+ * Extra load generator service for LLAP.
+ */
+public class LlapLoadGeneratorService extends AbstractService {
+  private static final Logger LOG = 
LoggerFactory.getLogger(LlapLoadGeneratorService.class);
+  private long interval;
+  private float threshold;
+  private String[] victimsHostName;
+  private Thread[] threads;
+
+  public LlapLoadGeneratorService() {
+super("LlapLoadGeneratorService");
+  }
+
+  @Override
+  protected void serviceInit(Configuration conf) throws Exception {
+super.serviceInit(conf);
+threshold = HiveConf.getFloatVar(conf, 
HiveConf.ConfVars.HIVE_TEST_LOAD_UTILIZATION);
+victimsHostName = HiveConf.getTrimmedStringsVar(conf, 
HiveConf.ConfVars.HIVE_TEST_LOAD_HOSTNAMES);
+interval = HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.HIVE_TEST_LOAD_INTERVAL, TimeUnit.MILLISECONDS);
+LOG.info("LlapLoadGeneratorService init with {} {} {}", interval, 
threshold, victimsHostName);
+  }
+
+  @Override
+  protected void serviceStart() throws UnknownHostException {
+String localHostName = InetAddress.getLocalHost().getHostName();
+LOG.debug("Local hostname is: {}", localHostName);
+for (String hostName : victimsHostName) {
+  if (hostName.equalsIgnoreCase(localHostName)) {
+LOG.debug("Starting load generator process on: {}", localHostName);
+threads = new Thread[Runtime.getRuntime().availableProcessors()];
+Random random = new Random();
+for (int i = 0; i < threads.length; i++) {
+  threads[i] = new Thread(new Runnable() {
+@Override
+public void run() {
+  while (!Thread.interrupted()) {
+if (random.nextFloat() <= threshold) {
+  // Keep it busy
+  long startTime = System.currentTimeMillis();
+  while (System.currentTimeMillis() - startTime < interval) {
 
 Review comment:
   Pretty much all time here is going to be spent in the kernel call 
(do_gettimeofday). Should we try to actually spend some time within the Java 
loop ourself? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 245335)
Time Spent: 4h 20m  (was: 4h 10m)

> Configurable injection of load for LLAP task execution
> --
>
> Key: HIVE-21732
> URL: https://issues.apache.org/jira/browse/HIVE-21732
> Project: Hive
>  Issue Type: Test
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21732.2.patch, HIVE-21732.3.patch, 
> HIVE-21732.4.patch, HIVE-21732.5.patch, 

[jira] [Work logged] (HIVE-21732) Configurable injection of load for LLAP task execution

2019-05-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21732?focusedWorklogId=245331=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-245331
 ]

ASF GitHub Bot logged work on HIVE-21732:
-

Author: ASF GitHub Bot
Created on: 20/May/19 17:01
Start Date: 20/May/19 17:01
Worklog Time Spent: 10m 
  Work Description: odraese commented on pull request #634: HIVE-21732: 
Configurable injection of latency for LLAP task execution
URL: https://github.com/apache/hive/pull/634#discussion_r285677877
 
 

 ##
 File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
 ##
 @@ -627,6 +627,15 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "internal usage only, used only in test mode. If set false, the 
operation logs, and the " +
 "operation log directory will not be removed, so they can be found 
after the test runs."),
 
+HIVE_TEST_LOAD_HOSTNAMES("hive.test.load.hostnames", "",
+"Specify host names for load testing. (e.g., \"host1,host2,host3\"). 
Leave it empty if no " +
+"load generation is needed (eg. for production)."),
+HIVE_TEST_LOAD_INTERVAL("hive.test.load.interval", "10ms", new 
TimeValidator(TimeUnit.MILLISECONDS),
+"The interval length used for load generation in milliseconds"),
 
 Review comment:
   I think, this description could add a little detail here. What happens per 
interval? Is this 10ms of load and then (how long) no load. Is this a complete 
iterations and does a factor of 0.2 mean that 2ms are LOAD and 8ms are idle?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 245331)
Time Spent: 3h 50m  (was: 3h 40m)

> Configurable injection of load for LLAP task execution
> --
>
> Key: HIVE-21732
> URL: https://issues.apache.org/jira/browse/HIVE-21732
> Project: Hive
>  Issue Type: Test
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21732.2.patch, HIVE-21732.3.patch, 
> HIVE-21732.4.patch, HIVE-21732.5.patch, HIVE-21732.6.patch, HIVE-21732.patch
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> For evaluating testing, it would be good to have a configurable way to inject 
> latency for LLAP tasks.
> The configuration should be able to control how much latency is injected into 
> each daemon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21646) Tez: Prevent TezTasks from escaping thread logging context

2019-05-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844106#comment-16844106
 ] 

Hive QA commented on HIVE-21646:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
21s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
23s{color} | {color:blue} ql in master has 2258 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 26m 26s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-17258/dev-support/hive-personality.sh
 |
| git revision | master / 8b8e702 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-17258/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Tez: Prevent TezTasks from escaping thread logging context
> --
>
> Key: HIVE-21646
> URL: https://issues.apache.org/jira/browse/HIVE-21646
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-21646.1.patch, HIVE-21646.1.patch
>
>
> If hive.exec.parallel is set to true to parallelize MoveTasks or StatsTasks, 
> the Tez task does not benefit from a new thread and will lose all the thread 
> context of the current query.
> Multiple threads even if they are spawned, will lock on SyncDagClient & make 
> progress sequentially.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21646) Tez: Prevent TezTasks from escaping thread logging context

2019-05-20 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844081#comment-16844081
 ] 

Ashutosh Chauhan commented on HIVE-21646:
-

+1

> Tez: Prevent TezTasks from escaping thread logging context
> --
>
> Key: HIVE-21646
> URL: https://issues.apache.org/jira/browse/HIVE-21646
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-21646.1.patch, HIVE-21646.1.patch
>
>
> If hive.exec.parallel is set to true to parallelize MoveTasks or StatsTasks, 
> the Tez task does not benefit from a new thread and will lose all the thread 
> context of the current query.
> Multiple threads even if they are spawned, will lock on SyncDagClient & make 
> progress sequentially.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21646) Tez: Prevent TezTasks from escaping thread logging context

2019-05-20 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-21646:
---
Attachment: HIVE-21646.1.patch

> Tez: Prevent TezTasks from escaping thread logging context
> --
>
> Key: HIVE-21646
> URL: https://issues.apache.org/jira/browse/HIVE-21646
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-21646.1.patch, HIVE-21646.1.patch
>
>
> If hive.exec.parallel is set to true to parallelize MoveTasks or StatsTasks, 
> the Tez task does not benefit from a new thread and will lose all the thread 
> context of the current query.
> Multiple threads even if they are spawned, will lock on SyncDagClient & make 
> progress sequentially.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21715) Adding a new partition specified by location (which is empty) leads to Exceptions

2019-05-20 Thread Laszlo Bodor (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843948#comment-16843948
 ] 

Laszlo Bodor commented on HIVE-21715:
-

+1

> Adding a new partition specified by location (which is empty) leads to 
> Exceptions
> -
>
> Key: HIVE-21715
> URL: https://issues.apache.org/jira/browse/HIVE-21715
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-21715.01.patch, HIVE-21715.01.patch, 
> HIVE-21715.02.patch, HIVE-21715.02.patch
>
>
> {code}
> create table supply (id int, part string, quantity int) partitioned by (day 
> int)
>stored as orc
>location 'hdfs:///tmp/a1'
>TBLPROPERTIES ('transactional'='true')
> ;
> alter table supply add partition (day=20110103) location 
>'hdfs:///tmp/a3';
> {code}
> check exception:
> {code}
> org.apache.hadoop.hive.ql.metadata.HiveException: Wrong file format. Please 
> check the file's format.
>   at 
> org.apache.hadoop.hive.ql.exec.MoveTask.checkFileFormats(MoveTask.java:696)
>   at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:370)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:210)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
> {code}
> If the format check is disabled; an exception happens from AcidUtils; because 
> during checking it doesn't expect it to be empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21755) Backport HIVE-21462 to branch-3: Upgrading SQL server backed metastore when changing data type of a column with constraints

2019-05-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843928#comment-16843928
 ] 

Hive QA commented on HIVE-21755:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12969144/HIVE-21755.01.branch-3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 43 failed/errored test(s), 14431 tests 
executed
*Failed tests:*
{noformat}
TestBeeLineDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=260)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=260)
TestMiniDruidCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=260)
TestMiniDruidKafkaCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=260)
TestTezPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=260)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[load_static_ptn_into_bucketed_table]
 (batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mm_all] (batchId=71)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[mm_all] 
(batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[current_date_timestamp]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[external_jdbc_auth]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[load_data_using_job]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[multi_in_clause]
 (batchId=176)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[results_cache_2]
 (batchId=179)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sharedwork] 
(batchId=176)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_views]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=167)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver
 (batchId=187)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver
 (batchId=188)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver
 (batchId=189)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver
 (batchId=190)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_subquery_chain]
 (batchId=97)
org.apache.hadoop.hive.metastore.tools.TestSchemaToolCatalogOps.alterBogusCatalog
 (batchId=233)
org.apache.hadoop.hive.metastore.tools.TestSchemaToolCatalogOps.alterCatalog 
(batchId=233)
org.apache.hadoop.hive.metastore.tools.TestSchemaToolCatalogOps.alterCatalogNoChange
 (batchId=233)
org.apache.hadoop.hive.metastore.tools.TestSchemaToolCatalogOps.createCatalog 
(batchId=233)
org.apache.hadoop.hive.metastore.tools.TestSchemaToolCatalogOps.createExistingCatalog
 (batchId=233)
org.apache.hadoop.hive.metastore.tools.TestSchemaToolCatalogOps.createExistingCatalogWithIfNotExists
 (batchId=233)
org.apache.hadoop.hive.metastore.tools.TestSchemaToolCatalogOps.moveDatabase 
(batchId=233)
org.apache.hadoop.hive.metastore.tools.TestSchemaToolCatalogOps.moveDatabaseWithExistingDbOfSameNameAlreadyInTargetCatalog
 (batchId=233)
org.apache.hadoop.hive.metastore.tools.TestSchemaToolCatalogOps.moveDbToNonExistentCatalog
 (batchId=233)
org.apache.hadoop.hive.metastore.tools.TestSchemaToolCatalogOps.moveNonExistentDatabase
 (batchId=233)
org.apache.hadoop.hive.metastore.tools.TestSchemaToolCatalogOps.moveNonExistentTable
 (batchId=233)
org.apache.hadoop.hive.metastore.tools.TestSchemaToolCatalogOps.moveTable 
(batchId=233)
org.apache.hadoop.hive.metastore.tools.TestSchemaToolCatalogOps.moveTableToNonExistentDb
 (batchId=233)
org.apache.hadoop.hive.metastore.tools.TestSchemaToolCatalogOps.moveTableWithExistingTableOfSameNameAlreadyInTargetDatabase
 (batchId=233)
org.apache.hadoop.hive.metastore.tools.TestSchemaToolCatalogOps.moveTableWithinCatalog
 (batchId=233)
org.apache.hadoop.hive.metastore.tools.TestSchemaToolForMetastore.testValidateLocations
 (batchId=223)
org.apache.hadoop.hive.metastore.tools.TestSchemaToolForMetastore.testValidateNullValues
 (batchId=223)
org.apache.hadoop.hive.metastore.tools.TestSchemaToolForMetastore.testValidateSequences
 (batchId=223)
org.apache.hadoop.hive.ql.TestWarehouseExternalDir.testManagedPaths 
(batchId=237)
org.apache.hadoop.hive.ql.parse.TestSQL11ReservedKeyWordsNegative$TestSQL11ReservedKeyWordsNegativeParametrized.testNegative[REAL]
 (batchId=288)
org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation 
(batchId=245)
org.apache.hive.spark.client.rpc.TestRpc.testServerPort (batchId=312)
{noformat}

Test results: 

[jira] [Updated] (HIVE-13781) Tez Job failed with FileNotFoundException when partition dir doesnt exists

2019-05-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-13781:
--
Labels: pull-request-available  (was: )

> Tez Job failed with FileNotFoundException when partition dir doesnt exists 
> ---
>
> Key: HIVE-13781
> URL: https://issues.apache.org/jira/browse/HIVE-13781
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 0.14.0, 2.0.0, 3.1.1
> Environment: hive 0.14.0 ,tez-0.5.2,hadoop 2.6.0
>Reporter: Feng Yuan
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-13781.1.patch
>
>
> when i have a partitioned table a with partition "day",in metadata a have 
> partition day: 20160501,20160502,but partition 20160501's dir didnt exits.
> so when i use tez engine to run hive -e "select day,count(*) from a where 
> xx=xx group by day"
> hive throws FileNotFoundException.
> but mr work.
> repo eg:
> CREATE EXTERNAL TABLE `a`(
>   `a` string)
> PARTITIONED BY ( 
>   `l_date` string);
> insert overwrite table a partition(l_date='2016-04-08') values (1),(2);
> insert overwrite table a partition(l_date='2016-04-09') values (1),(2);
> hadoop dfs -rm -r -f /warehouse/a/l_date=2016-04-09
> select l_date,count(*) from a where a='1' group by l_date;
> error:
> ut: a initializer failed, vertex=vertex_1463493135662_10445_1_00 [Map 1], 
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
> hdfs://bfdhadoopcool/warehouse/test.db/a/l_date=2015-04-09
>   at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:129)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-13781) Tez Job failed with FileNotFoundException when partition dir doesnt exists

2019-05-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13781?focusedWorklogId=245029=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-245029
 ]

ASF GitHub Bot logged work on HIVE-13781:
-

Author: ASF GitHub Bot
Created on: 20/May/19 12:02
Start Date: 20/May/19 12:02
Worklog Time Spent: 10m 
  Work Description: zbtcrazybuddy commented on pull request #640: 
HIVE-13781 Tez Job failed with FileNotFoundException when partition dir doesnt 
exists
URL: https://github.com/apache/hive/pull/640
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 245029)
Time Spent: 10m
Remaining Estimate: 0h

> Tez Job failed with FileNotFoundException when partition dir doesnt exists 
> ---
>
> Key: HIVE-13781
> URL: https://issues.apache.org/jira/browse/HIVE-13781
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 0.14.0, 2.0.0, 3.1.1
> Environment: hive 0.14.0 ,tez-0.5.2,hadoop 2.6.0
>Reporter: Feng Yuan
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-13781.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> when i have a partitioned table a with partition "day",in metadata a have 
> partition day: 20160501,20160502,but partition 20160501's dir didnt exits.
> so when i use tez engine to run hive -e "select day,count(*) from a where 
> xx=xx group by day"
> hive throws FileNotFoundException.
> but mr work.
> repo eg:
> CREATE EXTERNAL TABLE `a`(
>   `a` string)
> PARTITIONED BY ( 
>   `l_date` string);
> insert overwrite table a partition(l_date='2016-04-08') values (1),(2);
> insert overwrite table a partition(l_date='2016-04-09') values (1),(2);
> hadoop dfs -rm -r -f /warehouse/a/l_date=2016-04-09
> select l_date,count(*) from a where a='1' group by l_date;
> error:
> ut: a initializer failed, vertex=vertex_1463493135662_10445_1_00 [Map 1], 
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
> hdfs://bfdhadoopcool/warehouse/test.db/a/l_date=2015-04-09
>   at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:129)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-13781) Tez Job failed with FileNotFoundException when partition dir doesnt exists

2019-05-20 Thread zhangbutao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-13781:
--
Affects Version/s: 3.1.1

> Tez Job failed with FileNotFoundException when partition dir doesnt exists 
> ---
>
> Key: HIVE-13781
> URL: https://issues.apache.org/jira/browse/HIVE-13781
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 0.14.0, 2.0.0, 3.1.1
> Environment: hive 0.14.0 ,tez-0.5.2,hadoop 2.6.0
>Reporter: Feng Yuan
>Assignee: zhangbutao
>Priority: Major
> Attachments: HIVE-13781.1.patch
>
>
> when i have a partitioned table a with partition "day",in metadata a have 
> partition day: 20160501,20160502,but partition 20160501's dir didnt exits.
> so when i use tez engine to run hive -e "select day,count(*) from a where 
> xx=xx group by day"
> hive throws FileNotFoundException.
> but mr work.
> repo eg:
> CREATE EXTERNAL TABLE `a`(
>   `a` string)
> PARTITIONED BY ( 
>   `l_date` string);
> insert overwrite table a partition(l_date='2016-04-08') values (1),(2);
> insert overwrite table a partition(l_date='2016-04-09') values (1),(2);
> hadoop dfs -rm -r -f /warehouse/a/l_date=2016-04-09
> select l_date,count(*) from a where a='1' group by l_date;
> error:
> ut: a initializer failed, vertex=vertex_1463493135662_10445_1_00 [Map 1], 
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
> hdfs://bfdhadoopcool/warehouse/test.db/a/l_date=2015-04-09
>   at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:129)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-13781) Tez Job failed with FileNotFoundException when partition dir doesnt exists

2019-05-20 Thread zhangbutao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843884#comment-16843884
 ] 

zhangbutao commented on HIVE-13781:
---

Can anyone review this patch? I think this is a common problem which should be 
fixed as soon as possible.  Thanks !

> Tez Job failed with FileNotFoundException when partition dir doesnt exists 
> ---
>
> Key: HIVE-13781
> URL: https://issues.apache.org/jira/browse/HIVE-13781
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 0.14.0, 2.0.0
> Environment: hive 0.14.0 ,tez-0.5.2,hadoop 2.6.0
>Reporter: Feng Yuan
>Assignee: zhangbutao
>Priority: Major
> Attachments: HIVE-13781.1.patch
>
>
> when i have a partitioned table a with partition "day",in metadata a have 
> partition day: 20160501,20160502,but partition 20160501's dir didnt exits.
> so when i use tez engine to run hive -e "select day,count(*) from a where 
> xx=xx group by day"
> hive throws FileNotFoundException.
> but mr work.
> repo eg:
> CREATE EXTERNAL TABLE `a`(
>   `a` string)
> PARTITIONED BY ( 
>   `l_date` string);
> insert overwrite table a partition(l_date='2016-04-08') values (1),(2);
> insert overwrite table a partition(l_date='2016-04-09') values (1),(2);
> hadoop dfs -rm -r -f /warehouse/a/l_date=2016-04-09
> select l_date,count(*) from a where a='1' group by l_date;
> error:
> ut: a initializer failed, vertex=vertex_1463493135662_10445_1_00 [Map 1], 
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
> hdfs://bfdhadoopcool/warehouse/test.db/a/l_date=2015-04-09
>   at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:129)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21755) Backport HIVE-21462 to branch-3: Upgrading SQL server backed metastore when changing data type of a column with constraints

2019-05-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843880#comment-16843880
 ] 

Hive QA commented on HIVE-21755:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 15s{color} 
| {color:red} 
/data/hiveptest/logs/PreCommit-HIVE-Build-17257/patches/PreCommit-HIVE-Build-17257.patch
 does not apply to master. Rebase required? Wrong Branch? See 
http://cwiki.apache.org/confluence/display/Hive/HowToContribute for help. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-17257/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Backport HIVE-21462 to branch-3: Upgrading SQL server backed metastore when 
> changing data type of a column with constraints
> ---
>
> Key: HIVE-21755
> URL: https://issues.apache.org/jira/browse/HIVE-21755
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.1.1
>Reporter: David Lavati
>Assignee: David Lavati
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0, 3.1.2
>
> Attachments: HIVE-21755.01.branch-3.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Backport of HIVE-21462 to branch-3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-13781) Tez Job failed with FileNotFoundException when partition dir doesnt exists

2019-05-20 Thread zhangbutao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-13781:
--
Attachment: HIVE-13781.1.patch

> Tez Job failed with FileNotFoundException when partition dir doesnt exists 
> ---
>
> Key: HIVE-13781
> URL: https://issues.apache.org/jira/browse/HIVE-13781
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 0.14.0, 2.0.0
> Environment: hive 0.14.0 ,tez-0.5.2,hadoop 2.6.0
>Reporter: Feng Yuan
>Assignee: zhangbutao
>Priority: Major
> Attachments: HIVE-13781.1.patch
>
>
> when i have a partitioned table a with partition "day",in metadata a have 
> partition day: 20160501,20160502,but partition 20160501's dir didnt exits.
> so when i use tez engine to run hive -e "select day,count(*) from a where 
> xx=xx group by day"
> hive throws FileNotFoundException.
> but mr work.
> repo eg:
> CREATE EXTERNAL TABLE `a`(
>   `a` string)
> PARTITIONED BY ( 
>   `l_date` string);
> insert overwrite table a partition(l_date='2016-04-08') values (1),(2);
> insert overwrite table a partition(l_date='2016-04-09') values (1),(2);
> hadoop dfs -rm -r -f /warehouse/a/l_date=2016-04-09
> select l_date,count(*) from a where a='1' group by l_date;
> error:
> ut: a initializer failed, vertex=vertex_1463493135662_10445_1_00 [Map 1], 
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
> hdfs://bfdhadoopcool/warehouse/test.db/a/l_date=2015-04-09
>   at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:129)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-13781) Tez Job failed with FileNotFoundException when partition dir doesnt exists

2019-05-20 Thread zhangbutao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843877#comment-16843877
 ] 

zhangbutao commented on HIVE-13781:
---

This is because tez did not handle the empty partitions situation. MR handle 
the situation before the AM task running by creating dumming file for empty 
partitions. However, tez gets input paths when  AM task is running, and it  
can't handle empty partitions like MR does. I think hive should make sure the 
existence of  partitions path before adding the path to input paths. The patch 
does for this.

> Tez Job failed with FileNotFoundException when partition dir doesnt exists 
> ---
>
> Key: HIVE-13781
> URL: https://issues.apache.org/jira/browse/HIVE-13781
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 0.14.0, 2.0.0
> Environment: hive 0.14.0 ,tez-0.5.2,hadoop 2.6.0
>Reporter: Feng Yuan
>Assignee: zhangbutao
>Priority: Major
>
> when i have a partitioned table a with partition "day",in metadata a have 
> partition day: 20160501,20160502,but partition 20160501's dir didnt exits.
> so when i use tez engine to run hive -e "select day,count(*) from a where 
> xx=xx group by day"
> hive throws FileNotFoundException.
> but mr work.
> repo eg:
> CREATE EXTERNAL TABLE `a`(
>   `a` string)
> PARTITIONED BY ( 
>   `l_date` string);
> insert overwrite table a partition(l_date='2016-04-08') values (1),(2);
> insert overwrite table a partition(l_date='2016-04-09') values (1),(2);
> hadoop dfs -rm -r -f /warehouse/a/l_date=2016-04-09
> select l_date,count(*) from a where a='1' group by l_date;
> error:
> ut: a initializer failed, vertex=vertex_1463493135662_10445_1_00 [Map 1], 
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
> hdfs://bfdhadoopcool/warehouse/test.db/a/l_date=2015-04-09
>   at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:129)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21756) Backport HIVE-21404 to branch-3: MSSQL upgrade script alters the wrong column

2019-05-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21756?focusedWorklogId=244998=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-244998
 ]

ASF GitHub Bot logged work on HIVE-21756:
-

Author: ASF GitHub Bot
Created on: 20/May/19 11:12
Start Date: 20/May/19 11:12
Worklog Time Spent: 10m 
  Work Description: dlavati-hw commented on pull request #639: HIVE-21756 
Backport HIVE-21404: MSSQL upgrade script alters the wrong…
URL: https://github.com/apache/hive/pull/639
 
 
   … column (David Lavati via Zoltan Haindrich, Ashutosh Bapat)
   
   Change-Id: Ia85bc76c679ef29e9dcea261440a1836054a4e31
   Signed-off-by: Zoltan Haindrich 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 244998)
Time Spent: 10m
Remaining Estimate: 0h

> Backport HIVE-21404 to branch-3: MSSQL upgrade script alters the wrong column
> -
>
> Key: HIVE-21756
> URL: https://issues.apache.org/jira/browse/HIVE-21756
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.1.1
>Reporter: David Lavati
>Assignee: David Lavati
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0, 3.1.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Backport of HIVE-21404to branch-3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21755) Backport HIVE-21462 to branch-3: Upgrading SQL server backed metastore when changing data type of a column with constraints

2019-05-20 Thread David Lavati (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Lavati updated HIVE-21755:

Attachment: HIVE-21755.01.branch-3.patch
Status: Patch Available  (was: Open)

> Backport HIVE-21462 to branch-3: Upgrading SQL server backed metastore when 
> changing data type of a column with constraints
> ---
>
> Key: HIVE-21755
> URL: https://issues.apache.org/jira/browse/HIVE-21755
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.1.1
>Reporter: David Lavati
>Assignee: David Lavati
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0, 3.1.2
>
> Attachments: HIVE-21755.01.branch-3.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Backport of HIVE-21462 to branch-3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21755) Backport HIVE-21462 to branch-3: Upgrading SQL server backed metastore when changing data type of a column with constraints

2019-05-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21755?focusedWorklogId=244997=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-244997
 ]

ASF GitHub Bot logged work on HIVE-21755:
-

Author: ASF GitHub Bot
Created on: 20/May/19 11:05
Start Date: 20/May/19 11:05
Worklog Time Spent: 10m 
  Work Description: dlavati commented on pull request #638: HIVE-21755: 
Backport HIVE-21462: Upgrading SQL server backed metastor…
URL: https://github.com/apache/hive/pull/638
 
 
   …e when changing data type of a column with constraints (Ashutosh Bapat, 
reviewed by Daniel Dai)
   
   Change-Id: I2577d93b97888bbe2770a75bfe589c995e90ca8d
   Signed-off-by: Daniel Dai 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 244997)
Time Spent: 10m
Remaining Estimate: 0h

> Backport HIVE-21462 to branch-3: Upgrading SQL server backed metastore when 
> changing data type of a column with constraints
> ---
>
> Key: HIVE-21755
> URL: https://issues.apache.org/jira/browse/HIVE-21755
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.1.1
>Reporter: David Lavati
>Assignee: David Lavati
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0, 3.1.2
>
> Attachments: HIVE-21755.01.branch-3.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Backport of HIVE-21462 to branch-3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21741) Backport HIVE-20221 & related fix HIVE-20833 to branch-3: Increase column width for partition_params

2019-05-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843865#comment-16843865
 ] 

Hive QA commented on HIVE-21741:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12969143/HIVE-21741.01.branch-3.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/17256/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/17256/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-17256/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2019-05-20 11:01:14.009
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-17256/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z branch-3 ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2019-05-20 11:01:14.013
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 8b8e702 HIVE-21731 : Hive import fails, post upgrade of source 
3.0 cluster, to a target 4.0 cluster with strict managed table set to true. 
(Mahesh Kumar Behera reviewed by  Sankar Hariappan)
+ git clean -f -d
Removing standalone-metastore/metastore-server/src/gen/
+ git checkout branch-3
Switched to branch 'branch-3'
Your branch is behind 'origin/branch-3' by 1 commit, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/branch-3
HEAD is now at fdaa342 HIVE-21685: Wrong simplification in query with multiple 
IN clauses (Jesus Camacho Rodriguez, reviewed by Zoltan Haindrich)
+ git merge --ff-only origin/branch-3
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2019-05-20 11:01:18.542
+ rm -rf ../yetus_PreCommit-HIVE-Build-17256
+ mkdir ../yetus_PreCommit-HIVE-Build-17256
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-17256
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-17256/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: a/itests/src/test/resources/testconfiguration.properties: does not exist 
in index
error: a/ql/src/test/queries/clientpositive/alter_partition_change_col.q: does 
not exist in index
error: a/ql/src/test/results/clientpositive/alter_partition_change_col.q.out: 
does not exist in index
error: 
a/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
 does not exist in index
error: 
a/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java:
 does not exist in index
error: a/standalone-metastore/src/main/resources/package.jdo: does not exist in 
index
error: a/standalone-metastore/src/main/sql/derby/hive-schema-3.2.0.derby.sql: 
does not exist in index
error: 
a/standalone-metastore/src/main/sql/derby/upgrade-3.1.0-to-3.2.0.derby.sql: 
does not exist in index
error: a/standalone-metastore/src/main/sql/mssql/hive-schema-3.2.0.mssql.sql: 
does not exist in index
error: 
a/standalone-metastore/src/main/sql/mssql/upgrade-3.1.0-to-3.2.0.mssql.sql: 
does not exist in index
error: a/standalone-metastore/src/main/sql/mysql/hive-schema-3.2.0.mysql.sql: 
does not exist in index
error: 
a/standalone-metastore/src/main/sql/mysql/upgrade-3.1.0-to-3.2.0.mysql.sql: 
does not exist in index
error: a/standalone-metastore/src/main/sql/oracle/hive-schema-3.2.0.oracle.sql: 
does not exist in index
error: 
a/standalone-metastore/src/main/sql/oracle/upgrade-3.1.0-to-3.2.0.oracle.sql: 
does not exist in index
error: 
a/standalone-metastore/src/main/sql/postgres/hive-schema-3.2.0.postgres.sql: 
does not exist in index
error: 
a/standalone-metastore/src/main/sql/postgres/upgrade-3.1.0-to-3.2.0.postgres.sql:
 does not exist in index
Going to apply patch 

[jira] [Updated] (HIVE-21756) Backport HIVE-21404 to branch-3: MSSQL upgrade script alters the wrong column

2019-05-20 Thread David Lavati (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Lavati updated HIVE-21756:

Description: Backport of HIVE-21404to branch-3.  (was: Backport of 
HIVE-21462 to branch-3.)

> Backport HIVE-21404 to branch-3: MSSQL upgrade script alters the wrong column
> -
>
> Key: HIVE-21756
> URL: https://issues.apache.org/jira/browse/HIVE-21756
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.1.1
>Reporter: David Lavati
>Assignee: David Lavati
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0, 3.1.2
>
>
> Backport of HIVE-21404to branch-3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21756) Backport HIVE-21404 to branch-3: MSSQL upgrade script alters the wrong column

2019-05-20 Thread David Lavati (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Lavati reassigned HIVE-21756:
---


> Backport HIVE-21404 to branch-3: MSSQL upgrade script alters the wrong column
> -
>
> Key: HIVE-21756
> URL: https://issues.apache.org/jira/browse/HIVE-21756
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.1.1
>Reporter: David Lavati
>Assignee: David Lavati
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0, 3.1.2
>
>
> Backport of HIVE-21462 to branch-3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21755) Backport HIVE-21462 to branch-3: Upgrading SQL server backed metastore when changing data type of a column with constraints

2019-05-20 Thread David Lavati (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Lavati updated HIVE-21755:

Summary: Backport HIVE-21462 to branch-3: Upgrading SQL server backed 
metastore when changing data type of a column with constraints  (was: Backport 
HIVE-21404 to branch-3: MSSQL upgrade script alters the wrong column)

> Backport HIVE-21462 to branch-3: Upgrading SQL server backed metastore when 
> changing data type of a column with constraints
> ---
>
> Key: HIVE-21755
> URL: https://issues.apache.org/jira/browse/HIVE-21755
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.1.1
>Reporter: David Lavati
>Assignee: David Lavati
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0, 3.1.2
>
>
> Backport of HIVE-21462 to branch-3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21755) Backport HIVE-21404 to branch-3: MSSQL upgrade script alters the wrong column

2019-05-20 Thread David Lavati (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Lavati updated HIVE-21755:

Summary: Backport HIVE-21404 to branch-3: MSSQL upgrade script alters the 
wrong column  (was: Backport HIVE-21462 to branch-3: Upgrading SQL server 
backed metastore when changing data type of a column with constraints)

> Backport HIVE-21404 to branch-3: MSSQL upgrade script alters the wrong column
> -
>
> Key: HIVE-21755
> URL: https://issues.apache.org/jira/browse/HIVE-21755
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.1.1
>Reporter: David Lavati
>Assignee: David Lavati
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0, 3.1.2
>
>
> Backport of HIVE-21462 to branch-3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21755) Backport HIVE-21462 to branch-3: Upgrading SQL server backed metastore when changing data type of a column with constraints

2019-05-20 Thread David Lavati (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Lavati updated HIVE-21755:

Description: Backport of HIVE-21462 to branch-3.  (was: This is an umbrella 
for backporting HIVE-20221 & the related fix of HIVE-20833 to branch-3.)

> Backport HIVE-21462 to branch-3: Upgrading SQL server backed metastore when 
> changing data type of a column with constraints
> ---
>
> Key: HIVE-21755
> URL: https://issues.apache.org/jira/browse/HIVE-21755
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.1.1
>Reporter: David Lavati
>Assignee: David Lavati
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0, 3.1.2
>
>
> Backport of HIVE-21462 to branch-3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21755) Backport HIVE-21462 to branch-3: Upgrading SQL server backed metastore when changing data type of a column with constraints

2019-05-20 Thread David Lavati (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Lavati reassigned HIVE-21755:
---


> Backport HIVE-21462 to branch-3: Upgrading SQL server backed metastore when 
> changing data type of a column with constraints
> ---
>
> Key: HIVE-21755
> URL: https://issues.apache.org/jira/browse/HIVE-21755
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.1.1
>Reporter: David Lavati
>Assignee: David Lavati
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0, 3.1.2
>
>
> This is an umbrella for backporting HIVE-20221 & the related fix of 
> HIVE-20833 to branch-3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21741) Backport HIVE-20221 & related fix HIVE-20833 to branch-3: Increase column width for partition_params

2019-05-20 Thread David Lavati (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Lavati updated HIVE-21741:

Attachment: HIVE-21741.01.branch-3.patch
Status: Patch Available  (was: Open)

> Backport HIVE-20221 & related fix HIVE-20833 to branch-3: Increase column 
> width for partition_params
> 
>
> Key: HIVE-21741
> URL: https://issues.apache.org/jira/browse/HIVE-21741
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.1.1
>Reporter: David Lavati
>Assignee: David Lavati
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0, 3.1.2
>
> Attachments: HIVE-21741.01.branch-3.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is an umbrella for backporting HIVE-20221 & the related fix of 
> HIVE-20833 to branch-3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21732) Configurable injection of load for LLAP task execution

2019-05-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843856#comment-16843856
 ] 

Hive QA commented on HIVE-21732:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12969133/HIVE-21732.6.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 16028 tests 
executed
*Failed tests:*
{noformat}
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=177)

[union_top_level.q,vector_create_struct_table.q,schema_evol_text_vecrow_part_all_primitive.q,materialized_view_partitioned_2.q,murmur_hash_migration.q,update_where_partitioned.q,materialized_view_rewrite_1.q,materialized_view_create_rewrite_time_window_2.q,drop_partition_with_stats.q,smb_mapjoin_14.q,skiphf_aggr.q,fold_varchar.q,auto_join_filters.q,materialized_view_partitioned_3.q,insert_orig_table.q,mergejoin.q,vector_if_expr_2.q,orc_split_elimination.q,vector_outer_join0.q,schema_evol_text_vec_part_all_primitive.q,vector_complex_all.q,auto_sortmerge_join_4.q,union3.q,windowing_windowspec2.q,vector_list_constant.q,auto_smb_mapjoin_14.q,vector_mapjoin_complex_values.q,results_cache_truncate.q,vector_join_filters.q,reduce_deduplicate_extended.q]
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/17255/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/17255/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-17255/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12969133 - PreCommit-HIVE-Build

> Configurable injection of load for LLAP task execution
> --
>
> Key: HIVE-21732
> URL: https://issues.apache.org/jira/browse/HIVE-21732
> Project: Hive
>  Issue Type: Test
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21732.2.patch, HIVE-21732.3.patch, 
> HIVE-21732.4.patch, HIVE-21732.5.patch, HIVE-21732.6.patch, HIVE-21732.patch
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> For evaluating testing, it would be good to have a configurable way to inject 
> latency for LLAP tasks.
> The configuration should be able to control how much latency is injected into 
> each daemon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-13781) Tez Job failed with FileNotFoundException when partition dir doesnt exists

2019-05-20 Thread zhangbutao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-13781:
-

Assignee: zhangbutao

> Tez Job failed with FileNotFoundException when partition dir doesnt exists 
> ---
>
> Key: HIVE-13781
> URL: https://issues.apache.org/jira/browse/HIVE-13781
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 0.14.0, 2.0.0
> Environment: hive 0.14.0 ,tez-0.5.2,hadoop 2.6.0
>Reporter: Feng Yuan
>Assignee: zhangbutao
>Priority: Major
>
> when i have a partitioned table a with partition "day",in metadata a have 
> partition day: 20160501,20160502,but partition 20160501's dir didnt exits.
> so when i use tez engine to run hive -e "select day,count(*) from a where 
> xx=xx group by day"
> hive throws FileNotFoundException.
> but mr work.
> repo eg:
> CREATE EXTERNAL TABLE `a`(
>   `a` string)
> PARTITIONED BY ( 
>   `l_date` string);
> insert overwrite table a partition(l_date='2016-04-08') values (1),(2);
> insert overwrite table a partition(l_date='2016-04-09') values (1),(2);
> hadoop dfs -rm -r -f /warehouse/a/l_date=2016-04-09
> select l_date,count(*) from a where a='1' group by l_date;
> error:
> ut: a initializer failed, vertex=vertex_1463493135662_10445_1_00 [Map 1], 
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
> hdfs://bfdhadoopcool/warehouse/test.db/a/l_date=2015-04-09
>   at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:129)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21741) Backport HIVE-20221 & related fix HIVE-20833 to branch-3: Increase column width for partition_params

2019-05-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21741?focusedWorklogId=244976=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-244976
 ]

ASF GitHub Bot logged work on HIVE-21741:
-

Author: ASF GitHub Bot
Created on: 20/May/19 10:18
Start Date: 20/May/19 10:18
Worklog Time Spent: 10m 
  Work Description: dlavati commented on pull request #637: HIVE-21741 
Backport HIVE-20221 & related fix HIVE-20833 to branch-3: Increase column width 
for partition_params
URL: https://github.com/apache/hive/pull/637
 
 
   
   Change-Id: I58c602c80649965986b8080c8a666fe40fa17821
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 244976)
Time Spent: 10m
Remaining Estimate: 0h

> Backport HIVE-20221 & related fix HIVE-20833 to branch-3: Increase column 
> width for partition_params
> 
>
> Key: HIVE-21741
> URL: https://issues.apache.org/jira/browse/HIVE-21741
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.1.1
>Reporter: David Lavati
>Assignee: David Lavati
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0, 3.1.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is an umbrella for backporting HIVE-20221 & the related fix of 
> HIVE-20833 to branch-3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21741) Backport HIVE-20221 & related fix HIVE-20833 to branch-3: Increase column width for partition_params

2019-05-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-21741:
--
Labels: pull-request-available  (was: )

> Backport HIVE-20221 & related fix HIVE-20833 to branch-3: Increase column 
> width for partition_params
> 
>
> Key: HIVE-21741
> URL: https://issues.apache.org/jira/browse/HIVE-21741
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.1.1
>Reporter: David Lavati
>Assignee: David Lavati
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0, 3.1.2
>
>
> This is an umbrella for backporting HIVE-20221 & the related fix of 
> HIVE-20833 to branch-3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21741) Backport HIVE-20221 & related fix HIVE-20833 to branch-3: Increase column width for partition_params

2019-05-20 Thread David Lavati (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Lavati updated HIVE-21741:

Summary: Backport HIVE-20221 & related fix HIVE-20833 to branch-3: Increase 
column width for partition_params  (was: Backport HIVE-20221 & related fix 
HIVE-20833 to branch-3)

> Backport HIVE-20221 & related fix HIVE-20833 to branch-3: Increase column 
> width for partition_params
> 
>
> Key: HIVE-21741
> URL: https://issues.apache.org/jira/browse/HIVE-21741
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.1.1
>Reporter: David Lavati
>Assignee: David Lavati
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
>
> This is an umbrella for backporting HIVE-20221 & the related fix of 
> HIVE-20833 to branch-3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21741) Backport HIVE-20221 & related fix HIVE-20833 to branch-3

2019-05-20 Thread David Lavati (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Lavati updated HIVE-21741:

Description: This is an umbrella for backporting HIVE-20221 & the related 
fix of HIVE-20833 to branch-3.  (was: This is an umbrella for backporting the 
following metastore-related tickets to branch-3:
 * HIVE-20221
 * HIVE-20833
 * HIVE-21404
 * HIVE-21462

 Also including a .gitignore improvement:
 * HIVE-21406)

> Backport HIVE-20221 & related fix HIVE-20833 to branch-3
> 
>
> Key: HIVE-21741
> URL: https://issues.apache.org/jira/browse/HIVE-21741
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.1.1
>Reporter: David Lavati
>Assignee: David Lavati
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
>
> This is an umbrella for backporting HIVE-20221 & the related fix of 
> HIVE-20833 to branch-3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21741) Backport HIVE-20221 & related fix HIVE-20833 to branch-3

2019-05-20 Thread David Lavati (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Lavati updated HIVE-21741:

Summary: Backport HIVE-20221 & related fix HIVE-20833 to branch-3  (was: 
Backport metastore SQL commits to branch-3)

> Backport HIVE-20221 & related fix HIVE-20833 to branch-3
> 
>
> Key: HIVE-21741
> URL: https://issues.apache.org/jira/browse/HIVE-21741
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.1.1
>Reporter: David Lavati
>Assignee: David Lavati
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
>
> This is an umbrella for backporting the following metastore-related tickets 
> to branch-3:
>  * HIVE-20221
>  * HIVE-20833
>  * HIVE-21404
>  * HIVE-21462
>  Also including a .gitignore improvement:
>  * HIVE-21406



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21732) Configurable injection of load for LLAP task execution

2019-05-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843816#comment-16843816
 ] 

Hive QA commented on HIVE-21732:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
50s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
36s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
36s{color} | {color:blue} common in master has 62 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
43s{color} | {color:blue} llap-server in master has 81 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
20s{color} | {color:red} llap-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
21s{color} | {color:red} llap-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 21s{color} 
| {color:red} llap-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
13s{color} | {color:red} llap-server: The patch generated 1 new + 36 unchanged 
- 0 fixed = 37 total (was 36) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
19s{color} | {color:red} llap-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 16m 45s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-17255/dev-support/hive-personality.sh
 |
| git revision | master / 8b8e702 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| mvninstall | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-17255/yetus/patch-mvninstall-llap-server.txt
 |
| compile | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-17255/yetus/patch-compile-llap-server.txt
 |
| javac | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-17255/yetus/patch-compile-llap-server.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-17255/yetus/diff-checkstyle-llap-server.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-17255/yetus/patch-findbugs-llap-server.txt
 |
| modules | C: common llap-server U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-17255/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Configurable injection of load for LLAP task execution
> --
>
> Key: HIVE-21732
> URL: https://issues.apache.org/jira/browse/HIVE-21732
> Project: Hive
>  Issue Type: Test
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21732.2.patch, HIVE-21732.3.patch, 
> HIVE-21732.4.patch, HIVE-21732.5.patch, HIVE-21732.6.patch, HIVE-21732.patch
>
>  Time Spent: 3h 40m
>  Remaining 

[jira] [Comment Edited] (HIVE-21660) Wrong result when union all and later view with explode is used

2019-05-20 Thread Ganesha Shreedhara (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830329#comment-16830329
 ] 

Ganesha Shreedhara edited comment on HIVE-21660 at 5/20/19 9:47 AM:


When lateral view is used along with union all, the same object of 
FileSinkOperator type is getting visited twice in removeUnionOperators while 
looking for objects of FileSinkOperator type from all root operators (Ref: 
[source 
code|[https://github.com/apache/hive/blame/rel/release-3.1.1/ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java#L293]]).
   

 

This is because the operator tree for the second subquery having lateral view 
join is getting formed as below:

 
{code:java}
  TS17
   |
 LVF18
 /\
SEL19 SEL20
|   |
\ UDTF22
 \/ 
 LVJ21
   |
 SEL23
   | 
  FS25{code}
 

FS25 object is getting visited twice here.

It first sets the directory for the FileSinkOperator object as 
*tablePath+UNION_SUDBIR_PREFIX_2* (linked size is 2 because its the second 
subquery of union all query). 

When the same object is visited again, it resets the directory of that object 
to  *(tablePath+UNION_SUDBIR_PREFIX_2)+(**UNION_SUDBIR_PREFIX_1)*. **

So the data getting written in temp path formed using specPath 
(*tablePath+UNION_SUDBIR_PREFIX_2)* is not getting moved to the final path 
properly. 

 

This issue will be solved if we avoid setting the directory for the same object 
again.

 


was (Author: ganeshas):
When lateral view is used along with union all, the same object of 
FileSinkOperator type is getting visited twice in removeUnionOperators while 
looking for objects of FileSinkOperator type from all root operators (Ref: 
[source 
code|[https://github.com/apache/hive/blame/rel/release-3.1.1/ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java#L293]]).
 

It first sets the directory for the FileSinkOperator object as 
*tablePath+UNION_SUDBIR_PREFIX_2* (linked size is 2 because its the second 
subquery of union all query). 

When the same object is visited again, it resets the directory of that object 
to  *(tablePath+UNION_SUDBIR_PREFIX_2)+(**UNION_SUDBIR_PREFIX_1)*. **

So the data getting written in 
*tablePath+UNION_SUDBIR_PREFIX_2+**UNION_SUDBIR_PREFIX_1* is not getting moved 
to the final path. 

This issue will be solved if we avoid setting the directory for the same object 
again.  

 

The operator tree for the second subquery having lateral view join is getting 
formed as below:

 

 
{code:java}
  TS17
   |
 LVF18
 /\
SEL19 SEL20
|   |
\ UDTF22
 \/ 
 LVJ21
   |
 SEL23
   | 
  FS25{code}
 

FS25 object is getting visited twice here which is leading to this issue.

 

 

 

 

> Wrong result when union all and later view with explode is used
> ---
>
> Key: HIVE-21660
> URL: https://issues.apache.org/jira/browse/HIVE-21660
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 3.1.1
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21660.1.patch, HIVE-21660.patch
>
>
> There is a data loss when the data is inserted to a partitioned table using 
> union all and lateral view with explode. 
>  
> *Steps to reproduce:*
>  
> {code:java}
> create table t1 (id int, dt string);
> insert into t1 values (2, '2019-04-01');
> create table t2( id int, dates array);
> insert into t2 select 1 as id, array('2019-01-01','2019-01-02','2019-01-03') 
> as dates;
> create table dst (id int) partitioned by (dt string);
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.dynamic.partition=true;
> insert overwrite table dst partition (dt)
> select t.id, t.dt from (
> select id, dt from t1
> union all
> select id, dts as dt from t2 tt2 lateral view explode(tt2.dates) dd as dts ) 
> t;
> select * from dst;
> {code}
>  
>  
> *Actual Result:*
> {code:java}
> +--+--+
> | 2| 2019-04-01   |
> +--+--+{code}
>  
> *Expected Result* (Run only the select part from the above insert query)*:* 
> {code:java}
> +---++
> | 2     | 2019-04-01 |
> | 1     | 2019-01-01 |
> | 1     | 2019-01-02 |
> | 1     | 2019-01-03 |
> +---++{code}
>  
> Data retrieved using union all and lateral view with explode from second 
> table is missing. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21732) Configurable injection of load for LLAP task execution

2019-05-20 Thread Peter Vary (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-21732:
--
Attachment: HIVE-21732.6.patch

> Configurable injection of load for LLAP task execution
> --
>
> Key: HIVE-21732
> URL: https://issues.apache.org/jira/browse/HIVE-21732
> Project: Hive
>  Issue Type: Test
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21732.2.patch, HIVE-21732.3.patch, 
> HIVE-21732.4.patch, HIVE-21732.5.patch, HIVE-21732.6.patch, HIVE-21732.patch
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> For evaluating testing, it would be good to have a configurable way to inject 
> latency for LLAP tasks.
> The configuration should be able to control how much latency is injected into 
> each daemon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21732) Configurable injection of load for LLAP task execution

2019-05-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21732?focusedWorklogId=244903=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-244903
 ]

ASF GitHub Bot logged work on HIVE-21732:
-

Author: ASF GitHub Bot
Created on: 20/May/19 08:49
Start Date: 20/May/19 08:49
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #634: HIVE-21732: 
Configurable injection of latency for LLAP task execution
URL: https://github.com/apache/hive/pull/634#discussion_r285484659
 
 

 ##
 File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
 ##
 @@ -627,6 +627,17 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "internal usage only, used only in test mode. If set false, the 
operation logs, and the " +
 "operation log directory will not be removed, so they can be found 
after the test runs."),
 
+HIVE_TEST_LOAD_ENABLED("hive.test.load.enabled", false,
+"Enables a CPU load testing"),
+HIVE_TEST_LOAD_HOSTNAMES("hive.test.load.hostnames", "",
+"Specify host names for load testing. (e.g., \"host1,host2,host3\"). " 
+
+"Only checked if hive.test.load.enabled is set. Leave it empty if no 
loed generation is needed."),
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 244903)
Time Spent: 3.5h  (was: 3h 20m)

> Configurable injection of load for LLAP task execution
> --
>
> Key: HIVE-21732
> URL: https://issues.apache.org/jira/browse/HIVE-21732
> Project: Hive
>  Issue Type: Test
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21732.2.patch, HIVE-21732.3.patch, 
> HIVE-21732.4.patch, HIVE-21732.5.patch, HIVE-21732.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> For evaluating testing, it would be good to have a configurable way to inject 
> latency for LLAP tasks.
> The configuration should be able to control how much latency is injected into 
> each daemon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21732) Configurable injection of load for LLAP task execution

2019-05-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21732?focusedWorklogId=244904=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-244904
 ]

ASF GitHub Bot logged work on HIVE-21732:
-

Author: ASF GitHub Bot
Created on: 20/May/19 08:49
Start Date: 20/May/19 08:49
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #634: HIVE-21732: 
Configurable injection of latency for LLAP task execution
URL: https://github.com/apache/hive/pull/634#discussion_r285484729
 
 

 ##
 File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapLoadGeneratorService.java
 ##
 @@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.llap.daemon.impl;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.service.AbstractService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.net.InetAddress;
+import java.net.UnknownHostException;
+import java.util.Random;
+import java.util.concurrent.TimeUnit;
+
+/**
+ * Extra load generator service for LLAP.
+ */
+public class LlapLoadGeneratorService extends AbstractService {
+  private static final Logger LOG = 
LoggerFactory.getLogger(LlapLoadGeneratorService.class);
+  private long interval;
+  private float threshold;
+  private String[] victimsHostName;
+  private Thread[] threads;
+
+  public LlapLoadGeneratorService() {
+super("LlapLoadGeneratorService");
+  }
+
+  @Override
+  protected void serviceInit(Configuration conf) throws Exception {
+super.serviceInit(conf);
+threshold = HiveConf.getFloatVar(conf, 
HiveConf.ConfVars.HIVE_TEST_LOAD_UTILIZATION);
+victimsHostName = HiveConf.getTrimmedStringsVar(conf, 
HiveConf.ConfVars.HIVE_TEST_LOAD_HOSTNAMES);
+interval = HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.HIVE_TEST_LOAD_INTERVAL, TimeUnit.MILLISECONDS);
+LOG.info("LlapLoadGeneratorService init with {} {} {}", interval, 
threshold, victimsHostName);
+  }
+
+  @Override
+  protected void serviceStart() throws UnknownHostException {
+String localHostName = InetAddress.getLocalHost().getHostName();
+LOG.debug("Local hostname is: {}", localHostName);
+for (String hostName : victimsHostName) {
+  if (hostName.equalsIgnoreCase(localHostName)) {
+LOG.debug("Starting load generator process on: {}", localHostName);
+threads = new Thread[Runtime.getRuntime().availableProcessors()];
+Random random = new Random();
+for (int i = 0; i < threads.length; i++) {
+  threads[i] = new Thread(new Runnable() {
+@Override
+public void run() {
+  while (!Thread.interrupted()) {
+if (random.nextFloat() <= threshold) {
+  // Keep it busy
+  long startTime = System.currentTimeMillis();
+  while (System.currentTimeMillis() - startTime < interval) {
+// active loop, do nothing
+  }
+} else {
+  // Keep it idle
+  try {
+Thread.sleep(interval);
+  } catch (InterruptedException e) {
+// In case of interrupt finish the load generation
+break;
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 244904)
Time Spent: 3h 40m  (was: 3.5h)

> Configurable injection of load for LLAP task execution
> --
>
> Key: HIVE-21732
> URL: https://issues.apache.org/jira/browse/HIVE-21732
> Project: Hive
>  Issue Type: Test
>Reporter: Peter Vary
>

[jira] [Work logged] (HIVE-21732) Configurable injection of load for LLAP task execution

2019-05-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21732?focusedWorklogId=244900=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-244900
 ]

ASF GitHub Bot logged work on HIVE-21732:
-

Author: ASF GitHub Bot
Created on: 20/May/19 08:47
Start Date: 20/May/19 08:47
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #634: HIVE-21732: 
Configurable injection of latency for LLAP task execution
URL: https://github.com/apache/hive/pull/634#discussion_r285483531
 
 

 ##
 File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
 ##
 @@ -627,6 +627,14 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "internal usage only, used only in test mode. If set false, the 
operation logs, and the " +
 "operation log directory will not be removed, so they can be found 
after the test runs."),
 
+HIVE_TEST_LOAD_ENABLED("hive.test.load.enabled", false,
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 244900)
Time Spent: 3h 20m  (was: 3h 10m)

> Configurable injection of load for LLAP task execution
> --
>
> Key: HIVE-21732
> URL: https://issues.apache.org/jira/browse/HIVE-21732
> Project: Hive
>  Issue Type: Test
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21732.2.patch, HIVE-21732.3.patch, 
> HIVE-21732.4.patch, HIVE-21732.5.patch, HIVE-21732.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> For evaluating testing, it would be good to have a configurable way to inject 
> latency for LLAP tasks.
> The configuration should be able to control how much latency is injected into 
> each daemon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21697) Remove periodical full refresh in HMS cache

2019-05-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843735#comment-16843735
 ] 

Hive QA commented on HIVE-21697:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
39s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
45s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
40s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
54s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  2m 
50s{color} | {color:blue} standalone-metastore/metastore-common in master has 
31 extant Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  1m 
19s{color} | {color:blue} standalone-metastore/metastore-server in master has 
183 extant Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
32s{color} | {color:blue} beeline in master has 44 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
30s{color} | {color:blue} hcatalog/server-extensions in master has 3 extant 
Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
43s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
53s{color} | {color:blue} itests/util in master has 44 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
3s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
42s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
20s{color} | {color:red} server-extensions in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
42s{color} | {color:red} hive-unit in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
20s{color} | {color:red} server-extensions in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
45s{color} | {color:red} hive-unit in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 20s{color} 
| {color:red} server-extensions in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 45s{color} 
| {color:red} hive-unit in the patch failed. {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
31s{color} | {color:red} standalone-metastore/metastore-server: The patch 
generated 66 new + 1225 unchanged - 33 fixed = 1291 total (was 1258) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
18s{color} | {color:red} itests/hive-unit: The patch generated 3 new + 15 
unchanged - 1 fixed = 18 total (was 16) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
14s{color} | {color:red} itests/util: The patch generated 2 new + 18 unchanged 
- 0 fixed = 20 total (was 18) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
22s{color} | {color:red} standalone-metastore/metastore-server generated 2 new 
+ 182 unchanged - 1 fixed = 184 total (was 183) {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
20s{color} | {color:red} server-extensions in the patch failed. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
38s{color} | {color:red} hive-unit in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
40s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| 

[jira] [Commented] (HIVE-21660) Wrong result when union all and later view with explode is used

2019-05-20 Thread Ganesha Shreedhara (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843712#comment-16843712
 ] 

Ganesha Shreedhara commented on HIVE-21660:
---

Cc: [~hagleitn] [~pxiong] [~vgumashta]

> Wrong result when union all and later view with explode is used
> ---
>
> Key: HIVE-21660
> URL: https://issues.apache.org/jira/browse/HIVE-21660
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 3.1.1
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21660.1.patch, HIVE-21660.patch
>
>
> There is a data loss when the data is inserted to a partitioned table using 
> union all and lateral view with explode. 
>  
> *Steps to reproduce:*
>  
> {code:java}
> create table t1 (id int, dt string);
> insert into t1 values (2, '2019-04-01');
> create table t2( id int, dates array);
> insert into t2 select 1 as id, array('2019-01-01','2019-01-02','2019-01-03') 
> as dates;
> create table dst (id int) partitioned by (dt string);
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.dynamic.partition=true;
> insert overwrite table dst partition (dt)
> select t.id, t.dt from (
> select id, dt from t1
> union all
> select id, dts as dt from t2 tt2 lateral view explode(tt2.dates) dd as dts ) 
> t;
> select * from dst;
> {code}
>  
>  
> *Actual Result:*
> {code:java}
> +--+--+
> | 2| 2019-04-01   |
> +--+--+{code}
>  
> *Expected Result* (Run only the select part from the above insert query)*:* 
> {code:java}
> +---++
> | 2     | 2019-04-01 |
> | 1     | 2019-01-01 |
> | 1     | 2019-01-02 |
> | 1     | 2019-01-03 |
> +---++{code}
>  
> Data retrieved using union all and lateral view with explode from second 
> table is missing. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21697) Remove periodical full refresh in HMS cache

2019-05-20 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-21697:
--
Attachment: HIVE-21697.8.patch

> Remove periodical full refresh in HMS cache
> ---
>
> Key: HIVE-21697
> URL: https://issues.apache.org/jira/browse/HIVE-21697
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Attachments: HIVE-21697.1.patch, HIVE-21697.2.patch, 
> HIVE-21697.3.patch, HIVE-21697.4.patch, HIVE-21697.5.patch, 
> HIVE-21697.6.patch, HIVE-21697.7.patch, HIVE-21697.8.patch
>
>
> In HIVE-18661, we added periodical notification based refresh in HMS cache. 
> We shall remove periodical full refresh to simplify the code as it will no 
> longer be used. In the mean time, we introduced mechanism to provide 
> monotonic reads through the CachedStore.commitTransaction. This will no 
> longer be needed after HIVE-21637. So I will remove related code as well. 
> This will provide some performance benefits include:
> 1. We don't have to slow down write to catch up notification logs. Write can 
> be done immediately and tag the cache with writeids
> 2. We can read from cache even if updateUsingNotificationEvents is running. 
> Read will compare the writeids of the cache so monotonic reads will be 
> guaranteed
> I'd like to put a patch separately with HIVE-21637 so it can be tested 
> independently. HMW will use periodical notification based refresh to update 
> cache. And it will temporarily lift the monotonic reads guarantee until 
> HIVE-21637 checkin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)