[jira] [Updated] (HIVE-18352) introduce a METADATAONLY option while doing REPL DUMP to allow integrations of other tools

2018-01-04 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-18352:
--
Labels: pull-request-available  (was: )

> introduce a METADATAONLY option while doing REPL DUMP to allow integrations 
> of other tools 
> ---
>
> Key: HIVE-18352
> URL: https://issues.apache.org/jira/browse/HIVE-18352
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
>  Labels: pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18352.0.patch
>
>
> * Introduce a METADATAONLY option as part of the REPL DUMP command which will 
> only try and dump out events for DDL changes, this will be faster as we wont 
> need  scan of files on HDFS for DML changes. 
> * Additionally since we are only going to dump metadata operations, it might 
> be useful to include acid tables as well via an option as well. This option 
> can be removed when ACID support is complete via HIVE-18320
> it will be good to support the "WITH" clause as part of REPL DUMP command as 
> well (repl dump already supports it viaHIVE-17757) to achieve the above as 
> that will prevent less changes to the syntax of the statement and provide 
> more flexibility in future to include additional options as well. 
> {code}
> REPL DUMP [db_name] {FROM [event_id]} {TO [event_id]} {WITH 
> (['key'='value'],.)}
> {code}
> This will enable other tools like security / schema registry /  metadata 
> discovery to use replication related subsystem for their needs as well. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18352) introduce a METADATAONLY option while doing REPL DUMP to allow integrations of other tools

2018-01-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312651#comment-16312651
 ] 

ASF GitHub Bot commented on HIVE-18352:
---

GitHub user anishek opened a pull request:

https://github.com/apache/hive/pull/286

HIVE-18352: introduce a METADATAONLY option while doing REPL DUMP to allow 
integrations of other tools



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/anishek/hive HIVE-18352

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/286.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #286


commit c03814bd857cfd70a40aa7a5ec674e73cbfc63f9
Author: Anishek Agarwal 
Date:   2018-01-03T10:27:04Z

HIVE-18352: introduce a METADATAONLY option while doing REPL DUMP to allow 
integrations of other tools




> introduce a METADATAONLY option while doing REPL DUMP to allow integrations 
> of other tools 
> ---
>
> Key: HIVE-18352
> URL: https://issues.apache.org/jira/browse/HIVE-18352
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
>  Labels: pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18352.0.patch
>
>
> * Introduce a METADATAONLY option as part of the REPL DUMP command which will 
> only try and dump out events for DDL changes, this will be faster as we wont 
> need  scan of files on HDFS for DML changes. 
> * Additionally since we are only going to dump metadata operations, it might 
> be useful to include acid tables as well via an option as well. This option 
> can be removed when ACID support is complete via HIVE-18320
> it will be good to support the "WITH" clause as part of REPL DUMP command as 
> well (repl dump already supports it viaHIVE-17757) to achieve the above as 
> that will prevent less changes to the syntax of the statement and provide 
> more flexibility in future to include additional options as well. 
> {code}
> REPL DUMP [db_name] {FROM [event_id]} {TO [event_id]} {WITH 
> (['key'='value'],.)}
> {code}
> This will enable other tools like security / schema registry /  metadata 
> discovery to use replication related subsystem for their needs as well. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18221) test acid default

2018-01-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312644#comment-16312644
 ] 

Hive QA commented on HIVE-18221:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12904685/HIVE-18221.23.patch

{color:green}SUCCESS:{color} +1 due to 11 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 342 failed/errored test(s), 10959 tests 
executed
*Failed tests:*
{noformat}
TestAvroHCatLoader - did not produce a TEST-*.xml file (likely timed out) 
(batchId=190)
TestAvroHCatStorer - did not produce a TEST-*.xml file (likely timed out) 
(batchId=190)
TestBeeLineWithArgs - did not produce a TEST-*.xml file (likely timed out) 
(batchId=228)
TestBeelineConnectionUsingHiveSite - did not produce a TEST-*.xml file (likely 
timed out) (batchId=228)
TestBeelinePasswordOption - did not produce a TEST-*.xml file (likely timed 
out) (batchId=228)
TestBeelineWithUserHs2ConnectionFile - did not produce a TEST-*.xml file 
(likely timed out) (batchId=228)
TestCopyUtils - did not produce a TEST-*.xml file (likely timed out) 
(batchId=225)
TestCustomAuthentication - did not produce a TEST-*.xml file (likely timed out) 
(batchId=228)
TestDbNotificationListener - did not produce a TEST-*.xml file (likely timed 
out) (batchId=239)
TestDefaultHCatRecord - did not produce a TEST-*.xml file (likely timed out) 
(batchId=198)
TestE2EScenarios - did not produce a TEST-*.xml file (likely timed out) 
(batchId=190)
TestHCatDynamicPartitioned - did not produce a TEST-*.xml file (likely timed 
out) (batchId=194)
TestHCatExternalDynamicPartitioned - did not produce a TEST-*.xml file (likely 
timed out) (batchId=196)
TestHCatExternalNonPartitioned - did not produce a TEST-*.xml file (likely 
timed out) (batchId=197)
TestHCatExternalPartitioned - did not produce a TEST-*.xml file (likely timed 
out) (batchId=193)
TestHCatHiveCompatibility - did not produce a TEST-*.xml file (likely timed 
out) (batchId=239)
TestHCatHiveThriftCompatibility - did not produce a TEST-*.xml file (likely 
timed out) (batchId=239)
TestHCatInputFormat - did not produce a TEST-*.xml file (likely timed out) 
(batchId=197)
TestHCatInputFormatMethods - did not produce a TEST-*.xml file (likely timed 
out) (batchId=197)
TestHCatLoaderComplexSchema - did not produce a TEST-*.xml file (likely timed 
out) (batchId=190)
TestHCatLoaderEncryption - did not produce a TEST-*.xml file (likely timed out) 
(batchId=190)
TestHCatLoaderStorer - did not produce a TEST-*.xml file (likely timed out) 
(batchId=190)
TestHCatMultiOutputFormat - did not produce a TEST-*.xml file (likely timed 
out) (batchId=197)
TestHCatMutableDynamicPartitioned - did not produce a TEST-*.xml file (likely 
timed out) (batchId=191)
TestHCatMutableNonPartitioned - did not produce a TEST-*.xml file (likely timed 
out) (batchId=197)
TestHCatMutablePartitioned - did not produce a TEST-*.xml file (likely timed 
out) (batchId=195)
TestHCatNonPartitioned - did not produce a TEST-*.xml file (likely timed out) 
(batchId=192)
TestHCatOutputFormat - did not produce a TEST-*.xml file (likely timed out) 
(batchId=197)
TestHCatPartitionPublish - did not produce a TEST-*.xml file (likely timed out) 
(batchId=192)
TestHCatPartitioned - did not produce a TEST-*.xml file (likely timed out) 
(batchId=192)
TestHCatSchema - did not produce a TEST-*.xml file (likely timed out) 
(batchId=198)
TestHCatSchemaUtils - did not produce a TEST-*.xml file (likely timed out) 
(batchId=198)
TestHCatStorerMulti - did not produce a TEST-*.xml file (likely timed out) 
(batchId=190)
TestHCatStorerWrapper - did not produce a TEST-*.xml file (likely timed out) 
(batchId=190)
TestHiveClientCache - did not produce a TEST-*.xml file (likely timed out) 
(batchId=197)
TestInputJobInfo - did not produce a TEST-*.xml file (likely timed out) 
(batchId=197)
TestJsonSerDe - did not produce a TEST-*.xml file (likely timed out) 
(batchId=198)
TestLazyHCatRecord - did not produce a TEST-*.xml file (likely timed out) 
(batchId=198)
TestMultiOutputFormat - did not produce a TEST-*.xml file (likely timed out) 
(batchId=197)
TestNotificationListener - did not produce a TEST-*.xml file (likely timed out) 
(batchId=200)
TestOrcHCatLoader - did not produce a TEST-*.xml file (likely timed out) 
(batchId=190)
TestOrcHCatStorer - did not produce a TEST-*.xml file (likely timed out) 
(batchId=190)
TestParquetHCatLoader - did not produce a TEST-*.xml file (likely timed out) 
(batchId=190)
TestParquetHCatStorer - did not produce a TEST-*.xml file (likely timed out) 
(batchId=190)
TestPassProperties - did not produce a TEST-*.xml file (likely timed out) 
(batchId=192)
TestPigHCatUtil - did not produce a TEST-*.xml file (likely timed out) 
(batchId=190)
TestRCFileHCatLoader - did not produce a TEST-*.xml file (likely timed out) 
(batchId=190)
TestRCFileHCatStorer - did not produce a TEST-

[jira] [Updated] (HIVE-18381) Drop table operation isn't consider that hdfs acl privilege of the table location parent path

2018-01-04 Thread youchuikai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

youchuikai updated HIVE-18381:
--
Status: Patch Available  (was: In Progress)

> Drop table operation isn't consider that hdfs acl privilege of the table 
> location parent path  
> ---
>
> Key: HIVE-18381
> URL: https://issues.apache.org/jira/browse/HIVE-18381
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.0
> Environment: hive-1.1.0-cdh5.8.4
>Reporter: youchuikai
>Assignee: youchuikai
>
> {code:sql}
> // the push user belong to the test_rw group
> hive> dfs -getfacl /user/hive/warehouse1/test1.db;
> # file: /user/hive/warehouse1/test1.db
> # owner: root
> # group: hive
> user::rwx
> group::rwx
> group:test_r:r-x
> group:test_rw:rwx
> mask::rwx
> other::---
> default:user::rwx
> default:group::rwx
> default:group:test_r:r-x
> default:group:test_rw:rwx
> default:mask::rwx
> default:other::---
> hive> drop table test1.youck_66;
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Table metadata 
> not deleted since hdfs://nameservice-test1/user/hive/warehouse1/test1.db is 
> not writable by push)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-18381) Drop table operation isn't consider that hdfs acl privilege of the table location parent path

2018-01-04 Thread youchuikai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-18381 started by youchuikai.
-
> Drop table operation isn't consider that hdfs acl privilege of the table 
> location parent path  
> ---
>
> Key: HIVE-18381
> URL: https://issues.apache.org/jira/browse/HIVE-18381
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.0
> Environment: hive-1.1.0-cdh5.8.4
>Reporter: youchuikai
>Assignee: youchuikai
>
> {code:sql}
> // the push user belong to the test_rw group
> hive> dfs -getfacl /user/hive/warehouse1/test1.db;
> # file: /user/hive/warehouse1/test1.db
> # owner: root
> # group: hive
> user::rwx
> group::rwx
> group:test_r:r-x
> group:test_rw:rwx
> mask::rwx
> other::---
> default:user::rwx
> default:group::rwx
> default:group:test_r:r-x
> default:group:test_rw:rwx
> default:mask::rwx
> default:other::---
> hive> drop table test1.youck_66;
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Table metadata 
> not deleted since hdfs://nameservice-test1/user/hive/warehouse1/test1.db is 
> not writable by push)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18381) Drop table operation isn't consider that hdfs acl privilege of the table location parent path

2018-01-04 Thread youchuikai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312602#comment-16312602
 ] 

youchuikai commented on HIVE-18381:
---

*fix this bug.*
{code:java}
Index: src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===
--- src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java   (date 
1515137061000)
+++ src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java   (date 
1515137079737)
@@ -43,6 +43,8 @@
 import org.apache.hadoop.fs.FileStatus;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.AclEntry;
+import org.apache.hadoop.fs.permission.AclStatus;
 import org.apache.hadoop.fs.permission.FsAction;
 import org.apache.hadoop.hive.common.FileUtils;
 import org.apache.hadoop.hive.common.HiveStatsUtils;
@@ -250,8 +252,10 @@
 return false;
 }
 final FileStatus stat;
+final AclStatus aclStas;
 try {
 stat = getFs(path).getFileStatus(path);
+aclStas = getFs(path).getAclStatus(path);
 } catch (FileNotFoundException fnfe){
 // File named by path doesn't exist; nothing to validate.
 return true;
@@ -266,23 +270,38 @@
 } catch (LoginException le) {
 throw new IOException(le);
 }
-String user = ugi.getShortUserName();
+String user = ugi.getShortUserName();   // kaikai
+String[] groups = ugi.getGroupNames(); // groups 获取的是metastore的组用户信息。
 //check whether owner can delete
 if (stat.getOwner().equals(user) &&
 stat.getPermission().getUserAction().implies(FsAction.WRITE)) {
 return true;
 }
+
 //check whether group of the user can delete
 if (stat.getPermission().getGroupAction().implies(FsAction.WRITE)) {
-String[] groups = ugi.getGroupNames();
 if (ArrayUtils.contains(groups, stat.getGroup())) {
 return true;
 }
 }
+
 //check whether others can delete (uncommon case!!)
 if (stat.getPermission().getOtherAction().implies(FsAction.WRITE)) {
 return true;
 }
+
+// add extra
+List list = aclStas.getEntries();
+for (AclEntry aclEntry : list){
+if (aclEntry.getScope().toString() != "DEFAULT" && 
aclEntry.getPermission().implies(FsAction.WRITE) && aclEntry.getName() != 
"null"){
+if (aclEntry.getType().toString() == "USER" && 
aclEntry.getName().equals(user)){
+LOG.info("acl user is" + aclEntry.getName() + ";" + "hive 
cli user is " + user);
+return true;
+} else if (aclEntry.getType().toString() == "GROUP" && 
ArrayUtils.contains(groups, aclEntry.getName())){
+return true;
+}
+}
+}
 return false;
 }
   /*

{code}


> Drop table operation isn't consider that hdfs acl privilege of the table 
> location parent path  
> ---
>
> Key: HIVE-18381
> URL: https://issues.apache.org/jira/browse/HIVE-18381
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.0
> Environment: hive-1.1.0-cdh5.8.4
>Reporter: youchuikai
>Assignee: youchuikai
>
> {code:sql}
> // the push user belong to the test_rw group
> hive> dfs -getfacl /user/hive/warehouse1/test1.db;
> # file: /user/hive/warehouse1/test1.db
> # owner: root
> # group: hive
> user::rwx
> group::rwx
> group:test_r:r-x
> group:test_rw:rwx
> mask::rwx
> other::---
> default:user::rwx
> default:group::rwx
> default:group:test_r:r-x
> default:group:test_rw:rwx
> default:mask::rwx
> default:other::---
> hive> drop table test1.youck_66;
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Table metadata 
> not deleted since hdfs://nameservice-test1/user/hive/warehouse1/test1.db is 
> not writable by push)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17573) LLAP: JDK9 support fixes

2018-01-04 Thread liyunzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312596#comment-16312596
 ] 

liyunzhang commented on HIVE-17573:
---

[~gopalv]: thanks for your reply and tool.  
bq.JDK9 seems to wake up the producer-consumer pair on the same NUMA zone (the 
IO elevator allocates, passes the array to the executor thread and executor 
passes it back instead of throwing it to GC deref).
 If I don't add {{-XX:+UseNUMA}}, I guess the optimization about NUMA handling 
will not benefit the query, is it right? UseNUMA is disabled by default.
bq.the IO elevator allocates, passes the array to the executor thread and 
executor passes it back instead of throwing it to GC deref
I guess this will reduce less GC.

>From my test result, what i found is GC is less in JDK9 comparing JDK8 on Hive 
>on Spark  in long 
>queries([link|https://docs.google.com/presentation/d/1cK9ZfUliAggH3NJzSvexTPwkXpbsM7Dm0o0kdmuFQUU/edit#slide=id.p]).
> Maybe this is because G1GC is the default garbage collector and the purpose 
>of 
>[G1GC|https://docs.oracle.com/javase/9/gctuning/garbage-first-garbage-collector.htm#JSGCT-GUID-0394E76A-1A8F-425E-A0D0-B48A3DC82B42]
> is less GC time.

> LLAP: JDK9 support fixes
> 
>
> Key: HIVE-17573
> URL: https://issues.apache.org/jira/browse/HIVE-17573
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>
> The perf diff between JDK8 -> JDK9 seems to be significant.  
> TPC-H Q6 on JDK8 takes 32s on a single node + 1 Tb scale warehouse. 
> TPC-H Q6 on JDK9 takes 19s on the same host + same data.
> The performance difference seems to come from better JIT and better NUMA 
> handling.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-4312) Make ORC SerDe support replace columns

2018-01-04 Thread Upendra Yadav (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312592#comment-16312592
 ] 

Upendra Yadav commented on HIVE-4312:
-

Is there any plan to give this support?

> Make ORC SerDe support replace columns
> --
>
> Key: HIVE-4312
> URL: https://issues.apache.org/jira/browse/HIVE-4312
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>
> In the alterTable method of DDLTask.java there is an explicit list of SerDes 
> which support the replace columns command.  ORC should support this, at least 
> for partitioned tables, maybe not unpartitioned tables.
> This may be as simple as adding it to that list, but I suspect some 
> significant changes will be needed to make this work the the 
> CombineHiveInputFormat (e.g. where are combined and one split has a column 
> stored as a string and in the other it is stored as an int).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18353) CompactorMR should call jobclient.close() to trigger cleanup

2018-01-04 Thread Prabhu Joseph (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312583#comment-16312583
 ] 

Prabhu Joseph commented on HIVE-18353:
--

[~thejas] [~ekoifman] Can you review this when you get time. The failing test 
cases looks not related.

> CompactorMR should call jobclient.close() to trigger cleanup
> 
>
> Key: HIVE-18353
> URL: https://issues.apache.org/jira/browse/HIVE-18353
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 1.2.1
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
> Attachments: HIVE-18353.1.patch, HIVE-18353.2.patch, HIVE-18353.patch
>
>
> HiveMetastore process is leaking TrustStore reloader threads when running 
> compaction as JobClient close is not called from CompactorMR - MAPREDUCE-6618 
> and MAPREDUCE-6621 
> {code}
> "Truststore reloader thread" #2814 daemon prio=1 os_prio=0 
> tid=0x00cdc800 nid=0x2f05a waiting on condition [0x7fdaef403000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run(ReloadingX509TrustManager.java:194)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18221) test acid default

2018-01-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312581#comment-16312581
 ] 

Hive QA commented on HIVE-18221:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
28s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
31s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
 1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  6m 
44s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
25s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
35s{color} | {color:red} ql: The patch generated 3 new + 356 unchanged - 0 
fixed = 359 total (was 356) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
36s{color} | {color:red} root: The patch generated 3 new + 356 unchanged - 0 
fixed = 359 total (was 356) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  6m 
55s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 50m 30s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  xml  javac  javadoc  findbugs  checkstyle  
compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 20c9a39 |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8453/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8453/yetus/diff-checkstyle-root.txt
 |
| modules | C: ql hcatalog/core hcatalog/hcatalog-pig-adapter 
hcatalog/webhcat/java-client . itests/hive-unit U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8453/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> test acid default
> -
>
> Key: HIVE-18221
> URL: https://issues.apache.org/jira/browse/HIVE-18221
> Project: Hive
>  Issue Type: Test
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-18221.01.patch, HIVE-18221.02.patch, 
> HIVE-18221.03.patch, HIVE-18221.04.patch, HIVE-18221.07.patch, 
> HIVE-18221.08.patch, HIVE-18221.09.patch, HIVE-18221.10.patch, 
> HIVE-18221.11.patch, HIVE-18221.12.patch, HIVE-18221.13.patch, 
> HIVE-18221.14.patch, HIVE-18221.16.patch, HIVE-18221.18.patch, 
> HIVE-18221.19.patch, HIVE-18221.20.patch, HIVE-18221.21.patch, 
> HIVE-18221.22.patch, HIVE-18221.23.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18359) Extend grouping set limits from int to long

2018-01-04 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-18359:
-
Attachment: HIVE-18359.3.patch

> Extend grouping set limits from int to long
> ---
>
> Key: HIVE-18359
> URL: https://issues.apache.org/jira/browse/HIVE-18359
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-18359.1.patch, HIVE-18359.2.patch, 
> HIVE-18359.3.patch
>
>
> Grouping sets is broken for >32 columns because of usage of Int for bitmap 
> (also GROUPING__ID virtual column). This assumption breaks grouping 
> sets/rollups/cube when number of participating aggregation columns is >32. 
> The easier fix would be extend it to Long for now. The correct fix would be 
> to use BitSets everywhere but that would require GROUPING__ID column type to 
> binary which will make predicates on GROUPING__ID difficult to deal with. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18368) Improve Spark Debug RDD Graph

2018-01-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312522#comment-16312522
 ] 

Hive QA commented on HIVE-18368:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12904682/HIVE-18368.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 11547 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=35)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketsortoptimize_insert_2]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[hybridgrace_hashjoin_2]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=168)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=159)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_part]
 (batchId=93)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[stats_aggregator_error_1]
 (batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] 
(batchId=120)
org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore.testTransactionalValidation
 (batchId=213)
org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=253)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=225)
org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=231)
org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=231)
org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=231)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8452/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8452/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8452/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 18 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12904682 - PreCommit-HIVE-Build

> Improve Spark Debug RDD Graph
> -
>
> Key: HIVE-18368
> URL: https://issues.apache.org/jira/browse/HIVE-18368
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-18368.1.patch, Spark UI - Named RDDs.png
>
>
> The {{SparkPlan}} class does some logging to show the mapping between 
> different {{SparkTran}}, what shuffle types are used, and what trans are 
> cached. However, there is room for improvement.
> When debug logging is enabled the RDD graph is logged, but there isn't much 
> information printed about each RDD.
> We should combine both of the graphs and improve them. We could even make the 
> Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.
> Ideally, the final graph shows a clear relationship between Tran objects, 
> RDDs, and BaseWorks. Edge should include information about number of 
> partitions, shuffle types, Spark operations used, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected

2018-01-04 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312508#comment-16312508
 ] 

Gopal V commented on HIVE-18375:


[~pauljackson123]:the first two queries run on HDP3, there's probably a fix 
that went in for this which isn't in hive-2 branch.

{code}
0: jdbc:hive2://localhost:10007/tpcds_bin_par> EXPLAIN SELECT `first_name` 
`F_4`, `last_name` `F_5`
0: jdbc:hive2://localhost:10007/tpcds_bin_par> FROM `employees`
0: jdbc:hive2://localhost:10007/tpcds_bin_par> ORDER BY `emp_no` DESC;

Plan optimized by CBO.

Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)

Stage-0
  Fetch Operator
limit:-1
Stage-1
  Reducer 2 vectorized, llap
  File Output Operator [FS_9]
Select Operator [SEL_8] (rows=6 width=202)
  Output:["_col0","_col1"]
<-Map 1 [SIMPLE_EDGE] vectorized, llap
  SHUFFLE [RS_7]
Select Operator [SEL_6] (rows=6 width=202)
  Output:["_col0","_col1","_col2"]
  TableScan [TS_0] (rows=6 width=202)

testing@employees,employees,Tbl:COMPLETE,Col:NONE,Output:["first_name","last_name","emp_no"]
{code}

{code}
0: jdbc:hive2://localhost:10007/tpcds_bin_par> 
0: jdbc:hive2://localhost:10007/tpcds_bin_par> EXPLAIN SELECT `first_name` 
`F_4`, `emp_no` `F_3`, `last_name` `F_5`
0: jdbc:hive2://localhost:10007/tpcds_bin_par> FROM `employees`
0: jdbc:hive2://localhost:10007/tpcds_bin_par> ORDER BY `emp_no` DESC;

Plan optimized by CBO.

Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)

Stage-0
  Fetch Operator
limit:-1
Stage-1
  Reducer 2 vectorized, llap
  File Output Operator [FS_8]
Select Operator [SEL_7] (rows=6 width=202)
  Output:["_col0","_col1","_col2"]
<-Map 1 [SIMPLE_EDGE] vectorized, llap
  SHUFFLE [RS_6]
Select Operator [SEL_5] (rows=6 width=202)
  Output:["_col0","_col1","_col2"]
  TableScan [TS_0] (rows=6 width=202)

testing@employees,employees,Tbl:COMPLETE,Col:NONE,Output:["first_name","emp_no","last_name"]
{code}

> Cannot ORDER by subquery fields unless they are selected
> 
>
> Key: HIVE-18375
> URL: https://issues.apache.org/jira/browse/HIVE-18375
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.2
> Environment: Amazon AWS
> Release label:emr-5.11.0
> Hadoop distribution:Amazon 2.7.3
> Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1
> classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict]
>Reporter: Paul Jackson
>Priority: Minor
>
> Give these tables:
> {code:SQL}
> CREATE TABLE employees (
> emp_no  INT,
> first_name  VARCHAR(14),
> last_name   VARCHAR(16)
> );
> insert into employees values
> (1, 'Gottlob', 'Frege'),
> (2, 'Bertrand', 'Russell'),
> (3, 'Ludwig', 'Wittgenstein');
> CREATE TABLE salaries (
> emp_no  INT,
> salary  INT,
> from_date   DATE,
> to_date DATE
> );
> insert into salaries values
> (1, 10, '1900-01-01', '1900-01-31'),
> (1, 18, '1900-09-01', '1900-09-30'),
> (2, 15, '1940-03-01', '1950-01-01'),
> (3, 20, '1920-01-01', '1950-01-01');
> {code}
> This query returns the names of the employees ordered by their peak salary:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, this should still work even if the max_salary is not part of the 
> projection:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, that fails with this error:
> {code}
> Error while compiling statement: FAILED: SemanticException [Error 10004]: 
> line 9:9 Invalid table alias or column reference 't1': (possible column names 
> are: last_name, first_name)
> {code}
> FWIW, this also fails:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-01-04 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Status: Patch Available  (was: Open)

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-18350.1.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
> For bucketed table, hive relies on user to name the files matching the bucket 
> in non-strict mode. Hive assumes that the data belongs to same bucket in a 
> file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-01-04 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: HIVE-18350.1.patch

Only contains changes for bucketed tables.

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-18350.1.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
> For bucketed table, hive relies on user to name the files matching the bucket 
> in non-strict mode. Hive assumes that the data belongs to same bucket in a 
> file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18214) Flaky test: TestSparkClient

2018-01-04 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312482#comment-16312482
 ] 

Sahil Takiar commented on HIVE-18214:
-

[~aihuaxu] yes thats correct. It sends a shutdown message to the 
{{RemoteDriver}} asynchronously. Then it creates another {{RemoteDriver}}, 
which leads to the exception.

Yeah, we could add logic to do that, but again its not something that would 
happen in production because every {{RemoteDriver}} is spawned in a separate 
container. The {{RemoteDriver#main(String args[])}} is run in a YARN container. 
And each {{RemoteDriver}} creates a single {{SparkContext}} in its constructor.

We could just change {{TestSparkClient}} so that it always spawns the 
{{RemoteDriver}} in a separate process, I checked and it only makes the test 
take an extra 20 seconds. The code to run the {{RemoteDriver}} in the 
local-process was only ever meant for test purposes.

> Flaky test: TestSparkClient
> ---
>
> Key: HIVE-18214
> URL: https://issues.apache.org/jira/browse/HIVE-18214
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-18214.1.patch
>
>
> Looks like there is a race condition in {{TestSparkClient#runTest}}. The test 
> creates a {{RemoteDriver}} in memory, which creates a {{JavaSparkContext}}. A 
> new {{JavaSparkContext}} is created for each test that is run. There is a 
> race condition where the {{RemoteDriver}} isn't given enough time to 
> shutdown, so when the next test starts running it creates another 
> {{JavaSparkContext}} which causes an exception like 
> {{org.apache.spark.SparkException: Only one SparkContext may be running in 
> this JVM (see SPARK-2243)}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18368) Improve Spark Debug RDD Graph

2018-01-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312478#comment-16312478
 ] 

Hive QA commented on HIVE-18368:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
45s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
32s{color} | {color:red} ql: The patch generated 1 new + 54 unchanged - 9 fixed 
= 55 total (was 63) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 13m  3s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 20c9a39 |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8452/yetus/diff-checkstyle-ql.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8452/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Improve Spark Debug RDD Graph
> -
>
> Key: HIVE-18368
> URL: https://issues.apache.org/jira/browse/HIVE-18368
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-18368.1.patch, Spark UI - Named RDDs.png
>
>
> The {{SparkPlan}} class does some logging to show the mapping between 
> different {{SparkTran}}, what shuffle types are used, and what trans are 
> cached. However, there is room for improvement.
> When debug logging is enabled the RDD graph is logged, but there isn't much 
> information printed about each RDD.
> We should combine both of the graphs and improve them. We could even make the 
> Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.
> Ideally, the final graph shows a clear relationship between Tran objects, 
> RDDs, and BaseWorks. Edge should include information about number of 
> partitions, shuffle types, Spark operations used, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18061) q.outs: be more selective with masikng hdfs paths

2018-01-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312469#comment-16312469
 ] 

Hive QA commented on HIVE-18061:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12904598/HIVE-18061.02.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 172 failed/errored test(s), 11548 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=35)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=175)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_unencrypted_nonhdfs_external_tables]
 (batchId=173)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[acid_bucket_pruning]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[bucket5] 
(batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[bucket6] 
(batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[cte_2] 
(batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[cte_4] 
(batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[dynamic_partition_pruning_2]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[dynamic_semijoin_user_level]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[empty_dir_in_table]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[except_distinct] 
(batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[explainuser_2] 
(batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[external_table_with_space_in_location_path]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[file_with_header_footer]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[global_limit] 
(batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[import_exported_table]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[insert_into1] 
(batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[insert_into2] 
(batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[intersect_all] 
(batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[intersect_distinct]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_nullscan] 
(batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_stats] 
(batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llapdecider] 
(batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[load_fs2] 
(batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[load_hdfs_file_with_space_in_the_name]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[mapreduce1] 
(batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[mapreduce2] 
(batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[mm_all] 
(batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[multi_count_distinct_null]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters1]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_merge10] 
(batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_merge1] 
(batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_merge2] 
(batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_merge3] 
(batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_merge4] 
(batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_merge_diff_fs]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[parallel_colstats]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[parquet_complex_types_vectorization]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[parquet_map_type_vectorization]
 (batchId=

[jira] [Commented] (HIVE-18359) Extend grouping set limits from int to long

2018-01-04 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312461#comment-16312461
 ] 

Pengcheng Xiong commented on HIVE-18359:


LGTM +1 pending tests.  :)

> Extend grouping set limits from int to long
> ---
>
> Key: HIVE-18359
> URL: https://issues.apache.org/jira/browse/HIVE-18359
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-18359.1.patch, HIVE-18359.2.patch
>
>
> Grouping sets is broken for >32 columns because of usage of Int for bitmap 
> (also GROUPING__ID virtual column). This assumption breaks grouping 
> sets/rollups/cube when number of participating aggregation columns is >32. 
> The easier fix would be extend it to Long for now. The correct fix would be 
> to use BitSets everywhere but that would require GROUPING__ID column type to 
> binary which will make predicates on GROUPING__ID difficult to deal with. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18061) q.outs: be more selective with masikng hdfs paths

2018-01-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312446#comment-16312446
 ] 

Hive QA commented on HIVE-18061:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
21s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
43s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
24s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
10s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} The patch ql passed checkstyle {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} itests/util: The patch generated 0 new + 188 
unchanged - 6 fixed = 188 total (was 194) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 16m 28s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 20c9a39 |
| Default Java | 1.8.0_111 |
| modules | C: ql itests/util U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8451/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> q.outs: be more selective with masikng hdfs paths
> -
>
> Key: HIVE-18061
> URL: https://issues.apache.org/jira/browse/HIVE-18061
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Laszlo Bodor
> Attachments: HIVE-18061.01.patch, HIVE-18061.02.patch
>
>
> currently any line which contains a path which looks like an hdfs location is 
> replaced with a "masked pattern was here"...
> it might be releavant to record these messages; since even an exception 
> message might contain an hdfs location
> noticed in
> HIVE-18012



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18359) Extend grouping set limits from int to long

2018-01-04 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-18359:
-
Attachment: HIVE-18359.2.patch

> Extend grouping set limits from int to long
> ---
>
> Key: HIVE-18359
> URL: https://issues.apache.org/jira/browse/HIVE-18359
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-18359.1.patch, HIVE-18359.2.patch
>
>
> Grouping sets is broken for >32 columns because of usage of Int for bitmap 
> (also GROUPING__ID virtual column). This assumption breaks grouping 
> sets/rollups/cube when number of participating aggregation columns is >32. 
> The easier fix would be extend it to Long for now. The correct fix would be 
> to use BitSets everywhere but that would require GROUPING__ID column type to 
> binary which will make predicates on GROUPING__ID difficult to deal with. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18238) Driver execution may not have configuration changing sideeffects

2018-01-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312432#comment-16312432
 ] 

Hive QA commented on HIVE-18238:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12904675/HIVE-18238.04wip01.patch

{color:green}SUCCESS:{color} +1 due to 9 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 11121 tests 
executed
*Failed tests:*
{noformat}
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=161)

[dynamic_semijoin_reduction.q,materialized_view_create_rewrite_3.q,vectorization_pushdown.q,correlationoptimizer2.q,cbo_gby_empty.q,vectorization_short_regress.q,identity_project_remove_skip.q,mapjoin3.q,cross_product_check_1.q,unionDistinct_3.q,cbo_join.q,correlationoptimizer6.q,union_remove_26.q,cbo_rp_limit.q,vector_groupby_cube1.q,current_date_timestamp.q,union2.q,groupby2.q,schema_evol_text_vec_table.q,dynpart_sort_opt_vectorization.q,exchgpartition2lel.q,multiMapJoin1.q,sample10.q,vectorized_timestamp_ints_casts.q,vector_char_simple.q,auto_sortmerge_join_2.q,bucketizedhiveinputformat.q,vectorization_input_format_excludes.q,cte_mat_2.q,vectorization_8.q]
TestNegativeCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=92)

[nopart_insert.q,insert_into_with_schema.q,input41.q,having1.q,create_table_failure3.q,database_drop_not_empty_restrict.q,windowing_after_orderby.q,orderbysortby.q,subquery_select_distinct2.q,authorization_uri_alterpart_loc.q,udf_last_day_error_1.q,create_table_failure4.q,semijoin5.q,udf_format_number_wrong4.q,deletejar.q,exim_11_nonpart_noncompat_sorting.q,show_tables_bad_db2.q,drop_func_nonexistent.q,nopart_load.q,alter_table_non_partitioned_table_cascade.q,load_wrong_fileformat.q,lockneg_try_db_lock_conflict.q,udf_field_wrong_args_len.q,create_table_failure2.q,groupby2_map_skew_multi_distinct.q,udf_min.q,authorization_update_noupdatepriv.q,show_columns2.q,authorization_insert_noselectpriv.q,orc_replace_columns3_acid.q,udf_instr_wrong_args_len.q,compare_double_bigint.q,authorization_set_nonexistent_conf.q,alter_rename_partition_failure3.q,split_sample_wrong_format2.q,create_with_fk_pk_same_tab.q,authorization_show_roles_no_admin.q,materialized_view_authorization_rebuild_no_grant.q,unionLimit.q,authorization_revoke_table_fail2.q,authorization_insert_noinspriv.q,duplicate_insert3.q,authorization_desc_table_nosel.q,invalid_select_column.q,stats_noscan_non_native.q,orc_change_serde_acid.q,create_or_replace_view7.q,exim_07_nonpart_noncompat_ifof.q,udf_concat_ws_wrong2.q,fileformat_bad_class.q,merge_negative_2.q,exim_15_part_nonpart.q,authorization_not_owner_drop_view.q,external1.q,authorization_uri_insert.q,create_with_fk_wrong_ref.q,columnstats_tbllvl_incorrect_column.q,authorization_show_parts_nosel.q,merge_negative_1.q,authorization_not_owner_drop_tab.q,external2.q,authorization_deletejar.q,temp_table_create_like_partitions.q,udf_greatest_error_1.q,ptf_negative_AggrFuncsWithNoGBYNoPartDef.q,alter_view_as_select_not_exist.q,touch1.q,groupby3_map_skew_multi_distinct.q,exchange_partition_neg_partition_missing.q,groupby_cube_multi_gby.q,columnstats_tbllvl.q,drop_invalid_constraint2.q,alter_table_add_partition.q,update_not_acid.q,archive5.q,alter_table_constraint_invalid_pk_col.q,ivyDownload.q,udf_instr_wrong_type.q,bad_sample_clause.q,authorization_not_owner_drop_tab2.q,authorization_alter_db_owner.q,show_columns1.q,orc_type_promotion3.q,create_view_failure8.q,strict_join.q,udf_add_months_error_1.q,groupby_cube2.q,drop_partition_filter_failure.q,groupby_cube1.q,groupby_rollup1.q,genericFileFormat.q,authorization_create_macro1.q,invalid_cast_from_binary_4.q,drop_invalid_constraint1.q,serde_regex.q,show_partitions1.q,invalid_cast_from_binary_6.q,create_with_multi_pk_constraint.q,udf_field_wrong_type.q,groupby_grouping_sets4.q,groupby_grouping_sets3.q,load_data_into_acid.q,insertsel_fail.q,udf_locate_wrong_type.q,orc_type_promotion1_acid.q,set_table_property.q,create_or_replace_view2.q,groupby_grouping_sets2.q,alter_view_failure.q,distinct_windowing_failure1.q,invalid_t_alter2.q,alter_table_constraint_invalid_fk_col1.q,invalid_varchar_length_2.q,authorization_show_grant_otheruser_alltabs.q,subquery_windowing_corr.q,compact_non_acid_table.q,authorization_view_4.q,authorization_disallow_transform.q,materialized_view_authorization_rebuild_other.q,authorization_fail_4.q,dbtxnmgr_nodblock.q,set_hiveconf_internal_variable1.q,input_part0_neg.q,udf_printf_wrong3.q,load_orc_negative2.q,druid_buckets.q,archive2.q,authorization_addjar.q,invalid_sum_syntax.q,insert_into_with_schema1.q,udf_add_months_error_2.q,dyn_part_max_per_node.q,authorization_revoke_table_fail1.q,udf_printf_wrong2.q,archive_multi3.q,udf_printf_wrong1.q,subquery_subquery_chain.q,authorization_view_disable_cbo_4.q,no_matching_udf.q,char_pad_

[jira] [Assigned] (HIVE-18381) Drop table operation isn't consider that hdfs acl privilege of the table location parent path

2018-01-04 Thread youchuikai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

youchuikai reassigned HIVE-18381:
-


> Drop table operation isn't consider that hdfs acl privilege of the table 
> location parent path  
> ---
>
> Key: HIVE-18381
> URL: https://issues.apache.org/jira/browse/HIVE-18381
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.0
> Environment: hive-1.1.0-cdh5.8.4
>Reporter: youchuikai
>Assignee: youchuikai
>
> {code:sql}
> // the push user belong to the test_rw group
> hive> dfs -getfacl /user/hive/warehouse1/test1.db;
> # file: /user/hive/warehouse1/test1.db
> # owner: root
> # group: hive
> user::rwx
> group::rwx
> group:test_r:r-x
> group:test_rw:rwx
> mask::rwx
> other::---
> default:user::rwx
> default:group::rwx
> default:group:test_r:r-x
> default:group:test_rw:rwx
> default:mask::rwx
> default:other::---
> hive> drop table test1.youck_66;
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Table metadata 
> not deleted since hdfs://nameservice-test1/user/hive/warehouse1/test1.db is 
> not writable by push)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Reopened] (HIVE-18326) LLAP Tez scheduler - only preempt tasks if there's a dependency between them

2018-01-04 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reopened HIVE-18326:
-

Reverted the patch, looks like it breaks in some cases. I am looking; it 
appears that dag info doesn't have the entire dag, or smth like that, for some 
combination of union and multi-insert

> LLAP Tez scheduler - only preempt tasks if there's a dependency between them
> 
>
> Key: HIVE-18326
> URL: https://issues.apache.org/jira/browse/HIVE-18326
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0
>
> Attachments: HIVE-18326.01.patch, HIVE-18326.02.patch, 
> HIVE-18326.patch
>
>
> It is currently possible for e.g. two sides of a union (or a join for that 
> matter) to have slightly different priorities. We don't want to preempt 
> running tasks on one side in favor of the other side in such cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18328) Improve schematool validator to report duplicate rows for column statistics

2018-01-04 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312406#comment-16312406
 ] 

Naveen Gangam commented on HIVE-18328:
--

The test failures do not appear to be related to the patch. The previous builds 
have the same failures and some more. So +1 for me. [~aihuaxu] Could you please 
review this when you get a chance? Thanks

> Improve schematool validator to report duplicate rows for column statistics
> ---
>
> Key: HIVE-18328
> URL: https://issues.apache.org/jira/browse/HIVE-18328
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 2.1.1
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-18328.patch
>
>
> By design, in the {{TAB_COL_STATS}} table of the HMS schema, there should be 
> ONE AND ONLY ONE row, representing its statistics, for each column defined in 
> hive. A combination of DB_NAME, TABLE_NAME and COLUMN_NAME constitute a 
> primary key/unique row.
> Each time the statistics are computed for a column, this row is updated. 
> However, if somehow via  BDR/replication process, we end up with multiple 
> rows in this table for a given column, HMS server to recompute the statistics 
> there after.
> So it would be good to detect this data anamoly via the schema validation 
> tool.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18361) Extend shared work optimizer to reuse computation beyond work boundaries

2018-01-04 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-18361:
---
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Regenerated q files and pushed to master, thanks for reviewing [~ashutoshc]!

> Extend shared work optimizer to reuse computation beyond work boundaries
> 
>
> Key: HIVE-18361
> URL: https://issues.apache.org/jira/browse/HIVE-18361
> Project: Hive
>  Issue Type: New Feature
>  Components: Physical Optimizer
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-18361.01.patch, HIVE-18361.02.patch, 
> HIVE-18361.patch
>
>
> Follow-up of the work in HIVE-16867.
> HIVE-16867 introduced an optimization that identifies scans on input tables 
> that can be merged and reuses the computation that is done in the work 
> containing those scans. In particular, we traverse both parts of the plan 
> upstream and reuse the operators if possible.
> Currently, the optimizer will not go beyond the output edge(s) of that work. 
> This extension removes that limitation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18238) Driver execution may not have configuration changing sideeffects

2018-01-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312399#comment-16312399
 ] 

Hive QA commented on HIVE-18238:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
27s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
17s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
39s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
35s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
48s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
37s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
42s{color} | {color:red} ql: The patch generated 17 new + 1288 unchanged - 7 
fixed = 1305 total (was 1295) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m  
8s{color} | {color:red} cli: The patch generated 1 new + 38 unchanged - 1 fixed 
= 39 total (was 39) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} hcatalog/core: The patch generated 0 new + 33 
unchanged - 1 fixed = 33 total (was 34) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} The patch hcatalog-pig-adapter passed checkstyle 
{color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} The patch server-extensions passed checkstyle 
{color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} The patch hive-unit passed checkstyle {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
11s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 22m 59s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 3f5148d |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8450/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8450/yetus/diff-checkstyle-cli.txt
 |
| modules | C: ql cli hcatalog/core hcatalog/hcatalog-pig-adapter 
hcatalog/server-extensions itests/hive-unit U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8450/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Driver execution may not have configuration changing sideeffects 
> -
>
> Key: HIVE-18238
> URL: https://issues.apache.org/jira/browse/HIVE-18238
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Reporter: Zoltan H

[jira] [Commented] (HIVE-18366) Update HBaseSerDe to use hbase.mapreduce.hfileoutputformat.table.name instead of hbase.table.name as the table name property

2018-01-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312374#comment-16312374
 ] 

Hive QA commented on HIVE-18366:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12904671/HIVE-18366.1.patch

{color:green}SUCCESS:{color} +1 due to 18 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 22 failed/errored test(s), 11517 tests 
executed
*Failed tests:*
{noformat}
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=165)

[vector_interval_2.q,schema_evol_orc_acid_table_update.q,metadataonly1.q,auto_join_nulls.q,metadata_only_queries_with_filters.q,schema_evol_text_nonvec_part_all_complex.q,alter_merge_orc.q,vector_between_columns.q,vector_char_cast.q,vector_groupby_grouping_sets6.q,join_filters.q,udaf_collect_set_2.q,update_after_multiple_inserts.q,offset_limit_ppd_optimizer.q,materialized_view_describe.q,orc_merge_incompat1.q,vectorized_parquet_types.q,vector_windowing_gby2.q,explainanalyze_2.q,vectorization_15.q,union7.q,vectorization_nested_udf.q,vector_char_2.q,schema_evol_orc_acidvec_part.q,vector_groupby_3.q,materialized_view_create_rewrite_multi_db.q,acid_no_buckets.q,cbo_rp_gby.q,auto_sortmerge_join_9.q,vector_groupby_grouping_id2.q]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=35)
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[external_table_ppd] 
(batchId=96)
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_binary_storage_queries]
 (batchId=99)
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_ddl] 
(batchId=98)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketsortoptimize_insert_2]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[hybridgrace_hashjoin_2]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=168)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=159)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_part]
 (batchId=93)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[stats_aggregator_error_1]
 (batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] 
(batchId=120)
org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore.testTransactionalValidation
 (batchId=213)
org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=253)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=225)
org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=231)
org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=231)
org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=231)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8449/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8449/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8449/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 22 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12904671 - PreCommit-HIVE-Build

> Update HBaseSerDe to use hbase.mapreduce.hfileoutputformat.table.name instead 
> of hbase.table.name as the table name property
> 
>
> Key: HIVE-18366
> URL: https://issues.apache.org/jira/browse/HIVE-18366
> Project: Hive
>  Issue Type: Sub-task
>  Components: HBase Handler
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-18366.1.patch
>
>
> HBase 2.0 changes the table name property to 
> hbase.mapreduce.hfileoutputformat.table.name. HiveHFileOutputFormat is using 
> the new property name while HiveHBaseTableOutputFormat is not. If we create 
> the table as follows, HiveHBaseTableOutputFormat is used which still uses the 
> old property hbase.table.name.
> {noformat}
> create table hbase_table2(key int, val string) s

[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected

2018-01-04 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312371#comment-16312371
 ] 

Pengcheng Xiong commented on HIVE-18375:


[~pauljackson123], i am sorry but i saw that all of your above cases involve 
ORDER BY. Which simpler issue do you mean?

> Cannot ORDER by subquery fields unless they are selected
> 
>
> Key: HIVE-18375
> URL: https://issues.apache.org/jira/browse/HIVE-18375
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.2
> Environment: Amazon AWS
> Release label:emr-5.11.0
> Hadoop distribution:Amazon 2.7.3
> Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1
> classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict]
>Reporter: Paul Jackson
>Priority: Minor
>
> Give these tables:
> {code:SQL}
> CREATE TABLE employees (
> emp_no  INT,
> first_name  VARCHAR(14),
> last_name   VARCHAR(16)
> );
> insert into employees values
> (1, 'Gottlob', 'Frege'),
> (2, 'Bertrand', 'Russell'),
> (3, 'Ludwig', 'Wittgenstein');
> CREATE TABLE salaries (
> emp_no  INT,
> salary  INT,
> from_date   DATE,
> to_date DATE
> );
> insert into salaries values
> (1, 10, '1900-01-01', '1900-01-31'),
> (1, 18, '1900-09-01', '1900-09-30'),
> (2, 15, '1940-03-01', '1950-01-01'),
> (3, 20, '1920-01-01', '1950-01-01');
> {code}
> This query returns the names of the employees ordered by their peak salary:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, this should still work even if the max_salary is not part of the 
> projection:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, that fails with this error:
> {code}
> Error while compiling statement: FAILED: SemanticException [Error 10004]: 
> line 9:9 Invalid table alias or column reference 't1': (possible column names 
> are: last_name, first_name)
> {code}
> FWIW, this also fails:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> But this succeeds:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `max_sal` DESC;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected

2018-01-04 Thread Paul Jackson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312359#comment-16312359
 ] 

Paul Jackson commented on HIVE-18375:
-

There is no doubt these are the same issue. What do you think about the simpler 
issue in my comment that does not involve ORDER BY?

> Cannot ORDER by subquery fields unless they are selected
> 
>
> Key: HIVE-18375
> URL: https://issues.apache.org/jira/browse/HIVE-18375
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.2
> Environment: Amazon AWS
> Release label:emr-5.11.0
> Hadoop distribution:Amazon 2.7.3
> Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1
> classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict]
>Reporter: Paul Jackson
>Priority: Minor
>
> Give these tables:
> {code:SQL}
> CREATE TABLE employees (
> emp_no  INT,
> first_name  VARCHAR(14),
> last_name   VARCHAR(16)
> );
> insert into employees values
> (1, 'Gottlob', 'Frege'),
> (2, 'Bertrand', 'Russell'),
> (3, 'Ludwig', 'Wittgenstein');
> CREATE TABLE salaries (
> emp_no  INT,
> salary  INT,
> from_date   DATE,
> to_date DATE
> );
> insert into salaries values
> (1, 10, '1900-01-01', '1900-01-31'),
> (1, 18, '1900-09-01', '1900-09-30'),
> (2, 15, '1940-03-01', '1950-01-01'),
> (3, 20, '1920-01-01', '1950-01-01');
> {code}
> This query returns the names of the employees ordered by their peak salary:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, this should still work even if the max_salary is not part of the 
> projection:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, that fails with this error:
> {code}
> Error while compiling statement: FAILED: SemanticException [Error 10004]: 
> line 9:9 Invalid table alias or column reference 't1': (possible column names 
> are: last_name, first_name)
> {code}
> FWIW, this also fails:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> But this succeeds:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `max_sal` DESC;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected

2018-01-04 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312344#comment-16312344
 ] 

Pengcheng Xiong commented on HIVE-18375:


[~pauljackson123], if possible, could u try Hive master? As this is a new 
feature in HIVE-15160 targeting version 3.0, I doubt it is available in any 
published version yet.

> Cannot ORDER by subquery fields unless they are selected
> 
>
> Key: HIVE-18375
> URL: https://issues.apache.org/jira/browse/HIVE-18375
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.2
> Environment: Amazon AWS
> Release label:emr-5.11.0
> Hadoop distribution:Amazon 2.7.3
> Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1
> classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict]
>Reporter: Paul Jackson
>Priority: Minor
>
> Give these tables:
> {code:SQL}
> CREATE TABLE employees (
> emp_no  INT,
> first_name  VARCHAR(14),
> last_name   VARCHAR(16)
> );
> insert into employees values
> (1, 'Gottlob', 'Frege'),
> (2, 'Bertrand', 'Russell'),
> (3, 'Ludwig', 'Wittgenstein');
> CREATE TABLE salaries (
> emp_no  INT,
> salary  INT,
> from_date   DATE,
> to_date DATE
> );
> insert into salaries values
> (1, 10, '1900-01-01', '1900-01-31'),
> (1, 18, '1900-09-01', '1900-09-30'),
> (2, 15, '1940-03-01', '1950-01-01'),
> (3, 20, '1920-01-01', '1950-01-01');
> {code}
> This query returns the names of the employees ordered by their peak salary:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, this should still work even if the max_salary is not part of the 
> projection:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, that fails with this error:
> {code}
> Error while compiling statement: FAILED: SemanticException [Error 10004]: 
> line 9:9 Invalid table alias or column reference 't1': (possible column names 
> are: last_name, first_name)
> {code}
> FWIW, this also fails:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> But this succeeds:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `max_sal` DESC;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18366) Update HBaseSerDe to use hbase.mapreduce.hfileoutputformat.table.name instead of hbase.table.name as the table name property

2018-01-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312309#comment-16312309
 ] 

Hive QA commented on HIVE-18366:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
43s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
36s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
22s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
13s{color} | {color:red} itests/util: The patch generated 1 new + 11 unchanged 
- 0 fixed = 12 total (was 11) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 15m 35s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 3f5148d |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8449/yetus/diff-checkstyle-itests_util.txt
 |
| modules | C: hbase-handler hcatalog/webhcat/svr itests/hcatalog-unit 
itests/util U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8449/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Update HBaseSerDe to use hbase.mapreduce.hfileoutputformat.table.name instead 
> of hbase.table.name as the table name property
> 
>
> Key: HIVE-18366
> URL: https://issues.apache.org/jira/browse/HIVE-18366
> Project: Hive
>  Issue Type: Sub-task
>  Components: HBase Handler
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-18366.1.patch
>
>
> HBase 2.0 changes the table name property to 
> hbase.mapreduce.hfileoutputformat.table.name. HiveHFileOutputFormat is using 
> the new property name while HiveHBaseTableOutputFormat is not. If we create 
> the table as follows, HiveHBaseTableOutputFormat is used which still uses the 
> old property hbase.table.name.
> {noformat}
> create table hbase_table2(key int, val string) stored by 
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties 
> ('hbase.columns.mapping' = ':key,cf:val') tblproperties 
> ('hbase.mapreduce.hfileoutputformat.table.name' = 
> 'positive_hbase_handler_bulk')
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM

2018-01-04 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18269:

Status: Patch Available  (was: Open)

Done... I am trying to test it on cluster but the cluster I'm using is down 

> LLAP: Fast llap io with slow processing pipeline can lead to OOM
> 
>
> Key: HIVE-18269
> URL: https://issues.apache.org/jira/browse/HIVE-18269
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
> Attachments: HIVE-18269.01.patch, HIVE-18269.1.patch, 
> HIVE-18269.bad.patch, Screen Shot 2017-12-13 at 1.15.16 AM.png
>
>
> pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow 
> indefinitely when Llap IO is faster than processing pipeline. Since we don't 
> have backpressure to slow down the IO, this can lead to indefinite growth of 
> pending data leading to severe GC pressure and eventually lead to OOM.
> This specific instance of LLAP was running on HDFS on top of EBS volume 
> backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS 
> .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18096) add a user-friendly show plan command

2018-01-04 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312300#comment-16312300
 ] 

Sergey Shelukhin commented on HIVE-18096:
-

Some minor comments. +1 otherwise, I can commit after the update.

> add a user-friendly show plan command
> -
>
> Key: HIVE-18096
> URL: https://issues.apache.org/jira/browse/HIVE-18096
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Harish Jaiprakash
> Attachments: HIVE-18096.01.patch, HIVE-18096.02.patch
>
>
> For admin to be able to get an overview of a resource plan.
> We need to try to do this using sysdb. 
> If that is not possible to do in a nice way, we'd do a text-based one like 
> query explain, or desc extended table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM

2018-01-04 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312290#comment-16312290
 ] 

Jason Dere commented on HIVE-18269:
---

Looks ok I think .. can you submit the patch so we can see precommit test 
results?

> LLAP: Fast llap io with slow processing pipeline can lead to OOM
> 
>
> Key: HIVE-18269
> URL: https://issues.apache.org/jira/browse/HIVE-18269
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
> Attachments: HIVE-18269.01.patch, HIVE-18269.1.patch, 
> HIVE-18269.bad.patch, Screen Shot 2017-12-13 at 1.15.16 AM.png
>
>
> pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow 
> indefinitely when Llap IO is faster than processing pipeline. Since we don't 
> have backpressure to slow down the IO, this can lead to indefinite growth of 
> pending data leading to severe GC pressure and eventually lead to OOM.
> This specific instance of LLAP was running on HDFS on top of EBS volume 
> backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS 
> .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-14615) Temp table leaves behind insert command

2018-01-04 Thread Andrew Sherman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman updated HIVE-14615:
--
Attachment: HIVE-14615.3.patch

> Temp table leaves behind insert command
> ---
>
> Key: HIVE-14615
> URL: https://issues.apache.org/jira/browse/HIVE-14615
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Chaoyu Tang
>Assignee: Andrew Sherman
> Attachments: HIVE-14615.1.patch, HIVE-14615.2.patch, 
> HIVE-14615.3.patch
>
>
> {code}
> create table test (key int, value string);
> insert into test values (1, 'val1');
> show tables;
> test
> values__tmp__table__1
> {code}
> the temp table values__tmp__table__1 was resulted from insert into ...values
> and exists until logout the session.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18349) Misc metastore changes for debuggability, error on commit txn failures

2018-01-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312285#comment-16312285
 ] 

Hive QA commented on HIVE-18349:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12904687/HIVE-18349.4.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 610 failed/errored test(s), 11516 tests 
executed
*Failed tests:*
{noformat}
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=165)

[vector_interval_2.q,schema_evol_orc_acid_table_update.q,metadataonly1.q,auto_join_nulls.q,metadata_only_queries_with_filters.q,schema_evol_text_nonvec_part_all_complex.q,alter_merge_orc.q,vector_between_columns.q,vector_char_cast.q,vector_groupby_grouping_sets6.q,join_filters.q,udaf_collect_set_2.q,update_after_multiple_inserts.q,offset_limit_ppd_optimizer.q,materialized_view_describe.q,orc_merge_incompat1.q,vectorized_parquet_types.q,vector_windowing_gby2.q,explainanalyze_2.q,vectorization_15.q,union7.q,vectorization_nested_udf.q,vector_char_2.q,schema_evol_orc_acidvec_part.q,vector_groupby_3.q,materialized_view_create_rewrite_multi_db.q,acid_no_buckets.q,cbo_rp_gby.q,auto_sortmerge_join_9.q,vector_groupby_grouping_id2.q]
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[colstats_all_nulls] 
(batchId=245)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alterColumnStatsPart] 
(batchId=85)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alterColumnStats] 
(batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_partition_update_status]
 (batchId=89)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_table_column_stats]
 (batchId=64)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_table_update_status]
 (batchId=79)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_table_update_status_disable_bitvector]
 (batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[analyze_tbl_part] 
(batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_deep_filters]
 (batchId=89)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_filter] 
(batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_groupby2] 
(batchId=47)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_groupby] 
(batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_join] 
(batchId=53)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_join_pkfk]
 (batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_limit] 
(batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_part] 
(batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_select] 
(batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_table] 
(batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_union] 
(batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[array_size_estimation] 
(batchId=58)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_10] 
(batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_1] 
(batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_2] 
(batchId=83)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_5] 
(batchId=41)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_5a] 
(batchId=53)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_9] 
(batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join12] (batchId=24)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join13] (batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_stats2] 
(batchId=86)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_stats] 
(batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_without_localtask]
 (batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_decimal] 
(batchId=69)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_decimal_native] 
(batchId=27)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bitvector] (batchId=82)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_SortUnionTransposeRule]
 (batchId=16)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_annotate_stats_groupby]
 (batchId=84)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join0] 
(batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join1] 
(batchId=4)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[c

[jira] [Commented] (HIVE-18004) investigate deriving app name from JDBC connection for pool mapping

2018-01-04 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312276#comment-16312276
 ] 

Sergey Shelukhin commented on HIVE-18004:
-

We will go with (2) - url arguments

> investigate deriving app name from JDBC connection for pool mapping
> ---
>
> Key: HIVE-18004
> URL: https://issues.apache.org/jira/browse/HIVE-18004
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> There are some client info fields that popular apps (Tableau, etc) might 
> populate; this might allow us to map queries to pools based on an application 
> used. Need to take a look (see the doc for an example API we might look into)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-14498) Freshness period for query rewriting using materialized views

2018-01-04 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14498:
---
Attachment: HIVE-14498.04.patch

> Freshness period for query rewriting using materialized views
> -
>
> Key: HIVE-14498
> URL: https://issues.apache.org/jira/browse/HIVE-14498
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14498.01.patch, HIVE-14498.02.patch, 
> HIVE-14498.03.patch, HIVE-14498.04.patch, HIVE-14498.patch
>
>
> Once we have query rewriting in place (HIVE-14496), one of the main issues is 
> data freshness in the materialized views.
> Since we will not support view maintenance at first, we could include a 
> HiveConf property to configure a max freshness period (_n timeunits_). If a 
> query comes, and the materialized view has been populated (by create, 
> refresh, etc.) for a longer period than _n_, then we should not use it for 
> rewriting the query.
> Optionally, we could print a warning for the user indicating that the 
> materialized was not used because it was not fresh.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18367) Describe Extended output is truncated on a table with an explicit row format containing tabs or newlines.

2018-01-04 Thread Andrew Sherman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312271#comment-16312271
 ] 

Andrew Sherman commented on HIVE-18367:
---

Test failures look unrelated to this change.
[~pvary] please could you review?

> Describe Extended output is truncated on a table with an explicit row format 
> containing tabs or newlines.
> -
>
> Key: HIVE-18367
> URL: https://issues.apache.org/jira/browse/HIVE-18367
> Project: Hive
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
> Attachments: HIVE-18367.1.patch
>
>
> 'Describe Extended' dumps information about a table. The protocol for sending 
> this data relies on tabs and newlines to separate pieces of data. If a table 
> has 'FIELDS terminated by XXX' or 'LINES terminated by XXX' where XXX is a 
> tab or newline then the output seen by the user is prematurely truncated. Fix 
> this by replacing tabs and newlines in the table description with “\n” and 
> “\t”.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18214) Flaky test: TestSparkClient

2018-01-04 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312261#comment-16312261
 ] 

Aihua Xu commented on HIVE-18214:
-

[~stakiar] Try to understand the issue. So when the one test finishes, rpc is 
closing and it will close RemoteDriver to stop the SparkContext, but since it 
happens asynchronously, we don't know when it really shutdowns?

One thought: maybe we should add the logic to always make sure there is only 
JavaSparkContext instance created in one JVM. If there is one existing and we 
try to create a new one, we can shutdown the existing one and create a new one. 

> Flaky test: TestSparkClient
> ---
>
> Key: HIVE-18214
> URL: https://issues.apache.org/jira/browse/HIVE-18214
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-18214.1.patch
>
>
> Looks like there is a race condition in {{TestSparkClient#runTest}}. The test 
> creates a {{RemoteDriver}} in memory, which creates a {{JavaSparkContext}}. A 
> new {{JavaSparkContext}} is created for each test that is run. There is a 
> race condition where the {{RemoteDriver}} isn't given enough time to 
> shutdown, so when the next test starts running it creates another 
> {{JavaSparkContext}} which causes an exception like 
> {{org.apache.spark.SparkException: Only one SparkContext may be running in 
> this JVM (see SPARK-2243)}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected

2018-01-04 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312255#comment-16312255
 ] 

Pengcheng Xiong commented on HIVE-18375:


May be related to HIVE-15160.

> Cannot ORDER by subquery fields unless they are selected
> 
>
> Key: HIVE-18375
> URL: https://issues.apache.org/jira/browse/HIVE-18375
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.2
> Environment: Amazon AWS
> Release label:emr-5.11.0
> Hadoop distribution:Amazon 2.7.3
> Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1
> classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict]
>Reporter: Paul Jackson
>Priority: Minor
>
> Give these tables:
> {code:SQL}
> CREATE TABLE employees (
> emp_no  INT,
> first_name  VARCHAR(14),
> last_name   VARCHAR(16)
> );
> insert into employees values
> (1, 'Gottlob', 'Frege'),
> (2, 'Bertrand', 'Russell'),
> (3, 'Ludwig', 'Wittgenstein');
> CREATE TABLE salaries (
> emp_no  INT,
> salary  INT,
> from_date   DATE,
> to_date DATE
> );
> insert into salaries values
> (1, 10, '1900-01-01', '1900-01-31'),
> (1, 18, '1900-09-01', '1900-09-30'),
> (2, 15, '1940-03-01', '1950-01-01'),
> (3, 20, '1920-01-01', '1950-01-01');
> {code}
> This query returns the names of the employees ordered by their peak salary:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, this should still work even if the max_salary is not part of the 
> projection:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, that fails with this error:
> {code}
> Error while compiling statement: FAILED: SemanticException [Error 10004]: 
> line 9:9 Invalid table alias or column reference 't1': (possible column names 
> are: last_name, first_name)
> {code}
> FWIW, this also fails:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> But this succeeds:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `max_sal` DESC;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18275) add HS2-level WM metrics

2018-01-04 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18275:

Status: Patch Available  (was: Open)

> add HS2-level WM metrics
> 
>
> Key: HIVE-18275
> URL: https://issues.apache.org/jira/browse/HIVE-18275
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-18275.patch
>
>
> E.g. time spent in pool queue. Some existing UIs use perflogger output, so we 
> should also include that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18275) add HS2-level WM metrics

2018-01-04 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18275:

Attachment: HIVE-18275.patch

A small patch. Turns out there isn't much to HS2 per query metrics beside 
perflogger at the moment ;)
[~thejas] can you take a look?

> add HS2-level WM metrics
> 
>
> Key: HIVE-18275
> URL: https://issues.apache.org/jira/browse/HIVE-18275
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-18275.patch
>
>
> E.g. time spent in pool queue. Some existing UIs use perflogger output, so we 
> should also include that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18349) Misc metastore changes for debuggability, error on commit txn failures

2018-01-04 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-18349:
-
Attachment: HIVE-18349.5.patch

One minor fix, when we already throw MetaException (dropping default database 
for example) we will ignore the commit transaction failed MetaException. 

> Misc metastore changes for debuggability, error on commit txn failures
> --
>
> Key: HIVE-18349
> URL: https://issues.apache.org/jira/browse/HIVE-18349
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-18349.1.patch, HIVE-18349.2.patch, 
> HIVE-18349.3.patch, HIVE-18349.4.patch, HIVE-18349.5.patch
>
>
> 1) Hive metastore audit event log/metastore log does not log the final status 
> (success or failed) of the event. Some operations like for example, 
> drop_table returns a boolean success flag but it never gets logged anywhere. 
> However the same is sent to end event listeners or other metastore event 
> listeners. It will be good to log the final status of the events. 
> 2) Make connection timeout when using connection pool configurable. Currently 
> its hard coded to 30 seconds.
> 3) Provide a config to enable connection leak detection for HikariCP or 
> enable when debug logging is enabled.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18349) Misc metastore changes for debuggability, error on commit txn failures

2018-01-04 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-18349:
-
Attachment: (was: HIVE-18349.5.patch)

> Misc metastore changes for debuggability, error on commit txn failures
> --
>
> Key: HIVE-18349
> URL: https://issues.apache.org/jira/browse/HIVE-18349
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-18349.1.patch, HIVE-18349.2.patch, 
> HIVE-18349.3.patch, HIVE-18349.4.patch, HIVE-18349.5.patch
>
>
> 1) Hive metastore audit event log/metastore log does not log the final status 
> (success or failed) of the event. Some operations like for example, 
> drop_table returns a boolean success flag but it never gets logged anywhere. 
> However the same is sent to end event listeners or other metastore event 
> listeners. It will be good to log the final status of the events. 
> 2) Make connection timeout when using connection pool configurable. Currently 
> its hard coded to 30 seconds.
> 3) Provide a config to enable connection leak detection for HikariCP or 
> enable when debug logging is enabled.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18349) Misc metastore changes for debuggability, error on commit txn failures

2018-01-04 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-18349:
-
Attachment: HIVE-18349.5.patch

> Misc metastore changes for debuggability, error on commit txn failures
> --
>
> Key: HIVE-18349
> URL: https://issues.apache.org/jira/browse/HIVE-18349
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-18349.1.patch, HIVE-18349.2.patch, 
> HIVE-18349.3.patch, HIVE-18349.4.patch, HIVE-18349.5.patch
>
>
> 1) Hive metastore audit event log/metastore log does not log the final status 
> (success or failed) of the event. Some operations like for example, 
> drop_table returns a boolean success flag but it never gets logged anywhere. 
> However the same is sent to end event listeners or other metastore event 
> listeners. It will be good to log the final status of the events. 
> 2) Make connection timeout when using connection pool configurable. Currently 
> its hard coded to 30 seconds.
> 3) Provide a config to enable connection leak detection for HikariCP or 
> enable when debug logging is enabled.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18349) Misc metastore changes for debuggability, error on commit txn failures

2018-01-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312227#comment-16312227
 ] 

Hive QA commented on HIVE-18349:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
30s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
30s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
12s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
31s{color} | {color:red} standalone-metastore: The patch generated 35 new + 957 
unchanged - 19 fixed = 992 total (was 976) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 15m  3s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 3f5148d |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8448/yetus/diff-checkstyle-standalone-metastore.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8448/yetus/whitespace-eol.txt 
|
| modules | C: standalone-metastore itests/hive-unit U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8448/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Misc metastore changes for debuggability, error on commit txn failures
> --
>
> Key: HIVE-18349
> URL: https://issues.apache.org/jira/browse/HIVE-18349
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-18349.1.patch, HIVE-18349.2.patch, 
> HIVE-18349.3.patch, HIVE-18349.4.patch
>
>
> 1) Hive metastore audit event log/metastore log does not log the final status 
> (success or failed) of the event. Some operations like for example, 
> drop_table returns a boolean success flag but it never gets logged anywhere. 
> However the same is sent to end event listeners or other metastore event 
> listeners. It will be good to log the final status of the events. 
> 2) Make connection timeout when using connection pool configurable. Currently 
> its hard coded to 30 seconds.
> 3) Provide a config to enable connection leak detection for HikariCP or 
> enable when debug logging is enabled.



--
This message was sent b

[jira] [Commented] (HIVE-18361) Extend shared work optimizer to reuse computation beyond work boundaries

2018-01-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312203#comment-16312203
 ] 

Hive QA commented on HIVE-18361:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12904647/HIVE-18361.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 23 failed/errored test(s), 11091 tests 
executed
*Failed tests:*
{noformat}
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=166)

[materialized_view_create.q,schema_evol_orc_acid_part_update.q,orc_ppd_varchar.q,optimize_join_ptp.q,count_dist_rewrite.q,vector_nvl.q,join_nullsafe.q,vectorized_mapjoin.q,cross_prod_1.q,vectorized_shufflejoin.q,autoColumnStats_10.q,tez_smb_1.q,limit_pushdown.q,tez_vector_dynpart_hashjoin_1.q,vector_inner_join.q,subquery_notin.q,vector_coalesce_2.q,table_access_keys_stats.q,subquery_null_agg.q,filter_join_breaktask.q,mapjoin_decimal.q,column_table_stats.q,alter_merge_2_orc.q,columnstats_part_coltype.q,explainanalyze_2.q,union4.q,stats_based_fetch_decision.q,auto_sortmerge_join_10.q,extrapolate_part_stats_partial_ndv.q,vector_decimal_udf2.q]
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=169)

[join_is_not_distinct_from.q,tez_nway_join.q,tez_schema_evolution.q,bucket_map_join_tez1.q,vector_multi_insert.q,insert_update_delete.q,temp_table.q,cte_1.q,autoColumnStats_2.q,partition_pruning.q,vectorization_17.q,orc_merge8.q,orc_merge_incompat2.q,bucket_groupby.q,vector_outer_join4.q,vector_nullsafe_join.q,orc_merge7.q,bucketpruning1.q,schema_evol_orc_acidvec_table.q,vector_grouping_sets.q,vector_outer_join5.q,vector_groupby6.q,bucketmapjoin1.q,auto_sortmerge_join_5.q,auto_join0.q,load_dyn_part1.q,vector_windowing.q,schema_evol_orc_nonvec_part_all_primitive.q,auto_sortmerge_join_11.q,orc_merge_incompat_writer_version.q]
TestNegativeCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=93)

[udf_invalid.q,authorization_uri_export.q,druid_datasource2.q,view_update.q,default_partition_name.q,authorization_public_create.q,load_wrong_fileformat_rc_seq.q,altern1.q,describe_xpath1.q,drop_view_failure2.q,orc_replace_columns2_acid.q,temp_table_rename.q,invalid_select_column_with_subquery.q,udf_trunc_error1.q,insert_view_failure.q,dbtxnmgr_nodbunlock.q,authorization_show_columns.q,cte_recursion.q,load_part_nospec.q,clusterbyorderby.q,orc_type_promotion2.q,ctas_noperm_loc.q,duplicate_alias_in_transform.q,invalid_create_tbl2.q,part_col_complex_type.q,authorization_drop_db_empty.q,smb_mapjoin_14.q,subquery_scalar_multi_rows.q,alter_partition_coltype_2columns.q,subquery_corr_in_agg.q,authorization_show_grant_otheruser_wtab.q,regex_col_groupby.q,udaf_collect_set_unsupported.q,ptf_negative_DuplicateWindowAlias.q,exim_22_export_authfail.q,udf_likeany_wrong1.q,groupby_key.q,ambiguous_col.q,groupby3_multi_distinct.q,authorization_alter_drop_ptn.q,invalid_cast_from_binary_5.q,show_create_table_does_not_exist.q,exim_20_managed_location_over_existing.q,interval_3.q,authorization_compile.q,join35.q,merge_negative_3.q,udf_concat_ws_wrong3.q,create_or_replace_view8.q,split_sample_out_of_range.q,alter_concatenate_indexed_table.q,authorization_show_grant_otherrole.q,create_with_constraints_duplicate_name.q,invalid_stddev_samp_syntax.q,authorization_view_disable_cbo_7.q,autolocal1.q,analyze_view.q,exim_14_nonpart_part.q,avro_non_nullable_union.q,load_orc_negative_part.q,drop_view_failure1.q,columnstats_partlvl_invalid_values_autogather.q,exim_13_nonnative_import.q,alter_table_wrong_regex.q,add_partition_with_whitelist.q,udf_next_day_error_2.q,authorization_select.q,udf_trunc_error2.q,authorization_view_7.q,udf_format_number_wrong5.q,touch2.q,exim_03_nonpart_noncompat_colschema.q,orc_type_promotion1.q,lateral_view_alias.q,show_tables_bad_db1.q,unset_table_property.q,alter_non_native.q,nvl_mismatch_type.q,load_orc_negative3.q,authorization_create_role_no_admin.q,invalid_distinct1.q,authorization_grant_server.q,orc_type_promotion3_acid.q,show_tables_bad1.q,macro_unused_parameter.q,drop_invalid_constraint3.q,char_pad_convert_fail3.q,exim_23_import_exist_authfail.q,drop_invalid_constraint4.q,archive1.q,subquery_multiple_cols_in_select.q,drop_index_failure.q,change_hive_hdfs_session_path.q,udf_trunc_error3.q,invalid_variance_syntax.q,authorization_truncate_2.q,invalid_avg_syntax.q,invalid_select_column_with_tablename.q,mm_truncate_cols.q,groupby_grouping_sets1.q,druid_location.q,groupby2_multi_distinct.q,authorization_sba_drop_table.q,dynamic_partitions_with_whitelist.q,delete_non_acid_table.q,udf_greatest_error_2.q,create_with_constraints_validate.q,authorization_view_6.q,show_tablestatus.q,describe_xpath3.q,duplicate_alias_in_transform_schema.q,create_with_fk_uk_same_tab.q,authorization_create_tbl.q,udtf_not

[jira] [Commented] (HIVE-18214) Flaky test: TestSparkClient

2018-01-04 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312200#comment-16312200
 ] 

Sahil Takiar commented on HIVE-18214:
-

[~pvary] thanks for taking a look.

* In production, {{RemoteDriver}} is run in a dedicated container; however, we 
have some unit tests which run it in the local process; so in production its 
not really possible to hit this issue
* I'm not a fan of exposing these methods publicly either, I can add a 
{{\@VisibleForTesting}} annotation; {{RemoteDriver}} is already marked as 
{{\@Private}}

The only other way I can think of doing this is to change the 
{{TestSparkClient}} so it runs the {{RemoteDriver}} in a dedicated process (so 
similar to what we do in production). The test will take longer to run, but we 
won't hit this issue.

> Flaky test: TestSparkClient
> ---
>
> Key: HIVE-18214
> URL: https://issues.apache.org/jira/browse/HIVE-18214
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-18214.1.patch
>
>
> Looks like there is a race condition in {{TestSparkClient#runTest}}. The test 
> creates a {{RemoteDriver}} in memory, which creates a {{JavaSparkContext}}. A 
> new {{JavaSparkContext}} is created for each test that is run. There is a 
> race condition where the {{RemoteDriver}} isn't given enough time to 
> shutdown, so when the next test starts running it creates another 
> {{JavaSparkContext}} which causes an exception like 
> {{org.apache.spark.SparkException: Only one SparkContext may be running in 
> this JVM (see SPARK-2243)}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16484) Investigate SparkLauncher for HoS as alternative to bin/spark-submit

2018-01-04 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312178#comment-16312178
 ] 

Sahil Takiar commented on HIVE-16484:
-

Test failures are un-related. I updated the RB and added a few notes to explain 
what the code is doing - https://reviews.apache.org/r/58684/ [~xuefuz], 
[~lirui] can you review?

> Investigate SparkLauncher for HoS as alternative to bin/spark-submit
> 
>
> Key: HIVE-16484
> URL: https://issues.apache.org/jira/browse/HIVE-16484
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-16484.1.patch, HIVE-16484.2.patch, 
> HIVE-16484.3.patch, HIVE-16484.4.patch, HIVE-16484.5.patch, 
> HIVE-16484.6.patch, HIVE-16484.7.patch, HIVE-16484.8.patch, HIVE-16484.9.patch
>
>
> The {{SparkClientImpl#startDriver}} currently looks for the {{SPARK_HOME}} 
> directory and invokes the {{bin/spark-submit}} script, which spawns a 
> separate process to run the Spark application.
> {{SparkLauncher}} was added in SPARK-4924 and is a programatic way to launch 
> Spark applications.
> I see a few advantages:
> * No need to spawn a separate process to launch a HoS --> lower startup time
> * Simplifies the code in {{SparkClientImpl}} --> easier to debug
> * {{SparkLauncher#startApplication}} returns a {{SparkAppHandle}} which 
> contains some useful utilities for querying the state of the Spark job
> ** It also allows the launcher to specify a list of job listeners



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18052) Run p-tests on mm tables

2018-01-04 Thread Steve Yeom (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Yeom updated HIVE-18052:
--
Attachment: HIVE-18052.16.patch

patch 16 is the same as patch 156. Adding this since patch 15 is not tested.

> Run p-tests on mm tables
> 
>
> Key: HIVE-18052
> URL: https://issues.apache.org/jira/browse/HIVE-18052
> Project: Hive
>  Issue Type: Task
>Reporter: Steve Yeom
>Assignee: Steve Yeom
> Attachments: HIVE-18052.1.patch, HIVE-18052.10.patch, 
> HIVE-18052.11.patch, HIVE-18052.12.patch, HIVE-18052.13.patch, 
> HIVE-18052.14.patch, HIVE-18052.15.patch, HIVE-18052.16.patch, 
> HIVE-18052.2.patch, HIVE-18052.3.patch, HIVE-18052.4.patch, 
> HIVE-18052.5.patch, HIVE-18052.6.patch, HIVE-18052.7.patch, 
> HIVE-18052.8.patch, HIVE-18052.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-18052) Run p-tests on mm tables

2018-01-04 Thread Steve Yeom (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312174#comment-16312174
 ] 

Steve Yeom edited comment on HIVE-18052 at 1/4/18 11:22 PM:


patch 16 is the same as patch 156. Adding this since patch 15 is not run by the 
p-test system.


was (Author: steveyeom2017):
patch 16 is the same as patch 156. Adding this since patch 15 is not tested.

> Run p-tests on mm tables
> 
>
> Key: HIVE-18052
> URL: https://issues.apache.org/jira/browse/HIVE-18052
> Project: Hive
>  Issue Type: Task
>Reporter: Steve Yeom
>Assignee: Steve Yeom
> Attachments: HIVE-18052.1.patch, HIVE-18052.10.patch, 
> HIVE-18052.11.patch, HIVE-18052.12.patch, HIVE-18052.13.patch, 
> HIVE-18052.14.patch, HIVE-18052.15.patch, HIVE-18052.16.patch, 
> HIVE-18052.2.patch, HIVE-18052.3.patch, HIVE-18052.4.patch, 
> HIVE-18052.5.patch, HIVE-18052.6.patch, HIVE-18052.7.patch, 
> HIVE-18052.8.patch, HIVE-18052.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-18052) Run p-tests on mm tables

2018-01-04 Thread Steve Yeom (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312174#comment-16312174
 ] 

Steve Yeom edited comment on HIVE-18052 at 1/4/18 11:22 PM:


patch 16 is the same as patch 15. Adding this since patch 15 is not run by the 
p-test system.


was (Author: steveyeom2017):
patch 16 is the same as patch 156. Adding this since patch 15 is not run by the 
p-test system.

> Run p-tests on mm tables
> 
>
> Key: HIVE-18052
> URL: https://issues.apache.org/jira/browse/HIVE-18052
> Project: Hive
>  Issue Type: Task
>Reporter: Steve Yeom
>Assignee: Steve Yeom
> Attachments: HIVE-18052.1.patch, HIVE-18052.10.patch, 
> HIVE-18052.11.patch, HIVE-18052.12.patch, HIVE-18052.13.patch, 
> HIVE-18052.14.patch, HIVE-18052.15.patch, HIVE-18052.16.patch, 
> HIVE-18052.2.patch, HIVE-18052.3.patch, HIVE-18052.4.patch, 
> HIVE-18052.5.patch, HIVE-18052.6.patch, HIVE-18052.7.patch, 
> HIVE-18052.8.patch, HIVE-18052.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18349) Misc metastore changes for debuggability, error on commit txn failures

2018-01-04 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312173#comment-16312173
 ] 

Thejas M Nair commented on HIVE-18349:
--

+1 pending tests


> Misc metastore changes for debuggability, error on commit txn failures
> --
>
> Key: HIVE-18349
> URL: https://issues.apache.org/jira/browse/HIVE-18349
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-18349.1.patch, HIVE-18349.2.patch, 
> HIVE-18349.3.patch, HIVE-18349.4.patch
>
>
> 1) Hive metastore audit event log/metastore log does not log the final status 
> (success or failed) of the event. Some operations like for example, 
> drop_table returns a boolean success flag but it never gets logged anywhere. 
> However the same is sent to end event listeners or other metastore event 
> listeners. It will be good to log the final status of the events. 
> 2) Make connection timeout when using connection pool configurable. Currently 
> its hard coded to 30 seconds.
> 3) Provide a config to enable connection leak detection for HikariCP or 
> enable when debug logging is enabled.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18349) Misc metastore changes for debuggability, error on commit txn failures

2018-01-04 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-18349:
-
Summary: Misc metastore changes for debuggability, error on commit txn 
failures  (was: Misc metastore changes for debuggability)

> Misc metastore changes for debuggability, error on commit txn failures
> --
>
> Key: HIVE-18349
> URL: https://issues.apache.org/jira/browse/HIVE-18349
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-18349.1.patch, HIVE-18349.2.patch, 
> HIVE-18349.3.patch, HIVE-18349.4.patch
>
>
> 1) Hive metastore audit event log/metastore log does not log the final status 
> (success or failed) of the event. Some operations like for example, 
> drop_table returns a boolean success flag but it never gets logged anywhere. 
> However the same is sent to end event listeners or other metastore event 
> listeners. It will be good to log the final status of the events. 
> 2) Make connection timeout when using connection pool configurable. Currently 
> its hard coded to 30 seconds.
> 3) Provide a config to enable connection leak detection for HikariCP or 
> enable when debug logging is enabled.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18379) ALTER TABLE authorization_part SET PROPERTIES ("PARTITIONL_LEVEL_PRIVILEGE"="TRUE"); fails when authorization_part is MicroManaged table.

2018-01-04 Thread Steve Yeom (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Yeom updated HIVE-18379:
--
Status: Patch Available  (was: Open)

> ALTER TABLE authorization_part SET PROPERTIES 
> ("PARTITIONL_LEVEL_PRIVILEGE"="TRUE"); fails when authorization_part is 
> MicroManaged table.
> -
>
> Key: HIVE-18379
> URL: https://issues.apache.org/jira/browse/HIVE-18379
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Steve Yeom
>Assignee: Steve Yeom
>Priority: Minor
> Attachments: HIVE-18379.01.patch
>
>
> ALTER TABLE authorization_part SET TBLPROPERTIES 
> ("PARTITION_LEVEL_PRIVILEGE"="TRUE") fails when authorization_part is a 
> Micromanaged table.
> This is from authorization_2.q qtest.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18379) ALTER TABLE authorization_part SET PROPERTIES ("PARTITIONL_LEVEL_PRIVILEGE"="TRUE"); fails when authorization_part is MicroManaged table.

2018-01-04 Thread Steve Yeom (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Yeom updated HIVE-18379:
--
Attachment: HIVE-18379.01.patch

> ALTER TABLE authorization_part SET PROPERTIES 
> ("PARTITIONL_LEVEL_PRIVILEGE"="TRUE"); fails when authorization_part is 
> MicroManaged table.
> -
>
> Key: HIVE-18379
> URL: https://issues.apache.org/jira/browse/HIVE-18379
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Steve Yeom
>Assignee: Steve Yeom
>Priority: Minor
> Attachments: HIVE-18379.01.patch
>
>
> ALTER TABLE authorization_part SET TBLPROPERTIES 
> ("PARTITION_LEVEL_PRIVILEGE"="TRUE") fails when authorization_part is a 
> Micromanaged table.
> This is from authorization_2.q qtest.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-18275) add HS2-level WM metrics

2018-01-04 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-18275:
---

Assignee: Sergey Shelukhin

> add HS2-level WM metrics
> 
>
> Key: HIVE-18275
> URL: https://issues.apache.org/jira/browse/HIVE-18275
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> E.g. time spent in pool queue. Some existing UIs use perflogger output, so we 
> should also include that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM

2018-01-04 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312155#comment-16312155
 ] 

Sergey Shelukhin commented on HIVE-18269:
-

[~prasanth_j] [~jdere] [~gopalv] can someone please review this patch? thnx

> LLAP: Fast llap io with slow processing pipeline can lead to OOM
> 
>
> Key: HIVE-18269
> URL: https://issues.apache.org/jira/browse/HIVE-18269
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
> Attachments: HIVE-18269.01.patch, HIVE-18269.1.patch, 
> HIVE-18269.bad.patch, Screen Shot 2017-12-13 at 1.15.16 AM.png
>
>
> pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow 
> indefinitely when Llap IO is faster than processing pipeline. Since we don't 
> have backpressure to slow down the IO, this can lead to indefinite growth of 
> pending data leading to severe GC pressure and eventually lead to OOM.
> This specific instance of LLAP was running on HDFS on top of EBS volume 
> backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS 
> .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18377) avoid explicitly setting HIVE_SUPPORT_CONCURRENCY in JUnit tests

2018-01-04 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18377:
--
Status: Patch Available  (was: Open)

> avoid explicitly setting HIVE_SUPPORT_CONCURRENCY in JUnit tests
> 
>
> Key: HIVE-18377
> URL: https://issues.apache.org/jira/browse/HIVE-18377
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test, Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-18377.02.patch
>
>
> many UTs (e.g. TestHCatMultiOutputFormat, 
> BeelineWithHS2ConnectionFileTestBase, TestOperationLoggingAPIWithMr, 
> HCatBaseTest and many others)
> explicitly set 
> {{hiveConf.set(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY.varname, "false");}}
> It would be better if they picked up the settings from 
> data/conf/hive-site.xml.
> It adds consistency and makes it possible to run all tests with known config 
> (at least approach this).
> The outline of the process is:
> 1. build copies {{\*-site.xml files from data/conf/\*\*/\*-site.xml}} to 
> target/testconf/
> 2. HiveConf picks up target/testconf/hive-site.xml
> 3. Various forms of *CliDriver may explicitly specify (e.g. 
> MiniLlapLocalCliConfig) which hive-site.xml to use
>  
> The first step is to see how many explicit settings of 
> HIVE_SUPPORT_CONCURRENCY can be removed w/o breaking the tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18377) avoid explicitly setting HIVE_SUPPORT_CONCURRENCY in JUnit tests

2018-01-04 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18377:
--
Attachment: HIVE-18377.02.patch

> avoid explicitly setting HIVE_SUPPORT_CONCURRENCY in JUnit tests
> 
>
> Key: HIVE-18377
> URL: https://issues.apache.org/jira/browse/HIVE-18377
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test, Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-18377.02.patch
>
>
> many UTs (e.g. TestHCatMultiOutputFormat, 
> BeelineWithHS2ConnectionFileTestBase, TestOperationLoggingAPIWithMr, 
> HCatBaseTest and many others)
> explicitly set 
> {{hiveConf.set(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY.varname, "false");}}
> It would be better if they picked up the settings from 
> data/conf/hive-site.xml.
> It adds consistency and makes it possible to run all tests with known config 
> (at least approach this).
> The outline of the process is:
> 1. build copies {{\*-site.xml files from data/conf/\*\*/\*-site.xml}} to 
> target/testconf/
> 2. HiveConf picks up target/testconf/hive-site.xml
> 3. Various forms of *CliDriver may explicitly specify (e.g. 
> MiniLlapLocalCliConfig) which hive-site.xml to use
>  
> The first step is to see how many explicit settings of 
> HIVE_SUPPORT_CONCURRENCY can be removed w/o breaking the tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18349) Misc metastore changes for debuggability

2018-01-04 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312147#comment-16312147
 ] 

Prasanth Jayachandran commented on HIVE-18349:
--

[~thejas] Can you please take another look? some more changes went into this 
patch. 

> Misc metastore changes for debuggability
> 
>
> Key: HIVE-18349
> URL: https://issues.apache.org/jira/browse/HIVE-18349
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-18349.1.patch, HIVE-18349.2.patch, 
> HIVE-18349.3.patch, HIVE-18349.4.patch
>
>
> 1) Hive metastore audit event log/metastore log does not log the final status 
> (success or failed) of the event. Some operations like for example, 
> drop_table returns a boolean success flag but it never gets logged anywhere. 
> However the same is sent to end event listeners or other metastore event 
> listeners. It will be good to log the final status of the events. 
> 2) Make connection timeout when using connection pool configurable. Currently 
> its hard coded to 30 seconds.
> 3) Provide a config to enable connection leak detection for HikariCP or 
> enable when debug logging is enabled.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18349) Misc metastore changes for debuggability

2018-01-04 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-18349:
-
Attachment: HIVE-18349.4.patch

Updated patch covers some more places where failure to commit will throw 
exception. 

> Misc metastore changes for debuggability
> 
>
> Key: HIVE-18349
> URL: https://issues.apache.org/jira/browse/HIVE-18349
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-18349.1.patch, HIVE-18349.2.patch, 
> HIVE-18349.3.patch, HIVE-18349.4.patch
>
>
> 1) Hive metastore audit event log/metastore log does not log the final status 
> (success or failed) of the event. Some operations like for example, 
> drop_table returns a boolean success flag but it never gets logged anywhere. 
> However the same is sent to end event listeners or other metastore event 
> listeners. It will be good to log the final status of the events. 
> 2) Make connection timeout when using connection pool configurable. Currently 
> its hard coded to 30 seconds.
> 3) Provide a config to enable connection leak detection for HikariCP or 
> enable when debug logging is enabled.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-18379) ALTER TABLE authorization_part SET PROPERTIES ("PARTITIONL_LEVEL_PRIVILEGE"="TRUE"); fails when authorization_part is MicroManaged table.

2018-01-04 Thread Steve Yeom (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Yeom reassigned HIVE-18379:
-

Assignee: Steve Yeom

> ALTER TABLE authorization_part SET PROPERTIES 
> ("PARTITIONL_LEVEL_PRIVILEGE"="TRUE"); fails when authorization_part is 
> MicroManaged table.
> -
>
> Key: HIVE-18379
> URL: https://issues.apache.org/jira/browse/HIVE-18379
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Steve Yeom
>Assignee: Steve Yeom
>Priority: Minor
>
> ALTER TABLE authorization_part SET TBLPROPERTIES 
> ("PARTITION_LEVEL_PRIVILEGE"="TRUE") fails when authorization_part is a 
> Micromanaged table.
> This is from authorization_2.q qtest.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18361) Extend shared work optimizer to reuse computation beyond work boundaries

2018-01-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312134#comment-16312134
 ] 

Hive QA commented on HIVE-18361:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
28s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
20s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
49s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
19s{color} | {color:red} common: The patch generated 2 new + 932 unchanged - 0 
fixed = 934 total (was 932) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
34s{color} | {color:red} ql: The patch generated 19 new + 42 unchanged - 4 
fixed = 61 total (was 46) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 15m 18s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 3f5148d |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8447/yetus/diff-checkstyle-common.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8447/yetus/diff-checkstyle-ql.txt
 |
| modules | C: common ql itests U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8447/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Extend shared work optimizer to reuse computation beyond work boundaries
> 
>
> Key: HIVE-18361
> URL: https://issues.apache.org/jira/browse/HIVE-18361
> Project: Hive
>  Issue Type: New Feature
>  Components: Physical Optimizer
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>  Labels: TODOC3.0
> Attachments: HIVE-18361.01.patch, HIVE-18361.02.patch, 
> HIVE-18361.patch
>
>
> Follow-up of the work in HIVE-16867.
> HIVE-16867 introduced an optimization that identifies scans on input tables 
> that can be merged and reuses the computation that is done in the work 
> containing those scans. In particular, we traverse both parts of the plan 
> upstream and reuse the operators if possible.
> Currently, the optimizer will not go beyond the output edge(s) of that work. 
> This extension removes that limitation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18221) test acid default

2018-01-04 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18221:
--
Attachment: HIVE-18221.23.patch

> test acid default
> -
>
> Key: HIVE-18221
> URL: https://issues.apache.org/jira/browse/HIVE-18221
> Project: Hive
>  Issue Type: Test
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-18221.01.patch, HIVE-18221.02.patch, 
> HIVE-18221.03.patch, HIVE-18221.04.patch, HIVE-18221.07.patch, 
> HIVE-18221.08.patch, HIVE-18221.09.patch, HIVE-18221.10.patch, 
> HIVE-18221.11.patch, HIVE-18221.12.patch, HIVE-18221.13.patch, 
> HIVE-18221.14.patch, HIVE-18221.16.patch, HIVE-18221.18.patch, 
> HIVE-18221.19.patch, HIVE-18221.20.patch, HIVE-18221.21.patch, 
> HIVE-18221.22.patch, HIVE-18221.23.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18376) Update committer-list

2018-01-04 Thread Chris Drome (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312128#comment-16312128
 ] 

Chris Drome commented on HIVE-18376:


Thanks. Committed.

> Update committer-list
> -
>
> Key: HIVE-18376
> URL: https://issues.apache.org/jira/browse/HIVE-18376
> Project: Hive
>  Issue Type: Bug
>Reporter: Chris Drome
>Assignee: Chris Drome
>Priority: Trivial
> Attachments: HIVE-18376.1.patch
>
>
> Adding new entry to committer-list:
> {noformat}
> +
> +cdrome 
> +Chris Drome 
> + href="https://www.oath.com/";>Oath 
> +
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-18376) Update committer-list

2018-01-04 Thread Chris Drome (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Drome resolved HIVE-18376.

Resolution: Fixed

> Update committer-list
> -
>
> Key: HIVE-18376
> URL: https://issues.apache.org/jira/browse/HIVE-18376
> Project: Hive
>  Issue Type: Bug
>Reporter: Chris Drome
>Assignee: Chris Drome
>Priority: Trivial
> Attachments: HIVE-18376.1.patch
>
>
> Adding new entry to committer-list:
> {noformat}
> +
> +cdrome 
> +Chris Drome 
> + href="https://www.oath.com/";>Oath 
> +
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph

2018-01-04 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-18368:

Status: Patch Available  (was: Open)

> Improve Spark Debug RDD Graph
> -
>
> Key: HIVE-18368
> URL: https://issues.apache.org/jira/browse/HIVE-18368
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-18368.1.patch, Spark UI - Named RDDs.png
>
>
> The {{SparkPlan}} class does some logging to show the mapping between 
> different {{SparkTran}}, what shuffle types are used, and what trans are 
> cached. However, there is room for improvement.
> When debug logging is enabled the RDD graph is logged, but there isn't much 
> information printed about each RDD.
> We should combine both of the graphs and improve them. We could even make the 
> Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.
> Ideally, the final graph shows a clear relationship between Tran objects, 
> RDDs, and BaseWorks. Edge should include information about number of 
> partitions, shuffle types, Spark operations used, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph

2018-01-04 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-18368:

Attachment: HIVE-18368.1.patch

> Improve Spark Debug RDD Graph
> -
>
> Key: HIVE-18368
> URL: https://issues.apache.org/jira/browse/HIVE-18368
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-18368.1.patch, Spark UI - Named RDDs.png
>
>
> The {{SparkPlan}} class does some logging to show the mapping between 
> different {{SparkTran}}, what shuffle types are used, and what trans are 
> cached. However, there is room for improvement.
> When debug logging is enabled the RDD graph is logged, but there isn't much 
> information printed about each RDD.
> We should combine both of the graphs and improve them. We could even make the 
> Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.
> Ideally, the final graph shows a clear relationship between Tran objects, 
> RDDs, and BaseWorks. Edge should include information about number of 
> partitions, shuffle types, Spark operations used, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18328) Improve schematool validator to report duplicate rows for column statistics

2018-01-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312111#comment-16312111
 ] 

Hive QA commented on HIVE-18328:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12903348/HIVE-18328.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 11547 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=35)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketsortoptimize_insert_2]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[hybridgrace_hashjoin_2]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=168)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=159)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[bucketizedhiveinputformat]
 (batchId=177)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_part]
 (batchId=93)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[stats_aggregator_error_1]
 (batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] 
(batchId=120)
org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore.testTransactionalValidation
 (batchId=213)
org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=253)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=225)
org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=231)
org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=231)
org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=231)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8446/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8446/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8446/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 19 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12903348 - PreCommit-HIVE-Build

> Improve schematool validator to report duplicate rows for column statistics
> ---
>
> Key: HIVE-18328
> URL: https://issues.apache.org/jira/browse/HIVE-18328
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 2.1.1
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-18328.patch
>
>
> By design, in the {{TAB_COL_STATS}} table of the HMS schema, there should be 
> ONE AND ONLY ONE row, representing its statistics, for each column defined in 
> hive. A combination of DB_NAME, TABLE_NAME and COLUMN_NAME constitute a 
> primary key/unique row.
> Each time the statistics are computed for a column, this row is updated. 
> However, if somehow via  BDR/replication process, we end up with multiple 
> rows in this table for a given column, HMS server to recompute the statistics 
> there after.
> So it would be good to detect this data anamoly via the schema validation 
> tool.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph

2018-01-04 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-18368:

Attachment: Spark UI - Named RDDs.png

> Improve Spark Debug RDD Graph
> -
>
> Key: HIVE-18368
> URL: https://issues.apache.org/jira/browse/HIVE-18368
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: Spark UI - Named RDDs.png
>
>
> The {{SparkPlan}} class does some logging to show the mapping between 
> different {{SparkTran}}, what shuffle types are used, and what trans are 
> cached. However, there is room for improvement.
> When debug logging is enabled the RDD graph is logged, but there isn't much 
> information printed about each RDD.
> We should combine both of the graphs and improve them. We could even make the 
> Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.
> Ideally, the final graph shows a clear relationship between Tran objects, 
> RDDs, and BaseWorks. Edge should include information about number of 
> partitions, shuffle types, Spark operations used, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18368) Improve Spark Debug RDD Graph

2018-01-04 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312109#comment-16312109
 ] 

Sahil Takiar commented on HIVE-18368:
-

* Spark provides a nice RDD graph via {{RDD#toDebugString}} - I replaced the 
{{SparkPlan#logSparkPlan}} and {{SparkUtilities#rddGraphToString}} with this 
graph. It includes all the info from both of these graphs + more info. It's 
very similar to the info that is showed in the Spark Web UI. An example is 
below.
* Added explicit names for each RDD; the name is derived from the name of the 
{{BaseWork}} that corresponds to the RDD, along with the {{SparkEdgeProperty}} 
(if there is one). The example below shows this in detail.
** The nice thing about adding explicit names is that they show up in Spark Web 
UI too, which can be very useful for mapping a Hive Explain Plan to the Spark 
RDD DAG
** The name includes the number of partitions for the RDD as well as whether or 
not the RDD is cached
* I originally wanted to find a way to display this in the {{EXPLAIN EXTENDED}} 
output, but for now that may be a bit difficult, because the {{SparkPlan}} is 
only generated in the {{RemoteDriver}} - its probably possible to generate the 
{{SparkPlan}} somewhere in the {{ExplainTask}}, but I'll save that for a later 
JIRA
* The Spark RDD Graph is printed at INFO level, which I think should help with 
debugging
* I've attached a screenshot of what the the Spark Web UI looks like with named 
RDDs

Spark RDD Graph:

{code}
(1) Reducer 5 (1) MapPartitionsRDD[25] at mapPartitionsToPair at 
ReduceTran.java:41 []
 |  Reducer 5 (SORT, 1) ShuffledRDD[24] at sortByKey at SortByShuffler.java:51 
[]
 +-(166) Reducer 4 (166) MapPartitionsRDD[23] at mapPartitionsToPair at 
ReduceTran.java:41 []
 |   Reducer 4 (PARTITION-LEVEL SORT, 166) ShuffledRDD[22] at 
repartitionAndSortWithinPartitions at SortByShuffler.java:57 []
 +-(328) UnionRDD (328) UnionRDD[21] at union at SparkPlan.java:70 []
 |   Reducer 3 (328) MapPartitionsRDD[19] at mapPartitionsToPair at 
ReduceTran.java:41 []
 |   Reducer 3 (PARTITION-LEVEL SORT, 328) ShuffledRDD[18] at 
repartitionAndSortWithinPartitions at SortByShuffler.java:57 []
 +-(874) UnionRDD (874) UnionRDD[17] at union at SparkPlan.java:70 []
 |   UnionRDD (874) UnionRDD[16] at union at SparkPlan.java:70 []
 |   Reducer 2 (437) MapPartitionsRDD[11] at mapPartitionsToPair at 
ReduceTran.java:41 []
 |   Reducer 2 (GROUP, 437) MapPartitionsRDD[10] at groupByKey at 
GroupByShuffler.java:31 []
 |   ShuffledRDD[9] at groupByKey at GroupByShuffler.java:31 []
 +-(0) Map 1 (0) MapPartitionsRDD[8] at mapPartitionsToPair at 
MapTran.java:41 []
|  Map 1 (store_sales, 0) HadoopRDD[4] at hadoopRDD at 
SparkPlanGenerator.java:203 []
 |   Reducer 8 (437) MapPartitionsRDD[14] at mapPartitionsToPair at 
ReduceTran.java:41 []
 |   Reducer 8 (GROUP PARTITION-LEVEL SORT, 437) ShuffledRDD[13] at 
repartitionAndSortWithinPartitions at SortByShuffler.java:57 []
 +-(0) Map 7 (0) MapPartitionsRDD[12] at mapPartitionsToPair at 
MapTran.java:41 []
|  Map 7 (store_sales, 0) HadoopRDD[5] at hadoopRDD at 
SparkPlanGenerator.java:203 []
 |   Map 10 (0) MapPartitionsRDD[15] at mapPartitionsToPair at 
MapTran.java:41 []
 |   Map 10 (store, 0) HadoopRDD[6] at hadoopRDD at 
SparkPlanGenerator.java:203 []
 |   Map 11 (0) MapPartitionsRDD[20] at mapPartitionsToPair at 
MapTran.java:41 []
 |   Map 11 (item, 0) HadoopRDD[7] at hadoopRDD at 
SparkPlanGenerator.java:203 []
{code}

> Improve Spark Debug RDD Graph
> -
>
> Key: HIVE-18368
> URL: https://issues.apache.org/jira/browse/HIVE-18368
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: Spark UI - Named RDDs.png
>
>
> The {{SparkPlan}} class does some logging to show the mapping between 
> different {{SparkTran}}, what shuffle types are used, and what trans are 
> cached. However, there is room for improvement.
> When debug logging is enabled the RDD graph is logged, but there isn't much 
> information printed about each RDD.
> We should combine both of the graphs and improve them. We could even make the 
> Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.
> Ideally, the final graph shows a clear relationship between Tran objects, 
> RDDs, and BaseWorks. Edge should include information about number of 
> partitions, shuffle types, Spark operations used, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18376) Update committer-list

2018-01-04 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312106#comment-16312106
 ] 

Mithun Radhakrishnan commented on HIVE-18376:
-

+1. Welcome to the fold! :)

> Update committer-list
> -
>
> Key: HIVE-18376
> URL: https://issues.apache.org/jira/browse/HIVE-18376
> Project: Hive
>  Issue Type: Bug
>Reporter: Chris Drome
>Assignee: Chris Drome
>Priority: Trivial
> Attachments: HIVE-18376.1.patch
>
>
> Adding new entry to committer-list:
> {noformat}
> +
> +cdrome 
> +Chris Drome 
> + href="https://www.oath.com/";>Oath 
> +
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph

2018-01-04 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-18368:

Summary: Improve Spark Debug RDD Graph  (was: Improve SparkPlan Graph)

> Improve Spark Debug RDD Graph
> -
>
> Key: HIVE-18368
> URL: https://issues.apache.org/jira/browse/HIVE-18368
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> The {{SparkPlan}} class does some logging to show the mapping between 
> different {{SparkTran}}, what shuffle types are used, and what trans are 
> cached. However, there is room for improvement.
> When debug logging is enabled the RDD graph is logged, but there isn't much 
> information printed about each RDD.
> We should combine both of the graphs and improve them. We could even make the 
> Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.
> Ideally, the final graph shows a clear relationship between Tran objects, 
> RDDs, and BaseWorks. Edge should include information about number of 
> partitions, shuffle types, Spark operations used, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18378) Explain plan should show if a Map/Reduce Work is being cached

2018-01-04 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312093#comment-16312093
 ] 

Sahil Takiar commented on HIVE-18378:
-

The {{CombineEquivalentWorkResolver}} should also print something in the log 
everytime it decides to combine two work objets. Right now it doesn't print 
anything at the INFO level.

> Explain plan should show if a Map/Reduce Work is being cached
> -
>
> Key: HIVE-18378
> URL: https://issues.apache.org/jira/browse/HIVE-18378
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>
> It would be nice if the explain plan showed what {{MapWork}} / {{ReduceWork}} 
> objects are being cached by Spark.
> The {{CombineEquivalentWorkResolver}} is the only code that triggers Spark 
> cache-ing, so we should be able to modify it so that it displays if a work 
> object will be cached or not.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16601) Display Session Id and Query Name / Id in Spark UI

2018-01-04 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-16601:

Issue Type: Sub-task  (was: Bug)
Parent: HIVE-17718

> Display Session Id and Query Name / Id in Spark UI
> --
>
> Key: HIVE-16601
> URL: https://issues.apache.org/jira/browse/HIVE-16601
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-16601.1.patch, HIVE-16601.2.patch, 
> HIVE-16601.3.patch, HIVE-16601.4.patch, HIVE-16601.5.patch, 
> HIVE-16601.6.patch, HIVE-16601.7.patch, HIVE-16601.8.patch, Spark UI 
> Applications List.png, Spark UI Jobs List.png
>
>
> We should display the session id for each HoS Application Launched, and the 
> Query Name / Id and Dag Id for each Spark job launched. Hive-on-MR does 
> something similar via the {{mapred.job.name}} parameter. The query name is 
> displayed in the Job Name of the MR app.
> The changes here should also allow us to leverage the config 
> {{hive.query.name}} for HoS.
> This should help with debuggability of HoS applications. The Hive-on-Tez UI 
> does something similar.
> Related issues for Hive-on-Tez: HIVE-12357, HIVE-12523



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-18377) avoid explicitly setting HIVE_SUPPORT_CONCURRENCY in JUnit tests

2018-01-04 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-18377:
-


> avoid explicitly setting HIVE_SUPPORT_CONCURRENCY in JUnit tests
> 
>
> Key: HIVE-18377
> URL: https://issues.apache.org/jira/browse/HIVE-18377
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test, Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> many UTs (e.g. TestHCatMultiOutputFormat, 
> BeelineWithHS2ConnectionFileTestBase, TestOperationLoggingAPIWithMr, 
> HCatBaseTest and many others)
> explicitly set 
> {{hiveConf.set(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY.varname, "false");}}
> It would be better if they picked up the settings from 
> data/conf/hive-site.xml.
> It adds consistency and makes it possible to run all tests with known config 
> (at least approach this).
> The outline of the process is:
> 1. build copies {{\*-site.xml files from data/conf/\*\*/\*-site.xml}} to 
> target/testconf/
> 2. HiveConf picks up target/testconf/hive-site.xml
> 3. Various forms of *CliDriver may explicitly specify (e.g. 
> MiniLlapLocalCliConfig) which hive-site.xml to use
>  
> The first step is to see how many explicit settings of 
> HIVE_SUPPORT_CONCURRENCY can be removed w/o breaking the tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18074) do not show rejected tasks as killed in query UI

2018-01-04 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18074:

Issue Type: Task  (was: Sub-task)
Parent: (was: HIVE-17481)

> do not show rejected tasks as killed in query UI
> 
>
> Key: HIVE-18074
> URL: https://issues.apache.org/jira/browse/HIVE-18074
> Project: Hive
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> Tasks rejected from LLAP because the cluster is full are shown as killed 
> tasks in the commandline query UI (CLI and beeline). This shouldn't really 
> happen; killed tasks in the container case means something else, and this 
> scenario doesn't exist because AM doesn't continuously try to queue tasks. We 
> could change LLAP queue to use sort of a pull model (would also allow for 
> better duplicate scheduling), but for now we should fix the UI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18238) Driver execution may not have configuration changing sideeffects

2018-01-04 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-18238:

Attachment: HIVE-18238.04wip01.patch

I can't seem to reproduce these test failures...If I run the 
{{TestNegativeCliDriver}} at the current master it also starts fails with OOM 
errors... - however I've ensured that the {{Driver.destroy}} is called - it was 
called before after every failed command...

> Driver execution may not have configuration changing sideeffects 
> -
>
> Key: HIVE-18238
> URL: https://issues.apache.org/jira/browse/HIVE-18238
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Attachments: HIVE-18238.01wip01.patch, HIVE-18238.02.patch, 
> HIVE-18238.03.patch, HIVE-18238.04wip01.patch
>
>
> {{Driver}} executes sql statements which use "hiveconf" settings;
> but the {{Driver}} itself may *not* change the configuration...
> I've found an example; which shows how hazardous this is...
> {code}
> set hive.mapred.mode=strict;
> select "${hiveconf:hive.mapred.mode}";
> create table t (a int);
> analyze table t compute statistics;
> select "${hiveconf:hive.mapred.mode}";
> {code}
> currently; the last select returns {{nonstrict}} because of 
> [this|https://github.com/apache/hive/blob/7ddd915bf82a68c8ab73b0c4ca409f1a6d43d227/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L1696]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected

2018-01-04 Thread Paul Jackson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312051#comment-16312051
 ] 

Paul Jackson commented on HIVE-18375:
-

This seems related. Posting as a comment, but perhaps it should be in its own 
bug report.

Order By cannot see fields if they are projected with an alias.

The first two queries fail with:
{code}SemanticException [Error 10004]: line 7:9 Invalid table alias or column 
reference 'emp_no': (possible column names are: f_4, f_3, f_5){code}
The last two succeed.
{code:SQL}
SELECT `first_name` `F_4`, `last_name` `F_5`
FROM `default`.`employees`
ORDER BY `emp_no` DESC;

SELECT `first_name` `F_4`, `emp_no` `F_3`, `last_name` `F_5`
FROM `default`.`employees`
ORDER BY `emp_no` DESC;

SELECT `first_name` `F_4`, `emp_no`, `last_name` `F_5`
FROM `default`.`employees`
ORDER BY `emp_no` DESC;

SELECT `first_name` `F_4`, `emp_no` `F_3`, `last_name` `F_5`
FROM `default`.`employees`
ORDER BY `F_3` DESC;
{code}

> Cannot ORDER by subquery fields unless they are selected
> 
>
> Key: HIVE-18375
> URL: https://issues.apache.org/jira/browse/HIVE-18375
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.2
> Environment: Amazon AWS
> Release label:emr-5.11.0
> Hadoop distribution:Amazon 2.7.3
> Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1
> classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict]
>Reporter: Paul Jackson
>Priority: Minor
>
> Give these tables:
> {code:SQL}
> CREATE TABLE employees (
> emp_no  INT,
> first_name  VARCHAR(14),
> last_name   VARCHAR(16)
> );
> insert into employees values
> (1, 'Gottlob', 'Frege'),
> (2, 'Bertrand', 'Russell'),
> (3, 'Ludwig', 'Wittgenstein');
> CREATE TABLE salaries (
> emp_no  INT,
> salary  INT,
> from_date   DATE,
> to_date DATE
> );
> insert into salaries values
> (1, 10, '1900-01-01', '1900-01-31'),
> (1, 18, '1900-09-01', '1900-09-30'),
> (2, 15, '1940-03-01', '1950-01-01'),
> (3, 20, '1920-01-01', '1950-01-01');
> {code}
> This query returns the names of the employees ordered by their peak salary:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, this should still work even if the max_salary is not part of the 
> projection:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, that fails with this error:
> {code}
> Error while compiling statement: FAILED: SemanticException [Error 10004]: 
> line 9:9 Invalid table alias or column reference 't1': (possible column names 
> are: last_name, first_name)
> {code}
> FWIW, this also fails:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> But this succeeds:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `max_sal` DESC;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18376) Update committer-list

2018-01-04 Thread Chris Drome (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Drome updated HIVE-18376:
---
Attachment: HIVE-18376.1.patch

> Update committer-list
> -
>
> Key: HIVE-18376
> URL: https://issues.apache.org/jira/browse/HIVE-18376
> Project: Hive
>  Issue Type: Bug
>Reporter: Chris Drome
>Assignee: Chris Drome
>Priority: Trivial
> Attachments: HIVE-18376.1.patch
>
>
> Adding new entry to committer-list:
> {noformat}
> +
> +cdrome 
> +Chris Drome 
> + href="https://www.oath.com/";>Oath 
> +
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18328) Improve schematool validator to report duplicate rows for column statistics

2018-01-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312040#comment-16312040
 ] 

Hive QA commented on HIVE-18328:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
25s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
11s{color} | {color:red} beeline: The patch generated 6 new + 88 unchanged - 0 
fixed = 94 total (was 88) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 12m  1s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 3f5148d |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8446/yetus/diff-checkstyle-beeline.txt
 |
| modules | C: beeline itests/hive-unit U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8446/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Improve schematool validator to report duplicate rows for column statistics
> ---
>
> Key: HIVE-18328
> URL: https://issues.apache.org/jira/browse/HIVE-18328
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 2.1.1
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-18328.patch
>
>
> By design, in the {{TAB_COL_STATS}} table of the HMS schema, there should be 
> ONE AND ONLY ONE row, representing its statistics, for each column defined in 
> hive. A combination of DB_NAME, TABLE_NAME and COLUMN_NAME constitute a 
> primary key/unique row.
> Each time the statistics are computed for a column, this row is updated. 
> However, if somehow via  BDR/replication process, we end up with multiple 
> rows in this table for a given column, HMS server to recompute the statistics 
> there after.
> So it would be good to detect this data anamoly via the schema validation 
> tool.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-18376) Update committer-list

2018-01-04 Thread Chris Drome (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Drome reassigned HIVE-18376:
--


> Update committer-list
> -
>
> Key: HIVE-18376
> URL: https://issues.apache.org/jira/browse/HIVE-18376
> Project: Hive
>  Issue Type: Bug
>Reporter: Chris Drome
>Assignee: Chris Drome
>Priority: Trivial
>
> Adding new entry to committer-list:
> {noformat}
> +
> +cdrome 
> +Chris Drome 
> + href="https://www.oath.com/";>Oath 
> +
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18334) Cannot JOIN ON result of COALESCE

2018-01-04 Thread Paul Jackson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Jackson updated HIVE-18334:

Component/s: Query Processor

> Cannot JOIN ON result of COALESCE 
> --
>
> Key: HIVE-18334
> URL: https://issues.apache.org/jira/browse/HIVE-18334
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.2
> Environment: Amazon AWS
> Release label:emr-5.11.0
> Hadoop distribution:Amazon 2.7.3
> Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1
> classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict]
>Reporter: Paul Jackson
>Priority: Minor
>
> A join is returning no results when the ON clause is equating the results of 
> two COALESCE functions. To reproduce:
> {code:SQL}
> CREATE TABLE t5 (
>   dno INTEGER,
>   dname VARCHAR(30),
>   eno INTEGER,
>   ename VARCHAR(30));
> CREATE TABLE t6 (
>   dno INTEGER,
>   dname VARCHAR(30),
>   eno INTEGER,
>   ename VARCHAR(30));
> INSERT INTO t5 VALUES
>   (10, 'FOO', NULL, NULL),
>   (20, 'BAR', NULL, NULL),
>   (NULL, NULL, 7300, 'LARRY'),
>   (NULL, NULL, 7400, 'MOE'),
>   (NULL, NULL, 7500, 'CURLY');
> INSERT INTO t6 VALUES
>   (10, 'LENNON', NULL, NULL),
>   (20, 'MCCARTNEY', NULL, NULL),
>   (NULL, NULL, 7300, 'READY'),
>   (NULL, NULL, 7400, 'WILLING'),
>   (NULL, NULL, 7500, 'ABLE');
> -- Fails with 0 results
> SELECT *
> FROM t5
> INNER JOIN t6
> ON COALESCE(`t5`.`eno`, `t5`.`dno`) = COALESCE(`t6`.`eno`, `t6`.`dno`)
> -- Full cross with where clause works (in nonstrict mode), returning 5 results
> SELECT *
> FROM t5
> JOIN t6
> WHERE `t5`.`eno` = `t6`.`eno` OR `t5`.`dno` = `t6`.`dno`
> -- Strange that coalescing the same field returns 2 results...
> SELECT *
> FROM t5
> INNER JOIN t6
> ON COALESCE(`t5`.`dno`, `t5`.`dno`) = COALESCE(`t6`.`dno`, `t6`.`dno`)
> -- ...and coalescing the other field returns 3 results
> SELECT *
> FROM t5
> INNER JOIN t6
> ON COALESCE(`t5`.`eno`, `t5`.`eno`) = COALESCE(`t6`.`eno`, `t6`.`eno`)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-18374) Update committer-list

2018-01-04 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan resolved HIVE-18374.
-
Resolution: Fixed

Thanks, [~thejas]! :] Committed.

> Update committer-list
> -
>
> Key: HIVE-18374
> URL: https://issues.apache.org/jira/browse/HIVE-18374
> Project: Hive
>  Issue Type: Task
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Trivial
> Attachments: HIVE-18374.1.patch
>
>
> I'm afraid I need to make a trivial change to my organization affiliation:
> {code:xml}
> 
> mithun
> Mithun Radhakrishnan
> https://oath.com/";>Oath
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected

2018-01-04 Thread Paul Jackson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Jackson updated HIVE-18375:

Description: 
Give these tables:
{code:SQL}
CREATE TABLE employees (
emp_no  INT,
first_name  VARCHAR(14),
last_name   VARCHAR(16)
);
insert into employees values
(1, 'Gottlob', 'Frege'),
(2, 'Bertrand', 'Russell'),
(3, 'Ludwig', 'Wittgenstein');
CREATE TABLE salaries (
emp_no  INT,
salary  INT,
from_date   DATE,
to_date DATE
);
insert into salaries values
(1, 10, '1900-01-01', '1900-01-31'),
(1, 18, '1900-09-01', '1900-09-30'),
(2, 15, '1940-03-01', '1950-01-01'),
(3, 20, '1920-01-01', '1950-01-01');
{code}
This query returns the names of the employees ordered by their peak salary:
{code:SQL}
SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary`
FROM `default`.`employees`
INNER JOIN
 (SELECT `emp_no`, MAX(`salary`) `max_salary`
  FROM `default`.`salaries`
  WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
  GROUP BY `emp_no`) AS `t1`
ON `employees`.`emp_no` = `t1`.`emp_no`
ORDER BY `t1`.`max_salary` DESC;
{code}
However, this should still work even if the max_salary is not part of the 
projection:
{code:SQL}
SELECT `employees`.`last_name`, `employees`.`first_name`
FROM `default`.`employees`
INNER JOIN
 (SELECT `emp_no`, MAX(`salary`) `max_salary`
  FROM `default`.`salaries`
  WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
  GROUP BY `emp_no`) AS `t1`
ON `employees`.`emp_no` = `t1`.`emp_no`
ORDER BY `t1`.`max_salary` DESC;
{code}
However, that fails with this error:
{code}
Error while compiling statement: FAILED: SemanticException [Error 10004]: line 
9:9 Invalid table alias or column reference 't1': (possible column names are: 
last_name, first_name)
{code}

FWIW, this also fails:
{code:SQL}
SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` AS 
`max_sal`
FROM `default`.`employees`
INNER JOIN
 (SELECT `emp_no`, MAX(`salary`) `max_salary`
  FROM `default`.`salaries`
  WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
  GROUP BY `emp_no`) AS `t1`
ON `employees`.`emp_no` = `t1`.`emp_no`
ORDER BY `t1`.`max_salary` DESC;
{code}
But this succeeds:
{code:SQL}
SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` AS 
`max_sal`
FROM `default`.`employees`
INNER JOIN
 (SELECT `emp_no`, MAX(`salary`) `max_salary`
  FROM `default`.`salaries`
  WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
  GROUP BY `emp_no`) AS `t1`
ON `employees`.`emp_no` = `t1`.`emp_no`
ORDER BY `max_sal` DESC;
{code}

  was:
Give these tables:
{code:SQL}
CREATE TABLE employees (
emp_no  INT,
first_name  VARCHAR(14),
last_name   VARCHAR(16)
);
insert into employees values
(1, 'Gottlob', 'Frege'),
(2, 'Bertrand', 'Russell'),
(3, 'Ludwig', 'Wittgenstein');
CREATE TABLE salaries (
emp_no  INT,
salary  INT,
from_date   DATE,
to_date DATE
);
insert into salaries values
(1, 10, '1900-01-01', '1900-01-31'),
(1, 18, '1900-09-01', '1900-09-30'),
(2, 15, '1940-03-01', '1950-01-01'),
(3, 20, '1920-01-01', '1950-01-01');
{code}
This query returns the names of the employees ordered by their peak salary:
{code:SQL}
SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary`
FROM `default`.`employees`
INNER JOIN
 (SELECT `emp_no`, MAX(`salary`) `max_salary`
  FROM `default`.`salaries`
  WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
  GROUP BY `emp_no`) AS `t1`
ON `employees`.`emp_no` = `t1`.`emp_no`
ORDER BY `t1`.`max_salary` DESC;
{code}
However, this should still work even if the max_salary is not part of the 
projection:
{code:SQL}
SELECT `employees`.`last_name`, `employees`.`first_name`
FROM `default`.`employees`
INNER JOIN
 (SELECT `emp_no`, MAX(`salary`) `max_salary`
  FROM `default`.`salaries`
  WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
  GROUP BY `emp_no`) AS `t1`
ON `employees`.`emp_no` = `t1`.`emp_no`
ORDER BY `t1`.`max_salary` DESC;
{code}
However, that fails with this error:
{code}
Error while compiling statement: FAILED: SemanticException [Error 10004]: line 
9:9 Invalid table alias or column reference 't1': (possible column names are: 
last_name, first_name)
{code}


> Cannot ORDER by subquery fields unless they are selected
> 
>
> Key: HIVE-18375
> URL: https://issues.apache.org/jira/browse/HIVE-18375
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.2
> Environment: Amazon AWS
> Release label:emr-5.11.0
> Hadoop distribution:Amazon 2.7.3
> Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1
> classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict]
>Reporter: Paul Jackson
>Priority: Minor
>
> Give these tables:
> {code:

[jira] [Commented] (HIVE-18366) Update HBaseSerDe to use hbase.mapreduce.hfileoutputformat.table.name instead of hbase.table.name as the table name property

2018-01-04 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312014#comment-16312014
 ] 

Aihua Xu commented on HIVE-18366:
-

Attached patch-1: replace the old property name with the new one. Also added a 
qtest to make sure the right property name is used.

> Update HBaseSerDe to use hbase.mapreduce.hfileoutputformat.table.name instead 
> of hbase.table.name as the table name property
> 
>
> Key: HIVE-18366
> URL: https://issues.apache.org/jira/browse/HIVE-18366
> Project: Hive
>  Issue Type: Sub-task
>  Components: HBase Handler
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-18366.1.patch
>
>
> HBase 2.0 changes the table name property to 
> hbase.mapreduce.hfileoutputformat.table.name. HiveHFileOutputFormat is using 
> the new property name while HiveHBaseTableOutputFormat is not. If we create 
> the table as follows, HiveHBaseTableOutputFormat is used which still uses the 
> old property hbase.table.name.
> {noformat}
> create table hbase_table2(key int, val string) stored by 
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties 
> ('hbase.columns.mapping' = ':key,cf:val') tblproperties 
> ('hbase.mapreduce.hfileoutputformat.table.name' = 
> 'positive_hbase_handler_bulk')
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18274) add AM level metrics for WM

2018-01-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312017#comment-16312017
 ] 

Hive QA commented on HIVE-18274:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12904657/HIVE-18274.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 21 failed/errored test(s), 11516 tests 
executed
*Failed tests:*
{noformat}
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=161)

[dynamic_semijoin_reduction.q,materialized_view_create_rewrite_3.q,vectorization_pushdown.q,correlationoptimizer2.q,cbo_gby_empty.q,vectorization_short_regress.q,identity_project_remove_skip.q,mapjoin3.q,cross_product_check_1.q,unionDistinct_3.q,cbo_join.q,correlationoptimizer6.q,union_remove_26.q,cbo_rp_limit.q,vector_groupby_cube1.q,current_date_timestamp.q,union2.q,groupby2.q,schema_evol_text_vec_table.q,dynpart_sort_opt_vectorization.q,exchgpartition2lel.q,multiMapJoin1.q,sample10.q,vectorized_timestamp_ints_casts.q,vector_char_simple.q,auto_sortmerge_join_2.q,bucketizedhiveinputformat.q,vectorization_input_format_excludes.q,cte_mat_2.q,vectorization_8.q]
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[rcfile_format_nonpart]
 (batchId=248)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=35)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketsortoptimize_insert_2]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[hybridgrace_hashjoin_2]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=168)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=159)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[bucketizedhiveinputformat]
 (batchId=177)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_part]
 (batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] 
(batchId=120)
org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore.testTransactionalValidation
 (batchId=213)
org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testApplyPlanQpChanges 
(batchId=284)
org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=253)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=225)
org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=231)
org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=231)
org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=231)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8445/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8445/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8445/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 21 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12904657 - PreCommit-HIVE-Build

> add AM level metrics for WM
> ---
>
> Key: HIVE-18274
> URL: https://issues.apache.org/jira/browse/HIVE-18274
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-18274.01.patch, HIVE-18274.patch
>
>
> Unused guaranteed tasks (1 metric); guaranteed/speculative tasks x 
> updated/update in progress (4 metrics).
> It should be possible to view those over time as the query is (was) running, 
> to detect any anomalies. This jira is just to save the correct metrics.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18366) Update HBaseSerDe to use hbase.mapreduce.hfileoutputformat.table.name instead of hbase.table.name as the table name property

2018-01-04 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-18366:

Status: Patch Available  (was: Open)

> Update HBaseSerDe to use hbase.mapreduce.hfileoutputformat.table.name instead 
> of hbase.table.name as the table name property
> 
>
> Key: HIVE-18366
> URL: https://issues.apache.org/jira/browse/HIVE-18366
> Project: Hive
>  Issue Type: Sub-task
>  Components: HBase Handler
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-18366.1.patch
>
>
> HBase 2.0 changes the table name property to 
> hbase.mapreduce.hfileoutputformat.table.name. HiveHFileOutputFormat is using 
> the new property name while HiveHBaseTableOutputFormat is not. If we create 
> the table as follows, HiveHBaseTableOutputFormat is used which still uses the 
> old property hbase.table.name.
> {noformat}
> create table hbase_table2(key int, val string) stored by 
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties 
> ('hbase.columns.mapping' = ':key,cf:val') tblproperties 
> ('hbase.mapreduce.hfileoutputformat.table.name' = 
> 'positive_hbase_handler_bulk')
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18368) Improve SparkPlan Graph

2018-01-04 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-18368:

Description: 
The {{SparkPlan}} class does some logging to show the mapping between different 
{{SparkTran}}, what shuffle types are used, and what trans are cached. However, 
there is room for improvement.

When debug logging is enabled the RDD graph is logged, but there isn't much 
information printed about each RDD.

We should combine both of the graphs and improve them. We could even make the 
Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.

Ideally, the final graph shows a clear relationship between Tran objects, RDDs, 
and BaseWorks. Edge should include information about number of partitions, 
shuffle types, Spark operations used, etc.

  was:
The {{SparkPlan}} class does some logging to show the mapping between different 
{{SparkTran}}s, what shuffle types are used, and what trans are cached. 
However, there is room for improvement.

When debug logging is enabled the RDD graph is logged, but there isn't much 
information printed about each RDD.

We should combine both of the graphs and improve them. We could even make the 
Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.

Ideally, the final graph shows a clear relationship between Tran objects, RDDs, 
and BaseWorks. Edge should include information about number of partitions, 
shuffle types, Spark operations used, etc.


> Improve SparkPlan Graph
> ---
>
> Key: HIVE-18368
> URL: https://issues.apache.org/jira/browse/HIVE-18368
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> The {{SparkPlan}} class does some logging to show the mapping between 
> different {{SparkTran}}, what shuffle types are used, and what trans are 
> cached. However, there is room for improvement.
> When debug logging is enabled the RDD graph is logged, but there isn't much 
> information printed about each RDD.
> We should combine both of the graphs and improve them. We could even make the 
> Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.
> Ideally, the final graph shows a clear relationship between Tran objects, 
> RDDs, and BaseWorks. Edge should include information about number of 
> partitions, shuffle types, Spark operations used, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18366) Update HBaseSerDe to use hbase.mapreduce.hfileoutputformat.table.name instead of hbase.table.name as the table name property

2018-01-04 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-18366:

Attachment: HIVE-18366.1.patch

> Update HBaseSerDe to use hbase.mapreduce.hfileoutputformat.table.name instead 
> of hbase.table.name as the table name property
> 
>
> Key: HIVE-18366
> URL: https://issues.apache.org/jira/browse/HIVE-18366
> Project: Hive
>  Issue Type: Sub-task
>  Components: HBase Handler
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-18366.1.patch
>
>
> HBase 2.0 changes the table name property to 
> hbase.mapreduce.hfileoutputformat.table.name. HiveHFileOutputFormat is using 
> the new property name while HiveHBaseTableOutputFormat is not. If we create 
> the table as follows, HiveHBaseTableOutputFormat is used which still uses the 
> old property hbase.table.name.
> {noformat}
> create table hbase_table2(key int, val string) stored by 
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties 
> ('hbase.columns.mapping' = ':key,cf:val') tblproperties 
> ('hbase.mapreduce.hfileoutputformat.table.name' = 
> 'positive_hbase_handler_bulk')
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17718) Hive on Spark Debugging Improvements

2018-01-04 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17718:

Summary: Hive on Spark Debugging Improvements  (was: Spark Logging 
Improvements)

> Hive on Spark Debugging Improvements
> 
>
> Key: HIVE-17718
> URL: https://issues.apache.org/jira/browse/HIVE-17718
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> There are multiple places where it is hard to debug HoS - e.g. the HoS Remote 
> Driver and Client, the Spark RDD graph, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected

2018-01-04 Thread Andrew Sherman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16311995#comment-16311995
 ] 

Andrew Sherman commented on HIVE-18375:
---

Great bug report. 
[~minions] do you want to take a look at this?

> Cannot ORDER by subquery fields unless they are selected
> 
>
> Key: HIVE-18375
> URL: https://issues.apache.org/jira/browse/HIVE-18375
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.2
> Environment: Amazon AWS
> Release label:emr-5.11.0
> Hadoop distribution:Amazon 2.7.3
> Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1
> classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict]
>Reporter: Paul Jackson
>Priority: Minor
>
> Give these tables:
> {code:SQL}
> CREATE TABLE employees (
> emp_no  INT,
> first_name  VARCHAR(14),
> last_name   VARCHAR(16)
> );
> insert into employees values
> (1, 'Gottlob', 'Frege'),
> (2, 'Bertrand', 'Russell'),
> (3, 'Ludwig', 'Wittgenstein');
> CREATE TABLE salaries (
> emp_no  INT,
> salary  INT,
> from_date   DATE,
> to_date DATE
> );
> insert into salaries values
> (1, 10, '1900-01-01', '1900-01-31'),
> (1, 18, '1900-09-01', '1900-09-30'),
> (2, 15, '1940-03-01', '1950-01-01'),
> (3, 20, '1920-01-01', '1950-01-01');
> {code}
> This query returns the names of the employees ordered by their peak salary:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, this should still work even if the max_salary is not part of the 
> projection:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, that fails with this error:
> {code}
> Error while compiling statement: FAILED: SemanticException [Error 10004]: 
> line 9:9 Invalid table alias or column reference 't1': (possible column names 
> are: last_name, first_name)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18349) Misc metastore changes for debuggability

2018-01-04 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-18349:
-
Attachment: HIVE-18349.3.patch

Fixed TestMetaStoreEndFunctionListener test failure. Other test failures seems 
to be happening in master already.

> Misc metastore changes for debuggability
> 
>
> Key: HIVE-18349
> URL: https://issues.apache.org/jira/browse/HIVE-18349
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-18349.1.patch, HIVE-18349.2.patch, 
> HIVE-18349.3.patch
>
>
> 1) Hive metastore audit event log/metastore log does not log the final status 
> (success or failed) of the event. Some operations like for example, 
> drop_table returns a boolean success flag but it never gets logged anywhere. 
> However the same is sent to end event listeners or other metastore event 
> listeners. It will be good to log the final status of the events. 
> 2) Make connection timeout when using connection pool configurable. Currently 
> its hard coded to 30 seconds.
> 3) Provide a config to enable connection leak detection for HikariCP or 
> enable when debug logging is enabled.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18274) add AM level metrics for WM

2018-01-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16311935#comment-16311935
 ] 

Hive QA commented on HIVE-18274:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
39s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
10s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
10s{color} | {color:red} llap-tez: The patch generated 7 new + 174 unchanged - 
4 fixed = 181 total (was 178) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
11s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}  8m 32s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 3f5148d |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8445/yetus/diff-checkstyle-llap-tez.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8445/yetus/whitespace-eol.txt 
|
| modules | C: llap-tez U: llap-tez |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8445/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> add AM level metrics for WM
> ---
>
> Key: HIVE-18274
> URL: https://issues.apache.org/jira/browse/HIVE-18274
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-18274.01.patch, HIVE-18274.patch
>
>
> Unused guaranteed tasks (1 metric); guaranteed/speculative tasks x 
> updated/update in progress (4 metrics).
> It should be possible to view those over time as the query is (was) running, 
> to detect any anomalies. This jira is just to save the correct metrics.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18352) introduce a METADATAONLY option while doing REPL DUMP to allow integrations of other tools

2018-01-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16311920#comment-16311920
 ] 

Hive QA commented on HIVE-18352:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12904562/HIVE-18352.0.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 117 failed/errored test(s), 11546 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_addpartition_blobstore_to_blobstore]
 (batchId=248)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_addpartition_blobstore_to_local]
 (batchId=248)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_addpartition_blobstore_to_warehouse]
 (batchId=248)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_addpartition_local_to_blobstore]
 (batchId=248)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_blobstore_to_blobstore]
 (batchId=248)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_blobstore_to_blobstore_nonpart]
 (batchId=248)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_blobstore_to_local]
 (batchId=248)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_blobstore_to_warehouse]
 (batchId=248)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_blobstore_to_warehouse_nonpart]
 (batchId=248)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_local_to_blobstore]
 (batchId=248)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_00_nonpart_empty] 
(batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_01_nonpart] 
(batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_02_00_part_empty] 
(batchId=66)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_02_part] 
(batchId=50)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_03_nonpart_over_compat]
 (batchId=6)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_04_all_part] 
(batchId=28)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_04_evolved_parts] 
(batchId=31)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_05_some_part] 
(batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_06_one_part] 
(batchId=86)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_07_all_part_over_nonoverlap]
 (batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_08_nonpart_rename] 
(batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_09_part_spec_nonoverlap]
 (batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_10_external_managed]
 (batchId=69)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_11_managed_external]
 (batchId=69)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_12_external_location]
 (batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_13_managed_location]
 (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_14_managed_location_over_existing]
 (batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_15_external_part] 
(batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_16_part_external] 
(batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_17_part_managed] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_18_part_external] 
(batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_19_00_part_external_location]
 (batchId=66)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_19_part_external_location]
 (batchId=27)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_20_part_managed_location]
 (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_22_import_exist_authsuccess]
 (batchId=18)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_23_import_part_authsuccess]
 (batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_24_import_nonexist_authsuccess]
 (batchId=19)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_hidden_files] 
(batchId=47)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[repl_2_exim_basic] 
(batchId=78)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketsortoptimize_insert_2]
 (b

[jira] [Updated] (HIVE-18328) Improve schematool validator to report duplicate rows for column statistics

2018-01-04 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-18328:
-
Status: Patch Available  (was: Open)

> Improve schematool validator to report duplicate rows for column statistics
> ---
>
> Key: HIVE-18328
> URL: https://issues.apache.org/jira/browse/HIVE-18328
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 2.1.1
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-18328.patch
>
>
> By design, in the {{TAB_COL_STATS}} table of the HMS schema, there should be 
> ONE AND ONLY ONE row, representing its statistics, for each column defined in 
> hive. A combination of DB_NAME, TABLE_NAME and COLUMN_NAME constitute a 
> primary key/unique row.
> Each time the statistics are computed for a column, this row is updated. 
> However, if somehow via  BDR/replication process, we end up with multiple 
> rows in this table for a given column, HMS server to recompute the statistics 
> there after.
> So it would be good to detect this data anamoly via the schema validation 
> tool.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18274) add AM level metrics for WM

2018-01-04 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18274:

Attachment: HIVE-18274.01.patch

Rebased and updated the patch. I also noticed while modifying that for a new 
task, symmetrical to a finished task, only one counter needs to be updated.

> add AM level metrics for WM
> ---
>
> Key: HIVE-18274
> URL: https://issues.apache.org/jira/browse/HIVE-18274
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-18274.01.patch, HIVE-18274.patch
>
>
> Unused guaranteed tasks (1 metric); guaranteed/speculative tasks x 
> updated/update in progress (4 metrics).
> It should be possible to view those over time as the query is (was) running, 
> to detect any anomalies. This jira is just to save the correct metrics.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >