date:20161115

[jira] [Updated] (HIVE-1478) Non-boolean expression in WHERE clause throws exception

2016-11-15 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-1478:
---
Attachment: HIVE-1478.1.patch

report this problem more clearly - before the query starts executing

> Non-boolean expression in WHERE clause throws exception
> ---
>
> Key: HIVE-1478
> URL: https://issues.apache.org/jira/browse/HIVE-1478
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Zoltan Haindrich
>Priority: Minor
> Attachments: HIVE-1478.1.patch
>
>
> If the expression in the where clause does not evaluate to a boolean, the job 
> will fail with the following exception in the task logs:
> Query:
> SELECT key FROM src WHERE 1;
> Exception in mapper:
> 2010-07-21 17:00:31,460 FATAL ExecMapper: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row {"key":"238","value":"val_238"}
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:417)
>   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:180)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>   at org.apache.hadoop.mapred.Child.main(Child.java:159)
> Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.Boolean
>   at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:84)
>   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:45)
>   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:400)
>   ... 5 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-1478) Non-boolean expression in WHERE clause throws exception

2016-11-15 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-1478:
---
Status: Patch Available  (was: Open)

> Non-boolean expression in WHERE clause throws exception
> ---
>
> Key: HIVE-1478
> URL: https://issues.apache.org/jira/browse/HIVE-1478
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Zoltan Haindrich
>Priority: Minor
> Attachments: HIVE-1478.1.patch
>
>
> If the expression in the where clause does not evaluate to a boolean, the job 
> will fail with the following exception in the task logs:
> Query:
> SELECT key FROM src WHERE 1;
> Exception in mapper:
> 2010-07-21 17:00:31,460 FATAL ExecMapper: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row {"key":"238","value":"val_238"}
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:417)
>   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:180)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>   at org.apache.hadoop.mapred.Child.main(Child.java:159)
> Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.Boolean
>   at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:84)
>   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:45)
>   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:400)
>   ... 5 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15221) Improvement for MapJoin checkMemoryStatus, adding gc before throwing Exception

2016-11-15 Thread Fei Hui (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hui updated HIVE-15221:
---
Attachment: HIVE-15221.1.patch

patch uploaded
hi, [~alangates] could you please review it ?

> Improvement for MapJoin checkMemoryStatus, adding gc before throwing Exception
> --
>
> Key: HIVE-15221
> URL: https://issues.apache.org/jira/browse/HIVE-15221
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-15221.1.patch
>
>
> i see in the current master version
>  percentage = (double) usedMemory / (double) maxHeapSize;
> if  percentage > maxMemoryUsage, then throw MapJoinMemoryExhaustionException
> in my opinion, running is better than fail. after System.gc, ' if percentage 
> > maxMemoryUsage, then throw MapJoinMemoryExhaustionException' maybe better
> And original checking way has a problem: 1) consuming much memory cause gc 
> (e.g young gc), then check after adding row and pass. 2) consuming much 
> memory does not cause gc, then check after adding rows but throw Exception
> sometimes 2) occurs, but it contians less rows than 1).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15221) Improvement for MapJoin checkMemoryStatus, adding gc before throwing Exception

2016-11-15 Thread Fei Hui (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hui updated HIVE-15221:
---
Description: 
i see in the current master version

 percentage = (double) usedMemory / (double) maxHeapSize;

if  percentage > maxMemoryUsage, then throw MapJoinMemoryExhaustionException

in my opinion, running is better than fail. after System.gc, ' if percentage > 
maxMemoryUsage, then throw MapJoinMemoryExhaustionException' maybe better

And original checking way has a problem: 1) consuming much memory cause gc (e.g 
young gc), then check after adding row and pass. 2) consuming much memory does 
not cause gc, then check after adding rows but throw Exception

sometimes 2) occurs, but it contians less rows than 1).

  was:
i see in the current master version

 percentage = (double) usedMemory / (double) maxHeapSize;

if  percentage > maxMemoryUsage, then throw MapJoinMemoryExhaustionException

in my opinion, running is better than fail. after System.gc, ' if percentage > 
maxMemoryUsage, then throw MapJoinMemoryExhaustionException' maybe better

And original checking way has a problem: a) consuming much memory cause gc (e.g 
young gc), then check after adding row and pass. 2) consuming much memory does 
not cause gc, then check after adding rows but throw Exception
sometimes 2) occurs, but it contians less rows than 1).


> Improvement for MapJoin checkMemoryStatus, adding gc before throwing Exception
> --
>
> Key: HIVE-15221
> URL: https://issues.apache.org/jira/browse/HIVE-15221
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Fei Hui
>Assignee: Fei Hui
>
> i see in the current master version
>  percentage = (double) usedMemory / (double) maxHeapSize;
> if  percentage > maxMemoryUsage, then throw MapJoinMemoryExhaustionException
> in my opinion, running is better than fail. after System.gc, ' if percentage 
> > maxMemoryUsage, then throw MapJoinMemoryExhaustionException' maybe better
> And original checking way has a problem: 1) consuming much memory cause gc 
> (e.g young gc), then check after adding row and pass. 2) consuming much 
> memory does not cause gc, then check after adding rows but throw Exception
> sometimes 2) occurs, but it contians less rows than 1).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15202) Concurrent compactions for the same partition may generate malformed folder structure

2016-11-15 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669690#comment-15669690
 ] 

Rui Li commented on HIVE-15202:
---

[~ekoifman], is there any plan to implement this on Hive side? Or do you mean 
users have to avoid such concurrent compactions themselves?

> Concurrent compactions for the same partition may generate malformed folder 
> structure
> -
>
> Key: HIVE-15202
> URL: https://issues.apache.org/jira/browse/HIVE-15202
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>
> If two compactions run concurrently on a single partition, it may generate 
> folder structure like this: (nested base dir)
> {noformat}
> drwxr-xr-x   - root supergroup  0 2016-11-14 22:23 
> /user/hive/warehouse/test/z=1/base_007/base_007
> -rw-r--r--   3 root supergroup201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_0
> -rw-r--r--   3 root supergroup611 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_1
> -rw-r--r--   3 root supergroup614 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_2
> -rw-r--r--   3 root supergroup621 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_3
> -rw-r--r--   3 root supergroup621 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_4
> -rw-r--r--   3 root supergroup201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_5
> -rw-r--r--   3 root supergroup201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_6
> -rw-r--r--   3 root supergroup201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_7
> -rw-r--r--   3 root supergroup201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_8
> -rw-r--r--   3 root supergroup201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_9
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15202) Concurrent compactions for the same partition may generate malformed folder structure

2016-11-15 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669657#comment-15669657
 ] 

Eugene Koifman commented on HIVE-15202:
---

The right solution would be not to allow 2 concurrent compactions on the same 
partition.

> Concurrent compactions for the same partition may generate malformed folder 
> structure
> -
>
> Key: HIVE-15202
> URL: https://issues.apache.org/jira/browse/HIVE-15202
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>
> If two compactions run concurrently on a single partition, it may generate 
> folder structure like this: (nested base dir)
> {noformat}
> drwxr-xr-x   - root supergroup  0 2016-11-14 22:23 
> /user/hive/warehouse/test/z=1/base_007/base_007
> -rw-r--r--   3 root supergroup201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_0
> -rw-r--r--   3 root supergroup611 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_1
> -rw-r--r--   3 root supergroup614 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_2
> -rw-r--r--   3 root supergroup621 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_3
> -rw-r--r--   3 root supergroup621 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_4
> -rw-r--r--   3 root supergroup201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_5
> -rw-r--r--   3 root supergroup201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_6
> -rw-r--r--   3 root supergroup201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_7
> -rw-r--r--   3 root supergroup201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_8
> -rw-r--r--   3 root supergroup201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_9
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15204) Hive-Hbase integration thorws "java.lang.ClassNotFoundException: NULL::character varying" (Postgres)

2016-11-15 Thread Anshuman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669627#comment-15669627
 ] 

Anshuman commented on HIVE-15204:
-

Tested, setting datanucleus.rdbms.initializeColumnInfo as NULL is not resolving 
the issue.

PropConfigured:

datanucleus.rdbms.initializeColumnInfo
NONE
DN default on Postgres Bug


Result:
FAILED: RuntimeException java.lang.ClassNotFoundException: NULL::character 
varying

> Hive-Hbase integration thorws "java.lang.ClassNotFoundException: 
> NULL::character varying" (Postgres)
> 
>
> Key: HIVE-15204
> URL: https://issues.apache.org/jira/browse/HIVE-15204
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 2.1.0
> Environment: apache-hive-2.1.0-bin
> hbase-1.1.1
>Reporter: Anshuman
>  Labels: Postgres
>
> When doing hive to hbase integration, we have observed that current Apache 
> Hive 2.x is not able to recognise 'NULL::character varying' (Variant data 
> type of NULL in prostgres) properly and throws the 
> java.lang.ClassNotFoundException exception.
> Exception:
> ERROR ql.Driver: FAILED: RuntimeException java.lang.ClassNotFoundException: 
> NULL::character varying
> java.lang.RuntimeException: java.lang.ClassNotFoundException: NULL::character 
> varying
> 
> Caused by: java.lang.ClassNotFoundException: NULL::character varying
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> Reason:
> org.apache.hadoop.hive.ql.metadata.Table.java
> final public Class getInputFormatClass() {
> if (inputFormatClass == null) {
>   try {
> String className = tTable.getSd().getInputFormat();
> if (className == null) {  /*If the className is one of the postgres 
> variant of NULL i.e. 'NULL::character varying' control is going to else block 
> and throwing error.*/
>   if (getStorageHandler() == null) {
> return null;
>   }
>   inputFormatClass = getStorageHandler().getInputFormatClass();
> } else {
>   inputFormatClass = (Class)
> Class.forName(className, true, 
> Utilities.getSessionSpecifiedClassLoader());
> }
>   } catch (ClassNotFoundException e) {
> throw new RuntimeException(e);
>   }
> }
> return inputFormatClass;
>   }
> Steps to reproduce:
> Hive 2.x (e.g. apache-hive-2.1.0-bin) and HBase (e.g. hbase-1.1.1)
> 1. Install and configure Hive, if it is not already installed.
> 2. Install and configure HBase, if it is not already installed.
> 3. Configure the hive-site.xml File (as per recommended steps)
> 4. Provide necessary jars to Hive (as per recommended steps)
> 4. Create table in HBase as shown below -
> create 'hivehbase', 'ratings'
> put 'hivehbase', 'row1', 'ratings:userid', 'user1'
> put 'hivehbase', 'row1', 'ratings:bookid', 'book1'
> put 'hivehbase', 'row1', 'ratings:rating', '1'
>  
> put 'hivehbase', 'row2', 'ratings:userid', 'user2'
> put 'hivehbase', 'row2', 'ratings:bookid', 'book1'
> put 'hivehbase', 'row2', 'ratings:rating', '3'
>  
> put 'hivehbase', 'row3', 'ratings:userid', 'user2'
> put 'hivehbase', 'row3', 'ratings:bookid', 'book2'
> put 'hivehbase', 'row3', 'ratings:rating', '3'
>  
> put 'hivehbase', 'row4', 'ratings:userid', 'user2'
> put 'hivehbase', 'row4', 'ratings:bookid', 'book4'
> put 'hivehbase', 'row4', 'ratings:rating', '1'
> 5. Create external table as shown below 
> CREATE EXTERNAL TABLE hbasehive_table
> (key string, userid string,bookid string,rating int) 
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES 
> ("hbase.columns.mapping" = 
> ":key,ratings:userid,ratings:bookid,ratings:rating")
> TBLPROPERTIES ("hbase.table.name" = "hivehbase");
> 6. select * from hbasehive_table;
> FAILED: RuntimeException java.lang.ClassNotFoundException: NULL::character 
> varying



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15129) LLAP : Enhance cache hits for stripe metadata across queries

2016-11-15 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-15129:
-
Assignee: Rajesh Balamohan

> LLAP : Enhance cache hits for stripe metadata across queries
> 
>
> Key: HIVE-15129
> URL: https://issues.apache.org/jira/browse/HIVE-15129
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15129.1.patch, HIVE-15129.2.patch, 
> HIVE-15129.3.patch
>
>
> When multiple queries are run in LLAP, stripe metadata cache misses were 
> observed even though enough memory was available. 
> https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java#L655.
>  Even in cases when data was found in cache, it wasn't getting used as 
> {{globalnc}} changed from query to query.  Creating a superset of existing 
> indexes with {{globalInc}} would be helpful. 
> This would be lot more beneficial in cloud storage where opening and reading 
> small of data can be expensive compared to HDFS. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15129) LLAP : Enhance cache hits for stripe metadata across queries

2016-11-15 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-15129:
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks [~rajesh.balamohan] for the patch!

> LLAP : Enhance cache hits for stripe metadata across queries
> 
>
> Key: HIVE-15129
> URL: https://issues.apache.org/jira/browse/HIVE-15129
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15129.1.patch, HIVE-15129.2.patch, 
> HIVE-15129.3.patch
>
>
> When multiple queries are run in LLAP, stripe metadata cache misses were 
> observed even though enough memory was available. 
> https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java#L655.
>  Even in cases when data was found in cache, it wasn't getting used as 
> {{globalnc}} changed from query to query.  Creating a superset of existing 
> indexes with {{globalInc}} would be helpful. 
> This would be lot more beneficial in cloud storage where opening and reading 
> small of data can be expensive compared to HDFS. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15129) LLAP : Enhance cache hits for stripe metadata across queries

2016-11-15 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-15129:
-
Affects Version/s: 2.2.0

> LLAP : Enhance cache hits for stripe metadata across queries
> 
>
> Key: HIVE-15129
> URL: https://issues.apache.org/jira/browse/HIVE-15129
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15129.1.patch, HIVE-15129.2.patch, 
> HIVE-15129.3.patch
>
>
> When multiple queries are run in LLAP, stripe metadata cache misses were 
> observed even though enough memory was available. 
> https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java#L655.
>  Even in cases when data was found in cache, it wasn't getting used as 
> {{globalnc}} changed from query to query.  Creating a superset of existing 
> indexes with {{globalInc}} would be helpful. 
> This would be lot more beneficial in cloud storage where opening and reading 
> small of data can be expensive compared to HDFS. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15219) LLAP: Allow additional slider global parameters to be set while creating the LLAP package

2016-11-15 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669588#comment-15669588
 ] 

Hive QA commented on HIVE-15219:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12839105/HIVE-15219.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10680 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=92)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2146/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2146/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2146/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12839105 - PreCommit-HIVE-Build

> LLAP: Allow additional slider global parameters to be set while creating the 
> LLAP package
> -
>
> Key: HIVE-15219
> URL: https://issues.apache.org/jira/browse/HIVE-15219
> Project: Hive
>  Issue Type: Task
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-15219.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15220) WebHCat test driver not capturing end time of test accurately

2016-11-15 Thread Deepesh Khandelwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepesh Khandelwal updated HIVE-15220:
--
Status: Patch Available  (was: Open)

> WebHCat test driver not capturing end time of test accurately
> -
>
> Key: HIVE-15220
> URL: https://issues.apache.org/jira/browse/HIVE-15220
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Deepesh Khandelwal
>Assignee: Deepesh Khandelwal
>Priority: Trivial
> Attachments: HIVE-15220.1.patch
>
>
> Webhcat e2e testsuite prints message while ending test run:
> {noformat}
> Ending test  at 1479264720
> {noformat}
> Currently it is not capturing the end time correctly.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15220) WebHCat test driver not capturing end time of test accurately

2016-11-15 Thread Deepesh Khandelwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepesh Khandelwal updated HIVE-15220:
--
Attachment: HIVE-15220.1.patch

Attaching the patch with the change.
[~thejas] [~daijy] can you please review?

> WebHCat test driver not capturing end time of test accurately
> -
>
> Key: HIVE-15220
> URL: https://issues.apache.org/jira/browse/HIVE-15220
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Deepesh Khandelwal
>Assignee: Deepesh Khandelwal
>Priority: Trivial
> Attachments: HIVE-15220.1.patch
>
>
> Webhcat e2e testsuite prints message while ending test run:
> {noformat}
> Ending test  at 1479264720
> {noformat}
> Currently it is not capturing the end time correctly.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15220) WebHCat test driver not capturing end time of test accurately

2016-11-15 Thread Deepesh Khandelwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepesh Khandelwal updated HIVE-15220:
--
Summary: WebHCat test driver not capturing end time of test accurately  
(was: WebHCat test not capturing end time of test accurately)

> WebHCat test driver not capturing end time of test accurately
> -
>
> Key: HIVE-15220
> URL: https://issues.apache.org/jira/browse/HIVE-15220
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Deepesh Khandelwal
>Assignee: Deepesh Khandelwal
>Priority: Trivial
>
> Webhcat e2e testsuite prints message while ending test run:
> {noformat}
> Ending test  at 1479264720
> {noformat}
> Currently it is not capturing the end time correctly.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15218) Kyro Exception on subsequent run of a query in LLAP mode

2016-11-15 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669501#comment-15669501
 ] 

Hive QA commented on HIVE-15218:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12839101/HIVE-15218.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 189 failed/errored test(s), 10694 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_9] 
(batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join0] (batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join31] (batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_stats] 
(batchId=43)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_without_localtask]
 (batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_14] 
(batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_15] 
(batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_1] 
(batchId=41)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=43)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_3] 
(batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_4] 
(batchId=56)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_5] 
(batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_7] 
(batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_4]
 (batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_5]
 (batchId=52)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_cross_product_check_2]
 (batchId=18)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlationoptimizer5] 
(batchId=63)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cross_product_check_2] 
(batchId=79)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[identity_project_remove_skip]
 (batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join29] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join31] (batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[runtime_skewjoin_mapjoin_spark]
 (batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[skewjoin] (batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[skewjoin_onesideskew] 
(batchId=64)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_25] 
(batchId=7)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subq_where_serialization]
 (batchId=77)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_in_having] 
(batchId=52)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_multiinsert] 
(batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[tez_join_hash] 
(batchId=47)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union22] (batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_mapjoin_reduce] 
(batchId=70)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[dynamic_partition_pruning_2]
 (batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[except_distinct] 
(batchId=134)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[explainuser_2] 
(batchId=134)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[intersect_all] 
(batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[intersect_distinct]
 (batchId=134)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[intersect_merge] 
(batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_nullscan] 
(batchId=131)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[tez_union_dynamic_partition]
 (batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_sortmerge_join_10]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cluster] 
(batchId=141)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[column_access_stats]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[constprog_dpp]
 (batchId=138)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_partition_pruning]
 (batchId=141)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1]

[jira] [Commented] (HIVE-15204) Hive-Hbase integration thorws "java.lang.ClassNotFoundException: NULL::character varying" (Postgres)

2016-11-15 Thread Anshuman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669397#comment-15669397
 ] 

Anshuman commented on HIVE-15204:
-

This is almost a show-stopper for 2.1.0 hive-hbase users. Can we plan for 2.1.1 
? 

> Hive-Hbase integration thorws "java.lang.ClassNotFoundException: 
> NULL::character varying" (Postgres)
> 
>
> Key: HIVE-15204
> URL: https://issues.apache.org/jira/browse/HIVE-15204
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 2.1.0
> Environment: apache-hive-2.1.0-bin
> hbase-1.1.1
>Reporter: Anshuman
>  Labels: Postgres
>
> When doing hive to hbase integration, we have observed that current Apache 
> Hive 2.x is not able to recognise 'NULL::character varying' (Variant data 
> type of NULL in prostgres) properly and throws the 
> java.lang.ClassNotFoundException exception.
> Exception:
> ERROR ql.Driver: FAILED: RuntimeException java.lang.ClassNotFoundException: 
> NULL::character varying
> java.lang.RuntimeException: java.lang.ClassNotFoundException: NULL::character 
> varying
> 
> Caused by: java.lang.ClassNotFoundException: NULL::character varying
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> Reason:
> org.apache.hadoop.hive.ql.metadata.Table.java
> final public Class getInputFormatClass() {
> if (inputFormatClass == null) {
>   try {
> String className = tTable.getSd().getInputFormat();
> if (className == null) {  /*If the className is one of the postgres 
> variant of NULL i.e. 'NULL::character varying' control is going to else block 
> and throwing error.*/
>   if (getStorageHandler() == null) {
> return null;
>   }
>   inputFormatClass = getStorageHandler().getInputFormatClass();
> } else {
>   inputFormatClass = (Class)
> Class.forName(className, true, 
> Utilities.getSessionSpecifiedClassLoader());
> }
>   } catch (ClassNotFoundException e) {
> throw new RuntimeException(e);
>   }
> }
> return inputFormatClass;
>   }
> Steps to reproduce:
> Hive 2.x (e.g. apache-hive-2.1.0-bin) and HBase (e.g. hbase-1.1.1)
> 1. Install and configure Hive, if it is not already installed.
> 2. Install and configure HBase, if it is not already installed.
> 3. Configure the hive-site.xml File (as per recommended steps)
> 4. Provide necessary jars to Hive (as per recommended steps)
> 4. Create table in HBase as shown below -
> create 'hivehbase', 'ratings'
> put 'hivehbase', 'row1', 'ratings:userid', 'user1'
> put 'hivehbase', 'row1', 'ratings:bookid', 'book1'
> put 'hivehbase', 'row1', 'ratings:rating', '1'
>  
> put 'hivehbase', 'row2', 'ratings:userid', 'user2'
> put 'hivehbase', 'row2', 'ratings:bookid', 'book1'
> put 'hivehbase', 'row2', 'ratings:rating', '3'
>  
> put 'hivehbase', 'row3', 'ratings:userid', 'user2'
> put 'hivehbase', 'row3', 'ratings:bookid', 'book2'
> put 'hivehbase', 'row3', 'ratings:rating', '3'
>  
> put 'hivehbase', 'row4', 'ratings:userid', 'user2'
> put 'hivehbase', 'row4', 'ratings:bookid', 'book4'
> put 'hivehbase', 'row4', 'ratings:rating', '1'
> 5. Create external table as shown below 
> CREATE EXTERNAL TABLE hbasehive_table
> (key string, userid string,bookid string,rating int) 
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES 
> ("hbase.columns.mapping" = 
> ":key,ratings:userid,ratings:bookid,ratings:rating")
> TBLPROPERTIES ("hbase.table.name" = "hivehbase");
> 6. select * from hbasehive_table;
> FAILED: RuntimeException java.lang.ClassNotFoundException: NULL::character 
> varying



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15217) Add watch mode to llap status tool

2016-11-15 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669392#comment-15669392
 ] 

Hive QA commented on HIVE-15217:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12839100/HIVE-15217.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10679 tests 
executed
*Failed tests:*
{noformat}
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=108)

[tez_joins_explain.q,transform2.q,groupby5.q,cbo_semijoin.q,bucketmapjoin13.q,union_remove_6_subq.q,groupby2_map_multi_distinct.q,load_dyn_part9.q,multi_insert_gby2.q,vectorization_11.q,groupby_position.q,avro_compression_enabled_native.q,smb_mapjoin_8.q,join21.q,auto_join16.q]
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_bulk] 
(batchId=89)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=91)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2144/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2144/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2144/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12839100 - PreCommit-HIVE-Build

> Add watch mode to llap status tool
> --
>
> Key: HIVE-15217
> URL: https://issues.apache.org/jira/browse/HIVE-15217
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Minor
> Attachments: HIVE-15217.1.patch
>
>
> There is few seconds overhead for launching the llap status command. To avoid 
> we can add "watch" mode to llap status tool that refreshes the status after 
> configured interval. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15197) count and sum query on empty table, returning empty output

2016-11-15 Thread vishal.rajan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vishal.rajan updated HIVE-15197:

Description: 
When the below query is run in hive 1.2.0  it returns  'NULLNULL0' on 
empty table but when the same query is run on hive 2.1.0, nothing is returned 
on empty table.(both tables are ORC external tables)
hive 1.2.0-
hive>  SELECT sum(destination_pincode),sum(length(source_city)),count(*)  from 
test_stage.geo_zone;
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 4.79 sec   HDFS Read: 7354 HDFS Write: 
114 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 790 msec
OK
NULL NULL0
Time taken: 38.168 seconds, Fetched: 1 row(s)

-hive 2.1.0-
hive> SELECT sum(destination_pincode),sum(length(source_city)),count(*)  from 
test_stage.geo_zone
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) 
or using Hive 1.X releases.

Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
2016-11-14 19:06:15,421 WARN  [Thread-215] mapreduce.JobResourceUploader 
(JobResourceUploader.java:uploadFiles(64)) - Hadoop command-line option parsing 
not performed. Implement the Tool interface and execute your application with 
ToolRunner to remedy this.
2016-11-14 19:06:19,222 INFO  [Thread-215] input.FileInputFormat 
(FileInputFormat.java:listStatus(283)) - Total input paths to process : 1
2016-11-14 19:06:20,000 INFO  [Thread-215] mapreduce.JobSubmitter 
(JobSubmitter.java:submitJobInternal(198)) - number of splits:0
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0

2016-11-14 19:06:39,405 Stage-1 map = 0%,  reduce = 0%
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 28.302 seconds



  was:
When the below query is run in hive 1.2.0  it returns  'NULLNULL0' on 
empty table but when the same query is run on hive 2.1.0, nothing is returned 
on empty table.
hive 1.2.0(ORC)-
hive>  SELECT sum(destination_pincode),sum(length(source_city)),count(*)  from 
test_stage.geo_zone;
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 4.79 sec   HDFS Read: 7354 HDFS Write: 
114 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 790 msec
OK
NULL NULL0
Time taken: 38.168 seconds, Fetched: 1 row(s)

-hive 2.1.0(ORC)-
hive> SELECT sum(destination_pincode),sum(length(source_city)),count(*)  from 
test_stage.geo_zone
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) 
or using Hive 1.X releases.

Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
2016-11-14 19:06:15,421 WARN  [Thread-215] mapreduce.JobResourceUploader 
(JobResourceUploader.java:uploadFiles(64)) - Hadoop command-line option parsing 
not performed. Implement the Tool interface and execute your application with 
ToolRunner to remedy this.
2016-11-14 19:06:19,222 INFO  [Thread-215] input.FileInputFormat 
(FileInputFormat.java:listStatus(283)) - Total input paths to process : 1
2016-11-14 19:06:20,000 INFO  [Thread-215] mapreduce.JobSubmitter 
(JobSubmitter.java:submitJobInternal(198)) - number of splits:0
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0

2016-11-14 19:06:39,405 Stage-1 map = 0%,  reduce = 0%
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 28.302 seconds




> count and sum query on empty table, returning empty output 
> ---
>
> Key: HIVE-15197
> URL: https://issues.apache.org/jira/browse/HIVE-15197
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.0.0, 2.1.0, 2.0.1
>Reporter: vishal.rajan
>
> When the below query is run in hive 1.2.0  it returns  'NULL  NULL0' on 
> empty table but when the same query is run on hive 2.1.0, nothing is returned 
> on empty table.(both tables are ORC external tables)
> hive 1.2.0-
> hive>  SELECT sum(destination_pincode),sum(length(source_city)),count(*)  
> from test_stage.geo_zone;
> MapReduce Jobs Launched: 
> Stage-Stage-1: Map: 1   Cumulative CPU: 4.79 sec   HDFS Read: 7354 HDFS 
> Write: 114 SUCCESS
> Total MapReduce CPU Time Spent: 4 seconds 790 msec
> OK
> NULL   NULL0
> Time taken: 38.168 seconds, Fetched: 1 row(s)
> -hive 2.1.0-
>

[jira] [Updated] (HIVE-15197) count and sum query on empty table, returning empty output

2016-11-15 Thread vishal.rajan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vishal.rajan updated HIVE-15197:

Description: 
When the below query is run in hive 1.2.0  it returns  'NULLNULL0' on 
empty table but when the same query is run on hive 2.1.0, nothing is returned 
on empty table.
hive 1.2.0(ORC)-
hive>  SELECT sum(destination_pincode),sum(length(source_city)),count(*)  from 
test_stage.geo_zone;
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 4.79 sec   HDFS Read: 7354 HDFS Write: 
114 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 790 msec
OK
NULL NULL0
Time taken: 38.168 seconds, Fetched: 1 row(s)

-hive 2.1.0(ORC)-
hive> SELECT sum(destination_pincode),sum(length(source_city)),count(*)  from 
test_stage.geo_zone
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) 
or using Hive 1.X releases.

Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
2016-11-14 19:06:15,421 WARN  [Thread-215] mapreduce.JobResourceUploader 
(JobResourceUploader.java:uploadFiles(64)) - Hadoop command-line option parsing 
not performed. Implement the Tool interface and execute your application with 
ToolRunner to remedy this.
2016-11-14 19:06:19,222 INFO  [Thread-215] input.FileInputFormat 
(FileInputFormat.java:listStatus(283)) - Total input paths to process : 1
2016-11-14 19:06:20,000 INFO  [Thread-215] mapreduce.JobSubmitter 
(JobSubmitter.java:submitJobInternal(198)) - number of splits:0
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0

2016-11-14 19:06:39,405 Stage-1 map = 0%,  reduce = 0%
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 28.302 seconds



  was:
When the below query is run in hive 1.2.0  it returns  'NULLNULL0' on 
empty table but when the same query is run on hive 2.1.0, nothing is returned 
on empty table.
hive 1.2.0 (avro table)-
hive>  SELECT sum(destination_pincode),sum(length(source_city)),count(*)  from 
test_stage.geo_zone;
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 4.79 sec   HDFS Read: 7354 HDFS Write: 
114 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 790 msec
OK
NULL NULL0
Time taken: 38.168 seconds, Fetched: 1 row(s)

-hive 2.1.0(ORC)-
hive> SELECT sum(destination_pincode),sum(length(source_city)),count(*)  from 
test_stage.geo_zone
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) 
or using Hive 1.X releases.

Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
2016-11-14 19:06:15,421 WARN  [Thread-215] mapreduce.JobResourceUploader 
(JobResourceUploader.java:uploadFiles(64)) - Hadoop command-line option parsing 
not performed. Implement the Tool interface and execute your application with 
ToolRunner to remedy this.
2016-11-14 19:06:19,222 INFO  [Thread-215] input.FileInputFormat 
(FileInputFormat.java:listStatus(283)) - Total input paths to process : 1
2016-11-14 19:06:20,000 INFO  [Thread-215] mapreduce.JobSubmitter 
(JobSubmitter.java:submitJobInternal(198)) - number of splits:0
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0

2016-11-14 19:06:39,405 Stage-1 map = 0%,  reduce = 0%
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 28.302 seconds




> count and sum query on empty table, returning empty output 
> ---
>
> Key: HIVE-15197
> URL: https://issues.apache.org/jira/browse/HIVE-15197
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.0.0, 2.1.0, 2.0.1
>Reporter: vishal.rajan
>
> When the below query is run in hive 1.2.0  it returns  'NULL  NULL0' on 
> empty table but when the same query is run on hive 2.1.0, nothing is returned 
> on empty table.
> hive 1.2.0(ORC)-
> hive>  SELECT sum(destination_pincode),sum(length(source_city)),count(*)  
> from test_stage.geo_zone;
> MapReduce Jobs Launched: 
> Stage-Stage-1: Map: 1   Cumulative CPU: 4.79 sec   HDFS Read: 7354 HDFS 
> Write: 114 SUCCESS
> Total MapReduce CPU Time Spent: 4 seconds 790 msec
> OK
> NULL   NULL0
> Time taken: 38.168 seconds, Fetched: 1 row(s)
> -hive 2.1.0(ORC)-
> hive> SELECT

[jira] [Commented] (HIVE-10901) Optimize mutli column distinct queries

2016-11-15 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669316#comment-15669316
 ] 

Pengcheng Xiong commented on HIVE-10901:


[~gopalv], I am not sure how many reducers were used in jenkins, but it may be 
related to what you described last time.

> Optimize  mutli column distinct queries 
> 
>
> Key: HIVE-10901
> URL: https://issues.apache.org/jira/browse/HIVE-10901
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO, Logical Optimizer
>Affects Versions: 1.2.0
>Reporter: Mostafa Mokhtar
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10901.02.patch, HIVE-10901.03.patch, 
> HIVE-10901.patch
>
>
> HIVE-10568 is useful only when there is a distinct on one column. It can be 
> expanded for multiple column cases too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15202) Concurrent compactions for the same partition may generate malformed folder structure

2016-11-15 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669292#comment-15669292
 ] 

Rui Li commented on HIVE-15202:
---

Also pinging [~ekoifman]. Seems you're quite knowledgeable about transactions :)

> Concurrent compactions for the same partition may generate malformed folder 
> structure
> -
>
> Key: HIVE-15202
> URL: https://issues.apache.org/jira/browse/HIVE-15202
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>
> If two compactions run concurrently on a single partition, it may generate 
> folder structure like this: (nested base dir)
> {noformat}
> drwxr-xr-x   - root supergroup  0 2016-11-14 22:23 
> /user/hive/warehouse/test/z=1/base_007/base_007
> -rw-r--r--   3 root supergroup201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_0
> -rw-r--r--   3 root supergroup611 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_1
> -rw-r--r--   3 root supergroup614 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_2
> -rw-r--r--   3 root supergroup621 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_3
> -rw-r--r--   3 root supergroup621 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_4
> -rw-r--r--   3 root supergroup201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_5
> -rw-r--r--   3 root supergroup201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_6
> -rw-r--r--   3 root supergroup201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_7
> -rw-r--r--   3 root supergroup201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_8
> -rw-r--r--   3 root supergroup201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_9
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14982) Remove some reserved keywords in 2.2

2016-11-15 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669264#comment-15669264
 ] 

Hive QA commented on HIVE-14982:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12839090/HIVE-14982.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10680 tests 
executed
*Failed tests:*
{noformat}
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=108)

[tez_joins_explain.q,transform2.q,groupby5.q,cbo_semijoin.q,bucketmapjoin13.q,union_remove_6_subq.q,groupby2_map_multi_distinct.q,load_dyn_part9.q,multi_insert_gby2.q,vectorization_11.q,groupby_position.q,avro_compression_enabled_native.q,smb_mapjoin_8.q,join21.q,auto_join16.q]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hive.service.cli.TestEmbeddedThriftBinaryCLIService.testTaskStatus 
(batchId=207)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2143/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2143/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2143/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12839090 - PreCommit-HIVE-Build

> Remove some reserved keywords in 2.2
> 
>
> Key: HIVE-14982
> URL: https://issues.apache.org/jira/browse/HIVE-14982
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14982.01.patch
>
>
> It seems that CACHE, DAYOFWEEK, VIEWS are reserved keywords in master. This 
> conflicts with SQL2011 standard.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15218) Kyro Exception on subsequent run of a query in LLAP mode

2016-11-15 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669260#comment-15669260
 ] 

Gopal V commented on HIVE-15218:


LGTM +1 - tests pending.

The Statistics object doesn't seem to be referred on the execution side at all.

The question remains about why the {{removeField(kryo, 
AbstractOperatorDesc.class, "statistics");}} doesn't remove this field when 
serializing in the first place.

> Kyro Exception on subsequent run of a query in LLAP mode
> 
>
> Key: HIVE-15218
> URL: https://issues.apache.org/jira/browse/HIVE-15218
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15218.1.patch
>
>
> Following exception is observed when running TPCDS query19 during concurrency 
> test
> {code}
> Vertex failed, vertexName=Map 3, vertexId=vertex_1477340478603_0610_9_05, 
> diagnostics=[Task failed, taskId=task_1477340478603_0610_9_05_06, 
> diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed, info=[Error: Error 
> while running task ( failure ) : 
> attempt_1477340478603_0610_9_05_06_1:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: 
> Failed to load plan: 
> hdfs://cn105-10.l42scl.hortonworks.com:8020/tmp/hive/ndembla/0559ce24-663e-482a-a0ea-106d220b53be/hi...
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.RuntimeException: Failed to load plan: 
> hdfs://cn105-10.l42scl.hortonworks.com:8020/tmp/hive/ndembla/0559ce24-663e-482a-a0ea-106d220b53be/hi...
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieve(ObjectCache.java:57)
>   at 
> org.apache.hadoop.hive.ql.exec.ObjectCacheWrapper.retrieve(ObjectCacheWrapper.java:40)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:129)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:184)
>   ... 15 more
> Caused by: java.lang.RuntimeException: Failed to load plan: 
> hdfs://cn105-10.l42scl.hortonworks.com:8020/tmp/hive/ndembla/0559ce24-663e-482a-a0ea-106d220b53be/hi...
>   at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:469)
>   at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:305)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor$1.call(MapRecordProcessor.java:132)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieve(ObjectCache.java:55)
>   ... 18 more
> Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> Encountered unregistered class ID: 63
> Serialization trace:
> statistics (org.apache.hadoop.hive.ql.plan.TableScanDesc)
> conf (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
>   at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
>   at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:182)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>   at

[jira] [Commented] (HIVE-15219) LLAP: Allow additional slider global parameters to be set while creating the LLAP package

2016-11-15 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669236#comment-15669236
 ] 

Gopal V commented on HIVE-15219:


Does the user have to provide properly escaped json or some form of serialized 
string for this to work?

> LLAP: Allow additional slider global parameters to be set while creating the 
> LLAP package
> -
>
> Key: HIVE-15219
> URL: https://issues.apache.org/jira/browse/HIVE-15219
> Project: Hive
>  Issue Type: Task
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-15219.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15219) LLAP: Allow additional slider global parameters to be set while creating the LLAP package

2016-11-15 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-15219:
--
Status: Patch Available  (was: Open)

> LLAP: Allow additional slider global parameters to be set while creating the 
> LLAP package
> -
>
> Key: HIVE-15219
> URL: https://issues.apache.org/jira/browse/HIVE-15219
> Project: Hive
>  Issue Type: Task
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-15219.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15219) LLAP: Allow additional slider global parameters to be set while creating the LLAP package

2016-11-15 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-15219:
--
Attachment: HIVE-15219.patch

[~gopalv] - could you please take a look.

The initial requirement was for the UI port to be specified. Think it's better 
to allow a free form string to set additional parameters, instead of creating a 
new parameter in the llap cli for every parameter.

> LLAP: Allow additional slider global parameters to be set while creating the 
> LLAP package
> -
>
> Key: HIVE-15219
> URL: https://issues.apache.org/jira/browse/HIVE-15219
> Project: Hive
>  Issue Type: Task
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-15219.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15218) Kyro Exception on subsequent run of a query in LLAP mode

2016-11-15 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-15218:
-
Status: Patch Available  (was: Open)

> Kyro Exception on subsequent run of a query in LLAP mode
> 
>
> Key: HIVE-15218
> URL: https://issues.apache.org/jira/browse/HIVE-15218
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15218.1.patch
>
>
> Following exception is observed when running TPCDS query19 during concurrency 
> test
> {code}
> Vertex failed, vertexName=Map 3, vertexId=vertex_1477340478603_0610_9_05, 
> diagnostics=[Task failed, taskId=task_1477340478603_0610_9_05_06, 
> diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed, info=[Error: Error 
> while running task ( failure ) : 
> attempt_1477340478603_0610_9_05_06_1:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: 
> Failed to load plan: 
> hdfs://cn105-10.l42scl.hortonworks.com:8020/tmp/hive/ndembla/0559ce24-663e-482a-a0ea-106d220b53be/hi...
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.RuntimeException: Failed to load plan: 
> hdfs://cn105-10.l42scl.hortonworks.com:8020/tmp/hive/ndembla/0559ce24-663e-482a-a0ea-106d220b53be/hi...
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieve(ObjectCache.java:57)
>   at 
> org.apache.hadoop.hive.ql.exec.ObjectCacheWrapper.retrieve(ObjectCacheWrapper.java:40)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:129)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:184)
>   ... 15 more
> Caused by: java.lang.RuntimeException: Failed to load plan: 
> hdfs://cn105-10.l42scl.hortonworks.com:8020/tmp/hive/ndembla/0559ce24-663e-482a-a0ea-106d220b53be/hi...
>   at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:469)
>   at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:305)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor$1.call(MapRecordProcessor.java:132)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieve(ObjectCache.java:55)
>   ... 18 more
> Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> Encountered unregistered class ID: 63
> Serialization trace:
> statistics (org.apache.hadoop.hive.ql.plan.TableScanDesc)
> conf (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
>   at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
>   at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:182)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>   at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
>   at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:215)
>   at 
>

[jira] [Commented] (HIVE-15218) Kyro Exception on subsequent run of a query in LLAP mode

2016-11-15 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669145#comment-15669145
 ] 

Prasanth Jayachandran commented on HIVE-15218:
--

[~gopalv] found that HIVE-8769 changed statistics object from being transient 
to non-transient. 

> Kyro Exception on subsequent run of a query in LLAP mode
> 
>
> Key: HIVE-15218
> URL: https://issues.apache.org/jira/browse/HIVE-15218
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15218.1.patch
>
>
> Following exception is observed when running TPCDS query19 during concurrency 
> test
> {code}
> Vertex failed, vertexName=Map 3, vertexId=vertex_1477340478603_0610_9_05, 
> diagnostics=[Task failed, taskId=task_1477340478603_0610_9_05_06, 
> diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed, info=[Error: Error 
> while running task ( failure ) : 
> attempt_1477340478603_0610_9_05_06_1:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: 
> Failed to load plan: 
> hdfs://cn105-10.l42scl.hortonworks.com:8020/tmp/hive/ndembla/0559ce24-663e-482a-a0ea-106d220b53be/hi...
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.RuntimeException: Failed to load plan: 
> hdfs://cn105-10.l42scl.hortonworks.com:8020/tmp/hive/ndembla/0559ce24-663e-482a-a0ea-106d220b53be/hi...
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieve(ObjectCache.java:57)
>   at 
> org.apache.hadoop.hive.ql.exec.ObjectCacheWrapper.retrieve(ObjectCacheWrapper.java:40)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:129)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:184)
>   ... 15 more
> Caused by: java.lang.RuntimeException: Failed to load plan: 
> hdfs://cn105-10.l42scl.hortonworks.com:8020/tmp/hive/ndembla/0559ce24-663e-482a-a0ea-106d220b53be/hi...
>   at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:469)
>   at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:305)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor$1.call(MapRecordProcessor.java:132)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieve(ObjectCache.java:55)
>   ... 18 more
> Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> Encountered unregistered class ID: 63
> Serialization trace:
> statistics (org.apache.hadoop.hive.ql.plan.TableScanDesc)
> conf (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
>   at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
>   at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:182)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>   at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
>   at 
>

[jira] [Updated] (HIVE-15218) Kyro Exception on subsequent run of a query in LLAP mode

2016-11-15 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-15218:
-
Attachment: HIVE-15218.1.patch

[~gopalv] can you please review this change?

> Kyro Exception on subsequent run of a query in LLAP mode
> 
>
> Key: HIVE-15218
> URL: https://issues.apache.org/jira/browse/HIVE-15218
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15218.1.patch
>
>
> Following exception is observed when running TPCDS query19 during concurrency 
> test
> {code}
> Vertex failed, vertexName=Map 3, vertexId=vertex_1477340478603_0610_9_05, 
> diagnostics=[Task failed, taskId=task_1477340478603_0610_9_05_06, 
> diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed, info=[Error: Error 
> while running task ( failure ) : 
> attempt_1477340478603_0610_9_05_06_1:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: 
> Failed to load plan: 
> hdfs://cn105-10.l42scl.hortonworks.com:8020/tmp/hive/ndembla/0559ce24-663e-482a-a0ea-106d220b53be/hi...
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.RuntimeException: Failed to load plan: 
> hdfs://cn105-10.l42scl.hortonworks.com:8020/tmp/hive/ndembla/0559ce24-663e-482a-a0ea-106d220b53be/hi...
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieve(ObjectCache.java:57)
>   at 
> org.apache.hadoop.hive.ql.exec.ObjectCacheWrapper.retrieve(ObjectCacheWrapper.java:40)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:129)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:184)
>   ... 15 more
> Caused by: java.lang.RuntimeException: Failed to load plan: 
> hdfs://cn105-10.l42scl.hortonworks.com:8020/tmp/hive/ndembla/0559ce24-663e-482a-a0ea-106d220b53be/hi...
>   at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:469)
>   at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:305)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor$1.call(MapRecordProcessor.java:132)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieve(ObjectCache.java:55)
>   ... 18 more
> Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> Encountered unregistered class ID: 63
> Serialization trace:
> statistics (org.apache.hadoop.hive.ql.plan.TableScanDesc)
> conf (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
>   at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
>   at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:182)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>   at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
>   at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:215)
>   at 
>

[jira] [Commented] (HIVE-10901) Optimize mutli column distinct queries

2016-11-15 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669134#comment-15669134
 ] 

Hive QA commented on HIVE-10901:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12839085/HIVE-10901.03.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10680 tests 
executed
*Failed tests:*
{noformat}
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=102)

[skewjoinopt3.q,smb_mapjoin_4.q,timestamp_comparison.q,union_remove_10.q,mapreduce2.q,bucketmapjoin_negative.q,udf_in_file.q,auto_join12.q,skewjoin.q,vector_left_outer_join.q,semijoin.q,skewjoinopt9.q,smb_mapjoin_3.q,stats10.q,nullgroup4.q]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=91)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=91)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[multi_count_distinct]
 (batchId=90)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2142/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2142/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2142/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12839085 - PreCommit-HIVE-Build

> Optimize  mutli column distinct queries 
> 
>
> Key: HIVE-10901
> URL: https://issues.apache.org/jira/browse/HIVE-10901
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO, Logical Optimizer
>Affects Versions: 1.2.0
>Reporter: Mostafa Mokhtar
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10901.02.patch, HIVE-10901.03.patch, 
> HIVE-10901.patch
>
>
> HIVE-10568 is useful only when there is a distinct on one column. It can be 
> expanded for multiple column cases too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15217) Add watch mode to llap status tool

2016-11-15 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-15217:
-
Status: Patch Available  (was: Open)

> Add watch mode to llap status tool
> --
>
> Key: HIVE-15217
> URL: https://issues.apache.org/jira/browse/HIVE-15217
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Minor
> Attachments: HIVE-15217.1.patch
>
>
> There is few seconds overhead for launching the llap status command. To avoid 
> we can add "watch" mode to llap status tool that refreshes the status after 
> configured interval. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15217) Add watch mode to llap status tool

2016-11-15 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-15217:
-
Attachment: HIVE-15217.1.patch

[~sseth] Can you please review the patch?

> Add watch mode to llap status tool
> --
>
> Key: HIVE-15217
> URL: https://issues.apache.org/jira/browse/HIVE-15217
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Minor
> Attachments: HIVE-15217.1.patch
>
>
> There is few seconds overhead for launching the llap status command. To avoid 
> we can add "watch" mode to llap status tool that refreshes the status after 
> configured interval. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15216) Files on S3 are deleted one by one in INSERT OVERWRITE queries

2016-11-15 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-15216:

Issue Type: Bug  (was: Sub-task)
Parent: (was: HIVE-14269)

> Files on S3 are deleted one by one in INSERT OVERWRITE queries
> --
>
> Key: HIVE-15216
> URL: https://issues.apache.org/jira/browse/HIVE-15216
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sahil Takiar
>
> When running {{INSERT OVERWRITE}} queries the files to overwrite are deleted 
> one by one. The reason is that, by default, hive.exec.stagingdir is inside 
> the target table directory.
> Ideally Hive would just delete the entire table directory, but it can't do 
> that since the staging data is also inside the directory. Instead it deletes 
> each file one-by-one, which is very slow.
> There are a few ways to fix this:
> 1: Move the staging directory outside the table location. This can be done by 
>  setting hive.exec.stagingdir to a different location when running on S3. It 
> would be nice if users didn't have to explicitly set this when running on S3 
> and things just worked out-of-the-box. My understanding is that 
> hive.exec.stagingdir was only added to support HDFS encryption zones. Since 
> S3 doesn't have encryption zones, there should be no problem with using the 
> value of hive.exec.scratchdir to store all intermediate data instead.
> 2: Multi-thread the delete operations
> 3: See if the {{S3AFileSystem}} can expose some type of bulk delete op



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-15114) Remove extra MoveTask operators

2016-11-15 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669061#comment-15669061
 ] 

Sahil Takiar edited comment on HIVE-15114 at 11/16/16 2:01 AM:
---

[~spena] can we add some sets for other execution engines as well? I haven't 
tested the 2nd patch, but the first patch doesn't seem to take affect for HoS.


was (Author: stakiar):
[~spena] can we add some sets for other execution engines as well? I haven't 
test the 2nd patch, but the first patch doesn't seem to take affect for HoS.

> Remove extra MoveTask operators
> ---
>
> Key: HIVE-15114
> URL: https://issues.apache.org/jira/browse/HIVE-15114
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 2.1.0
>Reporter: Sahil Takiar
>Assignee: Sergio Peña
> Attachments: HIVE-15114.WIP.1.patch, HIVE-15114.WIP.2.patch
>
>
> When running simple insert queries (e.g. {{INSERT INTO TABLE ... VALUES 
> ...}}) there an extraneous {{MoveTask}s is created.
> This is problematic when the scratch directory is on S3 since renames require 
> copying the entire dataset.
> For simple queries (like the one above), there are two MoveTasks. The first 
> one moves the output data from one file in the scratch directory to another 
> file in the scratch directory. The second MoveTask moves the data from the 
> scratch directory to its final table location.
> The first MoveTask should not be necessary. The goal of this JIRA it to 
> remove it. This should help improve performance when running on S3.
> It seems that the first Move might be caused by a dependency resolution 
> problem in the optimizer, where a dependent task doesn't get properly removed 
> when the task it depends on is filtered by a condition resolver.
> A dummy {{MoveTask}} is added in the 
> {{GenMapRedUtils.createMRWorkForMergingFiles}} method. This method creates a 
> conditional task which launches a job to merge tasks at the end of the file. 
> At the end of the conditional job there is a MoveTask.
> Even though Hive decides that the conditional merge job is no needed, it 
> seems the MoveTask is still added to the plan.
> Seems this extra {{MoveTask}} may have been added intentionally. Not sure why 
> yet. The {{ConditionalResolverMergeFiles}} says that one of three tasks will 
> be returned: move task only, merge task only, merge task followed by a move 
> task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15114) Remove extra MoveTask operators

2016-11-15 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669061#comment-15669061
 ] 

Sahil Takiar commented on HIVE-15114:
-

[~spena] can we add some sets for other execution engines as well? I haven't 
test the 2nd patch, but the first patch doesn't seem to take affect for HoS.

> Remove extra MoveTask operators
> ---
>
> Key: HIVE-15114
> URL: https://issues.apache.org/jira/browse/HIVE-15114
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 2.1.0
>Reporter: Sahil Takiar
>Assignee: Sergio Peña
> Attachments: HIVE-15114.WIP.1.patch, HIVE-15114.WIP.2.patch
>
>
> When running simple insert queries (e.g. {{INSERT INTO TABLE ... VALUES 
> ...}}) there an extraneous {{MoveTask}s is created.
> This is problematic when the scratch directory is on S3 since renames require 
> copying the entire dataset.
> For simple queries (like the one above), there are two MoveTasks. The first 
> one moves the output data from one file in the scratch directory to another 
> file in the scratch directory. The second MoveTask moves the data from the 
> scratch directory to its final table location.
> The first MoveTask should not be necessary. The goal of this JIRA it to 
> remove it. This should help improve performance when running on S3.
> It seems that the first Move might be caused by a dependency resolution 
> problem in the optimizer, where a dependent task doesn't get properly removed 
> when the task it depends on is filtered by a condition resolver.
> A dummy {{MoveTask}} is added in the 
> {{GenMapRedUtils.createMRWorkForMergingFiles}} method. This method creates a 
> conditional task which launches a job to merge tasks at the end of the file. 
> At the end of the conditional job there is a MoveTask.
> Even though Hive decides that the conditional merge job is no needed, it 
> seems the MoveTask is still added to the plan.
> Seems this extra {{MoveTask}} may have been added intentionally. Not sure why 
> yet. The {{ConditionalResolverMergeFiles}} says that one of three tasks will 
> be returned: move task only, merge task only, merge task followed by a move 
> task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones

2016-11-15 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-15199:

Comment: was deleted

(was: [~spena] this actually affects any {{INSERT INTO}} query that needs to 
insert multiple files into the target table location. Each rename operation 
will basically overwrite the same file again and again, so all data will be 
lost except data from the last rename op.

Note this seems to be a regression of HIVE-12988, which first checked if the 
destination file existed before renaming it.)

> INSERT INTO data on S3 is replacing the old rows with the new ones
> --
>
> Key: HIVE-15199
> URL: https://issues.apache.org/jira/browse/HIVE-15199
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Critical
>
> Any INSERT INTO statement run on S3 tables and when the scratch directory is 
> saved on S3 is deleting old rows of the table.
> {noformat}
> hive> set hive.blobstore.use.blobstore.as.scratchdir=true;
> hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1';
> hive> insert into table t1 values (1,'name1');
> hive> select * from t1;
> 1   name1
> hive> insert into table t1 values (2,'name2');
> hive> select * from t1;
> 2   name2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones

2016-11-15 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-15199:

Comment: was deleted

(was: Jumped the gun on this. This isn't true, only happens when multiple 
insert intos are run.)

> INSERT INTO data on S3 is replacing the old rows with the new ones
> --
>
> Key: HIVE-15199
> URL: https://issues.apache.org/jira/browse/HIVE-15199
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Critical
>
> Any INSERT INTO statement run on S3 tables and when the scratch directory is 
> saved on S3 is deleting old rows of the table.
> {noformat}
> hive> set hive.blobstore.use.blobstore.as.scratchdir=true;
> hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1';
> hive> insert into table t1 values (1,'name1');
> hive> select * from t1;
> 1   name1
> hive> insert into table t1 values (2,'name2');
> hive> select * from t1;
> 2   name2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones

2016-11-15 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669041#comment-15669041
 ] 

Sahil Takiar commented on HIVE-15199:
-

Jumped the gun on this. This isn't true, only happens when multiple insert 
intos are run.

> INSERT INTO data on S3 is replacing the old rows with the new ones
> --
>
> Key: HIVE-15199
> URL: https://issues.apache.org/jira/browse/HIVE-15199
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Critical
>
> Any INSERT INTO statement run on S3 tables and when the scratch directory is 
> saved on S3 is deleting old rows of the table.
> {noformat}
> hive> set hive.blobstore.use.blobstore.as.scratchdir=true;
> hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1';
> hive> insert into table t1 values (1,'name1');
> hive> select * from t1;
> 1   name1
> hive> insert into table t1 values (2,'name2');
> hive> select * from t1;
> 2   name2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-14990) run all tests for MM tables and fix the issues that are found

2016-11-15 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668996#comment-15668996
 ] 

Sergey Shelukhin edited comment on HIVE-14990 at 11/16/16 1:36 AM:
---

Updated test list to fix/declare irrelevant before closing this. Only updated 
the CliDriver list actually, haven't made my way thru it yet
{noformat}
TestCliDriver:
stats_list_bucket
show_tablestatus
vector_udf2
list_bucket_dml_14
autoColumnStats_9
stats_noscan_2
symlink_text_input_format
temp_table_precedence
offset_limit_global_optimizer
rand_partitionpruner2
materialized_view_authorization_sqlstd,materialized_*
merge_dynamic_partition, merge_dynamic_partition*
orc_vectorization_ppd
parquet_join2
repl_3_exim_metadata
sample6
sample_islocalmode_hook
smb_mapjoin_2,smb_mapjoin_3,smb_mapjoin_7
orc_createas1
exim_16_part_external,exim_17_part_managed,


TestEncryptedHDFSCliDriver:
encryption_ctas
encryption_drop_partition 
encryption_insert_values
encryption_join_unencrypted_tbl
encryption_load_data_to_encrypted_tables

MiniLlapLocal:
exchgpartition2lel
cbo_rp_lineage2
create_merge_compressed
deleteAnalyze
delete_where_no_match
delete_where_non_partitioned
dynpart_sort_optimization
escape2
insert1
lineage2
lineage3
orc_llap
schema_evol_orc_nonvec_part
schema_evol_orc_vec_part
schema_evol_text_nonvec_part
schema_evol_text_vec_part
schema_evol_text_vecrow_part
smb_mapjoin_6
tez_dml
union_fast_stats
update_all_types
update_tmp_table
update_where_no_match
update_where_non_partitioned
vector_outer_join1
vector_outer_join4

MiniLlap:
load_fs2
orc_ppd_basic
external_table_with_space_in_location_path
file_with_header_footer
import_exported_table
schemeAuthority,schemeAuthority2
table_nonprintable

Minimr:
infer_bucket_sort_map_operators
infer_bucket_sort_merge
infer_bucket_sort_reducers_power_two
root_dir_external_table
scriptfile1

TestSymlinkTextInputFormat#testCombine 
TestJdbcWithLocalClusterSpark, etc.
{noformat}


was (Author: sershe):
Updated test list to fix/declare irrelevant before closing this 
{noformat}
TestCliDriver:
stats_list_bucket
show_tablestatus
vector_udf2
list_bucket_dml_14
autoColumnStats_9
stats_noscan_2
symlink_text_input_format
temp_table_precedence
offset_limit_global_optimizer
rand_partitionpruner2
materialized_view_authorization_sqlstd,materialized_*
merge_dynamic_partition, merge_dynamic_partition*
orc_vectorization_ppd
parquet_join2
repl_3_exim_metadata
sample6
sample_islocalmode_hook
smb_mapjoin_2,smb_mapjoin_3,smb_mapjoin_7
orc_createas1
exim_16_part_external,exim_17_part_managed,


TestEncryptedHDFSCliDriver:
encryption_ctas
encryption_drop_partition 
encryption_insert_values
encryption_join_unencrypted_tbl
encryption_load_data_to_encrypted_tables

MiniLlapLocal:
exchgpartition2lel
cbo_rp_lineage2
create_merge_compressed
deleteAnalyze
delete_where_no_match
delete_where_non_partitioned
dynpart_sort_optimization
escape2
insert1
lineage2
lineage3
orc_llap
schema_evol_orc_nonvec_part
schema_evol_orc_vec_part
schema_evol_text_nonvec_part
schema_evol_text_vec_part
schema_evol_text_vecrow_part
smb_mapjoin_6
tez_dml
union_fast_stats
update_all_types
update_tmp_table
update_where_no_match
update_where_non_partitioned
vector_outer_join1
vector_outer_join4

MiniLlap:
load_fs2
orc_ppd_basic
external_table_with_space_in_location_path
file_with_header_footer
import_exported_table
schemeAuthority,schemeAuthority2
table_nonprintable

Minimr:
infer_bucket_sort_map_operators
infer_bucket_sort_merge
infer_bucket_sort_reducers_power_two
root_dir_external_table
scriptfile1

TestSymlinkTextInputFormat#testCombine 
TestJdbcWithLocalClusterSpark, etc.
{noformat}

> run all tests for MM tables and fix the issues that are found
> -
>
> Key: HIVE-14990
> URL: https://issues.apache.org/jira/browse/HIVE-14990
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, 
> HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, 
> HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, 
> HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.08.patch, 
> HIVE-14990.09.patch, HIVE-14990.10.patch, HIVE-14990.10.patch, 
> HIVE-14990.10.patch, HIVE-14990.patch
>
>
> Expected failures 
> 1) All HCat tests (cannot write MM tables via the HCat writer)
> 2) Almost all merge tests (alter .. concat is not supported).
> 3) Tests that run dfs commands with specific paths (path changes).
> 4) Truncate column (not supported).
> 5) Describe formatted will have the new table fields in the output (before 
> merging MM with ACID).
> 6) Many tests w/explain extended - diff in partition "base file name" (path 
> changes).
> 7) TestTxnCommands - all the conversion tests, as they check for bucket

[jira] [Commented] (HIVE-14990) run all tests for MM tables and fix the issues that are found

2016-11-15 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668996#comment-15668996
 ] 

Sergey Shelukhin commented on HIVE-14990:
-

Updated test list to fix/declare irrelevant before closing this 
{noformat}
TestCliDriver:
stats_list_bucket
show_tablestatus
vector_udf2
list_bucket_dml_14
autoColumnStats_9
stats_noscan_2
symlink_text_input_format
temp_table_precedence
offset_limit_global_optimizer
rand_partitionpruner2
materialized_view_authorization_sqlstd,materialized_*
merge_dynamic_partition, merge_dynamic_partition*
orc_vectorization_ppd
parquet_join2
repl_3_exim_metadata
sample6
sample_islocalmode_hook
smb_mapjoin_2,smb_mapjoin_3,smb_mapjoin_7
orc_createas1
exim_16_part_external,exim_17_part_managed,


TestEncryptedHDFSCliDriver:
encryption_ctas
encryption_drop_partition 
encryption_insert_values
encryption_join_unencrypted_tbl
encryption_load_data_to_encrypted_tables

MiniLlapLocal:
exchgpartition2lel
cbo_rp_lineage2
create_merge_compressed
deleteAnalyze
delete_where_no_match
delete_where_non_partitioned
dynpart_sort_optimization
escape2
insert1
lineage2
lineage3
orc_llap
schema_evol_orc_nonvec_part
schema_evol_orc_vec_part
schema_evol_text_nonvec_part
schema_evol_text_vec_part
schema_evol_text_vecrow_part
smb_mapjoin_6
tez_dml
union_fast_stats
update_all_types
update_tmp_table
update_where_no_match
update_where_non_partitioned
vector_outer_join1
vector_outer_join4

MiniLlap:
load_fs2
orc_ppd_basic
external_table_with_space_in_location_path
file_with_header_footer
import_exported_table
schemeAuthority,schemeAuthority2
table_nonprintable

Minimr:
infer_bucket_sort_map_operators
infer_bucket_sort_merge
infer_bucket_sort_reducers_power_two
root_dir_external_table
scriptfile1

TestSymlinkTextInputFormat#testCombine 
TestJdbcWithLocalClusterSpark, etc.
{noformat}

> run all tests for MM tables and fix the issues that are found
> -
>
> Key: HIVE-14990
> URL: https://issues.apache.org/jira/browse/HIVE-14990
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, 
> HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, 
> HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, 
> HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.08.patch, 
> HIVE-14990.09.patch, HIVE-14990.10.patch, HIVE-14990.10.patch, 
> HIVE-14990.10.patch, HIVE-14990.patch
>
>
> Expected failures 
> 1) All HCat tests (cannot write MM tables via the HCat writer)
> 2) Almost all merge tests (alter .. concat is not supported).
> 3) Tests that run dfs commands with specific paths (path changes).
> 4) Truncate column (not supported).
> 5) Describe formatted will have the new table fields in the output (before 
> merging MM with ACID).
> 6) Many tests w/explain extended - diff in partition "base file name" (path 
> changes).
> 7) TestTxnCommands - all the conversion tests, as they check for bucket count 
> using file lists (path changes).
> 8) HBase metastore tests cause methods are not implemented.
> 9) Some load and ExIm tests that export a table and then rely on specific 
> path for load (path changes).
> 10) Bucket map join/etc. - diffs; disabled the optimization for MM tables due 
> to how it accounts for buckets



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14990) run all tests for MM tables and fix the issues that are found

2016-11-15 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668974#comment-15668974
 ] 

Hive QA commented on HIVE-14990:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12839083/HIVE-14990.10.patch

{color:green}SUCCESS:{color} +1 due to 14 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 682 failed/errored test(s), 10025 tests 
executed
*Failed tests:*
{noformat}
TestDbNotificationListener - did not produce a TEST-*.xml file (likely timed 
out) (batchId=217)
TestHCatClientNotification - did not produce a TEST-*.xml file (likely timed 
out) (batchId=217)
TestHCatHiveCompatibility - did not produce a TEST-*.xml file (likely timed 
out) (batchId=217)
TestHCatHiveThriftCompatibility - did not produce a TEST-*.xml file (likely 
timed out) (batchId=217)
TestPigHBaseStorageHandler - did not produce a TEST-*.xml file (likely timed 
out) (batchId=217)
TestSequenceFileReadWrite - did not produce a TEST-*.xml file (likely timed 
out) (batchId=217)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_concatenate_indexed_table]
 (batchId=41)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_merge] (batchId=24)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_merge_2] 
(batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_merge_2_orc] 
(batchId=67)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_merge_3] 
(batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_merge_stats] 
(batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_8] 
(batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_9] 
(batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join32] (batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_11] 
(batchId=77)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_13] 
(batchId=58)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_14] 
(batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_15] 
(batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_1] 
(batchId=41)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=43)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_3] 
(batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_4] 
(batchId=56)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_5] 
(batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_7] 
(batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[binary_output_format] 
(batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_spark1] 
(batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_spark2] 
(batchId=2)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_spark3] 
(batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_1] 
(batchId=29)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_2] 
(batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_3] 
(batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_4] 
(batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_5] 
(batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_6] 
(batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_7] 
(batchId=34)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_8] 
(batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin11] 
(batchId=64)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin12] 
(batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin13] 
(batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin5] 
(batchId=75)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin8] 
(batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin_negative2] 
(batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin_negative3] 
(batchId=26)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin_negative] 
(batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_1]
 (batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_3]
 (batchId=70)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_4]
 (batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_5]
 (batchId=52)

[jira] [Updated] (HIVE-14982) Remove some reserved keywords in 2.2

2016-11-15 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14982:
---
Status: Patch Available  (was: Open)

> Remove some reserved keywords in 2.2
> 
>
> Key: HIVE-14982
> URL: https://issues.apache.org/jira/browse/HIVE-14982
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14982.01.patch
>
>
> It seems that CACHE, DAYOFWEEK, VIEWS are reserved keywords in master. This 
> conflicts with SQL2011 standard.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14982) Remove some reserved keywords in 2.2

2016-11-15 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668971#comment-15668971
 ] 

Pengcheng Xiong commented on HIVE-14982:


[~ashutoshc] or [~sershe], could u take a quick look? Thanks.

> Remove some reserved keywords in 2.2
> 
>
> Key: HIVE-14982
> URL: https://issues.apache.org/jira/browse/HIVE-14982
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14982.01.patch
>
>
> It seems that CACHE, DAYOFWEEK, VIEWS are reserved keywords in master. This 
> conflicts with SQL2011 standard.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14982) Remove some reserved keywords in 2.2

2016-11-15 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14982:
---
Attachment: HIVE-14982.01.patch

> Remove some reserved keywords in 2.2
> 
>
> Key: HIVE-14982
> URL: https://issues.apache.org/jira/browse/HIVE-14982
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14982.01.patch
>
>
> It seems that CACHE, DAYOFWEEK, VIEWS are reserved keywords in master. This 
> conflicts with SQL2011 standard.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10901) Optimize mutli column distinct queries

2016-11-15 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10901:
---
Status: Patch Available  (was: Open)

> Optimize  mutli column distinct queries 
> 
>
> Key: HIVE-10901
> URL: https://issues.apache.org/jira/browse/HIVE-10901
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO, Logical Optimizer
>Affects Versions: 1.2.0
>Reporter: Mostafa Mokhtar
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10901.02.patch, HIVE-10901.03.patch, 
> HIVE-10901.patch
>
>
> HIVE-10568 is useful only when there is a distinct on one column. It can be 
> expanded for multiple column cases too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10901) Optimize mutli column distinct queries

2016-11-15 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10901:
---
Status: Open  (was: Patch Available)

> Optimize  mutli column distinct queries 
> 
>
> Key: HIVE-10901
> URL: https://issues.apache.org/jira/browse/HIVE-10901
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO, Logical Optimizer
>Affects Versions: 1.2.0
>Reporter: Mostafa Mokhtar
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10901.02.patch, HIVE-10901.03.patch, 
> HIVE-10901.patch
>
>
> HIVE-10568 is useful only when there is a distinct on one column. It can be 
> expanded for multiple column cases too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10901) Optimize mutli column distinct queries

2016-11-15 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10901:
---
Attachment: HIVE-10901.03.patch

> Optimize  mutli column distinct queries 
> 
>
> Key: HIVE-10901
> URL: https://issues.apache.org/jira/browse/HIVE-10901
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO, Logical Optimizer
>Affects Versions: 1.2.0
>Reporter: Mostafa Mokhtar
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10901.02.patch, HIVE-10901.03.patch, 
> HIVE-10901.patch
>
>
> HIVE-10568 is useful only when there is a distinct on one column. It can be 
> expanded for multiple column cases too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14990) run all tests for MM tables and fix the issues that are found

2016-11-15 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14990:

Attachment: HIVE-14990.10.patch

> run all tests for MM tables and fix the issues that are found
> -
>
> Key: HIVE-14990
> URL: https://issues.apache.org/jira/browse/HIVE-14990
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, 
> HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, 
> HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, 
> HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.08.patch, 
> HIVE-14990.09.patch, HIVE-14990.10.patch, HIVE-14990.10.patch, 
> HIVE-14990.10.patch, HIVE-14990.patch
>
>
> Expected failures 
> 1) All HCat tests (cannot write MM tables via the HCat writer)
> 2) Almost all merge tests (alter .. concat is not supported).
> 3) Tests that run dfs commands with specific paths (path changes).
> 4) Truncate column (not supported).
> 5) Describe formatted will have the new table fields in the output (before 
> merging MM with ACID).
> 6) Many tests w/explain extended - diff in partition "base file name" (path 
> changes).
> 7) TestTxnCommands - all the conversion tests, as they check for bucket count 
> using file lists (path changes).
> 8) HBase metastore tests cause methods are not implemented.
> 9) Some load and ExIm tests that export a table and then rely on specific 
> path for load (path changes).
> 10) Bucket map join/etc. - diffs; disabled the optimization for MM tables due 
> to how it accounts for buckets



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones

2016-11-15 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668755#comment-15668755
 ] 

Sahil Takiar edited comment on HIVE-15199 at 11/15/16 11:39 PM:


[~spena] this actually affects any {{INSERT INTO}} query that needs to insert 
multiple files into the target table location. Each rename operation will 
basically overwrite the same file again and again, so all data will be lost 
except data from the last rename op.

Note this seems to be a regression of HIVE-12988, which first checked if the 
destination file existed before renaming it.


was (Author: stakiar):
[~spena] this actually affects any {{INSERT INTO}} query that needs to insert 
multiple files into the target table location will lose data. Each rename 
operation will basically overwrite the same file again and again.

Note this seems to be a regression of HIVE-12988, which first checked if the 
destination file existed before renaming it.

> INSERT INTO data on S3 is replacing the old rows with the new ones
> --
>
> Key: HIVE-15199
> URL: https://issues.apache.org/jira/browse/HIVE-15199
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Critical
>
> Any INSERT INTO statement run on S3 tables and when the scratch directory is 
> saved on S3 is deleting old rows of the table.
> {noformat}
> hive> set hive.blobstore.use.blobstore.as.scratchdir=true;
> hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1';
> hive> insert into table t1 values (1,'name1');
> hive> select * from t1;
> 1   name1
> hive> insert into table t1 values (2,'name2');
> hive> select * from t1;
> 2   name2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones

2016-11-15 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668755#comment-15668755
 ] 

Sahil Takiar commented on HIVE-15199:
-

[~spena] this is actually much worse that I thought. Any {{INSERT INTO}} query 
that needs to insert multiple files into the target table location will lose 
data, each rename operation will basically overwrite the same file again and 
again.

Note this seems to be a regression of HIVE-12988, which first checked if the 
destination file existed before renaming it.

> INSERT INTO data on S3 is replacing the old rows with the new ones
> --
>
> Key: HIVE-15199
> URL: https://issues.apache.org/jira/browse/HIVE-15199
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Critical
>
> Any INSERT INTO statement run on S3 tables and when the scratch directory is 
> saved on S3 is deleting old rows of the table.
> {noformat}
> hive> set hive.blobstore.use.blobstore.as.scratchdir=true;
> hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1';
> hive> insert into table t1 values (1,'name1');
> hive> select * from t1;
> 1   name1
> hive> insert into table t1 values (2,'name2');
> hive> select * from t1;
> 2   name2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones

2016-11-15 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668755#comment-15668755
 ] 

Sahil Takiar edited comment on HIVE-15199 at 11/15/16 11:38 PM:


[~spena] this actually affects any {{INSERT INTO}} query that needs to insert 
multiple files into the target table location will lose data. Each rename 
operation will basically overwrite the same file again and again.

Note this seems to be a regression of HIVE-12988, which first checked if the 
destination file existed before renaming it.


was (Author: stakiar):
[~spena] this is actually much worse that I thought. Any {{INSERT INTO}} query 
that needs to insert multiple files into the target table location will lose 
data, each rename operation will basically overwrite the same file again and 
again.

Note this seems to be a regression of HIVE-12988, which first checked if the 
destination file existed before renaming it.

> INSERT INTO data on S3 is replacing the old rows with the new ones
> --
>
> Key: HIVE-15199
> URL: https://issues.apache.org/jira/browse/HIVE-15199
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Critical
>
> Any INSERT INTO statement run on S3 tables and when the scratch directory is 
> saved on S3 is deleting old rows of the table.
> {noformat}
> hive> set hive.blobstore.use.blobstore.as.scratchdir=true;
> hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1';
> hive> insert into table t1 values (1,'name1');
> hive> select * from t1;
> 1   name1
> hive> insert into table t1 values (2,'name2');
> hive> select * from t1;
> 2   name2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13557) Make interval keyword optional while specifying DAY in interval arithmetic

2016-11-15 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-13557:

Attachment: HIVE-13557.3.patch

patch#3 - ( #2 on reviewboard)

qtest updates - other minor improvements...like fixing reported argument in 
exceptions

valid intervals: {{1 day}}, {{(1+x) day}}, {{'1' year'}}, {{('1') year}} 
optionally with interval keyword

> Make interval keyword optional while specifying DAY in interval arithmetic
> --
>
> Key: HIVE-13557
> URL: https://issues.apache.org/jira/browse/HIVE-13557
> Project: Hive
>  Issue Type: Sub-task
>  Components: Types
>Reporter: Ashutosh Chauhan
>Assignee: Zoltan Haindrich
> Attachments: HIVE-13557.1.patch, HIVE-13557.1.patch, 
> HIVE-13557.1.patch, HIVE-13557.2.patch, HIVE-13557.3.patch
>
>
> Currently we support expressions like: {code}
> WHERE SOLD_DATE BETWEEN ((DATE('2000-01-31'))  - INTERVAL '30' DAY) AND 
> DATE('2000-01-31')
> {code}
> We should support:
> {code}
> WHERE SOLD_DATE BETWEEN ((DATE('2000-01-31')) + (-30) DAY) AND 
> DATE('2000-01-31')
> {code}
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15215) Files on S3 are deleted one by one in INSERT OVERWRITE queries

2016-11-15 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668735#comment-15668735
 ] 

Sahil Takiar commented on HIVE-15215:
-

Here is the code that trigger the file by file delete (inside the {{Hive.java}} 
class):

{code}
replaceFiles(...) {
...
  FileSystem fs2 = oldPath.getFileSystem(conf);
  if (fs2.exists(oldPath)) {
// Do not delete oldPath if:
//  - destf is subdir of oldPath
//if ( !(fs2.equals(destf.getFileSystem(conf)) && 
FileUtils.isSubDir(oldPath, destf, fs2)))
isOldPathUnderDestf = FileUtils.isSubDir(oldPath, destf, fs2);
if (isOldPathUnderDestf) {
  // if oldPath is destf or its subdir, its should definitely be 
deleted, otherwise its
  // existing content might result in incorrect (extra) data.
  // But not sure why we changed not to delete the oldPath in 
HIVE-8750 if it is
  // not the destf or its subdir?
  oldPathDeleted = FileUtils.trashFilesUnderDir(fs2, oldPath, conf);
}
  }
...
}
{code}

> Files on S3 are deleted one by one in INSERT OVERWRITE queries
> --
>
> Key: HIVE-15215
> URL: https://issues.apache.org/jira/browse/HIVE-15215
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>
> When running {{INSERT OVERWRITE}} queries the files to overwrite are deleted 
> one by one. The reason is that, by default, hive.exec.stagingdir is inside 
> the target table directory.
> Ideally Hive would just delete the entire table directory, but it can't do 
> that since the staging data is also inside the directory. Instead it deletes 
> each file one-by-one, which is very slow.
> There are a few ways to fix this:
> 1: Move the staging directory outside the table location. This can be done by 
>  setting hive.exec.stagingdir to a different location when running on S3. It 
> would be nice if users didn't have to explicitly set this when running on S3 
> and things just worked out-of-the-box. My understanding is that 
> hive.exec.stagingdir was only added to support HDFS encryption zones. Since 
> S3 doesn't have encryption zones, there should be no problem with using the 
> value of hive.exec.scratchdir to store all intermediate data instead.
> 2: Multi-thread the delete operations
> 3: See if the {{S3AFileSystem}} can expose some type of bulk delete op



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-15216) Files on S3 are deleted one by one in INSERT OVERWRITE queries

2016-11-15 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved HIVE-15216.
-
Resolution: Duplicate

> Files on S3 are deleted one by one in INSERT OVERWRITE queries
> --
>
> Key: HIVE-15216
> URL: https://issues.apache.org/jira/browse/HIVE-15216
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>
> When running {{INSERT OVERWRITE}} queries the files to overwrite are deleted 
> one by one. The reason is that, by default, hive.exec.stagingdir is inside 
> the target table directory.
> Ideally Hive would just delete the entire table directory, but it can't do 
> that since the staging data is also inside the directory. Instead it deletes 
> each file one-by-one, which is very slow.
> There are a few ways to fix this:
> 1: Move the staging directory outside the table location. This can be done by 
>  setting hive.exec.stagingdir to a different location when running on S3. It 
> would be nice if users didn't have to explicitly set this when running on S3 
> and things just worked out-of-the-box. My understanding is that 
> hive.exec.stagingdir was only added to support HDFS encryption zones. Since 
> S3 doesn't have encryption zones, there should be no problem with using the 
> value of hive.exec.scratchdir to store all intermediate data instead.
> 2: Multi-thread the delete operations
> 3: See if the {{S3AFileSystem}} can expose some type of bulk delete op



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-15214) LLAP: Offer a "slow" mode to debug race conditions in package builder

2016-11-15 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V resolved HIVE-15214.

Resolution: Duplicate

> LLAP: Offer a "slow" mode to debug race conditions in package builder 
> --
>
> Key: HIVE-15214
> URL: https://issues.apache.org/jira/browse/HIVE-15214
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
>
> HIVE-15125 is enabled by default, add an option to disable parallel 
> generation of data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15211) Provide support for complex expressions in ON clauses for INNER joins

2016-11-15 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668674#comment-15668674
 ] 

Hive QA commented on HIVE-15211:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12839043/HIVE-15211.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10696 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2131/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2131/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2131/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12839043 - PreCommit-HIVE-Build

> Provide support for complex expressions in ON clauses for INNER joins
> -
>
> Key: HIVE-15211
> URL: https://issues.apache.org/jira/browse/HIVE-15211
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Parser
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15211.patch
>
>
> Currently, we have some restrictions on the predicates that we can use in ON 
> clauses for inner joins (we have those restrictions for outer joins too, but 
> we will tackle that in a follow-up). Semantically equivalent queries can be 
> expressed if the predicate is introduced in the WHERE clause, but we would 
> like that user can express it both in ON and WHERE clause, as in standard SQL.
> This patch is an extension to overcome these restrictions for inner joins.
> It will allow to write queries that currently fail in Hive such as:
> {code:sql}
> -- Disjunctions
> SELECT *
> FROM src1 JOIN src
> ON (src1.key=src.key
>   OR src1.value between 100 and 102
>   OR src.value between 100 and 102)
> LIMIT 10;
> -- Conjunction with multiple inputs references in one side
> SELECT *
> FROM src1 JOIN src
> ON (src1.key+src.key >= 100
>   AND src1.key+src.key <= 102)
> LIMIT 10;
> -- Conjunct with no references
> SELECT *
> FROM src1 JOIN src
> ON (src1.value between 100 and 102
>   AND src.value between 100 and 102
>   AND true)
> LIMIT 10;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15125) LLAP: Parallelize slider package generator

2016-11-15 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-15125:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
 Release Note:  LLAP: Parallelize slider package generator (Gopal V, 
reviewed by Sergey Shelukhin)
   Status: Resolved  (was: Patch Available)

> LLAP: Parallelize slider package generator
> --
>
> Key: HIVE-15125
> URL: https://issues.apache.org/jira/browse/HIVE-15125
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
> Fix For: 2.2.0
>
> Attachments: HIVE-15125.1.patch, HIVE-15125.1.patch
>
>
> The metastore init + download of functions takes approx 4 seconds.
> This is enough time to complete all the other operations in parallel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14089) complex type support in LLAP IO is broken

2016-11-15 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14089:

Attachment: HIVE-14089.13.patch

Timeouts looks spurious, some of my other jiras also had timeouts. Trying again 
with the same patch

> complex type support in LLAP IO is broken 
> --
>
> Key: HIVE-14089
> URL: https://issues.apache.org/jira/browse/HIVE-14089
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14089.04.patch, HIVE-14089.05.patch, 
> HIVE-14089.06.patch, HIVE-14089.07.patch, HIVE-14089.08.patch, 
> HIVE-14089.09.patch, HIVE-14089.10.patch, HIVE-14089.10.patch, 
> HIVE-14089.10.patch, HIVE-14089.11.patch, HIVE-14089.12.patch, 
> HIVE-14089.13.patch, HIVE-14089.WIP.2.patch, HIVE-14089.WIP.3.patch, 
> HIVE-14089.WIP.patch
>
>
> HIVE-13617 is causing MiniLlapCliDriver following test failures
> {code}
> org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all
> org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15057) Support other types of operators (other than SELECT)

2016-11-15 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15057:

Attachment: HIVE-15057.wip.patch

> Support other types of operators (other than SELECT)
> 
>
> Key: HIVE-15057
> URL: https://issues.apache.org/jira/browse/HIVE-15057
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Physical Optimizer
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15057.wip.patch
>
>
> Currently only SELECT operators are supported for nested column pruning. We 
> should add support for other types of operators so the optimization can work 
> for complex queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15057) Support other types of operators (other than SELECT)

2016-11-15 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15057:

Attachment: (was: HIVE-15057.wip.patch)

> Support other types of operators (other than SELECT)
> 
>
> Key: HIVE-15057
> URL: https://issues.apache.org/jira/browse/HIVE-15057
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Physical Optimizer
>Reporter: Chao Sun
>Assignee: Chao Sun
>
> Currently only SELECT operators are supported for nested column pruning. We 
> should add support for other types of operators so the optimization can work 
> for complex queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15148) disallow loading data into bucketed tables (by default)

2016-11-15 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15148:

Attachment: HIVE-15148.03.patch

Retrying again... Spark failures look spurious

> disallow loading data into bucketed tables (by default)
> ---
>
> Key: HIVE-15148
> URL: https://issues.apache.org/jira/browse/HIVE-15148
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15148.01.patch, HIVE-15148.02.patch, 
> HIVE-15148.03.patch, HIVE-15148.patch
>
>
> A few q file tests still use the following, allowed, pattern:
> {noformat}
> CREATE TABLE bucket_small (key string, value string) partitioned by (ds 
> string) CLUSTERED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> load data local inpath '../../data/files/smallsrcsortbucket1outof4.txt' INTO 
> TABLE bucket_small partition(ds='2008-04-08');
> load data local inpath '../../data/files/smallsrcsortbucket2outof4.txt' INTO 
> TABLE bucket_small partition(ds='2008-04-08');
> {noformat}
> This relies on the user to load the correct number of files with correctly 
> hashed data and the correct order of file names; if there's some discrepancy 
> in any of the above, the queries will fail or may produce incorrect results 
> if some bucket-based optimizations kick in.
> Additionally, even if the user does everything correctly, as far as I know 
> some code derives bucket number from file name, which won't work in this case 
> (as opposed to getting buckets based on the order of files, which will work 
> here but won't work as per  HIVE-14970... sigh).
> Hive enforces bucketing in other cases (the check cannot even be disabled 
> these days), so I suggest that we either prohibit the above outright, or at 
> least add a safety config setting that would disallow it by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11072) Add data validation between Hive metastore upgrades tests

2016-11-15 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668596#comment-15668596
 ] 

Aihua Xu commented on HIVE-11072:
-

The new patch looks good to me. [~ctang.ma] do you have more comments on it?

> Add data validation between Hive metastore upgrades tests
> -
>
> Key: HIVE-11072
> URL: https://issues.apache.org/jira/browse/HIVE-11072
> Project: Hive
>  Issue Type: New Feature
>  Components: Tests
>Reporter: Sergio Peña
>Assignee: Naveen Gangam
> Attachments: HIVE-11072.1.patch, HIVE-11072.2.patch, 
> HIVE-11072.3.patch, HIVE-11072.4.patch, HIVE-11072.5.patch, 
> HIVE-11072.to-be-committed.patch
>
>
> An existing Hive metastore upgrade test is running on Hive jenkins. However, 
> these scripts do test only database schema upgrade, not data validation 
> between upgrades.
> We should validate data between metastore version upgrades. Using data 
> validation, we may ensure that data won't be damaged, or corrupted when 
> upgrading the Hive metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14990) run all tests for MM tables and fix the issues that are found

2016-11-15 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14990:

Attachment: HIVE-14990.10.patch

> run all tests for MM tables and fix the issues that are found
> -
>
> Key: HIVE-14990
> URL: https://issues.apache.org/jira/browse/HIVE-14990
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, 
> HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, 
> HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, 
> HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.08.patch, 
> HIVE-14990.09.patch, HIVE-14990.10.patch, HIVE-14990.10.patch, 
> HIVE-14990.patch
>
>
> Expected failures 
> 1) All HCat tests (cannot write MM tables via the HCat writer)
> 2) Almost all merge tests (alter .. concat is not supported).
> 3) Tests that run dfs commands with specific paths (path changes).
> 4) Truncate column (not supported).
> 5) Describe formatted will have the new table fields in the output (before 
> merging MM with ACID).
> 6) Many tests w/explain extended - diff in partition "base file name" (path 
> changes).
> 7) TestTxnCommands - all the conversion tests, as they check for bucket count 
> using file lists (path changes).
> 8) HBase metastore tests cause methods are not implemented.
> 9) Some load and ExIm tests that export a table and then rely on specific 
> path for load (path changes).
> 10) Bucket map join/etc. - diffs; disabled the optimization for MM tables due 
> to how it accounts for buckets



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15207) Implement a capability to detect incorrect sequence numbers

2016-11-15 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15207:

Status: Patch Available  (was: Open)

Initial patch: added 'validate' option to hiveSchemaTool and added the logic to 
detect the invalid sequenceNumber.

> Implement a capability to detect incorrect sequence numbers
> ---
>
> Key: HIVE-15207
> URL: https://issues.apache.org/jira/browse/HIVE-15207
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15207.1.patch
>
>
> We have seen next sequence number is smaller than the max(id) for certain 
> tables. Seems it's caused by thread-safe issue in HMS, but still not sure if 
> it has been fully fixed. Try to detect such issue.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15207) Implement a capability to detect incorrect sequence numbers

2016-11-15 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15207:

Attachment: HIVE-15207.1.patch

> Implement a capability to detect incorrect sequence numbers
> ---
>
> Key: HIVE-15207
> URL: https://issues.apache.org/jira/browse/HIVE-15207
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15207.1.patch
>
>
> We have seen next sequence number is smaller than the max(id) for certain 
> tables. Seems it's caused by thread-safe issue in HMS, but still not sure if 
> it has been fully fixed. Try to detect such issue.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14990) run all tests for MM tables and fix the issues that are found

2016-11-15 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668582#comment-15668582
 ] 

Sergey Shelukhin commented on HIVE-14990:
-

Looks like spark failures may be caused by a spurious pom file change

> run all tests for MM tables and fix the issues that are found
> -
>
> Key: HIVE-14990
> URL: https://issues.apache.org/jira/browse/HIVE-14990
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, 
> HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, 
> HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, 
> HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.08.patch, 
> HIVE-14990.09.patch, HIVE-14990.10.patch, HIVE-14990.patch
>
>
> Expected failures 
> 1) All HCat tests (cannot write MM tables via the HCat writer)
> 2) Almost all merge tests (alter .. concat is not supported).
> 3) Tests that run dfs commands with specific paths (path changes).
> 4) Truncate column (not supported).
> 5) Describe formatted will have the new table fields in the output (before 
> merging MM with ACID).
> 6) Many tests w/explain extended - diff in partition "base file name" (path 
> changes).
> 7) TestTxnCommands - all the conversion tests, as they check for bucket count 
> using file lists (path changes).
> 8) HBase metastore tests cause methods are not implemented.
> 9) Some load and ExIm tests that export a table and then rely on specific 
> path for load (path changes).
> 10) Bucket map join/etc. - diffs; disabled the optimization for MM tables due 
> to how it accounts for buckets



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14990) run all tests for MM tables and fix the issues that are found

2016-11-15 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14990:

Attachment: HIVE-14990.10.patch

After the merge.

> run all tests for MM tables and fix the issues that are found
> -
>
> Key: HIVE-14990
> URL: https://issues.apache.org/jira/browse/HIVE-14990
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, 
> HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, 
> HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, 
> HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.08.patch, 
> HIVE-14990.09.patch, HIVE-14990.10.patch, HIVE-14990.patch
>
>
> Expected failures 
> 1) All HCat tests (cannot write MM tables via the HCat writer)
> 2) Almost all merge tests (alter .. concat is not supported).
> 3) Tests that run dfs commands with specific paths (path changes).
> 4) Truncate column (not supported).
> 5) Describe formatted will have the new table fields in the output (before 
> merging MM with ACID).
> 6) Many tests w/explain extended - diff in partition "base file name" (path 
> changes).
> 7) TestTxnCommands - all the conversion tests, as they check for bucket count 
> using file lists (path changes).
> 8) HBase metastore tests cause methods are not implemented.
> 9) Some load and ExIm tests that export a table and then rely on specific 
> path for load (path changes).
> 10) Bucket map join/etc. - diffs; disabled the optimization for MM tables due 
> to how it accounts for buckets



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14990) run all tests for MM tables and fix the issues that are found

2016-11-15 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14990:

Attachment: (was: HIVE-14990.09.patch)

> run all tests for MM tables and fix the issues that are found
> -
>
> Key: HIVE-14990
> URL: https://issues.apache.org/jira/browse/HIVE-14990
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, 
> HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, 
> HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, 
> HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.08.patch, 
> HIVE-14990.09.patch, HIVE-14990.10.patch, HIVE-14990.patch
>
>
> Expected failures 
> 1) All HCat tests (cannot write MM tables via the HCat writer)
> 2) Almost all merge tests (alter .. concat is not supported).
> 3) Tests that run dfs commands with specific paths (path changes).
> 4) Truncate column (not supported).
> 5) Describe formatted will have the new table fields in the output (before 
> merging MM with ACID).
> 6) Many tests w/explain extended - diff in partition "base file name" (path 
> changes).
> 7) TestTxnCommands - all the conversion tests, as they check for bucket count 
> using file lists (path changes).
> 8) HBase metastore tests cause methods are not implemented.
> 9) Some load and ExIm tests that export a table and then rely on specific 
> path for load (path changes).
> 10) Bucket map join/etc. - diffs; disabled the optimization for MM tables due 
> to how it accounts for buckets



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14990) run all tests for MM tables and fix the issues that are found

2016-11-15 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668562#comment-15668562
 ] 

Sergey Shelukhin commented on HIVE-14990:
-

ditto for TestHive, etc. failures - caused by output difference due to MM id 
fields that would go away.

> run all tests for MM tables and fix the issues that are found
> -
>
> Key: HIVE-14990
> URL: https://issues.apache.org/jira/browse/HIVE-14990
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, 
> HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, 
> HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, 
> HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.08.patch, 
> HIVE-14990.09.patch, HIVE-14990.09.patch, HIVE-14990.patch
>
>
> Expected failures 
> 1) All HCat tests (cannot write MM tables via the HCat writer)
> 2) Almost all merge tests (alter .. concat is not supported).
> 3) Tests that run dfs commands with specific paths (path changes).
> 4) Truncate column (not supported).
> 5) Describe formatted will have the new table fields in the output (before 
> merging MM with ACID).
> 6) Many tests w/explain extended - diff in partition "base file name" (path 
> changes).
> 7) TestTxnCommands - all the conversion tests, as they check for bucket count 
> using file lists (path changes).
> 8) HBase metastore tests cause methods are not implemented.
> 9) Some load and ExIm tests that export a table and then rely on specific 
> path for load (path changes).
> 10) Bucket map join/etc. - diffs; disabled the optimization for MM tables due 
> to how it accounts for buckets



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14990) run all tests for MM tables and fix the issues that are found

2016-11-15 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668554#comment-15668554
 ] 

Sergey Shelukhin commented on HIVE-14990:
-

Relevant q test failures for future branch merge (w/non-MM tables, after MM 
patch):
load_dyn_part1, autoColumnStats_2 and _1,  escape2, load_dyn_part2, 
dynpart_sort_opt_vectorization, orc_createas1, combine3, update_tmp_table, 
delete_where_non_partitioned, delete_where_no_match, update_where_no_match, 
update_where_non_partitioned, update_all_types

I suspect many ACID failures are due to incomplete ACID type patch. 
The rest are out file changes that are either correct or will go away after 
ACID integration

> run all tests for MM tables and fix the issues that are found
> -
>
> Key: HIVE-14990
> URL: https://issues.apache.org/jira/browse/HIVE-14990
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, 
> HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, 
> HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, 
> HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.08.patch, 
> HIVE-14990.09.patch, HIVE-14990.09.patch, HIVE-14990.patch
>
>
> Expected failures 
> 1) All HCat tests (cannot write MM tables via the HCat writer)
> 2) Almost all merge tests (alter .. concat is not supported).
> 3) Tests that run dfs commands with specific paths (path changes).
> 4) Truncate column (not supported).
> 5) Describe formatted will have the new table fields in the output (before 
> merging MM with ACID).
> 6) Many tests w/explain extended - diff in partition "base file name" (path 
> changes).
> 7) TestTxnCommands - all the conversion tests, as they check for bucket count 
> using file lists (path changes).
> 8) HBase metastore tests cause methods are not implemented.
> 9) Some load and ExIm tests that export a table and then rely on specific 
> path for load (path changes).
> 10) Bucket map join/etc. - diffs; disabled the optimization for MM tables due 
> to how it accounts for buckets



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15057) Support other types of operators (other than SELECT)

2016-11-15 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668553#comment-15668553
 ] 

Hive QA commented on HIVE-15057:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12839024/HIVE-15057.wip.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 478 failed/errored test(s), 10399 tests 
executed
*Failed tests:*
{noformat}
TestCBOMaxNumToCNF - did not produce a TEST-*.xml file (likely timed out) 
(batchId=255)
TestCBORuleFiredOnlyOnce - did not produce a TEST-*.xml file (likely timed out) 
(batchId=255)
TestCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=11)

[auto_join18.q,input1_limit.q,load_dyn_part3.q,autoColumnStats_4.q,auto_sortmerge_join_14.q,drop_table.q,bucket_map_join_tez2.q,auto_join33.q,merge4.q,parquet_external_time.q,storage_format_descriptor.q,mapjoin_hook.q,multi_column_in_single.q,schema_evol_orc_nonvec_table.q,cbo_rp_subq_in.q,authorization_view_disable_cbo_4.q,list_bucket_dml_2.q,cbo_rp_semijoin.q,char_2.q,union_remove_14.q,non_ascii_literal2.q,load_part_authsuccess.q,auto_sortmerge_join_15.q,explain_rearrange.q,varchar_union1.q,input21.q,vector_udf2.q,groupby_cube_multi_gby.q,bucketmapjoin8.q,union34.q]
TestCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=14)

[push_or.q,skewjoinopt16.q,bucket3.q,acid_join.q,drop_partitions_filter3.q,schema_evol_text_nonvec_table.q,mrr.q,auto_join15.q,orc_ppd_schema_evol_2b.q,having2.q,regex_col.q,udf_tinyint.q,vector_interval_1.q,semijoin5.q,constprog_dpp.q,skewjoinopt13.q,cbo_rp_auto_join0.q,udf_reflect2.q,udf_div.q,auto_sortmerge_join_6.q,vector_groupby4.q,cbo_SortUnionTransposeRule.q,union_remove_24.q,update_where_non_partitioned.q,annotate_stats_part.q,list_bucket_dml_4.q,join22.q,udf_xpath_short.q,merge_join_1.q,join33.q]
TestCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=4)

[join_vc.q,varchar_join1.q,join7.q,insert_values_tmp_table.q,json_serde_tsformat.q,tez_union2.q,script_env_var1.q,bucketsortoptimize_insert_8.q,stats16.q,union20.q,inputddl5.q,select_transform_hint.q,parallel_join1.q,compute_stats_string.q,union_remove_7.q,union27.q,optional_outer.q,vector_include_no_sel.q,insert0.q,folder_predicate.q,groupby_cube1.q,groupby7_map_multi_single_reducer.q,join_reorder4.q,vector_interval_arithmetic.q,smb_mapjoin_17.q,groupby7_map.q,input_part10.q,udf_mask_show_first_n.q,union.q,cbo_udf_udaf.q]
TestCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=5)

[ptf_general_queries.q,correlationoptimizer9.q,auto_join_reordering_values.q,sample2.q,decimal_join.q,mapjoin_subquery2.q,join43.q,bucket_if_with_path_filter.q,udf_month.q,mapjoin1.q,avro_partitioned_native.q,join25.q,nullformatdir.q,authorization_admin_almighty1.q,udf_avg.q,cte_mat_4.q,groupby3.q,cbo_rp_union.q,udaf_covar_samp.q,exim_03_nonpart_over_compat.q,udf_logged_in_user.q,index_stale.q,union12.q,skewjoinopt2.q,skewjoinopt18.q,colstats_all_nulls.q,bucketsortoptimize_insert_2.q,quote2.q,udf_classloader.q,authorization_owner_actions.q]
TestCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=58)

[touch.q,auto_sortmerge_join_13.q,join4.q,join35.q,filter_cond_pushdown2.q,except_distinct.q,vector_left_outer_join2.q,udf_ucase.q,udf_ceil.q,vectorized_ptf.q,exim_25_export_parentpath_has_inaccessible_children.q,udf_array.q,join_filters.q,udf_current_user.q,acid_vectorization.q,join_reorder3.q,auto_join19.q,distinct_windowing_no_cbo.q,vectorization_15.q,union7.q,vectorization_nested_udf.q,database_properties.q,partition_varchar1.q,vector_groupby_3.q,udf_sort_array.q,cte_6.q,vector_mr_diff_schema_alias.q,rcfile_union.q,explain_logical.q,interval_3.q]
TestColumnPrunerProcCtx - did not produce a TEST-*.xml file (likely timed out) 
(batchId=255)
TestGenMapRedUtilsUsePartitionColumnsNegative - did not produce a TEST-*.xml 
file (likely timed out) (batchId=255)
TestHiveMetaStoreChecker - did not produce a TEST-*.xml file (likely timed out) 
(batchId=255)
TestNegativePartitionPrunerCompactExpr - did not produce a TEST-*.xml file 
(likely timed out) (batchId=255)
TestPositivePartitionPrunerCompactExpr - did not produce a TEST-*.xml file 
(likely timed out) (batchId=255)
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=116)

[load_dyn_part2.q,smb_mapjoin_7.q,vectorization_5.q,smb_mapjoin_2.q,ppd_join_filter.q,column_access_stats.q,vector_between_in.q,vectorized_string_funcs.q,vectorization_1.q,bucket_map_join_2.q,groupby4_map_skew.q,groupby_ppr_multi_distinct.q,temp_table_join1.q,vectorized_case.q,stats_noscan_1.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=117)

[jira] [Updated] (HIVE-15200) Support setOp in subQuery with parentheses

2016-11-15 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15200:
---
Status: Patch Available  (was: Open)

> Support setOp in subQuery with parentheses
> --
>
> Key: HIVE-15200
> URL: https://issues.apache.org/jira/browse/HIVE-15200
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15200.01.patch
>
>
> {code}
> explain select key from ((select key from src) union (select key from 
> src))subq;
> {code}
> will throw
> {code}
> FAILED: ParseException line 1:47 cannot recognize input near 'union' '(' 
> 'select' in subquery source
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15200) Support setOp in subQuery with parentheses

2016-11-15 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15200:
---
Attachment: HIVE-15200.01.patch

> Support setOp in subQuery with parentheses
> --
>
> Key: HIVE-15200
> URL: https://issues.apache.org/jira/browse/HIVE-15200
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15200.01.patch
>
>
> {code}
> explain select key from ((select key from src) union (select key from 
> src))subq;
> {code}
> will throw
> {code}
> FAILED: ParseException line 1:47 cannot recognize input near 'union' '(' 
> 'select' in subquery source
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-15194) Hive on Tez - Hive Runtime Error while closing operators

2016-11-15 Thread Wei Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664624#comment-15664624
 ] 

Wei Zheng edited comment on HIVE-15194 at 11/15/16 10:03 PM:
-

Thanks [~gopalv] for the quick analysis. But I think it is either 
isHashMapOnDisk() is true or we have a in-memory hashmap
{code}
/* It may happen that there's not enough memory to instantiate a hashmap 
for the partition.
 * In that case, we don't create the hashmap, but pretend the hashmap is 
directly "spilled".
 */
public HashPartition(int initialCapacity, float loadFactor, int wbSize, 
long maxProbeSize,
 boolean createHashMap, String spillLocalDirs) {
  if (createHashMap) {
// Probe space should be at least equal to the size of our designated 
wbSize
maxProbeSize = Math.max(maxProbeSize, wbSize);
hashMap = new BytesBytesMultiHashMap(initialCapacity, loadFactor, 
wbSize, maxProbeSize);
  } else {
hashMapSpilledOnCreation = true;
hashMapOnDisk = true;
  }
  this.spillLocalDirs = spillLocalDirs;
  this.initialCapacity = initialCapacity;
  this.loadFactor = loadFactor;
  this.wbSize = wbSize;
}
{code}
[~ssmane3.tech] It will be helpful if you can attach the hive.log. Thanks.


was (Author: wzheng):
Thanks [~gopalv] for the quick analysis. But I think it is either 
isHashMapOnDisk() is false or we have a in-memory hashmap
{code}
/* It may happen that there's not enough memory to instantiate a hashmap 
for the partition.
 * In that case, we don't create the hashmap, but pretend the hashmap is 
directly "spilled".
 */
public HashPartition(int initialCapacity, float loadFactor, int wbSize, 
long maxProbeSize,
 boolean createHashMap, String spillLocalDirs) {
  if (createHashMap) {
// Probe space should be at least equal to the size of our designated 
wbSize
maxProbeSize = Math.max(maxProbeSize, wbSize);
hashMap = new BytesBytesMultiHashMap(initialCapacity, loadFactor, 
wbSize, maxProbeSize);
  } else {
hashMapSpilledOnCreation = true;
hashMapOnDisk = true;
  }
  this.spillLocalDirs = spillLocalDirs;
  this.initialCapacity = initialCapacity;
  this.loadFactor = loadFactor;
  this.wbSize = wbSize;
}
{code}
[~ssmane3.tech] It will be helpful if you can attach the hive.log. Thanks.

> Hive on Tez - Hive Runtime Error while closing operators
> 
>
> Key: HIVE-15194
> URL: https://issues.apache.org/jira/browse/HIVE-15194
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Tez
>Affects Versions: 2.1.0
> Environment: Hive 2.1.0 
> Tez 0.8.4
> 4 Nodes x CentOS-6 x64 (32GB Memory, 8 CPUs)
> Hadoop 2.7.1
>Reporter: Shankar M
>
> Please help me to solve below issue.. 
> --
> I am setting below commands in hive CLI: 
> set hive.execution.engine=tez;
> set hive.vectorized.execution.enabled = true;
> set hive.vectorized.execution.reduce.enabled = true;
> set hive.cbo.enable=true;
> set hive.compute.query.using.stats=true;
> set hive.stats.fetch.column.stats=true;
> set hive.stats.fetch.partition.stats=true;
> SET hive.tez.container.size=4096;
> SET hive.tez.java.opts=-Xmx3072m;
> --
> {code}
> hive> CREATE TABLE tmp_parquet_newtable STORED AS PARQUET AS 
> > select a.* from orc_very_large_table a where a.event = 1 and EXISTS 
> (SELECT 1 FROM tmp_small_parquet_table b WHERE b.session_id = a.session_id ) ;
> Query ID = hadoop_20161114132930_65843cb3-557c-4b42-b662-2901caf5be2d
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1479059955967_0049)
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED  
> --
> Map 1 .  containerFAILED384 440  340  
> 26   0  
> Map 2 .. container SUCCEEDED  1  100  
>  0   0  
> --
> VERTICES: 01/02  [===>>---] 11%   ELAPSED TIME: 43.76 s   
>  
> --
> Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1479059955967_0049_2_01, 
> diagnostics=[Task failed,

[jira] [Updated] (HIVE-15180) Extend JSONMessageFactory to store additional information about metadata objects on different table events

2016-11-15 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-15180:

Attachment: HIVE-15180.3.patch

> Extend JSONMessageFactory to store additional information about metadata 
> objects on different table events
> --
>
> Key: HIVE-15180
> URL: https://issues.apache.org/jira/browse/HIVE-15180
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-15180.1.patch, HIVE-15180.2.patch, 
> HIVE-15180.3.patch
>
>
> We want the {{NOTIFICATION_LOG}} table to capture additional information 
> about the metadata objects when {{DbNotificationListener}} captures different 
> events for a table (create/drop/alter).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15205) Create ReplDumpTask/ReplDumpWork for dumping out metadata

2016-11-15 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668294#comment-15668294
 ] 

Sergey Shelukhin commented on HIVE-15205:
-

Theres' no patch so I cannot tell :)

> Create ReplDumpTask/ReplDumpWork for dumping out metadata
> -
>
> Key: HIVE-15205
> URL: https://issues.apache.org/jira/browse/HIVE-15205
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>
> The current bootstrap code generates dump metadata during semantic analysis 
> which breaks security and task/work abstraction. It also uses existing 
> classes (from Export/Import world) for code reuse purpose, but as a result 
> ends up dealing with a lot if-then-elses. It makes sense to have a cleaner 
> abstraction which uses ReplDumpTask and ReplDumpWork (to configure the Task). 
> Also perhaps worth evaluating ReplLoadTask/ReplLoadWork for load side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15205) Create ReplDumpTask/ReplDumpWork for dumping out metadata

2016-11-15 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668286#comment-15668286
 ] 

Vaibhav Gumashta commented on HIVE-15205:
-

[~sershe] This jira doesn't touch ImportSemanticAnalyzer etc. It is resolving 
some of the problems you are pointing to in original jira. All the classes here 
are new (unless you have created ones with similar names). 

> Create ReplDumpTask/ReplDumpWork for dumping out metadata
> -
>
> Key: HIVE-15205
> URL: https://issues.apache.org/jira/browse/HIVE-15205
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>
> The current bootstrap code generates dump metadata during semantic analysis 
> which breaks security and task/work abstraction. It also uses existing 
> classes (from Export/Import world) for code reuse purpose, but as a result 
> ends up dealing with a lot if-then-elses. It makes sense to have a cleaner 
> abstraction which uses ReplDumpTask and ReplDumpWork (to configure the Task). 
> Also perhaps worth evaluating ReplLoadTask/ReplLoadWork for load side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15211) Provide support for complex expressions in ON clauses for INNER joins

2016-11-15 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15211:
---
Status: Patch Available  (was: In Progress)

> Provide support for complex expressions in ON clauses for INNER joins
> -
>
> Key: HIVE-15211
> URL: https://issues.apache.org/jira/browse/HIVE-15211
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Parser
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15211.patch
>
>
> Currently, we have some restrictions on the predicates that we can use in ON 
> clauses for inner joins (we have those restrictions for outer joins too, but 
> we will tackle that in a follow-up). Semantically equivalent queries can be 
> expressed if the predicate is introduced in the WHERE clause, but we would 
> like that user can express it both in ON and WHERE clause, as in standard SQL.
> This patch is an extension to overcome these restrictions for inner joins.
> It will allow to write queries that currently fail in Hive such as:
> {code:sql}
> -- Disjunctions
> SELECT *
> FROM src1 JOIN src
> ON (src1.key=src.key
>   OR src1.value between 100 and 102
>   OR src.value between 100 and 102)
> LIMIT 10;
> -- Conjunction with multiple inputs references in one side
> SELECT *
> FROM src1 JOIN src
> ON (src1.key+src.key >= 100
>   AND src1.key+src.key <= 102)
> LIMIT 10;
> -- Conjunct with no references
> SELECT *
> FROM src1 JOIN src
> ON (src1.value between 100 and 102
>   AND src.value between 100 and 102
>   AND true)
> LIMIT 10;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15211) Provide support for complex expressions in ON clauses for INNER joins

2016-11-15 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15211:
---
Attachment: HIVE-15211.patch

> Provide support for complex expressions in ON clauses for INNER joins
> -
>
> Key: HIVE-15211
> URL: https://issues.apache.org/jira/browse/HIVE-15211
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Parser
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15211.patch
>
>
> Currently, we have some restrictions on the predicates that we can use in ON 
> clauses for inner joins (we have those restrictions for outer joins too, but 
> we will tackle that in a follow-up). Semantically equivalent queries can be 
> expressed if the predicate is introduced in the WHERE clause, but we would 
> like that user can express it both in ON and WHERE clause, as in standard SQL.
> This patch is an extension to overcome these restrictions for inner joins.
> It will allow to write queries that currently fail in Hive such as:
> {code:sql}
> -- Disjunctions
> SELECT *
> FROM src1 JOIN src
> ON (src1.key=src.key
>   OR src1.value between 100 and 102
>   OR src.value between 100 and 102)
> LIMIT 10;
> -- Conjunction with multiple inputs references in one side
> SELECT *
> FROM src1 JOIN src
> ON (src1.key+src.key >= 100
>   AND src1.key+src.key <= 102)
> LIMIT 10;
> -- Conjunct with no references
> SELECT *
> FROM src1 JOIN src
> ON (src1.value between 100 and 102
>   AND src.value between 100 and 102
>   AND true)
> LIMIT 10;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work started] (HIVE-15211) Provide support for complex expressions in ON clauses for INNER joins

2016-11-15 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-15211 started by Jesus Camacho Rodriguez.
--
> Provide support for complex expressions in ON clauses for INNER joins
> -
>
> Key: HIVE-15211
> URL: https://issues.apache.org/jira/browse/HIVE-15211
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Parser
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15211.patch
>
>
> Currently, we have some restrictions on the predicates that we can use in ON 
> clauses for inner joins (we have those restrictions for outer joins too, but 
> we will tackle that in a follow-up). Semantically equivalent queries can be 
> expressed if the predicate is introduced in the WHERE clause, but we would 
> like that user can express it both in ON and WHERE clause, as in standard SQL.
> This patch is an extension to overcome these restrictions for inner joins.
> It will allow to write queries that currently fail in Hive such as:
> {code:sql}
> -- Disjunctions
> SELECT *
> FROM src1 JOIN src
> ON (src1.key=src.key
>   OR src1.value between 100 and 102
>   OR src.value between 100 and 102)
> LIMIT 10;
> -- Conjunction with multiple inputs references in one side
> SELECT *
> FROM src1 JOIN src
> ON (src1.key+src.key >= 100
>   AND src1.key+src.key <= 102)
> LIMIT 10;
> -- Conjunct with no references
> SELECT *
> FROM src1 JOIN src
> ON (src1.value between 100 and 102
>   AND src.value between 100 and 102
>   AND true)
> LIMIT 10;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15208) Query string should be HTML encoded for Web UI

2016-11-15 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668235#comment-15668235
 ] 

Hive QA commented on HIVE-15208:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12839019/HIVE-15208.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10679 tests 
executed
*Failed tests:*
{noformat}
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=92)

[bucketmapjoin4.q,bucket_map_join_spark4.q,union21.q,groupby2_noskew.q,timestamp_2.q,date_join1.q,mergejoins.q,smb_mapjoin_11.q,auto_sortmerge_join_3.q,mapjoin_test_outer.q,vectorization_9.q,merge2.q,groupby6_noskew.q,auto_join_without_localtask.q,multi_join_union.q]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] 
(batchId=90)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2129/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2129/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2129/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12839019 - PreCommit-HIVE-Build

> Query string should be HTML encoded for Web UI
> --
>
> Key: HIVE-15208
> URL: https://issues.apache.org/jira/browse/HIVE-15208
> Project: Hive
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Attachments: HIVE-15208.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15211) Provide support for complex expressions in ON clauses for INNER joins

2016-11-15 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15211:
---
Summary: Provide support for complex expressions in ON clauses for INNER 
joins  (was: Extends support for complex expressions in inner joins ON clauses)

> Provide support for complex expressions in ON clauses for INNER joins
> -
>
> Key: HIVE-15211
> URL: https://issues.apache.org/jira/browse/HIVE-15211
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Parser
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> Currently, we have some restrictions on the predicates that we can use in ON 
> clauses for inner joins (we have those restrictions for outer joins too, but 
> we will tackle that in a follow-up). Semantically equivalent queries can be 
> expressed if the predicate is introduced in the WHERE clause, but we would 
> like that user can express it both in ON and WHERE clause, as in standard SQL.
> This patch is an extension to overcome these restrictions for inner joins.
> It will allow to write queries that currently fail in Hive such as:
> {code:sql}
> -- Disjunctions
> SELECT *
> FROM src1 JOIN src
> ON (src1.key=src.key
>   OR src1.value between 100 and 102
>   OR src.value between 100 and 102)
> LIMIT 10;
> -- Conjunction with multiple inputs references in one side
> SELECT *
> FROM src1 JOIN src
> ON (src1.key+src.key >= 100
>   AND src1.key+src.key <= 102)
> LIMIT 10;
> -- Conjunct with no references
> SELECT *
> FROM src1 JOIN src
> ON (src1.value between 100 and 102
>   AND src.value between 100 and 102
>   AND true)
> LIMIT 10;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14990) run all tests for MM tables and fix the issues that are found

2016-11-15 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668141#comment-15668141
 ] 

Sergey Shelukhin commented on HIVE-14990:
-

Hmm, this seems to have excluded the test code that makes all tables MM

> run all tests for MM tables and fix the issues that are found
> -
>
> Key: HIVE-14990
> URL: https://issues.apache.org/jira/browse/HIVE-14990
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, 
> HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, 
> HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, 
> HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.08.patch, 
> HIVE-14990.09.patch, HIVE-14990.09.patch, HIVE-14990.patch
>
>
> Expected failures 
> 1) All HCat tests (cannot write MM tables via the HCat writer)
> 2) Almost all merge tests (alter .. concat is not supported).
> 3) Tests that run dfs commands with specific paths (path changes).
> 4) Truncate column (not supported).
> 5) Describe formatted will have the new table fields in the output (before 
> merging MM with ACID).
> 6) Many tests w/explain extended - diff in partition "base file name" (path 
> changes).
> 7) TestTxnCommands - all the conversion tests, as they check for bucket count 
> using file lists (path changes).
> 8) HBase metastore tests cause methods are not implemented.
> 9) Some load and ExIm tests that export a table and then rely on specific 
> path for load (path changes).
> 10) Bucket map join/etc. - diffs; disabled the optimization for MM tables due 
> to how it accounts for buckets



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14990) run all tests for MM tables and fix the issues that are found

2016-11-15 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668051#comment-15668051
 ] 

Hive QA commented on HIVE-14990:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12839017/HIVE-14990.09.patch

{color:green}SUCCESS:{color} +1 due to 55 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 100 failed/errored test(s), 9991 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[combine3] (batchId=7)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view] (batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view_partitioned] 
(batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cteViews] (batchId=69)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_04_all_part] 
(batchId=26)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_05_some_part] 
(batchId=66)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_16_part_external] 
(batchId=53)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_17_part_managed] 
(batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_18_part_external] 
(batchId=64)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_19_00_part_external_location]
 (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[merge_dynamic_partition2]
 (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[merge_dynamic_partition3]
 (batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mm_all2] (batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mm_all] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mm_conversions] 
(batchId=67)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mm_insertonly_acid] 
(batchId=75)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_createas1] 
(batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat3]
 (batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat]
 (batchId=16)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[repl_1_drop] (batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[show_tablestatus] 
(batchId=69)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters1]
 (batchId=131)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters]
 (batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=131)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[autoColumnStats_1]
 (batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[autoColumnStats_2]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_unionDistinct_2]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[delete_where_no_match]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[delete_where_non_partitioned]
 (batchId=142)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_opt_vectorization]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_optimization]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[escape1] 
(batchId=136)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[escape2] 
(batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_into_with_schema]
 (batchId=137)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[load_dyn_part1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[load_dyn_part2]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_views]
 (batchId=138)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[unionDistinct_2]
 (batchId=137)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[update_all_types]
 (batchId=138)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[update_tmp_table]
 (batchId=142)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[update_where_no_match]
 (batchId=138)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[update_where_non_partitioned]
 (batchId=138)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver
 (batchId=156)

[jira] [Commented] (HIVE-15151) Bootstrap support for replv2

2016-11-15 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668032#comment-15668032
 ] 

Sergey Shelukhin commented on HIVE-15151:
-

Please see comment in parent JIRA... making sweeping code changes and moves 
with FIXME comments that intend to make more sweeping changes and moves 
repeatedly breaks anyone who does any work in parallel. Changes like that 
should be made with one commit or on the branch. 

> Bootstrap support for replv2
> 
>
> Key: HIVE-15151
> URL: https://issues.apache.org/jira/browse/HIVE-15151
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15151.2.patch, HIVE-15151.3.patch, 
> HIVE-15151.3.patch, HIVE-15151.4.patch, HIVE-15151.addendum.patch, 
> HIVE-15151.patch
>
>
> We need to support the ability to bootstrap an initial state, dumping out 
> currently existing dbs/tables, etc, so that incremental replication can take 
> over from that point. To this end, we should implement commands such as REPL 
> DUMP, REPL LOAD, REPL STATUS, as described over at 
> https://cwiki.apache.org/confluence/display/Hive/HiveReplicationv2Development



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15205) Create ReplDumpTask/ReplDumpWork for dumping out metadata

2016-11-15 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668018#comment-15668018
 ] 

Sergey Shelukhin commented on HIVE-15205:
-

Temporary -1
See the comment in parent JIRA "Is it possible to do work in the branch? This 
causes immense conflicts with hive-14535 branch, and I see tons of comments 
that purport with FIXMEs and stuff to move code around and refactor this and 
that.
I think this should be done on the branch and merged once "

> Create ReplDumpTask/ReplDumpWork for dumping out metadata
> -
>
> Key: HIVE-15205
> URL: https://issues.apache.org/jira/browse/HIVE-15205
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>
> The current bootstrap code generates dump metadata during semantic analysis 
> which breaks security and task/work abstraction. It also uses existing 
> classes (from Export/Import world) for code reuse purpose, but as a result 
> ends up dealing with a lot if-then-elses. It makes sense to have a cleaner 
> abstraction which uses ReplDumpTask and ReplDumpWork (to configure the Task). 
> Also perhaps worth evaluating ReplLoadTask/ReplLoadWork for load side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-15205) Create ReplDumpTask/ReplDumpWork for dumping out metadata

2016-11-15 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668018#comment-15668018
 ] 

Sergey Shelukhin edited comment on HIVE-15205 at 11/15/16 7:27 PM:
---

Temporary -1
See the comment in parent JIRA "Is it possible to do work in the branch? This 
causes immense conflicts with hive-14535 branch, and I see tons of comments 
that purport with FIXMEs and stuff to move code around and refactor this and 
that.
I think this should be done on the branch and merged once when it's ready"


was (Author: sershe):
Temporary -1
See the comment in parent JIRA "Is it possible to do work in the branch? This 
causes immense conflicts with hive-14535 branch, and I see tons of comments 
that purport with FIXMEs and stuff to move code around and refactor this and 
that.
I think this should be done on the branch and merged once "

> Create ReplDumpTask/ReplDumpWork for dumping out metadata
> -
>
> Key: HIVE-15205
> URL: https://issues.apache.org/jira/browse/HIVE-15205
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>
> The current bootstrap code generates dump metadata during semantic analysis 
> which breaks security and task/work abstraction. It also uses existing 
> classes (from Export/Import world) for code reuse purpose, but as a result 
> ends up dealing with a lot if-then-elses. It makes sense to have a cleaner 
> abstraction which uses ReplDumpTask and ReplDumpWork (to configure the Task). 
> Also perhaps worth evaluating ReplLoadTask/ReplLoadWork for load side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14841) Replication - Phase 2

2016-11-15 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668011#comment-15668011
 ] 

Sergey Shelukhin commented on HIVE-14841:
-

Is it possible to do work in the branch? This causes immense conflicts with 
hive-14535 branch, and I see tons of comments that purport with FIXMEs and 
stuff to move code around and refactor this and that.
I think this should be done on the branch and merged once when ready, so that 
conflicts with parallel changes to the code affected by the moves are minimized.

> Replication - Phase 2
> -
>
> Key: HIVE-14841
> URL: https://issues.apache.org/jira/browse/HIVE-14841
> Project: Hive
>  Issue Type: New Feature
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
>
> Per email sent out to the dev list, the current implementation of replication 
> in hive has certain drawbacks, for instance :
> * Replication follows a rubberbanding pattern, wherein different tables/ptns 
> can be in a different/mixed state on the destination, so that unless all 
> events are caught up on, we do not have an equivalent warehouse. Thus, this 
> only satisfies DR cases, not load balancing usecases, and the secondary 
> warehouse is really only seen as a backup, rather than as a live warehouse 
> that trails the primary.
> * The base implementation is a naive implementation, and has several 
> performance problems, including a large amount of duplication of data for 
> subsequent events, as mentioned in HIVE-13348, having to copy out entire 
> partitions/tables when just a delta of files might be sufficient/etc. Also, 
> using EXPORT/IMPORT allows us a simple implementation, but at the cost of 
> tons of temporary space, much of which is not actually applied at the 
> destination.
> Thus, to track this, we now create a new branch (repl2) and a uber-jira(this 
> one) to track experimental development towards improvement of this situation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15057) Support other types of operators (other than SELECT)

2016-11-15 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15057:

Attachment: (was: HIVE-15057.wip.patch)

> Support other types of operators (other than SELECT)
> 
>
> Key: HIVE-15057
> URL: https://issues.apache.org/jira/browse/HIVE-15057
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Physical Optimizer
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15057.wip.patch
>
>
> Currently only SELECT operators are supported for nested column pruning. We 
> should add support for other types of operators so the optimization can work 
> for complex queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15057) Support other types of operators (other than SELECT)

2016-11-15 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15057:

Attachment: HIVE-15057.wip.patch

> Support other types of operators (other than SELECT)
> 
>
> Key: HIVE-15057
> URL: https://issues.apache.org/jira/browse/HIVE-15057
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Physical Optimizer
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15057.wip.patch
>
>
> Currently only SELECT operators are supported for nested column pruning. We 
> should add support for other types of operators so the optimization can work 
> for complex queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15208) Query string should be HTML encoded for Web UI

2016-11-15 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667957#comment-15667957
 ] 

Xuefu Zhang commented on HIVE-15208:


+1

> Query string should be HTML encoded for Web UI
> --
>
> Key: HIVE-15208
> URL: https://issues.apache.org/jira/browse/HIVE-15208
> Project: Hive
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Attachments: HIVE-15208.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones

2016-11-15 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667931#comment-15667931
 ] 

Steve Loughran commented on HIVE-15199:
---

sounds related to HADOOP-13402

I am not going to express any opinion about what is "the correct" behaviour we 
should expect from rename, as I don't think anyone knows that. If you look at 
the [FS 
Specification|https://hadoop.apache.org/docs/stable2/hadoop-project-dist/hadoop-common/filesystem/filesystem.html]
 we're pretty explicit that rename is hard, and that there are different 
behaviours by different filesystems are.

I'm not defending S3A here, just noting I'm not 100% sure of what HDFS does 
itself here, and how that compares to the semantics of posix's rename call 
(which is different from the unix command line {{mv}} operation).

> INSERT INTO data on S3 is replacing the old rows with the new ones
> --
>
> Key: HIVE-15199
> URL: https://issues.apache.org/jira/browse/HIVE-15199
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Critical
>
> Any INSERT INTO statement run on S3 tables and when the scratch directory is 
> saved on S3 is deleting old rows of the table.
> {noformat}
> hive> set hive.blobstore.use.blobstore.as.scratchdir=true;
> hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1';
> hive> insert into table t1 values (1,'name1');
> hive> select * from t1;
> 1   name1
> hive> insert into table t1 values (2,'name2');
> hive> select * from t1;
> 2   name2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15208) Query string should be HTML encoded for Web UI

2016-11-15 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-15208:
---
Status: Patch Available  (was: Open)

Submitted a patch that HTML-escapes the query string.

> Query string should be HTML encoded for Web UI
> --
>
> Key: HIVE-15208
> URL: https://issues.apache.org/jira/browse/HIVE-15208
> Project: Hive
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Attachments: HIVE-15208.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15208) Query string should be HTML encoded for Web UI

2016-11-15 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-15208:
---
Attachment: HIVE-15208.1.patch

> Query string should be HTML encoded for Web UI
> --
>
> Key: HIVE-15208
> URL: https://issues.apache.org/jira/browse/HIVE-15208
> Project: Hive
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Attachments: HIVE-15208.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14990) run all tests for MM tables and fix the issues that are found

2016-11-15 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14990:

Attachment: HIVE-14990.09.patch

There are massive conflicts with master due to replication patch. Will attach 
the diff for now before the merge (that basically undoes some master patches)

> run all tests for MM tables and fix the issues that are found
> -
>
> Key: HIVE-14990
> URL: https://issues.apache.org/jira/browse/HIVE-14990
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, 
> HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, 
> HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, 
> HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.08.patch, 
> HIVE-14990.09.patch, HIVE-14990.09.patch, HIVE-14990.patch
>
>
> Expected failures 
> 1) All HCat tests (cannot write MM tables via the HCat writer)
> 2) Almost all merge tests (alter .. concat is not supported).
> 3) Tests that run dfs commands with specific paths (path changes).
> 4) Truncate column (not supported).
> 5) Describe formatted will have the new table fields in the output (before 
> merging MM with ACID).
> 6) Many tests w/explain extended - diff in partition "base file name" (path 
> changes).
> 7) TestTxnCommands - all the conversion tests, as they check for bucket count 
> using file lists (path changes).
> 8) HBase metastore tests cause methods are not implemented.
> 9) Some load and ExIm tests that export a table and then rely on specific 
> path for load (path changes).
> 10) Bucket map join/etc. - diffs; disabled the optimization for MM tables due 
> to how it accounts for buckets



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones

2016-11-15 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667862#comment-15667862
 ] 

Sahil Takiar commented on HIVE-15199:
-

The code block Sergio posted looks like it could have some major 
inefficiencies, even for HDFS. If my understanding is correct, the code 
basically tries to rename the data with the suffix {{... + "_copy_" + 
counter}}, if it fails (because the files already exists), it increments the 
counter and then tries again. This doesn't sound like a scalable solution, what 
happens if there are 1000 files under the directory, any insert will require 
explicitly checking for the existence of files from {{... + "_copy_0"}} to 
{{... + "_copy_1000"}}. On HDFS, and especially on S3, this doesn't seem to be 
a very efficient approach (would be good to confirm this behavior).

If the logic above is indeed what happens, there could be a few different ways 
to fix this.

1: Append an UUID to the end of the file name rather than using a counter, 
since UUID are globally unique there should be no chance of conflict
2: Append the query_id + a synchronized counter ({{private synchronized long 
counter}}) to the file name

> INSERT INTO data on S3 is replacing the old rows with the new ones
> --
>
> Key: HIVE-15199
> URL: https://issues.apache.org/jira/browse/HIVE-15199
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Critical
>
> Any INSERT INTO statement run on S3 tables and when the scratch directory is 
> saved on S3 is deleting old rows of the table.
> {noformat}
> hive> set hive.blobstore.use.blobstore.as.scratchdir=true;
> hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1';
> hive> insert into table t1 values (1,'name1');
> hive> select * from t1;
> 1   name1
> hive> insert into table t1 values (2,'name2');
> hive> select * from t1;
> 2   name2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15209) Set hive.strict.checks.cartesian.product to false by default

2016-11-15 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667850#comment-15667850
 ] 

Jesus Camacho Rodriguez commented on HIVE-15209:


[~xuefuz], OK, then it does not violate compliance... but I would argue it is 
better to disable it by default and let the admin decide whether users should 
be able to execute cartesian products or not. That would give us better tests 
coverage and the possibility to run randomly a cartesian product if we want to.

Btw, cartesian product does not need to be specified explicitly; it might be 
produced by the Calcite optimizer too e.g. if we can prune with a filter on a 
constant equality both inputs of a join.

> Set hive.strict.checks.cartesian.product to false by default
> 
>
> Key: HIVE-15209
> URL: https://issues.apache.org/jira/browse/HIVE-15209
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15209.patch
>
>
> If we aim to make Hive compliant with SQL, w should disable this property by 
> default, as expressing a cartesian product, though inefficient, is perfectly 
> valid in SQL.
> Further, if we express complex predicates in the ON clause of a SQL query, we 
> might not be able to push these predicates to the join operator; however, we 
> should still be able to execute the query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15204) Hive-Hbase integration thorws "java.lang.ClassNotFoundException: NULL::character varying" (Postgres)

2016-11-15 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667842#comment-15667842
 ] 

Sergey Shelukhin commented on HIVE-15204:
-

Probably dup of HIVE-14322

> Hive-Hbase integration thorws "java.lang.ClassNotFoundException: 
> NULL::character varying" (Postgres)
> 
>
> Key: HIVE-15204
> URL: https://issues.apache.org/jira/browse/HIVE-15204
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 2.1.0
> Environment: apache-hive-2.1.0-bin
> hbase-1.1.1
>Reporter: Anshuman
>  Labels: Postgres
>
> When doing hive to hbase integration, we have observed that current Apache 
> Hive 2.x is not able to recognise 'NULL::character varying' (Variant data 
> type of NULL in prostgres) properly and throws the 
> java.lang.ClassNotFoundException exception.
> Exception:
> ERROR ql.Driver: FAILED: RuntimeException java.lang.ClassNotFoundException: 
> NULL::character varying
> java.lang.RuntimeException: java.lang.ClassNotFoundException: NULL::character 
> varying
> 
> Caused by: java.lang.ClassNotFoundException: NULL::character varying
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> Reason:
> org.apache.hadoop.hive.ql.metadata.Table.java
> final public Class getInputFormatClass() {
> if (inputFormatClass == null) {
>   try {
> String className = tTable.getSd().getInputFormat();
> if (className == null) {  /*If the className is one of the postgres 
> variant of NULL i.e. 'NULL::character varying' control is going to else block 
> and throwing error.*/
>   if (getStorageHandler() == null) {
> return null;
>   }
>   inputFormatClass = getStorageHandler().getInputFormatClass();
> } else {
>   inputFormatClass = (Class)
> Class.forName(className, true, 
> Utilities.getSessionSpecifiedClassLoader());
> }
>   } catch (ClassNotFoundException e) {
> throw new RuntimeException(e);
>   }
> }
> return inputFormatClass;
>   }
> Steps to reproduce:
> Hive 2.x (e.g. apache-hive-2.1.0-bin) and HBase (e.g. hbase-1.1.1)
> 1. Install and configure Hive, if it is not already installed.
> 2. Install and configure HBase, if it is not already installed.
> 3. Configure the hive-site.xml File (as per recommended steps)
> 4. Provide necessary jars to Hive (as per recommended steps)
> 4. Create table in HBase as shown below -
> create 'hivehbase', 'ratings'
> put 'hivehbase', 'row1', 'ratings:userid', 'user1'
> put 'hivehbase', 'row1', 'ratings:bookid', 'book1'
> put 'hivehbase', 'row1', 'ratings:rating', '1'
>  
> put 'hivehbase', 'row2', 'ratings:userid', 'user2'
> put 'hivehbase', 'row2', 'ratings:bookid', 'book1'
> put 'hivehbase', 'row2', 'ratings:rating', '3'
>  
> put 'hivehbase', 'row3', 'ratings:userid', 'user2'
> put 'hivehbase', 'row3', 'ratings:bookid', 'book2'
> put 'hivehbase', 'row3', 'ratings:rating', '3'
>  
> put 'hivehbase', 'row4', 'ratings:userid', 'user2'
> put 'hivehbase', 'row4', 'ratings:bookid', 'book4'
> put 'hivehbase', 'row4', 'ratings:rating', '1'
> 5. Create external table as shown below 
> CREATE EXTERNAL TABLE hbasehive_table
> (key string, userid string,bookid string,rating int) 
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES 
> ("hbase.columns.mapping" = 
> ":key,ratings:userid,ratings:bookid,ratings:rating")
> TBLPROPERTIES ("hbase.table.name" = "hivehbase");
> 6. select * from hbasehive_table;
> FAILED: RuntimeException java.lang.ClassNotFoundException: NULL::character 
> varying



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15209) Set hive.strict.checks.cartesian.product to false by default

2016-11-15 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667839#comment-15667839
 ] 

Jesus Camacho Rodriguez commented on HIVE-15209:


[~sershe], thank you for pointing that out; I did not remember that.

Then I agree with [~xuefuz] that a conscious decision should be made so we do 
not flip this back and forth as he said. Probably it all depends on the 
direction that we want to give to the project...

> Set hive.strict.checks.cartesian.product to false by default
> 
>
> Key: HIVE-15209
> URL: https://issues.apache.org/jira/browse/HIVE-15209
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15209.patch
>
>
> If we aim to make Hive compliant with SQL, w should disable this property by 
> default, as expressing a cartesian product, though inefficient, is perfectly 
> valid in SQL.
> Further, if we express complex predicates in the ON clause of a SQL query, we 
> might not be able to push these predicates to the join operator; however, we 
> should still be able to execute the query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15209) Set hive.strict.checks.cartesian.product to false by default

2016-11-15 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667837#comment-15667837
 ] 

Xuefu Zhang commented on HIVE-15209:


I don't think erring out and saying that "Cartesian products are disabled for 
safety reasons" violates compliance. Further I'd argue that being compliant is 
not our ultimate goal.

As to b/c, the default has been true since HIVE-12727, which is released in 
2.0. I don't think we should make existing user suffer just to make new users 
happier.

> Set hive.strict.checks.cartesian.product to false by default
> 
>
> Key: HIVE-15209
> URL: https://issues.apache.org/jira/browse/HIVE-15209
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15209.patch
>
>
> If we aim to make Hive compliant with SQL, w should disable this property by 
> default, as expressing a cartesian product, though inefficient, is perfectly 
> valid in SQL.
> Further, if we express complex predicates in the ON clause of a SQL query, we 
> might not be able to push these predicates to the join operator; however, we 
> should still be able to execute the query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

1 - 100 of 143 matches

Mail list logo