[jira] [Updated] (HIVE-11278) Partition.setOutputFormatClass should not do toString for Class object

2015-08-07 Thread Rajat Khandelwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajat Khandelwal updated HIVE-11278:

Attachment: (was: HIVE-11278_2015-07-20_16:06:40.patch)

> Partition.setOutputFormatClass should not do toString for Class object 
> ---
>
> Key: HIVE-11278
> URL: https://issues.apache.org/jira/browse/HIVE-11278
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java#L286
> inside setInputFormatClass, we're doing:
> {noformat}
>  public void setInputFormatClass(Class 
> inputFormatClass) {
> this.inputFormatClass = inputFormatClass;
> tPartition.getSd().setInputFormat(inputFormatClass.getName());
>   }
> {noformat}
> But inside setOutputFormatClass, we're doing toString for class, instead of 
> getName().
> {noformat}
>   public void setOutputFormatClass(Class 
> outputFormatClass) {
> this.outputFormatClass = outputFormatClass;
> tPartition.getSd().setOutputFormat(HiveFileFormatUtils
> .getOutputFormatSubstitute(outputFormatClass).toString());
>   }
> {noformat}
> Difference is that, for Class A.class, toString is "class A.class", getName 
> is "A.class". So Class.forName(cls.getName()) succeeds, but 
> Class.forName(cls.toString()) is not valid. 
> So if you get a partition, set outputformat, and make an alter call, then get 
> the partition again and make a getOutputFormatClass call on that object, it 
> throws a ClassNotFoundException on 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java#L316,
>  because it's basically calling Class.forName("class a.b.c.ClassName.class") 
> which is wrong!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11278) Partition.setOutputFormatClass should not do toString for Class object

2015-08-07 Thread Rajat Khandelwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajat Khandelwal updated HIVE-11278:

Attachment: HIVE-11278.01.patch

> Partition.setOutputFormatClass should not do toString for Class object 
> ---
>
> Key: HIVE-11278
> URL: https://issues.apache.org/jira/browse/HIVE-11278
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Attachments: HIVE-11278.01.patch
>
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java#L286
> inside setInputFormatClass, we're doing:
> {noformat}
>  public void setInputFormatClass(Class 
> inputFormatClass) {
> this.inputFormatClass = inputFormatClass;
> tPartition.getSd().setInputFormat(inputFormatClass.getName());
>   }
> {noformat}
> But inside setOutputFormatClass, we're doing toString for class, instead of 
> getName().
> {noformat}
>   public void setOutputFormatClass(Class 
> outputFormatClass) {
> this.outputFormatClass = outputFormatClass;
> tPartition.getSd().setOutputFormat(HiveFileFormatUtils
> .getOutputFormatSubstitute(outputFormatClass).toString());
>   }
> {noformat}
> Difference is that, for Class A.class, toString is "class A.class", getName 
> is "A.class". So Class.forName(cls.getName()) succeeds, but 
> Class.forName(cls.toString()) is not valid. 
> So if you get a partition, set outputformat, and make an alter call, then get 
> the partition again and make a getOutputFormatClass call on that object, it 
> throws a ClassNotFoundException on 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java#L316,
>  because it's basically calling Class.forName("class a.b.c.ClassName.class") 
> which is wrong!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11304) Migrate to Log4j2 from Log4j 1.x

2015-08-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661434#comment-14661434
 ] 

Hive QA commented on HIVE-11304:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12748981/HIVE-11304.7.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9312 tests executed
*Failed tests:*
{noformat}
TestJdbcWithMiniHS2 - did not produce a TEST-*.xml file
TestSSL - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4855/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4855/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4855/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12748981 - PreCommit-HIVE-TRUNK-Build

> Migrate to Log4j2 from Log4j 1.x
> 
>
> Key: HIVE-11304
> URL: https://issues.apache.org/jira/browse/HIVE-11304
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-11304.2.patch, HIVE-11304.3.patch, 
> HIVE-11304.4.patch, HIVE-11304.5.patch, HIVE-11304.6.patch, 
> HIVE-11304.7.patch, HIVE-11304.patch
>
>
> Log4J2 has some great benefits and can benefit hive significantly. Some 
> notable features include
> 1) Performance (parametrized logging, performance when logging is disabled 
> etc.) More details can be found here 
> https://logging.apache.org/log4j/2.x/performance.html
> 2) RoutingAppender - Route logs to different log files based on MDC context 
> (useful for HS2, LLAP etc.)
> 3) Asynchronous logging
> This is an umbrella jira to track changes related to Log4j2 migration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11304) Migrate to Log4j2 from Log4j 1.x

2015-08-07 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661437#comment-14661437
 ] 

Gopal V commented on HIVE-11304:


Added to my nightly builds.

> Migrate to Log4j2 from Log4j 1.x
> 
>
> Key: HIVE-11304
> URL: https://issues.apache.org/jira/browse/HIVE-11304
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-11304.2.patch, HIVE-11304.3.patch, 
> HIVE-11304.4.patch, HIVE-11304.5.patch, HIVE-11304.6.patch, 
> HIVE-11304.7.patch, HIVE-11304.patch
>
>
> Log4J2 has some great benefits and can benefit hive significantly. Some 
> notable features include
> 1) Performance (parametrized logging, performance when logging is disabled 
> etc.) More details can be found here 
> https://logging.apache.org/log4j/2.x/performance.html
> 2) RoutingAppender - Route logs to different log files based on MDC context 
> (useful for HS2, LLAP etc.)
> 3) Asynchronous logging
> This is an umbrella jira to track changes related to Log4j2 migration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11498) HIVE Authorization v2 should not check permission for dummy entity

2015-08-07 Thread Dapeng Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dapeng Sun updated HIVE-11498:
--
Affects Version/s: 1.2.0

> HIVE Authorization v2 should not check permission for dummy entity
> --
>
> Key: HIVE-11498
> URL: https://issues.apache.org/jira/browse/HIVE-11498
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11498.001.patch, HIVE-11498.002.patch, 
> HIVE-11498.003.patch
>
>
> The queries like {{SELECT 1+1;}}, The target table and database will set to 
> {{_dummy_database}} {{_dummy_table}}, authorization should skip these kinds 
> of databases or tables.
> For authz v1. it has skip them.
> eg1. [Source code at 
> github|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L600]
> {noformat}
> for (WriteEntity write : outputs) {
> if (write.isDummy() || write.isPathType()) {
>   continue;
> }
> {noformat}
> eg2. [Source code at 
> github|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L633]
> {noformat}
> for (ReadEntity read : inputs) {
> if (read.isDummy() || read.isPathType()) {
>   continue;
> }
>...
> }
> {noformat}
> ...
> This patch will fix authz v2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11498) HIVE Authorization v2 should not check permission for dummy entity

2015-08-07 Thread Dapeng Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dapeng Sun updated HIVE-11498:
--
Fix Version/s: 2.0.0
   1.3.0

> HIVE Authorization v2 should not check permission for dummy entity
> --
>
> Key: HIVE-11498
> URL: https://issues.apache.org/jira/browse/HIVE-11498
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11498.001.patch, HIVE-11498.002.patch, 
> HIVE-11498.003.patch
>
>
> The queries like {{SELECT 1+1;}}, The target table and database will set to 
> {{_dummy_database}} {{_dummy_table}}, authorization should skip these kinds 
> of databases or tables.
> For authz v1. it has skip them.
> eg1. [Source code at 
> github|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L600]
> {noformat}
> for (WriteEntity write : outputs) {
> if (write.isDummy() || write.isPathType()) {
>   continue;
> }
> {noformat}
> eg2. [Source code at 
> github|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L633]
> {noformat}
> for (ReadEntity read : inputs) {
> if (read.isDummy() || read.isPathType()) {
>   continue;
> }
>...
> }
> {noformat}
> ...
> This patch will fix authz v2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11498) HIVE Authorization v2 should not check permission for dummy entity

2015-08-07 Thread Dapeng Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dapeng Sun updated HIVE-11498:
--
Component/s: Authorization

> HIVE Authorization v2 should not check permission for dummy entity
> --
>
> Key: HIVE-11498
> URL: https://issues.apache.org/jira/browse/HIVE-11498
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.2.0, 1.3.0, 2.0.0
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11498.001.patch, HIVE-11498.002.patch, 
> HIVE-11498.003.patch
>
>
> The queries like {{SELECT 1+1;}}, The target table and database will set to 
> {{_dummy_database}} {{_dummy_table}}, authorization should skip these kinds 
> of databases or tables.
> For authz v1. it has skip them.
> eg1. [Source code at 
> github|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L600]
> {noformat}
> for (WriteEntity write : outputs) {
> if (write.isDummy() || write.isPathType()) {
>   continue;
> }
> {noformat}
> eg2. [Source code at 
> github|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L633]
> {noformat}
> for (ReadEntity read : inputs) {
> if (read.isDummy() || read.isPathType()) {
>   continue;
> }
>...
> }
> {noformat}
> ...
> This patch will fix authz v2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11498) HIVE Authorization v2 should not check permission for dummy entity

2015-08-07 Thread Dapeng Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dapeng Sun updated HIVE-11498:
--
Affects Version/s: 2.0.0
   1.3.0

> HIVE Authorization v2 should not check permission for dummy entity
> --
>
> Key: HIVE-11498
> URL: https://issues.apache.org/jira/browse/HIVE-11498
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.2.0, 1.3.0, 2.0.0
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11498.001.patch, HIVE-11498.002.patch, 
> HIVE-11498.003.patch
>
>
> The queries like {{SELECT 1+1;}}, The target table and database will set to 
> {{_dummy_database}} {{_dummy_table}}, authorization should skip these kinds 
> of databases or tables.
> For authz v1. it has skip them.
> eg1. [Source code at 
> github|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L600]
> {noformat}
> for (WriteEntity write : outputs) {
> if (write.isDummy() || write.isPathType()) {
>   continue;
> }
> {noformat}
> eg2. [Source code at 
> github|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L633]
> {noformat}
> for (ReadEntity read : inputs) {
> if (read.isDummy() || read.isPathType()) {
>   continue;
> }
>...
> }
> {noformat}
> ...
> This patch will fix authz v2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-3843) .q test case for HIVE-3446

2015-08-07 Thread Sam Tunnicliffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated HIVE-3843:
--
Assignee: (was: Sam Tunnicliffe)

> .q test case for HIVE-3446
> --
>
> Key: HIVE-3843
> URL: https://issues.apache.org/jira/browse/HIVE-3843
> Project: Hive
>  Issue Type: Test
>  Components: Tests, Types
>Affects Versions: 0.11.0
>Reporter: Ashutosh Chauhan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11498) HIVE Authorization v2 should not check permission for dummy entity

2015-08-07 Thread Dong Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661515#comment-14661515
 ] 

Dong Chen commented on HIVE-11498:
--

[~dapengsun], thanks for the patch.

LGTM, pending tests.

[~thejas] would you like to take a look at this change about auth v2, and 
further comments? Thanks.

> HIVE Authorization v2 should not check permission for dummy entity
> --
>
> Key: HIVE-11498
> URL: https://issues.apache.org/jira/browse/HIVE-11498
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.2.0, 1.3.0, 2.0.0
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11498.001.patch, HIVE-11498.002.patch, 
> HIVE-11498.003.patch
>
>
> The queries like {{SELECT 1+1;}}, The target table and database will set to 
> {{_dummy_database}} {{_dummy_table}}, authorization should skip these kinds 
> of databases or tables.
> For authz v1. it has skip them.
> eg1. [Source code at 
> github|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L600]
> {noformat}
> for (WriteEntity write : outputs) {
> if (write.isDummy() || write.isPathType()) {
>   continue;
> }
> {noformat}
> eg2. [Source code at 
> github|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L633]
> {noformat}
> for (ReadEntity read : inputs) {
> if (read.isDummy() || read.isPathType()) {
>   continue;
> }
>...
> }
> {noformat}
> ...
> This patch will fix authz v2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11391) CBO (Calcite Return Path): Add CBO tests with return path on

2015-08-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661527#comment-14661527
 ] 

Hive QA commented on HIVE-11391:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12749038/HIVE-11391.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9340 tests executed
*Failed tests:*
{noformat}
org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4856/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4856/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4856/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12749038 - PreCommit-HIVE-TRUNK-Build

> CBO (Calcite Return Path): Add CBO tests with return path on
> 
>
> Key: HIVE-11391
> URL: https://issues.apache.org/jira/browse/HIVE-11391
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11391.patch, HIVE-11391.patch, HIVE-11391.patch, 
> HIVE-11391.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11398) Parse wide OR and wide AND trees to flat OR/AND trees

2015-08-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661616#comment-14661616
 ] 

Hive QA commented on HIVE-11398:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12749043/HIVE-11398.2.patch

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 9327 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_deep_filters
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_17
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_17
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_17
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4857/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4857/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4857/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12749043 - PreCommit-HIVE-TRUNK-Build

> Parse wide OR and wide AND trees to flat OR/AND trees
> -
>
> Key: HIVE-11398
> URL: https://issues.apache.org/jira/browse/HIVE-11398
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer, UDF
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11398.2.patch, HIVE-11398.patch
>
>
> Deep trees of AND/OR are hard to traverse particularly when they are merely 
> the same structure in nested form as a version of the operator that takes an 
> arbitrary number of args.
> One potential way to convert the DFS searches into a simpler BFS search is to 
> introduce a new Operator pair named ALL and ANY.
> ALL(A, B, C, D, E) represents AND(AND(AND(AND(E, D), C), B), A)
> ANY(A, B, C, D, E) represents OR(OR(OR(OR(E, D), C),B),A)
> The SemanticAnalyser would be responsible for generating these operators and 
> this would mean that the depth and complexity of traversals for the simplest 
> case of wide AND/OR trees would be trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11436) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : dealing with empty char

2015-08-07 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661646#comment-14661646
 ] 

Jesus Camacho Rodriguez commented on HIVE-11436:


+1

> CBO: Calcite Operator To Hive Operator (Calcite Return Path) : dealing with 
> empty char
> --
>
> Key: HIVE-11436
> URL: https://issues.apache.org/jira/browse/HIVE-11436
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11436.01.patch, HIVE-11436.02.patch, 
> HIVE-11436.03.patch, HIVE-11436.04.patch
>
>
> BaseCharUtils checks whether the length of a char is in between [1,255]. This 
> causes return path to throw error when the the length of a char is 0. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11437) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : dealing with insert into

2015-08-07 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661653#comment-14661653
 ] 

Jesus Camacho Rodriguez commented on HIVE-11437:


+1 pending QA run

> CBO: Calcite Operator To Hive Operator (Calcite Return Path) : dealing with 
> insert into
> ---
>
> Key: HIVE-11437
> URL: https://issues.apache.org/jira/browse/HIVE-11437
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11437.01.patch, HIVE-11437.02.patch, 
> HIVE-11437.03.patch, HIVE-11437.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11397) Parse Hive OR clauses as they are written into the AST

2015-08-07 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11397:
---
Attachment: HIVE-11397.2.patch

> Parse Hive OR clauses as they are written into the AST
> --
>
> Key: HIVE-11397
> URL: https://issues.apache.org/jira/browse/HIVE-11397
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11397.1.patch, HIVE-11397.2.patch, 
> HIVE-11397.2.patch, HIVE-11397.patch
>
>
> When parsing A OR B OR C, hive converts it into 
> (C OR B) OR A
> instead of turning it into
> A OR (B OR C)
> {code}
> GenericUDFOPOr or = new GenericUDFOPOr();
> List expressions = new ArrayList(2);
> expressions.add(previous);
> expressions.add(current);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11391) CBO (Calcite Return Path): Add CBO tests with return path on

2015-08-07 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661663#comment-14661663
 ] 

Jesus Camacho Rodriguez commented on HIVE-11391:


Fail is not related to the patch.

> CBO (Calcite Return Path): Add CBO tests with return path on
> 
>
> Key: HIVE-11391
> URL: https://issues.apache.org/jira/browse/HIVE-11391
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11391.patch, HIVE-11391.patch, HIVE-11391.patch, 
> HIVE-11391.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11461) Transform flat AND/OR into IN struct clause

2015-08-07 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11461:
---
Attachment: HIVE-11461.2.patch

> Transform flat AND/OR into IN struct clause
> ---
>
> Key: HIVE-11461
> URL: https://issues.apache.org/jira/browse/HIVE-11461
> Project: Hive
>  Issue Type: Bug
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11461.1.patch, HIVE-11461.2.patch, HIVE-11461.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11398) Parse wide OR and wide AND trees to flat OR/AND trees

2015-08-07 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11398:
---
Attachment: HIVE-11398.3.patch

> Parse wide OR and wide AND trees to flat OR/AND trees
> -
>
> Key: HIVE-11398
> URL: https://issues.apache.org/jira/browse/HIVE-11398
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer, UDF
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11398.2.patch, HIVE-11398.3.patch, HIVE-11398.patch
>
>
> Deep trees of AND/OR are hard to traverse particularly when they are merely 
> the same structure in nested form as a version of the operator that takes an 
> arbitrary number of args.
> One potential way to convert the DFS searches into a simpler BFS search is to 
> introduce a new Operator pair named ALL and ANY.
> ALL(A, B, C, D, E) represents AND(AND(AND(AND(E, D), C), B), A)
> ANY(A, B, C, D, E) represents OR(OR(OR(OR(E, D), C),B),A)
> The SemanticAnalyser would be responsible for generating these operators and 
> this would mean that the depth and complexity of traversals for the simplest 
> case of wide AND/OR trees would be trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11480) CBO: Calcite Operator To Hive Operator (Calcite Return Path): char/varchar as input to GenericUDAF

2015-08-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661814#comment-14661814
 ] 

Hive QA commented on HIVE-11480:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12749095/HIVE-11480.01.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9326 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter_partitioned
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty
org.apache.hive.jdbc.TestSSL.testSSLFetchHttp
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4858/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4858/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4858/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12749095 - PreCommit-HIVE-TRUNK-Build

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): char/varchar as 
> input to GenericUDAF 
> ---
>
> Key: HIVE-11480
> URL: https://issues.apache.org/jira/browse/HIVE-11480
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11480.01.patch
>
>
> Some of the UDAF can not deal with char/varchar correctly when return path is 
> on, for example udaf_number_format.q.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10880) The bucket number is not respected in insert overwrite.

2015-08-07 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661854#comment-14661854
 ] 

Yongzhi Chen commented on HIVE-10880:
-

Thanks [~xuefuz] for reviewing the code.

> The bucket number is not respected in insert overwrite.
> ---
>
> Key: HIVE-10880
> URL: https://issues.apache.org/jira/browse/HIVE-10880
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>Priority: Critical
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-10880.1.patch, HIVE-10880.2.patch, 
> HIVE-10880.3.patch, HIVE-10880.4.patch
>
>
> When hive.enforce.bucketing is true, the bucket number defined in the table 
> is no longer respected in current master and 1.2. 
> Reproduce:
> {code:sql}
> CREATE TABLE IF NOT EXISTS buckettestinput( 
> data string 
> ) 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> CREATE TABLE IF NOT EXISTS buckettestoutput1( 
> data string 
> )CLUSTERED BY(data) 
> INTO 2 BUCKETS 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> CREATE TABLE IF NOT EXISTS buckettestoutput2( 
> data string 
> )CLUSTERED BY(data) 
> INTO 2 BUCKETS 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> {code}
> Then I inserted the following data into the "buckettestinput" table:
> {noformat}
> firstinsert1 
> firstinsert2 
> firstinsert3 
> firstinsert4 
> firstinsert5 
> firstinsert6 
> firstinsert7 
> firstinsert8 
> secondinsert1 
> secondinsert2 
> secondinsert3 
> secondinsert4 
> secondinsert5 
> secondinsert6 
> secondinsert7 
> secondinsert8
> {noformat}
> {code:sql}
> set hive.enforce.bucketing = true; 
> set hive.enforce.sorting=true;
> insert overwrite table buckettestoutput1 
> select * from buckettestinput where data like 'first%';
> set hive.auto.convert.sortmerge.join=true; 
> set hive.optimize.bucketmapjoin = true; 
> set hive.optimize.bucketmapjoin.sortedmerge = true; 
> select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
> {code}
> {noformat}
> Error: Error while compiling statement: FAILED: SemanticException [Error 
> 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
> bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
> of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
> (state=42000,code=10141)
> {noformat}
> The related debug information related to insert overwrite:
> {noformat}
> 0: jdbc:hive2://localhost:1> insert overwrite table buckettestoutput1 
> select * from buckettestinput where data like 'first%'insert overwrite table 
> buckettestoutput1 
> 0: jdbc:hive2://localhost:1> ;
> select * from buckettestinput where data like ' 
> first%';
> INFO  : Number of reduce tasks determined at compile time: 2
> INFO  : In order to change the average load for a reducer (in bytes):
> INFO  :   set hive.exec.reducers.bytes.per.reducer=
> INFO  : In order to limit the maximum number of reducers:
> INFO  :   set hive.exec.reducers.max=
> INFO  : In order to set a constant number of reducers:
> INFO  :   set mapred.reduce.tasks=
> INFO  : Job running in-process (local Hadoop)
> INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
> INFO  : Ended Job = job_local107155352_0001
> INFO  : Loading data to table default.buckettestoutput1 from 
> file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1
> INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, 
> totalSize=52, rawDataSize=48]
> No rows affected (1.692 seconds)
> {noformat}
> Insert use dynamic partition does not have the issue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7594) Hive JDBC client: "out of sequence response" on large long running query

2015-08-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-7594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661855#comment-14661855
 ] 

Csaba Németi commented on HIVE-7594:


Hi. I got the same error running a simple query that doesn't take minutes...
What I discovered (I'm 1 month old in Hadoop & Hive so, take this in account 
please) is the following: I think SQL processing is asynchronous and this raise 
serious problems if you try to run two queries in parallel. So the way I fixed 
the problem was to synchronize the methods using Hive connection.

> Hive JDBC client: "out of sequence response" on large long running query
> 
>
> Key: HIVE-7594
> URL: https://issues.apache.org/jira/browse/HIVE-7594
> Project: Hive
>  Issue Type: Bug
>  Components: Clients, HiveServer2
>Affects Versions: 0.13.0
> Environment: HDP2.1
>Reporter: Hari Sekhon
>
> When executing a long running query in a JDBC client (Squirrel) to 
> HiveServer2 after several minutes I get this error in the client:
> {code}
> Error: org.apache.thrift.TApplicationException: ExecuteStatement failed: out 
> of sequence response
> SQLState:  08S01
> ErrorCode: 0
> {code}
> I've seen this before in, iirc when running 2 queries in 1 session but I've 
> closed the client and run only this single query in a new session each time. 
> I did a search and saw HIVE-6893 referring to a Metastore exception which I 
> have in some older logs but not corresponding / recent in these recent 
> instances, the error seems different in this case but may be related.
> The query to reproduce is "select count(*) from myTable" where myTable is a 
> 1TB table of 620 million rows. This happens in both MR and Tez execution 
> engines running on Yarn.
> Here are all the jars I've added to the classpath (taken from Hortonworks doc 
> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1-latest/bk_dataintegration/content/ch_using-hive-2.html,
>  plus added hadoop-common, hive-exec and slf4j-api to solve class not found 
> issues on top of that):
> commons-codec-1.4.jar
> commons-logging-1.1.3.jar
> hadoop-common-2.4.0.2.1.3.0-563.jar
> hive-exec-0.13.0.2.1.3.0-563.jar
> hive-jdbc-0.13.0.2.1.3.0-563.jar
> hive-service-0.13.0.2.1.3.0-563.jar
> httpclient-4.2.5.jar
> httpcore-4.2.5.jar
> libthrift-0.9.0.jar
> slf4j-api-1.7.5.jar
> I am seeing errors like this in the hiveserver2.log:
> {code}
> 2014-08-01 15:04:31,358 ERROR [pool-5-thread-3]: server.TThreadPoolServer 
> (TThreadPoolServer.java:run(215)) - Error occurred during processing of 
> message.
> java.lang.RuntimeException: org.apache.thrift.transport.TTransportException
> at 
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.thrift.transport.TTransportException
> at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at 
> org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182)
> at 
> org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
> at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
> at 
> org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
> at 
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
> ... 4 more
> ...
> 2014-08-01 15:06:31,520 ERROR [pool-5-thread-3]: server.TThreadPoolServer 
> (TThreadPoolServer.java:run(215)) - Error occurred during processing of 
> message.
> java.lang.RuntimeException: org.apache.thrift.transport.TTransportException
> at 
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.thrift.transport.TTransportException
> at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
> at org.apache.thrift.transport.TTra

[jira] [Commented] (HIVE-11340) Create ORC based table using like clause doesn't copy compression property

2015-08-07 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661866#comment-14661866
 ] 

Yongzhi Chen commented on HIVE-11340:
-

Add to RB:
https://reviews.apache.org/r/37218/

> Create ORC based table using like clause doesn't copy compression property
> --
>
> Key: HIVE-11340
> URL: https://issues.apache.org/jira/browse/HIVE-11340
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
>Reporter: Gaurav Kohli
>Assignee: Yongzhi Chen
>Priority: Minor
> Attachments: HIVE-11340.1.patch
>
>
> I found a issue in “create table like” clause, as it is not copying the table 
> properties from ORC File format based table.
> Steps to reproduce:
> Step1 :
> create table orc_table (
> time string)
> stored as ORC tblproperties ("orc.compress"="SNAPPY");
> Step 2: 
> create table orc_table_using_like like orc_table;
> Step 3:
> show create table orc_table_using_like;  
> Result:
> createtab_stmt
> CREATE TABLE `orc_table_using_like`(
>   `time` string)
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
> STORED AS INPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
> OUTPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> LOCATION
>   'hdfs://nameservice1/user/hive/warehouse/gkohli.db/orc_table_using_like'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1437578939')
> Issue:  'orc.compress'='SNAPPY' property is missing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4897) Hive should handle AlreadyExists on retries when creating tables/partitions

2015-08-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661883#comment-14661883
 ] 

Hive QA commented on HIVE-4897:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12749104/HIVE-4897.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4859/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4859/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4859/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Tests exited with: ExecutionException: java.util.concurrent.ExecutionException: 
java.io.IOException: Error writing to 
/data/hive-ptest/working/scratch/hiveptest-TestErrorMsg.sh
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12749104 - PreCommit-HIVE-TRUNK-Build

> Hive should handle AlreadyExists on retries when creating tables/partitions
> ---
>
> Key: HIVE-4897
> URL: https://issues.apache.org/jira/browse/HIVE-4897
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Aihua Xu
> Attachments: HIVE-4897.patch, hive-snippet.log
>
>
> Creating new tables/partitions may fail with an AlreadyExistsException if 
> there is an error part way through the creation and the HMS tries again 
> without properly cleaning up or checking if this is a retry.
> While partitioning a new table via a script on distributed hive (MetaStore on 
> the same machine) there was a long timeout and then:
> {code}
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. 
> AlreadyExistsException(message:Partition already exists:Partition( ...
> {code}
> I am assuming this is due to retry. Perhaps already-exists on retry could be 
> handled better.
> A similar error occurred while creating a table through Impala, which issued 
> a single createTable call that failed with an AlreadyExistsException. See the 
> logs related to table tmp_proc_8_d2b7b0f133be455ca95615818b8a5879_7 in the 
> attached hive-snippet.log



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10975) Parquet: Bump the parquet version up to 1.8.0

2015-08-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-10975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661928#comment-14661928
 ] 

Sergio Peña commented on HIVE-10975:


Looks good [~Ferd]
+1

> Parquet: Bump the parquet version up to 1.8.0
> -
>
> Key: HIVE-10975
> URL: https://issues.apache.org/jira/browse/HIVE-10975
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
>Priority: Minor
> Attachments: HIVE-10975-parquet.patch, HIVE-10975.1-parquet.patch, 
> HIVE-10975.1.patch, HIVE-10975.2.patch, HIVE-10975.2.patch, 
> HIVE-10975.2.patch, HIVE-10975.patch
>
>
> There are lots of changes since parquet's graduation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11341) Avoid expensive resizing of ASTNode tree

2015-08-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662026#comment-14662026
 ] 

Hive QA commented on HIVE-11341:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12749107/HIVE-11341.6.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4860/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4860/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4860/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Tests exited with: ExecutionException: java.util.concurrent.ExecutionException: 
org.apache.hive.ptest.execution.ssh.SSHExecutionException: RSyncResult 
[localFile=/data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-4860/succeeded/TestJdbcWithMiniHS2,
 remoteFile=/home/hiveptest/54.204.187.100-hiveptest-2/logs/, getExitCode()=12, 
getException()=null, getUser()=hiveptest, getHost()=54.204.187.100, 
getInstance()=2]: 'Address 54.204.187.100 maps to 
ec2-54-204-187-100.compute-1.amazonaws.com, but this does not map back to the 
address - POSSIBLE BREAK-IN ATTEMPT!
receiving incremental file list
TEST-TestJdbcWithMiniHS2-TEST-org.apache.hive.jdbc.TestJdbcWithMiniHS2.xml
   0   0%0.00kB/s0:00:00
5801 100%5.53MB/s0:00:00 (xfer#1, to-check=3/5)
hive.log
   0   0%0.00kB/s0:00:00
10715136   0%   10.21MB/s0:21:45
58589184   0%   27.91MB/s0:07:55
   105381888   0%   33.49MB/s0:06:35
   154140672   1%   36.74MB/s0:05:59
   200638464   1%   45.28MB/s0:04:50
   248512512   1%   45.26MB/s0:04:49
   297664512   2%   45.82MB/s0:04:44
   343048192   2%   45.03MB/s0:04:48
   389775360   2%   45.08MB/s0:04:47
   427819008   3%   42.75MB/s0:05:02
   467042304   3%   40.38MB/s0:05:19
   502693888   3%   38.06MB/s0:05:37
   536084480   3%   34.43MB/s0:06:12
   575700992   4%   34.81MB/s0:06:07
   608796672   4%   33.35MB/s0:06:22
   647036928   4%   33.96MB/s0:06:14
   682688512   4%   34.95MB/s0:06:02
   718503936   5%   34.05MB/s0:06:11
   754253824   5%   34.69MB/s0:06:03
   789905408   5%   34.06MB/s0:06:09
   825786368   6%   34.12MB/s0:06:07
   861667328   6%   34.12MB/s0:06:06
   896794624   6%   33.98MB/s0:06:06
   933003264   6%   34.12MB/s0:06:04
   967835648   7%   33.87MB/s0:06:06
  1004371968   7%   34.03MB/s0:06:03
  1040220160   7%   34.17MB/s0:06:00
  1075970048   7%   34.08MB/s0:06:00
  556096   8%   34.26MB/s0:05:57
  1147273216   8%   34.05MB/s0:05:58
  1182629888   8%   33.96MB/s0:05:58
  1218609152   8%   34.01MB/s0:05:57
  1254195200   9%   34.01MB/s0:05:56
  1288699904   9%   33.69MB/s0:05:58
  1325531136   9%   34.05MB/s0:05:53
  1361248256   9%   34.00MB/s0:05:53
  1396703232  10%   33.97MB/s0:05:52
  1432518656  10%   34.30MB/s0:05:48
  1468235776  10%   34.03MB/s0:05:49
  1503789056  11%   33.99MB/s0:05:49
  1539473408  11%   34.04MB/s0:05:47
  1573191680  11%   33.54MB/s0:05:51
  1610874880  11%   34.01MB/s0:05:46
  1646526464  12%   34.03MB/s0:05:44
  1682145280  12%   34.02MB/s0:05:43
  1717993472  12%   34.51MB/s0:05:38
  1753743360  12%   34.05MB/s0:05:41
  1789198336  13%   34.00MB/s0:05:41
  1825046528  13%   34.04MB/s0:05:39
  1860698112  13%   33.99MB/s0:05:39
  1896513536  13%   34.01MB/s0:05:37
  1932263424  14%   34.08MB/s0:05:36
  1967915008  14%   34.04MB/s0:05:35
  2003533824  14%   34.05MB/s0:05:34
  2039250944  14%   34.02MB/s0:05:33
  2074476544  15%   33.91MB/s0:05:33
  2109833216  15%   33.84MB/s0:05:33
  2145419264  15%   33.85MB/s0:05:32
  2180513792  15%   33.05MB/s0:05:39
  2219180032  16%   33.85MB/s0:05:30
  2254700544  16%   33.86MB/s0:05:28
  2289500160  16%   33.68MB/s0:05:29
  2324332544  17%   34.27MB/s0:05:23
  2358804480  17%   33.29MB/s0:05:31
  2392948736  17%   32.98MB/s0:05:33
  2423619584  17%   31.98MB/s0:05:43
  2457436160  17%   31.75MB/s0:05:44
  2491023360  18%   31.54MB/s0:05:45
  2523987968  18%   31.25MB/s0:05:48
  2556035072  18%   31.58MB/s0:05:43
  2585526272  18%   30.54MB/s0:05:54
  2611675136  19%   28.70MB/s0:06:15
  2635464704  19%   26.52MB/s0:06:46
  2651324416  19%   22.66MB/s0:07:54
  2668363776  19%   19.69MB/s0:09:05
  2688778240  19%   18.37MB/s0:09:43
  2724659200  19%   21.26MB/s0:08:22
  2760376320  20%   26.00MB/s0:06:49
  2796027904  20%   30.44MB/s0:05:48
  2831613952  20%   34.05MB/s0:0

[jira] [Updated] (HIVE-11340) Create ORC based table using like clause doesn't copy compression property

2015-08-07 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-11340:

Description: 
I found a issue in “create table like” clause, as it is not copying the table 
properties from ORC File format based table.

Steps to reproduce:
Step1 :
{code}
create table orc_table (
time string)
stored as ORC tblproperties ("orc.compress"="SNAPPY");
{code}

Step 2:
{code} 
create table orc_table_using_like like orc_table;
{code}

Step 3:
{code}
show create table orc_table_using_like;  
{code}

Result:

{code}
createtab_stmt
CREATE TABLE `orc_table_using_like`(
  `time` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://nameservice1/user/hive/warehouse/gkohli.db/orc_table_using_like'
TBLPROPERTIES (
  'transient_lastDdlTime'='1437578939')
{code}

Issue:  'orc.compress'='SNAPPY' property is missing




  was:
I found a issue in “create table like” clause, as it is not copying the table 
properties from ORC File format based table.

Steps to reproduce:
Step1 :
create table orc_table (
time string)
stored as ORC tblproperties ("orc.compress"="SNAPPY");

Step 2: 
create table orc_table_using_like like orc_table;

Step 3:
show create table orc_table_using_like;  
Result:

createtab_stmt
CREATE TABLE `orc_table_using_like`(
  `time` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://nameservice1/user/hive/warehouse/gkohli.db/orc_table_using_like'
TBLPROPERTIES (
  'transient_lastDdlTime'='1437578939')

Issue:  'orc.compress'='SNAPPY' property is missing





> Create ORC based table using like clause doesn't copy compression property
> --
>
> Key: HIVE-11340
> URL: https://issues.apache.org/jira/browse/HIVE-11340
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
>Reporter: Gaurav Kohli
>Assignee: Yongzhi Chen
>Priority: Minor
> Attachments: HIVE-11340.1.patch
>
>
> I found a issue in “create table like” clause, as it is not copying the table 
> properties from ORC File format based table.
> Steps to reproduce:
> Step1 :
> {code}
> create table orc_table (
> time string)
> stored as ORC tblproperties ("orc.compress"="SNAPPY");
> {code}
> Step 2:
> {code} 
> create table orc_table_using_like like orc_table;
> {code}
> Step 3:
> {code}
> show create table orc_table_using_like;  
> {code}
> Result:
> {code}
> createtab_stmt
> CREATE TABLE `orc_table_using_like`(
>   `time` string)
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
> STORED AS INPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
> OUTPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> LOCATION
>   'hdfs://nameservice1/user/hive/warehouse/gkohli.db/orc_table_using_like'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1437578939')
> {code}
> Issue:  'orc.compress'='SNAPPY' property is missing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11469) InstanceCache does not have proper implementation of equals or hashcode

2015-08-07 Thread Swarnim Kulkarni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni updated HIVE-11469:

Attachment: HIVE-11469.1.patch.txt

Turned out to be simpler than what I was expecting it to be. Patch attached.

> InstanceCache does not have proper implementation of equals or hashcode
> ---
>
> Key: HIVE-11469
> URL: https://issues.apache.org/jira/browse/HIVE-11469
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Swarnim Kulkarni
>Assignee: Swarnim Kulkarni
> Attachments: HIVE-11469.1.patch.txt
>
>
> With HIVE-11288, we started using InstanceCache as a key. However it doesn't 
> seem like the class actually implements the equals or hashcode methods which 
> can potentially lead to inaccurate results. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11469) InstanceCache does not have proper implementation of equals or hashcode

2015-08-07 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662074#comment-14662074
 ] 

Swarnim Kulkarni commented on HIVE-11469:
-

Review board: https://reviews.apache.org/r/37223/

> InstanceCache does not have proper implementation of equals or hashcode
> ---
>
> Key: HIVE-11469
> URL: https://issues.apache.org/jira/browse/HIVE-11469
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Swarnim Kulkarni
>Assignee: Swarnim Kulkarni
> Attachments: HIVE-11469.1.patch.txt
>
>
> With HIVE-11288, we started using InstanceCache as a key. However it doesn't 
> seem like the class actually implements the equals or hashcode methods which 
> can potentially lead to inaccurate results. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8532) return code of "source xxx" clause is missing

2015-08-07 Thread Jonathan Kelly (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662103#comment-14662103
 ] 

Jonathan Kelly commented on HIVE-8532:
--

The fix version is currently 1.0.0, but this change doesn't actually seem to be 
in the official 1.0.0 release. I'm looking at the correct tag, right? 
release-1.0.0 (Git commit hash 697aecadc3ba62bc11f3ba0a6c8522daeec7b53f, 
git-svn-id: https://svn.apache.org/repos/asf/hive/tags/release-1.0.0@1656909) 
does not contain this change.

> return code of "source xxx" clause is missing
> -
>
> Key: HIVE-8532
> URL: https://issues.apache.org/jira/browse/HIVE-8532
> Project: Hive
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: 0.12.0, 0.13.1
>Reporter: Gordon Wang
>Assignee: vitthal (Suhas) Gogate
> Fix For: 1.0.0
>
> Attachments: HIVE-8532.patch
>
>
> When executing "source "  clause, hive client driver does not catch 
> the return code of this command.
> This behaviour causes an issue when running hive query in Oozie workflow.
> When the "source" clause is put into a Oozie workflow, Oozie can not get the 
> return code of this command. Thus, Oozie consider the "source" clause as 
> successful all the time. 
> So, when the "source" clause fails, the hive query does not abort and the 
> oozie workflow does not abort either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11356) SMB join on tez fails when one of the tables is empty

2015-08-07 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-11356:
--
Attachment: HIVE-11356.4.patch

Changed the test file name and added a few more tests.

> SMB join on tez fails when one of the tables is empty
> -
>
> Key: HIVE-11356
> URL: https://issues.apache.org/jira/browse/HIVE-11356
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-11356.1.patch, HIVE-11356.3.patch, 
> HIVE-11356.4.patch
>
>
> {code}
> :java.lang.IllegalStateException: Unexpected event. All physical sources 
> already initialized 
> at com.google.common.base.Preconditions.checkState(Preconditions.java:145) 
> at 
> org.apache.tez.mapreduce.input.MultiMRInput.handleEvents(MultiMRInput.java:142)
>  
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.handleEvent(LogicalIOProcessorRuntimeTask.java:610)
>  
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.access$1100(LogicalIOProcessorRuntimeTask.java:90)
>  
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$1.run(LogicalIOProcessorRuntimeTask.java:673)
>  
> at java.lang.Thread.run(Thread.java:745) 
> ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex 
> vertex_1437168420060_17787_1_01 [Map 4] killed/failed due to:null] 
> Vertex killed, vertexName=Reducer 5, 
> vertexId=vertex_1437168420060_17787_1_02, diagnostics=[Vertex received Kill 
> while in RUNNING state., Vertex killed as other vertex failed. failedTasks:0, 
> Vertex vertex_1437168420060_17787_1_02 [Reducer 5] killed/failed due to:null] 
> DAG failed due to vertex failure. failedVertices:1 killedVertices:1 
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask 
> HQL-FAILED 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11466) HIVE-10166 generates more data on hive.log causing Jenkins to fill all the disk.

2015-08-07 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662168#comment-14662168
 ] 

Xuefu Zhang commented on HIVE-11466:


[~thejas], in spark branch itself, it seems that some change in HIVE-9152 
itself exposed this problem. Besides the thrift version change for code 
generation, there doesn't seem to be anything that can causes this. I guess 
some investigation is needed. In the meantime, second eye helps. Maybe you can 
take a look at the patch there as well to see if you spot anything suspicious.

> HIVE-10166 generates more data on hive.log causing Jenkins to fill all the 
> disk.
> 
>
> Key: HIVE-11466
> URL: https://issues.apache.org/jira/browse/HIVE-11466
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergio Peña
>Assignee: Xuefu Zhang
> Attachments: HIVE-11466.patch
>
>
> An issue with HIVE-10166 patch is increasing the size of hive.log and  
> causing jenkins to fail because it does not have more space.
> Here's a test I run when running TestJdbcWithMiniHS2 before the patch, with 
> the patch, and after other commits.
> {noformat}
> BEFORE HIVE-10166
> 13M Aug  5 11:57 ./hive-unit/target/tmp/log/hive.log
> WITH HIVE-10166
> 2.4G Aug  5 12:07 ./hive-unit/target/tmp/log/hive.log
> CURRENT HEAD
> 3.2G Aug  5 12:36 ./hive-unit/target/tmp/log/hive.log
> {noformat}
> This is just a single test, but on Jenkins, hive.log is more than 13G of size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11490) Lazily call ASTNode::toStringTree() after tree modification

2015-08-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662171#comment-14662171
 ] 

Hive QA commented on HIVE-11490:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12749124/HIVE-11490.1.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4861/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4861/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4861/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Tests exited with: ExecutionException: java.util.concurrent.ExecutionException: 
org.apache.hive.ptest.execution.ssh.SSHExecutionException: RSyncResult 
[localFile=/data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-4861/succeeded/TestJdbcWithMiniHS2,
 remoteFile=/home/hiveptest/54.145.175.19-hiveptest-1/logs/, getExitCode()=12, 
getException()=null, getUser()=hiveptest, getHost()=54.145.175.19, 
getInstance()=1]: 'Address 54.145.175.19 maps to 
ec2-54-145-175-19.compute-1.amazonaws.com, but this does not map back to the 
address - POSSIBLE BREAK-IN ATTEMPT!
receiving incremental file list
./
TEST-TestJdbcWithMiniHS2-TEST-org.apache.hive.jdbc.TestJdbcWithMiniHS2.xml
   0   0%0.00kB/s0:00:00
5802 100%5.53MB/s0:00:00 (xfer#1, to-check=3/5)
hive.log
   0   0%0.00kB/s0:00:00
46825472   0%   44.66MB/s0:07:46
95551488   0%   45.52MB/s0:07:36
   142311424   0%   45.22MB/s0:07:38
   190054400   0%   45.31MB/s0:07:36
   238354432   1%   45.66MB/s0:07:31
   286818304   1%   45.64MB/s0:07:31
   335675392   1%   46.12MB/s0:07:25
   383090688   1%   46.06MB/s0:07:25
   422346752   1%   43.89MB/s0:07:46
   457506816   2%   40.72MB/s0:08:21
   491782144   2%   37.24MB/s0:09:07
   529203200   2%   34.84MB/s0:09:44
   560988160   2%   33.05MB/s0:10:14
   600080384   2%   33.99MB/s0:09:56
   635863040   2%   34.36MB/s0:09:49
   671514624   3%   33.92MB/s0:09:55
   707461120   3%   34.91MB/s0:09:38
   743276544   3%   34.10MB/s0:09:50
   779091968   3%   34.10MB/s0:09:49
   812384256   3%   33.43MB/s0:10:00
   851050496   3%   34.07MB/s0:09:48
   886374400   4%   33.98MB/s0:09:48
   913014784   4%   31.53MB/s0:10:33
   935329792   4%   29.07MB/s0:11:26
   979697664   4%   30.41MB/s0:10:54
  1026686976   4%   33.16MB/s0:09:59
  1064173568   4%   36.04MB/s0:09:10
  1099890688   5%   39.24MB/s0:08:24
  1121943552   5%   33.64MB/s0:09:47
  1154646016   5%   30.27MB/s0:10:52
  1192230912   5%   30.29MB/s0:10:50
  1234960384   5%   31.58MB/s0:10:22
  1274544128   5%   35.98MB/s0:09:05
  1313308672   6%   37.42MB/s0:08:43
  1348861952   6%   36.94MB/s0:08:49
  1384742912   6%   35.72MB/s0:09:06
  1420460032   6%   34.79MB/s0:09:20
  1456111616   6%   34.03MB/s0:09:31
  1491337216   6%   33.94MB/s0:09:31
  1526956032   7%   33.88MB/s0:09:32
  1562869760   7%   33.94MB/s0:09:29
  1598685184   7%   33.98MB/s0:09:28
  1629880320   7%   33.03MB/s0:09:43
  1669169152   7%   33.89MB/s0:09:27
  1705082880   7%   33.86MB/s0:09:27
  1740898304   8%   33.87MB/s0:09:25
  1776451584   8%   34.92MB/s0:09:08
  1812201472   8%   34.09MB/s0:09:20
  1841561600   8%   32.54MB/s0:09:46
  1876819968   8%   32.41MB/s0:09:47
  1913257984   8%   32.62MB/s0:09:42
  1951629312   9%   33.23MB/s0:09:30
  1986822144   9%   34.62MB/s0:09:06
  2022440960   9%   34.72MB/s0:09:04
  2058158080   9%   34.55MB/s0:09:05
  2093842432   9%   33.92MB/s0:09:14
  2129592320   9%   34.06MB/s0:09:11
  2164260864  10%   33.79MB/s0:09:15
  2200436736  10%   33.22MB/s0:09:23
  2234351616  10%   32.80MB/s0:09:29
  2274000896  10%   33.71MB/s0:09:13
  2309685248  10%   34.00MB/s0:09:07
  2345369600  10%   34.57MB/s0:08:57
  2381152256  11%   35.00MB/s0:08:49
  2416934912  11%   34.04MB/s0:09:03
  2452914176  11%   34.11MB/s0:09:01
  2474377216  11%   30.28MB/s0:10:09
  2501378048  11%   28.13MB/s0:10:55
  2526806016  11%   25.10MB/s0:12:13
  2558492672  11%   24.11MB/s0:12:42
  2600763392  12%   29.26MB/s0:10:26
  2645622784  12%   33.53MB/s0:09:05
  2688024576  12%   38.41MB/s0:07:55
  2731442176  12%   41.23MB/s0:07:21
  2765422592  12%   39.26MB/s0:07:42
  2802155520  13%   37.32MB/s0:08:05
  2839019520  13%   36.00MB/s0:08:22
  2874933248  13%   34.21MB/s0:08:48
  2910715904  13%   34.63MB/s0:08:40
  2946433024  13%   34.40MB/s0:08

[jira] [Updated] (HIVE-11438) Join a ACID table with non-ACID table fail with MR on 1.0.0

2015-08-07 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-11438:
--
Attachment: HIVE-11438.2-branch-1.0.patch

Discussed offline with [~jdere], check local fs first before go to hdfs. The 
utility method might be invoked in backend, which should use local fs.

> Join a ACID table with non-ACID table fail with MR on 1.0.0
> ---
>
> Key: HIVE-11438
> URL: https://issues.apache.org/jira/browse/HIVE-11438
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor, Transactions
>Affects Versions: 1.0.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 1.0.1
>
> Attachments: HIVE-11438.1-branch-1.0.patch, HIVE-11438.1.patch, 
> HIVE-11438.2-branch-1.0.patch
>
>
> The following script fail on MR mode:
> Preparation:
> {code}
> CREATE TABLE orc_update_table (k1 INT, f1 STRING, op_code STRING) 
> CLUSTERED BY (k1) INTO 2 BUCKETS 
> STORED AS ORC TBLPROPERTIES("transactional"="true"); 
> INSERT INTO TABLE orc_update_table VALUES (1, 'a', 'I');
> CREATE TABLE orc_table (k1 INT, f1 STRING) 
> CLUSTERED BY (k1) SORTED BY (k1) INTO 2 BUCKETS 
> STORED AS ORC; 
> INSERT OVERWRITE TABLE orc_table VALUES (1, 'x');
> {code}
> Then run the following script:
> {code}
> SET hive.execution.engine=mr; 
> SET hive.auto.convert.join=false; 
> SET hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> SELECT t1.*, t2.* FROM orc_table t1 
> JOIN orc_update_table t2 ON t1.k1=t2.k1 ORDER BY t1.k1;
> {code}
> Stack:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:265)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:272)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:509)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:624)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:616)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:585)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:580)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:580)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:571)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:429)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1606)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1367)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1179)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1006)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:996)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:247)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Job Submission failed with exception 'java.lang.NullPoi

[jira] [Commented] (HIVE-11438) Join a ACID table with non-ACID table fail with MR on 1.0.0

2015-08-07 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662217#comment-14662217
 ] 

Jason Dere commented on HIVE-11438:
---

+1

> Join a ACID table with non-ACID table fail with MR on 1.0.0
> ---
>
> Key: HIVE-11438
> URL: https://issues.apache.org/jira/browse/HIVE-11438
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor, Transactions
>Affects Versions: 1.0.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 1.0.1
>
> Attachments: HIVE-11438.1-branch-1.0.patch, HIVE-11438.1.patch, 
> HIVE-11438.2-branch-1.0.patch
>
>
> The following script fail on MR mode:
> Preparation:
> {code}
> CREATE TABLE orc_update_table (k1 INT, f1 STRING, op_code STRING) 
> CLUSTERED BY (k1) INTO 2 BUCKETS 
> STORED AS ORC TBLPROPERTIES("transactional"="true"); 
> INSERT INTO TABLE orc_update_table VALUES (1, 'a', 'I');
> CREATE TABLE orc_table (k1 INT, f1 STRING) 
> CLUSTERED BY (k1) SORTED BY (k1) INTO 2 BUCKETS 
> STORED AS ORC; 
> INSERT OVERWRITE TABLE orc_table VALUES (1, 'x');
> {code}
> Then run the following script:
> {code}
> SET hive.execution.engine=mr; 
> SET hive.auto.convert.join=false; 
> SET hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> SELECT t1.*, t2.* FROM orc_table t1 
> JOIN orc_update_table t2 ON t1.k1=t2.k1 ORDER BY t1.k1;
> {code}
> Stack:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:265)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:272)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:509)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:624)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:616)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:585)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:580)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:580)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:571)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:429)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1606)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1367)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1179)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1006)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:996)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:247)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Job Submission failed with exception 'java.lang.NullPointerException(null)'
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> {code}
> Note the 

[jira] [Updated] (HIVE-11356) SMB join on tez fails when one of the tables is empty

2015-08-07 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-11356:
--
Attachment: HIVE-11356.5.patch

> SMB join on tez fails when one of the tables is empty
> -
>
> Key: HIVE-11356
> URL: https://issues.apache.org/jira/browse/HIVE-11356
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-11356.1.patch, HIVE-11356.3.patch, 
> HIVE-11356.4.patch, HIVE-11356.5.patch
>
>
> {code}
> :java.lang.IllegalStateException: Unexpected event. All physical sources 
> already initialized 
> at com.google.common.base.Preconditions.checkState(Preconditions.java:145) 
> at 
> org.apache.tez.mapreduce.input.MultiMRInput.handleEvents(MultiMRInput.java:142)
>  
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.handleEvent(LogicalIOProcessorRuntimeTask.java:610)
>  
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.access$1100(LogicalIOProcessorRuntimeTask.java:90)
>  
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$1.run(LogicalIOProcessorRuntimeTask.java:673)
>  
> at java.lang.Thread.run(Thread.java:745) 
> ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex 
> vertex_1437168420060_17787_1_01 [Map 4] killed/failed due to:null] 
> Vertex killed, vertexName=Reducer 5, 
> vertexId=vertex_1437168420060_17787_1_02, diagnostics=[Vertex received Kill 
> while in RUNNING state., Vertex killed as other vertex failed. failedTasks:0, 
> Vertex vertex_1437168420060_17787_1_02 [Reducer 5] killed/failed due to:null] 
> DAG failed due to vertex failure. failedVertices:1 killedVertices:1 
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask 
> HQL-FAILED 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11340) Create ORC based table using like clause doesn't copy compression property

2015-08-07 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662232#comment-14662232
 ] 

Yongzhi Chen commented on HIVE-11340:
-

Change the test to use describe formatted.
It seems that in order not to be masked by the output, the property has to be 
explicitly used. Add get compress info in the OrcSerde initialize method. Also 
fix a bug in OrcConf.  

> Create ORC based table using like clause doesn't copy compression property
> --
>
> Key: HIVE-11340
> URL: https://issues.apache.org/jira/browse/HIVE-11340
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
>Reporter: Gaurav Kohli
>Assignee: Yongzhi Chen
>Priority: Minor
> Attachments: HIVE-11340.1.patch
>
>
> I found a issue in “create table like” clause, as it is not copying the table 
> properties from ORC File format based table.
> Steps to reproduce:
> Step1 :
> {code}
> create table orc_table (
> time string)
> stored as ORC tblproperties ("orc.compress"="SNAPPY");
> {code}
> Step 2:
> {code} 
> create table orc_table_using_like like orc_table;
> {code}
> Step 3:
> {code}
> show create table orc_table_using_like;  
> {code}
> Result:
> {code}
> createtab_stmt
> CREATE TABLE `orc_table_using_like`(
>   `time` string)
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
> STORED AS INPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
> OUTPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> LOCATION
>   'hdfs://nameservice1/user/hive/warehouse/gkohli.db/orc_table_using_like'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1437578939')
> {code}
> Issue:  'orc.compress'='SNAPPY' property is missing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11340) Create ORC based table using like clause doesn't copy compression property

2015-08-07 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-11340:

Attachment: HIVE-11340.2.patch

> Create ORC based table using like clause doesn't copy compression property
> --
>
> Key: HIVE-11340
> URL: https://issues.apache.org/jira/browse/HIVE-11340
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
>Reporter: Gaurav Kohli
>Assignee: Yongzhi Chen
>Priority: Minor
> Attachments: HIVE-11340.1.patch, HIVE-11340.2.patch
>
>
> I found a issue in “create table like” clause, as it is not copying the table 
> properties from ORC File format based table.
> Steps to reproduce:
> Step1 :
> {code}
> create table orc_table (
> time string)
> stored as ORC tblproperties ("orc.compress"="SNAPPY");
> {code}
> Step 2:
> {code} 
> create table orc_table_using_like like orc_table;
> {code}
> Step 3:
> {code}
> show create table orc_table_using_like;  
> {code}
> Result:
> {code}
> createtab_stmt
> CREATE TABLE `orc_table_using_like`(
>   `time` string)
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
> STORED AS INPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
> OUTPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> LOCATION
>   'hdfs://nameservice1/user/hive/warehouse/gkohli.db/orc_table_using_like'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1437578939')
> {code}
> Issue:  'orc.compress'='SNAPPY' property is missing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11387) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix reduce_deduplicate optimization

2015-08-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11387:
---
Attachment: (was: HIVE-11387.05.patch)

> CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix 
> reduce_deduplicate optimization
> --
>
> Key: HIVE-11387
> URL: https://issues.apache.org/jira/browse/HIVE-11387
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11387.01.patch, HIVE-11387.02.patch, 
> HIVE-11387.03.patch, HIVE-11387.04.patch, HIVE-11387.05.patch, 
> HIVE-11387.06.patch
>
>
> {noformat}
> The main problem is that, due to return path, now we may have 
> (RS1-GBY2)-(RS3-GBY4) when map.aggr=false, i.e., no map aggr. However, in the 
> non-return path, it will be treated as (RS1)-(GBY2-RS3-GBY4). The main 
> problem is that it does not take into account of the setting.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11387) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix reduce_deduplicate optimization

2015-08-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11387:
---
Attachment: HIVE-11387.06.patch

> CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix 
> reduce_deduplicate optimization
> --
>
> Key: HIVE-11387
> URL: https://issues.apache.org/jira/browse/HIVE-11387
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11387.01.patch, HIVE-11387.02.patch, 
> HIVE-11387.03.patch, HIVE-11387.04.patch, HIVE-11387.05.patch, 
> HIVE-11387.06.patch
>
>
> {noformat}
> The main problem is that, due to return path, now we may have 
> (RS1-GBY2)-(RS3-GBY4) when map.aggr=false, i.e., no map aggr. However, in the 
> non-return path, it will be treated as (RS1)-(GBY2-RS3-GBY4). The main 
> problem is that it does not take into account of the setting.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11387) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix reduce_deduplicate optimization

2015-08-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11387:
---
Attachment: HIVE-11387.05.patch

trigger QA run

> CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix 
> reduce_deduplicate optimization
> --
>
> Key: HIVE-11387
> URL: https://issues.apache.org/jira/browse/HIVE-11387
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11387.01.patch, HIVE-11387.02.patch, 
> HIVE-11387.03.patch, HIVE-11387.04.patch, HIVE-11387.05.patch, 
> HIVE-11387.06.patch
>
>
> {noformat}
> The main problem is that, due to return path, now we may have 
> (RS1-GBY2)-(RS3-GBY4) when map.aggr=false, i.e., no map aggr. However, in the 
> non-return path, it will be treated as (RS1)-(GBY2-RS3-GBY4). The main 
> problem is that it does not take into account of the setting.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11496) Better tests for evaluating ORC predicate pushdown

2015-08-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662236#comment-14662236
 ] 

Sergey Shelukhin commented on HIVE-11496:
-

+1

> Better tests for evaluating ORC predicate pushdown
> --
>
> Key: HIVE-11496
> URL: https://issues.apache.org/jira/browse/HIVE-11496
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-11496.1.patch
>
>
> There were many regressions recently wrt ORC predicate pushdown. We don't 
> have system tests to capture these regressions. Currently there is only junit 
> tests for testing ORC predicate pushdown feature. Since hive counters are not 
> available during qfile test execution there is no easy way to verify if ORC 
> PPD feature worked or not. This jira is add a post execution hook to print 
> hive counters (esp. number of input records) to error stream so that it will 
> appear in qfile test output. This way we can verify ORC SARG evaluation and 
> avoid future regressions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11467) WriteBuffers rounding wbSize to next power of 2 may cause OOM

2015-08-07 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-11467:
-
Attachment: HIVE-11467.03.patch

> WriteBuffers rounding wbSize to next power of 2 may cause OOM
> -
>
> Key: HIVE-11467
> URL: https://issues.apache.org/jira/browse/HIVE-11467
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-11467.01.patch, HIVE-11467.02.patch, 
> HIVE-11467.03.patch
>
>
> If wbSize passed to WriteBuffers cstr is not power of 2, it will do a 
> rounding first to the next power of 2
> {code}
>   public WriteBuffers(int wbSize, long maxSize) {
> this.wbSize = Integer.bitCount(wbSize) == 1 ? wbSize : 
> (Integer.highestOneBit(wbSize) << 1);
> this.wbSizeLog2 = 31 - Integer.numberOfLeadingZeros(this.wbSize);
> this.offsetMask = this.wbSize - 1;
> this.maxSize = maxSize;
> writePos.bufferIndex = -1;
> nextBufferToWrite();
>   }
> {code}
> That may break existing memory consumption assumption for mapjoin, and 
> potentially cause OOM.
> The solution will be to pass a power of 2 number as wbSize from upstream 
> during hashtable creation, to avoid this late expansion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11467) WriteBuffers rounding wbSize to next power of 2 may cause OOM

2015-08-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662247#comment-14662247
 ] 

Sergey Shelukhin commented on HIVE-11467:
-

nit: it's probably better to either change this in WBs, or if we now always 
pass correct value, change WBs to throw error, instead of having different 
adjustments in two places.

> WriteBuffers rounding wbSize to next power of 2 may cause OOM
> -
>
> Key: HIVE-11467
> URL: https://issues.apache.org/jira/browse/HIVE-11467
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-11467.01.patch, HIVE-11467.02.patch, 
> HIVE-11467.03.patch
>
>
> If wbSize passed to WriteBuffers cstr is not power of 2, it will do a 
> rounding first to the next power of 2
> {code}
>   public WriteBuffers(int wbSize, long maxSize) {
> this.wbSize = Integer.bitCount(wbSize) == 1 ? wbSize : 
> (Integer.highestOneBit(wbSize) << 1);
> this.wbSizeLog2 = 31 - Integer.numberOfLeadingZeros(this.wbSize);
> this.offsetMask = this.wbSize - 1;
> this.maxSize = maxSize;
> writePos.bufferIndex = -1;
> nextBufferToWrite();
>   }
> {code}
> That may break existing memory consumption assumption for mapjoin, and 
> potentially cause OOM.
> The solution will be to pass a power of 2 number as wbSize from upstream 
> during hashtable creation, to avoid this late expansion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11466) HIVE-10166 generates more data on hive.log causing Jenkins to fill all the disk.

2015-08-07 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662249#comment-14662249
 ] 

Chao Sun commented on HIVE-11466:
-

I'm still investigating this.. the strange thing is the issue happens even 
after switching back to 0.9.0.
Looks like something unrelated to thrift in HIVE-9152 caused this issue. More 
latter..

> HIVE-10166 generates more data on hive.log causing Jenkins to fill all the 
> disk.
> 
>
> Key: HIVE-11466
> URL: https://issues.apache.org/jira/browse/HIVE-11466
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergio Peña
>Assignee: Xuefu Zhang
> Attachments: HIVE-11466.patch
>
>
> An issue with HIVE-10166 patch is increasing the size of hive.log and  
> causing jenkins to fail because it does not have more space.
> Here's a test I run when running TestJdbcWithMiniHS2 before the patch, with 
> the patch, and after other commits.
> {noformat}
> BEFORE HIVE-10166
> 13M Aug  5 11:57 ./hive-unit/target/tmp/log/hive.log
> WITH HIVE-10166
> 2.4G Aug  5 12:07 ./hive-unit/target/tmp/log/hive.log
> CURRENT HEAD
> 3.2G Aug  5 12:36 ./hive-unit/target/tmp/log/hive.log
> {noformat}
> This is just a single test, but on Jenkins, hive.log is more than 13G of size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11416) CBO: Calcite Operator To Hive Operator (Calcite Return Path): Groupby Optimizer assumes the schema can match after removing RS and GBY

2015-08-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11416:
---
Attachment: HIVE-11416.05.patch

[~jcamachorodriguez], thanks for your valuable comments to improve the patch. I 
have modifies the patch accordingly after our discussion. Could you please take 
another look? Thanks.

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): Groupby 
> Optimizer assumes the schema can match after removing RS and GBY
> --
>
> Key: HIVE-11416
> URL: https://issues.apache.org/jira/browse/HIVE-11416
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11416.01.patch, HIVE-11416.02.patch, 
> HIVE-11416.03.patch, HIVE-11416.04.patch, HIVE-11416.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-11385) LLAP: clean up ORC dependencies - move encoded reader path into a cloned ReaderImpl

2015-08-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662255#comment-14662255
 ] 

Sergey Shelukhin edited comment on HIVE-11385 at 8/7/15 6:50 PM:
-

note: EncodedOrcFile, and new encoded.Reader and encoded.ReaderImpl are added 
(they are, respectively, a 1-method factory, and sub-classes of corresponding 
Reader/Impl in ORC that add one method). The rest of the files are just moved, 
so you can ignore the deletes/additions with respect to content, as well as any 
import changes; moved to either the new "orc.encoded" package (renamed from 
"orc.llap"), or the storage-api module.
I also moved DiskRange to storage-api from common, since ORC depends on it, not 
sure why it wasn't moved on master.
[~prasanth_j]


was (Author: sershe):
note: EncodedOrcFile, and new encoded.Reader and encoded.ReaderImpl are added 
(they are, respectively, a 1-method factory, and sub-classes of corresponding 
Reader/Impl in ORC that add one method. The rest of the files are just moved, 
so you can ignore the deletes/additions with respect to content, as well as any 
import changes; moved to either the new "orc.encoded" package (renamed from 
"orc.llap"), or the storage-api module.
I also moved DiskRange to storage-api from common, since ORC depends on it, not 
sure why it wasn't moved on master.

> LLAP: clean up ORC dependencies - move encoded reader path into a cloned 
> ReaderImpl
> ---
>
> Key: HIVE-11385
> URL: https://issues.apache.org/jira/browse/HIVE-11385
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11385.01.patch, HIVE-11385.patch
>
>
> Before there's storage handler module, we can clean some things up
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11385) LLAP: clean up ORC dependencies - move encoded reader path into a cloned ReaderImpl

2015-08-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662255#comment-14662255
 ] 

Sergey Shelukhin commented on HIVE-11385:
-

note: EncodedOrcFile, and new encoded.Reader and encoded.ReaderImpl are added 
(they are, respectively, a 1-method factory, and sub-classes of corresponding 
Reader/Impl in ORC that add one method. The rest of the files are just moved, 
so you can ignore the deletes/additions with respect to content, as well as any 
import changes; moved to either the new "orc.encoded" package (renamed from 
"orc.llap"), or the storage-api module.
I also moved DiskRange to storage-api from common, since ORC depends on it, not 
sure why it wasn't moved on master.

> LLAP: clean up ORC dependencies - move encoded reader path into a cloned 
> ReaderImpl
> ---
>
> Key: HIVE-11385
> URL: https://issues.apache.org/jira/browse/HIVE-11385
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11385.01.patch, HIVE-11385.patch
>
>
> Before there's storage handler module, we can clean some things up
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11480) CBO: Calcite Operator To Hive Operator (Calcite Return Path): char/varchar as input to GenericUDAF

2015-08-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11480:
---
Attachment: HIVE-11480.02.patch

the test failures all passed on my mac. resubmit the patch for another run.

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): char/varchar as 
> input to GenericUDAF 
> ---
>
> Key: HIVE-11480
> URL: https://issues.apache.org/jira/browse/HIVE-11480
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11480.01.patch, HIVE-11480.02.patch
>
>
> Some of the UDAF can not deal with char/varchar correctly when return path is 
> on, for example udaf_number_format.q.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7476) CTAS does not work properly for s3

2015-08-07 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-7476:

Component/s: (was: Query Processor)

> CTAS does not work properly for s3
> --
>
> Key: HIVE-7476
> URL: https://issues.apache.org/jira/browse/HIVE-7476
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1, 1.1.0
> Environment: Linux
>Reporter: Jian Fang
>Assignee: Szehon Ho
> Fix For: 2.0.0
>
> Attachments: HIVE-7476.1.patch, HIVE-7476.2.patch, HIVE-7476.3.patch
>
>
> When we use CTAS to create a new table in s3, the table location is not set 
> correctly. As a result, the data from the existing table cannot be inserted 
> into the new created table.
> We can use the following example to reproduce this issue.
> set hive.metastore.warehouse.dir=OUTPUT_PATH;
> drop table s3_dir_test;
> drop table s3_1;
> drop table s3_2;
> create external table s3_dir_test(strct struct)
> row format delimited
> fields terminated by '\t'
> collection items terminated by ' '
> location 'INPUT_PATH';
> create table s3_1(strct struct)
> row format delimited
> fields terminated by '\t'
> collection items terminated by ' ';
> insert overwrite table s3_1 select * from s3_dir_test;
> select * from s3_1;
> create table s3_2 as select * from s3_1;
> select * from s3_1;
> select * from s3_2;
> The data could be as follows.
> 1 abc 10.5
> 2 def 11.5
> 3 ajss 90.23232
> 4 djns 89.02002
> 5 random 2.99
> 6 data 3.002
> 7 ne 71.9084
> The root cause is that the SemanticAnalyzer class did not handle s3 location 
> properly for CTAS.
> A patch will be provided shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11355) Hive on tez: memory manager for sort buffers (input/output) and operators

2015-08-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662313#comment-14662313
 ] 

Sergey Shelukhin commented on HIVE-11355:
-

The patch needs to be rebased... I can review for now (the conflicts are small)

> Hive on tez: memory manager for sort buffers (input/output) and operators
> -
>
> Key: HIVE-11355
> URL: https://issues.apache.org/jira/browse/HIVE-11355
> Project: Hive
>  Issue Type: Improvement
>  Components: Tez
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-11355.1.patch, HIVE-11355.2.patch, 
> HIVE-11355.3.patch
>
>
> We need to better manage the sort buffer allocations to ensure better 
> performance. Also, we need to provide configurations to certain operators to 
> stay within memory limits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11355) Hive on tez: memory manager for sort buffers (input/output) and operators

2015-08-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662319#comment-14662319
 ] 

Sergey Shelukhin commented on HIVE-11355:
-

in GBO - why isn't it checking for flushes anymore if memoryNeeded is specified?
createEdgeProperty nit - memoryManagerEnabled is only used in one case block 
but it is retrieved for all of them.
dummyStoreOp.setConf(new DummyStoreDesc()); - what is this for?
{noformat}
Invalid configuration could cause OutOfMemory issues at runtime. Configuration "
+ HiveConf.ConfVars.HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD + " = 
"
+ mapJoinsTotalAvailableMemory + " is greater than configured 
container size: "
+ totalAvailableMemory + ". Check your tez/yarn settings");
{noformat}
Will this happen to many people? When the threshold is set to some arbitrary 
high value to force map join, this may happen, even though this value will not 
actually be the table size in reality. IIRC I saw it in some tests.
To be continued


> Hive on tez: memory manager for sort buffers (input/output) and operators
> -
>
> Key: HIVE-11355
> URL: https://issues.apache.org/jira/browse/HIVE-11355
> Project: Hive
>  Issue Type: Improvement
>  Components: Tez
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-11355.1.patch, HIVE-11355.2.patch, 
> HIVE-11355.3.patch
>
>
> We need to better manage the sort buffer allocations to ensure better 
> performance. Also, we need to provide configurations to certain operators to 
> stay within memory limits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11355) Hive on tez: memory manager for sort buffers (input/output) and operators

2015-08-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662329#comment-14662329
 ] 

Sergey Shelukhin commented on HIVE-11355:
-

Some comments in evaluateUnionWork? Input size is computed from a subset of 
output edges, but no input edges are taken into account
Comment for "forAll". Actually I don't understand the difference between using 
regex rule for mapjoins and the default rule that checks for mapjoins. Aren't 
they supposed to do the same thing?
{noformat}
   throw new SemanticException("Memory shortage of " + 
-(totalAvailableMemory)
+ ". Please modify the container size to be greater than "
+ HiveConf.ConfVars.HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD);
{noformat}
isn't it supposed to already be addressed by the above exception? At least, the 
message seems to indicate so.
pctx.conf.getLongVar(HiveConf.ConfVars.MAPREDMAXSPLITSIZE - can this be 0?
if (edge.getDataFlowSize() > TEN_MB) { nit - indentation.
To be continued
Actually RB would help :)


> Hive on tez: memory manager for sort buffers (input/output) and operators
> -
>
> Key: HIVE-11355
> URL: https://issues.apache.org/jira/browse/HIVE-11355
> Project: Hive
>  Issue Type: Improvement
>  Components: Tez
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-11355.1.patch, HIVE-11355.2.patch, 
> HIVE-11355.3.patch
>
>
> We need to better manage the sort buffer allocations to ensure better 
> performance. Also, we need to provide configurations to certain operators to 
> stay within memory limits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11437) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : dealing with insert into

2015-08-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662331#comment-14662331
 ] 

Hive QA commented on HIVE-11437:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12749135/HIVE-11437.04.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4862/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4862/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4862/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Tests exited with: ExecutionException: java.util.concurrent.ExecutionException: 
org.apache.hive.ptest.execution.ssh.SSHExecutionException: RSyncResult 
[localFile=/data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-4862/succeeded/TestJdbcWithMiniHS2,
 remoteFile=/home/hiveptest/54.159.224.15-hiveptest-1/logs/, getExitCode()=12, 
getException()=null, getUser()=hiveptest, getHost()=54.159.224.15, 
getInstance()=1]: 'Address 54.159.224.15 maps to 
ec2-54-159-224-15.compute-1.amazonaws.com, but this does not map back to the 
address - POSSIBLE BREAK-IN ATTEMPT!
receiving incremental file list
./
TEST-TestJdbcWithMiniHS2-TEST-org.apache.hive.jdbc.TestJdbcWithMiniHS2.xml
   0   0%0.00kB/s0:00:00
5798 100%5.53MB/s0:00:00 (xfer#1, to-check=3/5)
hive.log
   0   0%0.00kB/s0:00:00
44892160   0%   42.81MB/s0:05:21
93159424   0%   44.42MB/s0:05:08
   139984896   0%   44.50MB/s0:05:07
   185532416   1%   44.23MB/s0:05:07
   234356736   1%   45.17MB/s0:05:00
   283115520   2%   45.29MB/s0:04:58
   331874304   2%   45.75MB/s0:04:54
   379879424   2%   46.34MB/s0:04:49
   405045248   2%   40.67MB/s0:05:29
   444399616   3%   38.41MB/s0:05:47
   474251264   3%   33.91MB/s0:06:33
   508035072   3%   29.97MB/s0:07:23
   548831232   3%   33.63MB/s0:06:34
   588775424   4%   33.78MB/s0:06:31
   632750080   4%   37.08MB/s0:05:55
   668467200   4%   38.24MB/s0:05:43
   704118784   4%   37.03MB/s0:05:54
   739770368   5%   36.01MB/s0:06:03
   775651328   5%   34.07MB/s0:06:22
   811597824   5%   34.12MB/s0:06:21
   847085568   5%   34.09MB/s0:06:20
   883032064   6%   34.16MB/s0:06:18
   918716416   6%   34.12MB/s0:06:18
   954499072   6%   34.10MB/s0:06:17
   990347264   7%   34.17MB/s0:06:15
  1006862336   7%   29.27MB/s0:07:17
  1012105216   7%   22.07MB/s0:09:40
  1042153472   7%   20.70MB/s0:10:17
  1082654720   7%   21.44MB/s0:09:54
  1126825984   7%   28.11MB/s0:07:31
  1173946368   8%   37.94MB/s0:05:33
  1222606848   8%   42.33MB/s0:04:57
  1268711424   8%   44.37MB/s0:04:43
  1310195712   9%   43.73MB/s0:04:46
  1345716224   9%   40.95MB/s0:05:04
  1380384768   9%   37.62MB/s0:05:31
  1416364032  10%   34.14MB/s0:06:03
  1453555712  10%   33.13MB/s0:06:13
  1491501056  10%   33.70MB/s0:06:06
  1527250944  10%   33.93MB/s0:06:02
  1563066368  11%   34.97MB/s0:05:51
  1598881792  11%   34.65MB/s0:05:53
  1634697216  11%   34.14MB/s0:05:57
  1670381568  11%   34.14MB/s0:05:56
  1705902080  12%   34.06MB/s0:05:56
  1741946880  12%   34.11MB/s0:05:54
  1777696768  12%   34.09MB/s0:05:53
  1813381120  12%   34.09MB/s0:05:52
  1849065472  13%   34.13MB/s0:05:51
  1884749824  13%   34.06MB/s0:05:51
  1919680512  13%   33.87MB/s0:05:52
  1953300480  13%   33.38MB/s0:05:56
  1990066176  14%   33.63MB/s0:05:52
  2025717760  14%   33.61MB/s0:05:51
  2061565952  14%   33.83MB/s0:05:48
  2097152000  14%   34.30MB/s0:05:42
  2132901888  15%   34.05MB/s0:05:44
  2168586240  15%   34.06MB/s0:05:43
  2204270592  15%   34.01MB/s0:05:42
  2239725568  15%   33.98MB/s0:05:41
  2275573760  16%   34.01MB/s0:05:40
  2311389184  16%   34.04MB/s0:05:39
  2346975232  16%   34.01MB/s0:05:38
  2382692352  16%   34.09MB/s0:05:36
  2418278400  17%   34.03MB/s0:05:36
  2453929984  17%   33.98MB/s0:05:35
  2489745408  17%   34.05MB/s0:05:33
  2523070464  17%   33.48MB/s0:05:38
  2560458752  18%   33.92MB/s0:05:33
  2596306944  18%   33.96MB/s0:05:31
  2631958528  18%   33.91MB/s0:05:31
  2667151360  18%   34.36MB/s0:05:25
  2703261696  19%   34.05MB/s0:05:27
  2739044352  19%   34.03MB/s0:05:26
  2774827008  19%   34.03MB/s0:05:25
  2810511360  19%   34.15MB/s0:05:23
  2846064640  20%   34.02MB/s0:05:23
  2881748992  20%   33.98MB/s0:05:23
  2917564416  20%   34.02MB/s0:05:21
  2953117696  20%   33.97MB/s0:0

[jira] [Commented] (HIVE-11355) Hive on tez: memory manager for sort buffers (input/output) and operators

2015-08-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662338#comment-14662338
 ] 

Sergey Shelukhin commented on HIVE-11355:
-

What is the magic with TEN_MB, should it be configurable?
remainingMemory = totalAvailableMemory / 2 nit - can be assigned in one 
statement in the line above
Why is totalRequiredInputOutputMem required? Is Tez supposed to hold entire 
input and entire output in memory?
if (remainingMemory < 0) { - log?

Should any of this be in explain (extended?) output?
Would any other q tests be affected by the addition of "Dummy Store" line to 
explain?

> Hive on tez: memory manager for sort buffers (input/output) and operators
> -
>
> Key: HIVE-11355
> URL: https://issues.apache.org/jira/browse/HIVE-11355
> Project: Hive
>  Issue Type: Improvement
>  Components: Tez
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-11355.1.patch, HIVE-11355.2.patch, 
> HIVE-11355.3.patch
>
>
> We need to better manage the sort buffer allocations to ensure better 
> performance. Also, we need to provide configurations to certain operators to 
> stay within memory limits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11467) WriteBuffers rounding wbSize to next power of 2 may cause OOM

2015-08-07 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-11467:
-
Attachment: (was: HIVE-11467.03.patch)

> WriteBuffers rounding wbSize to next power of 2 may cause OOM
> -
>
> Key: HIVE-11467
> URL: https://issues.apache.org/jira/browse/HIVE-11467
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-11467.01.patch, HIVE-11467.02.patch, 
> HIVE-11467.03.patch
>
>
> If wbSize passed to WriteBuffers cstr is not power of 2, it will do a 
> rounding first to the next power of 2
> {code}
>   public WriteBuffers(int wbSize, long maxSize) {
> this.wbSize = Integer.bitCount(wbSize) == 1 ? wbSize : 
> (Integer.highestOneBit(wbSize) << 1);
> this.wbSizeLog2 = 31 - Integer.numberOfLeadingZeros(this.wbSize);
> this.offsetMask = this.wbSize - 1;
> this.maxSize = maxSize;
> writePos.bufferIndex = -1;
> nextBufferToWrite();
>   }
> {code}
> That may break existing memory consumption assumption for mapjoin, and 
> potentially cause OOM.
> The solution will be to pass a power of 2 number as wbSize from upstream 
> during hashtable creation, to avoid this late expansion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11467) WriteBuffers rounding wbSize to next power of 2 may cause OOM

2015-08-07 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-11467:
-
Attachment: HIVE-11467.03.patch

> WriteBuffers rounding wbSize to next power of 2 may cause OOM
> -
>
> Key: HIVE-11467
> URL: https://issues.apache.org/jira/browse/HIVE-11467
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-11467.01.patch, HIVE-11467.02.patch, 
> HIVE-11467.03.patch
>
>
> If wbSize passed to WriteBuffers cstr is not power of 2, it will do a 
> rounding first to the next power of 2
> {code}
>   public WriteBuffers(int wbSize, long maxSize) {
> this.wbSize = Integer.bitCount(wbSize) == 1 ? wbSize : 
> (Integer.highestOneBit(wbSize) << 1);
> this.wbSizeLog2 = 31 - Integer.numberOfLeadingZeros(this.wbSize);
> this.offsetMask = this.wbSize - 1;
> this.maxSize = maxSize;
> writePos.bufferIndex = -1;
> nextBufferToWrite();
>   }
> {code}
> That may break existing memory consumption assumption for mapjoin, and 
> potentially cause OOM.
> The solution will be to pass a power of 2 number as wbSize from upstream 
> during hashtable creation, to avoid this late expansion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11467) WriteBuffers rounding wbSize to next power of 2 may cause OOM

2015-08-07 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662400#comment-14662400
 ] 

Wei Zheng commented on HIVE-11467:
--

Makes sense. I changed the rounding logic in WriteBuffers cstr to an assertion.

> WriteBuffers rounding wbSize to next power of 2 may cause OOM
> -
>
> Key: HIVE-11467
> URL: https://issues.apache.org/jira/browse/HIVE-11467
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-11467.01.patch, HIVE-11467.02.patch, 
> HIVE-11467.03.patch
>
>
> If wbSize passed to WriteBuffers cstr is not power of 2, it will do a 
> rounding first to the next power of 2
> {code}
>   public WriteBuffers(int wbSize, long maxSize) {
> this.wbSize = Integer.bitCount(wbSize) == 1 ? wbSize : 
> (Integer.highestOneBit(wbSize) << 1);
> this.wbSizeLog2 = 31 - Integer.numberOfLeadingZeros(this.wbSize);
> this.offsetMask = this.wbSize - 1;
> this.maxSize = maxSize;
> writePos.bufferIndex = -1;
> nextBufferToWrite();
>   }
> {code}
> That may break existing memory consumption assumption for mapjoin, and 
> potentially cause OOM.
> The solution will be to pass a power of 2 number as wbSize from upstream 
> during hashtable creation, to avoid this late expansion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11467) WriteBuffers rounding wbSize to next power of 2 may cause OOM

2015-08-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662417#comment-14662417
 ] 

Sergey Shelukhin commented on HIVE-11467:
-

The latest patch again changes it from lower to higher power of two... should 
it be lower?
Otherwise looks good

> WriteBuffers rounding wbSize to next power of 2 may cause OOM
> -
>
> Key: HIVE-11467
> URL: https://issues.apache.org/jira/browse/HIVE-11467
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-11467.01.patch, HIVE-11467.02.patch, 
> HIVE-11467.03.patch
>
>
> If wbSize passed to WriteBuffers cstr is not power of 2, it will do a 
> rounding first to the next power of 2
> {code}
>   public WriteBuffers(int wbSize, long maxSize) {
> this.wbSize = Integer.bitCount(wbSize) == 1 ? wbSize : 
> (Integer.highestOneBit(wbSize) << 1);
> this.wbSizeLog2 = 31 - Integer.numberOfLeadingZeros(this.wbSize);
> this.offsetMask = this.wbSize - 1;
> this.maxSize = maxSize;
> writePos.bufferIndex = -1;
> nextBufferToWrite();
>   }
> {code}
> That may break existing memory consumption assumption for mapjoin, and 
> potentially cause OOM.
> The solution will be to pass a power of 2 number as wbSize from upstream 
> during hashtable creation, to avoid this late expansion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11385) LLAP: clean up ORC dependencies - move encoded reader path into a cloned ReaderImpl

2015-08-07 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662432#comment-14662432
 ] 

Prasanth Jayachandran commented on HIVE-11385:
--

+1

> LLAP: clean up ORC dependencies - move encoded reader path into a cloned 
> ReaderImpl
> ---
>
> Key: HIVE-11385
> URL: https://issues.apache.org/jira/browse/HIVE-11385
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11385.01.patch, HIVE-11385.patch
>
>
> Before there's storage handler module, we can clean some things up
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11467) WriteBuffers rounding wbSize to next power of 2 may cause OOM

2015-08-07 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662435#comment-14662435
 ] 

Wei Zheng commented on HIVE-11467:
--

Patch 3 is already rounding lower right? Please correct me if possible.

> WriteBuffers rounding wbSize to next power of 2 may cause OOM
> -
>
> Key: HIVE-11467
> URL: https://issues.apache.org/jira/browse/HIVE-11467
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-11467.01.patch, HIVE-11467.02.patch, 
> HIVE-11467.03.patch
>
>
> If wbSize passed to WriteBuffers cstr is not power of 2, it will do a 
> rounding first to the next power of 2
> {code}
>   public WriteBuffers(int wbSize, long maxSize) {
> this.wbSize = Integer.bitCount(wbSize) == 1 ? wbSize : 
> (Integer.highestOneBit(wbSize) << 1);
> this.wbSizeLog2 = 31 - Integer.numberOfLeadingZeros(this.wbSize);
> this.offsetMask = this.wbSize - 1;
> this.maxSize = maxSize;
> writePos.bufferIndex = -1;
> nextBufferToWrite();
>   }
> {code}
> That may break existing memory consumption assumption for mapjoin, and 
> potentially cause OOM.
> The solution will be to pass a power of 2 number as wbSize from upstream 
> during hashtable creation, to avoid this late expansion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11466) HIVE-10166 generates more data on hive.log causing Jenkins to fill all the disk.

2015-08-07 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662448#comment-14662448
 ] 

Jason Dere commented on HIVE-11466:
---

Looks like there was a bad merge of ThriftBinaryCLIService.java, and 
server.serve() is called twice. I commented out the first invocation and it 
looks like the log size isn't huge anymore.

{noformat}
// TCP Server
server = new TThreadPoolServer(sargs);
 +  server.setServerEventHandler(serverEventHandler);
 +  server.serve();
-   String msg = "Started " + ThriftBinaryCLIService.class.getSimpleName() 
+ " on port "
+   String msg = "Starting " + ThriftBinaryCLIService.class.getSimpleName() 
+ " on port "
+ portNum + " with " + minWorkerThreads + "..." + maxWorkerThreads 
+ " worker threads";
LOG.info(msg);
+   server.serve();
  } catch (Throwable t) {
{noformat}

> HIVE-10166 generates more data on hive.log causing Jenkins to fill all the 
> disk.
> 
>
> Key: HIVE-11466
> URL: https://issues.apache.org/jira/browse/HIVE-11466
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergio Peña
>Assignee: Xuefu Zhang
> Attachments: HIVE-11466.patch
>
>
> An issue with HIVE-10166 patch is increasing the size of hive.log and  
> causing jenkins to fail because it does not have more space.
> Here's a test I run when running TestJdbcWithMiniHS2 before the patch, with 
> the patch, and after other commits.
> {noformat}
> BEFORE HIVE-10166
> 13M Aug  5 11:57 ./hive-unit/target/tmp/log/hive.log
> WITH HIVE-10166
> 2.4G Aug  5 12:07 ./hive-unit/target/tmp/log/hive.log
> CURRENT HEAD
> 3.2G Aug  5 12:36 ./hive-unit/target/tmp/log/hive.log
> {noformat}
> This is just a single test, but on Jenkins, hive.log is more than 13G of size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11462) GenericUDFStruct should constant fold at compile time

2015-08-07 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-11462:
--
Assignee: Gopal V

> GenericUDFStruct should constant fold at compile time
> -
>
> Key: HIVE-11462
> URL: https://issues.apache.org/jira/browse/HIVE-11462
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-11462.1.patch, HIVE-11462.WIP.patch
>
>
> HIVE-11428 introduces a constant Struct Object, which is available for the 
> runtime operators to assume as a constant parameter.
> This operator isn't constant folded during compilation since the UDF returns 
> a complex type, which is logged as warning by the constant propogation layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11466) HIVE-10166 generates more data on hive.log causing Jenkins to fill all the disk.

2015-08-07 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662450#comment-14662450
 ] 

Jason Dere commented on HIVE-11466:
---

eh, spoke to soon .. still seem to be getting errors in my test. I'll take a 
look when the test finishes

> HIVE-10166 generates more data on hive.log causing Jenkins to fill all the 
> disk.
> 
>
> Key: HIVE-11466
> URL: https://issues.apache.org/jira/browse/HIVE-11466
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergio Peña
>Assignee: Xuefu Zhang
> Attachments: HIVE-11466.patch
>
>
> An issue with HIVE-10166 patch is increasing the size of hive.log and  
> causing jenkins to fail because it does not have more space.
> Here's a test I run when running TestJdbcWithMiniHS2 before the patch, with 
> the patch, and after other commits.
> {noformat}
> BEFORE HIVE-10166
> 13M Aug  5 11:57 ./hive-unit/target/tmp/log/hive.log
> WITH HIVE-10166
> 2.4G Aug  5 12:07 ./hive-unit/target/tmp/log/hive.log
> CURRENT HEAD
> 3.2G Aug  5 12:36 ./hive-unit/target/tmp/log/hive.log
> {noformat}
> This is just a single test, but on Jenkins, hive.log is more than 13G of size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11467) WriteBuffers rounding wbSize to next power of 2 may cause OOM

2015-08-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662454#comment-14662454
 ] 

Sergey Shelukhin commented on HIVE-11467:
-

Sorry, looked at the wrong patch or something. +1

> WriteBuffers rounding wbSize to next power of 2 may cause OOM
> -
>
> Key: HIVE-11467
> URL: https://issues.apache.org/jira/browse/HIVE-11467
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-11467.01.patch, HIVE-11467.02.patch, 
> HIVE-11467.03.patch
>
>
> If wbSize passed to WriteBuffers cstr is not power of 2, it will do a 
> rounding first to the next power of 2
> {code}
>   public WriteBuffers(int wbSize, long maxSize) {
> this.wbSize = Integer.bitCount(wbSize) == 1 ? wbSize : 
> (Integer.highestOneBit(wbSize) << 1);
> this.wbSizeLog2 = 31 - Integer.numberOfLeadingZeros(this.wbSize);
> this.offsetMask = this.wbSize - 1;
> this.maxSize = maxSize;
> writePos.bufferIndex = -1;
> nextBufferToWrite();
>   }
> {code}
> That may break existing memory consumption assumption for mapjoin, and 
> potentially cause OOM.
> The solution will be to pass a power of 2 number as wbSize from upstream 
> during hashtable creation, to avoid this late expansion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11498) HIVE Authorization v2 should not check permission for dummy entity

2015-08-07 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662456#comment-14662456
 ] 

Thejas M Nair commented on HIVE-11498:
--

+1 
Thanks for the patch [~dapengsun]
Thanks for pointing me to this change [~dongc]

> HIVE Authorization v2 should not check permission for dummy entity
> --
>
> Key: HIVE-11498
> URL: https://issues.apache.org/jira/browse/HIVE-11498
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.2.0, 1.3.0, 2.0.0
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11498.001.patch, HIVE-11498.002.patch, 
> HIVE-11498.003.patch
>
>
> The queries like {{SELECT 1+1;}}, The target table and database will set to 
> {{_dummy_database}} {{_dummy_table}}, authorization should skip these kinds 
> of databases or tables.
> For authz v1. it has skip them.
> eg1. [Source code at 
> github|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L600]
> {noformat}
> for (WriteEntity write : outputs) {
> if (write.isDummy() || write.isPathType()) {
>   continue;
> }
> {noformat}
> eg2. [Source code at 
> github|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L633]
> {noformat}
> for (ReadEntity read : inputs) {
> if (read.isDummy() || read.isPathType()) {
>   continue;
> }
>...
> }
> {noformat}
> ...
> This patch will fix authz v2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11499) Using embedded metastore with HiveServer2 leaks classloaders when used with UDFs

2015-08-07 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-11499:

Attachment: HS2-NucleusCache-Leak.tiff

> Using embedded metastore with HiveServer2 leaks classloaders when used with 
> UDFs
> 
>
> Key: HIVE-11499
> URL: https://issues.apache.org/jira/browse/HIVE-11499
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.1.1, 1.2.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HS2-NucleusCache-Leak.tiff
>
>
> When UDFs are used, we create a new classloader to add the UDF jar. Similar 
> to what hadoop's reflection utils does(HIVE-11408), datanucleus caches the 
> classloaders 
> (https://github.com/datanucleus/datanucleus-core/blob/3.2/src/java/org/datanucleus/NucleusContext.java#L161).
>  JDOPersistanceManager factory (1 per JVM) holds on to a NucleusContext 
> reference 
> (https://github.com/datanucleus/datanucleus-api-jdo/blob/3.2/src/java/org/datanucleus/api/jdo/JDOPersistenceManagerFactory.java#L115).
>  Until we call  NucleusContext#close, the classloader cache is not cleared. 
> In case of UDFs this can lead as shows in the attached screenshot,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11499) Using embedded metastore with HiveServer2 leaks classloaders when used with UDFs

2015-08-07 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-11499:

Description: When UDFs are used, we create a new classloader to add the UDF 
jar. Similar to what hadoop's reflection utils does(HIVE-11408), datanucleus 
caches the classloaders 
(https://github.com/datanucleus/datanucleus-core/blob/3.2/src/java/org/datanucleus/NucleusContext.java#L161).
 JDOPersistanceManager factory (1 per JVM) holds on to a NucleusContext 
reference 
(https://github.com/datanucleus/datanucleus-api-jdo/blob/3.2/src/java/org/datanucleus/api/jdo/JDOPersistenceManagerFactory.java#L115).
 Until we call  NucleusContext#close, the classloader cache is not cleared. In 
case of UDFs this can lead as shows in the attached screenshot, where 
NucleusContext holds on to several URLClassloader objects.  (was: When UDFs are 
used, we create a new classloader to add the UDF jar. Similar to what hadoop's 
reflection utils does(HIVE-11408), datanucleus caches the classloaders 
(https://github.com/datanucleus/datanucleus-core/blob/3.2/src/java/org/datanucleus/NucleusContext.java#L161).
 JDOPersistanceManager factory (1 per JVM) holds on to a NucleusContext 
reference 
(https://github.com/datanucleus/datanucleus-api-jdo/blob/3.2/src/java/org/datanucleus/api/jdo/JDOPersistenceManagerFactory.java#L115).
 Until we call  NucleusContext#close, the classloader cache is not cleared. In 
case of UDFs this can lead as shows in the attached screenshot,)

> Using embedded metastore with HiveServer2 leaks classloaders when used with 
> UDFs
> 
>
> Key: HIVE-11499
> URL: https://issues.apache.org/jira/browse/HIVE-11499
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.1.1, 1.2.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HS2-NucleusCache-Leak.tiff
>
>
> When UDFs are used, we create a new classloader to add the UDF jar. Similar 
> to what hadoop's reflection utils does(HIVE-11408), datanucleus caches the 
> classloaders 
> (https://github.com/datanucleus/datanucleus-core/blob/3.2/src/java/org/datanucleus/NucleusContext.java#L161).
>  JDOPersistanceManager factory (1 per JVM) holds on to a NucleusContext 
> reference 
> (https://github.com/datanucleus/datanucleus-api-jdo/blob/3.2/src/java/org/datanucleus/api/jdo/JDOPersistenceManagerFactory.java#L115).
>  Until we call  NucleusContext#close, the classloader cache is not cleared. 
> In case of UDFs this can lead as shows in the attached screenshot, where 
> NucleusContext holds on to several URLClassloader objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11499) Datanucleus leaks classloaders when used using embedded metastore with HiveServer2 with UDFs

2015-08-07 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-11499:

Summary: Datanucleus leaks classloaders when used using embedded metastore 
with HiveServer2 with UDFs  (was: Using embedded metastore with HiveServer2 
leaks classloaders when used with UDFs)

> Datanucleus leaks classloaders when used using embedded metastore with 
> HiveServer2 with UDFs
> 
>
> Key: HIVE-11499
> URL: https://issues.apache.org/jira/browse/HIVE-11499
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.1.1, 1.2.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HS2-NucleusCache-Leak.tiff
>
>
> When UDFs are used, we create a new classloader to add the UDF jar. Similar 
> to what hadoop's reflection utils does(HIVE-11408), datanucleus caches the 
> classloaders 
> (https://github.com/datanucleus/datanucleus-core/blob/3.2/src/java/org/datanucleus/NucleusContext.java#L161).
>  JDOPersistanceManager factory (1 per JVM) holds on to a NucleusContext 
> reference 
> (https://github.com/datanucleus/datanucleus-api-jdo/blob/3.2/src/java/org/datanucleus/api/jdo/JDOPersistenceManagerFactory.java#L115).
>  Until we call  NucleusContext#close, the classloader cache is not cleared. 
> In case of UDFs this can lead as shows in the attached screenshot, where 
> NucleusContext holds on to several URLClassloader objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11499) Datanucleus leaks classloaders when used using embedded metastore with HiveServer2 with UDFs

2015-08-07 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-11499:

Description: When UDFs are used, we create a new classloader to add the UDF 
jar. Similar to what hadoop's reflection utils does(HIVE-11408), datanucleus 
caches the classloaders 
(https://github.com/datanucleus/datanucleus-core/blob/3.2/src/java/org/datanucleus/NucleusContext.java#L161).
 JDOPersistanceManager factory (1 per JVM) holds on to a NucleusContext 
reference 
(https://github.com/datanucleus/datanucleus-api-jdo/blob/3.2/src/java/org/datanucleus/api/jdo/JDOPersistenceManagerFactory.java#L115).
 Until we call  NucleusContext#close, the classloader cache is not cleared. In 
case of UDFs this can lead to permgen leak, as shown in the attached 
screenshot, where NucleusContext holds on to several URLClassloader objects.  
(was: When UDFs are used, we create a new classloader to add the UDF jar. 
Similar to what hadoop's reflection utils does(HIVE-11408), datanucleus caches 
the classloaders 
(https://github.com/datanucleus/datanucleus-core/blob/3.2/src/java/org/datanucleus/NucleusContext.java#L161).
 JDOPersistanceManager factory (1 per JVM) holds on to a NucleusContext 
reference 
(https://github.com/datanucleus/datanucleus-api-jdo/blob/3.2/src/java/org/datanucleus/api/jdo/JDOPersistenceManagerFactory.java#L115).
 Until we call  NucleusContext#close, the classloader cache is not cleared. In 
case of UDFs this can lead as shows in the attached screenshot, where 
NucleusContext holds on to several URLClassloader objects.)

> Datanucleus leaks classloaders when used using embedded metastore with 
> HiveServer2 with UDFs
> 
>
> Key: HIVE-11499
> URL: https://issues.apache.org/jira/browse/HIVE-11499
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.1.1, 1.2.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HS2-NucleusCache-Leak.tiff
>
>
> When UDFs are used, we create a new classloader to add the UDF jar. Similar 
> to what hadoop's reflection utils does(HIVE-11408), datanucleus caches the 
> classloaders 
> (https://github.com/datanucleus/datanucleus-core/blob/3.2/src/java/org/datanucleus/NucleusContext.java#L161).
>  JDOPersistanceManager factory (1 per JVM) holds on to a NucleusContext 
> reference 
> (https://github.com/datanucleus/datanucleus-api-jdo/blob/3.2/src/java/org/datanucleus/api/jdo/JDOPersistenceManagerFactory.java#L115).
>  Until we call  NucleusContext#close, the classloader cache is not cleared. 
> In case of UDFs this can lead to permgen leak, as shown in the attached 
> screenshot, where NucleusContext holds on to several URLClassloader objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11499) Datanucleus leaks classloaders when used using embedded metastore with HiveServer2 with UDFs

2015-08-07 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662478#comment-14662478
 ] 

Vaibhav Gumashta commented on HIVE-11499:
-

cc [~thejas]

> Datanucleus leaks classloaders when used using embedded metastore with 
> HiveServer2 with UDFs
> 
>
> Key: HIVE-11499
> URL: https://issues.apache.org/jira/browse/HIVE-11499
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.1.1, 1.2.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HS2-NucleusCache-Leak.tiff
>
>
> When UDFs are used, we create a new classloader to add the UDF jar. Similar 
> to what hadoop's reflection utils does(HIVE-11408), datanucleus caches the 
> classloaders 
> (https://github.com/datanucleus/datanucleus-core/blob/3.2/src/java/org/datanucleus/NucleusContext.java#L161).
>  JDOPersistanceManager factory (1 per JVM) holds on to a NucleusContext 
> reference 
> (https://github.com/datanucleus/datanucleus-api-jdo/blob/3.2/src/java/org/datanucleus/api/jdo/JDOPersistenceManagerFactory.java#L115).
>  Until we call  NucleusContext#close, the classloader cache is not cleared. 
> In case of UDFs this can lead to permgen leak, as shown in the attached 
> screenshot, where NucleusContext holds on to several URLClassloader objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-11472) ORC StringDirectTreeReader is thrashing the GC due to byte[] allocation per row

2015-08-07 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned HIVE-11472:
--

Assignee: Gopal V

> ORC StringDirectTreeReader is thrashing the GC due to byte[] allocation per 
> row
> ---
>
> Key: HIVE-11472
> URL: https://issues.apache.org/jira/browse/HIVE-11472
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
>  Labels: Performance
> Fix For: 1.3.0, 2.0.0
>
>
> For every row x column
> {code}
> int len = (int) lengths.next();
> int offset = 0;
> byte[] bytes = new byte[len];
> while (len > 0) {
>   int written = stream.read(bytes, offset, len);
>   if (written < 0) {
> throw new EOFException("Can't finish byte read from " + stream);
>   }
> {code}
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/TreeReaderFactory.java#L1552
> This is not a big issue until it misses the GC TLAB.
> From hadoop-2.6.x (HADOOP-10855) you can read into a Text directly. 
> Possibly can create a different TreeReader from the factory for 2.6.x & use a 
> DataInputStream per stream and prevent an allocation in the inner loop.
> {code}
> int len = (int) lengths.next();
> result.readWithKnownLength(datastream, len);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11356) SMB join on tez fails when one of the tables is empty

2015-08-07 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662493#comment-14662493
 ] 

Vikram Dixit K commented on HIVE-11356:
---

This test will fail without a release of tez with TEZ-2636.

> SMB join on tez fails when one of the tables is empty
> -
>
> Key: HIVE-11356
> URL: https://issues.apache.org/jira/browse/HIVE-11356
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-11356.1.patch, HIVE-11356.3.patch, 
> HIVE-11356.4.patch, HIVE-11356.5.patch
>
>
> {code}
> :java.lang.IllegalStateException: Unexpected event. All physical sources 
> already initialized 
> at com.google.common.base.Preconditions.checkState(Preconditions.java:145) 
> at 
> org.apache.tez.mapreduce.input.MultiMRInput.handleEvents(MultiMRInput.java:142)
>  
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.handleEvent(LogicalIOProcessorRuntimeTask.java:610)
>  
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.access$1100(LogicalIOProcessorRuntimeTask.java:90)
>  
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$1.run(LogicalIOProcessorRuntimeTask.java:673)
>  
> at java.lang.Thread.run(Thread.java:745) 
> ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex 
> vertex_1437168420060_17787_1_01 [Map 4] killed/failed due to:null] 
> Vertex killed, vertexName=Reducer 5, 
> vertexId=vertex_1437168420060_17787_1_02, diagnostics=[Vertex received Kill 
> while in RUNNING state., Vertex killed as other vertex failed. failedTasks:0, 
> Vertex vertex_1437168420060_17787_1_02 [Reducer 5] killed/failed due to:null] 
> DAG failed due to vertex failure. failedVertices:1 killedVertices:1 
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask 
> HQL-FAILED 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11477) CBO inserts a UDF cast for integer type promotion (only for negative numbers)

2015-08-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662497#comment-14662497
 ] 

Hive QA commented on HIVE-11477:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12749150/HIVE-11477.01.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4863/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4863/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4863/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Tests exited with: ExecutionException: java.util.concurrent.ExecutionException: 
org.apache.hive.ptest.execution.ssh.SSHExecutionException: RSyncResult 
[localFile=/data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-4863/succeeded/TestJdbcWithMiniHS2,
 remoteFile=/home/hiveptest/54.159.210.106-hiveptest-1/logs/, getExitCode()=12, 
getException()=null, getUser()=hiveptest, getHost()=54.159.210.106, 
getInstance()=1]: 'Address 54.159.210.106 maps to 
ec2-54-159-210-106.compute-1.amazonaws.com, but this does not map back to the 
address - POSSIBLE BREAK-IN ATTEMPT!
receiving incremental file list
./
TEST-TestJdbcWithMiniHS2-TEST-org.apache.hive.jdbc.TestJdbcWithMiniHS2.xml
   0   0%0.00kB/s0:00:00
5799 100%5.53MB/s0:00:00 (xfer#1, to-check=3/5)
hive.log
   0   0%0.00kB/s0:00:00
47153152   0%   44.97MB/s0:04:20
94437376   0%   45.05MB/s0:04:19
   141328384   1%   44.93MB/s0:04:18
   186187776   1%   44.38MB/s0:04:21
   232423424   1%   44.14MB/s0:04:21
   280821760   2%   44.39MB/s0:04:18
   329154560   2%   44.77MB/s0:04:15
   377389056   3%   45.57MB/s0:04:10
   403341312   3%   40.75MB/s0:04:39
   432308224   3%   36.11MB/s0:05:14
   470515712   3%   33.69MB/s0:05:35
   513015808   4%   32.29MB/s0:05:48
   543686656   4%   33.30MB/s0:05:37
   567017472   4%   31.56MB/s0:05:55
   595591168   4%   29.04MB/s0:06:25
   623378432   5%   25.62MB/s0:07:15
   654311424   5%   25.65MB/s0:07:13
   689963008   5%   28.89MB/s0:06:24
   730955776   6%   32.10MB/s0:05:44
   778665984   6%   36.89MB/s0:04:58
   826933248   6%   41.18MB/s0:04:26
   874971136   7%   44.13MB/s0:04:07
   913145856   7%   43.46MB/s0:04:10
   944472064   7%   39.53MB/s0:04:34
   981106688   8%   36.76MB/s0:04:54
   986677248   8%   26.41MB/s0:06:49
   995328000   8%   19.43MB/s0:09:15
  1001390080   8%   13.29MB/s0:13:31
  1007943680   8%6.22MB/s0:28:53
  1017774080   8%7.27MB/s0:24:42
  1061748736   8%   15.52MB/s0:11:31
  1103527936   9%   24.18MB/s0:07:22
  1149272064   9%   33.70MB/s0:05:15
  1196425216   9%   42.60MB/s0:04:08
  1240793088  10%   42.70MB/s0:04:07
  1253048320  10%   35.19MB/s0:04:59
  1267466240  10%   27.70MB/s0:06:20
  1274281984  10%   18.19MB/s0:09:38
  1283719168  10%9.83MB/s0:17:49
  1311768576  10%   13.62MB/s0:12:50
  1360330752  11%   21.63MB/s0:08:02
  1408303104  11%   31.29MB/s0:05:32
rsync: write failed on 
"/data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-4863/succeeded/TestJdbcWithMiniHS2/hive.log":
 No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(301) [receiver=3.0.6]
rsync: connection unexpectedly closed (198 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(600) 
[generator=3.0.6]
Address 54.159.210.106 maps to ec2-54-159-210-106.compute-1.amazonaws.com, but 
this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT!
receiving incremental file list
./
hive.log
   0   0%0.00kB/s0:00:00
rsync: write failed on 
"/data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-4863/succeeded/TestJdbcWithMiniHS2/hive.log":
 No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(301) [receiver=3.0.6]
rsync: connection unexpectedly closed (198 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(600) 
[generator=3.0.6]
Address 54.159.210.106 maps to ec2-54-159-210-106.compute-1.amazonaws.com, but 
this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT!
receiving incremental file list
./
hive.log
   0   0%0.00kB/s0:00:00
rsync: write failed on 
"/data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-4863/succeeded/TestJdbcWithMiniHS2/hive.log":
 No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(301) [receiver=3.0.6]
rsync: connection unexpectedly closed (198 bytes received so far) [ge

[jira] [Commented] (HIVE-11494) Some positive constant double predicates gets rounded off while negative constants are not

2015-08-07 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662496#comment-14662496
 ] 

Pengcheng Xiong commented on HIVE-11494:


after the patches in  HIVE-11477 and HIVE-11493, in both pos and neg cases, 
they will not get rounded off.

> Some positive constant double predicates gets rounded off while negative 
> constants are not
> --
>
> Key: HIVE-11494
> URL: https://issues.apache.org/jira/browse/HIVE-11494
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Pengcheng Xiong
>Priority: Critical
>
> Check the predicates in filter expression for following queries. It looks 
> closely related to HIVE-11477 and HIVE-11493
> {code:title=explain select * from orc_ppd where f = -0.0799821186066;}
> OK
> Stage-0
>Fetch Operator
>   limit:-1
>   Select Operator [SEL_2]
>  
> outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13"]
>  Filter Operator [FIL_4]
> predicate:(f = -0.0799821186066) (type: boolean)
> TableScan [TS_0]
>alias:orc_ppd
> {code}
> {code:title=explain select * from orc_ppd where f = 0.0799821186066;}
> OK
> Stage-0
>Fetch Operator
>   limit:-1
>   Select Operator [SEL_2]
>  
> outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13"]
>  Filter Operator [FIL_4]
> predicate:(f = 0.08) (type: boolean)
> TableScan [TS_0]
>alias:orc_ppd
> {code}
> Negative strings constants gets rounded off.
> {code:title=explain select * from orc_ppd where f = "-0.0799821186066";}
> OK
> Stage-0
>Fetch Operator
>   limit:-1
>   Select Operator [SEL_2]
>  
> outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13"]
>  Filter Operator [FIL_4]
> predicate:(f = -0.08) (type: boolean)
> TableScan [TS_0]
>alias:orc_ppd
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11466) HIVE-10166 generates more data on hive.log causing Jenkins to fill all the disk.

2015-08-07 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662537#comment-14662537
 ] 

Thejas M Nair commented on HIVE-11466:
--

[~csun] What do you think of reverting the HIVE-9152 change temporarily while 
the issue with the change is investigated, so that we can get the precommit 
tests going again ?


> HIVE-10166 generates more data on hive.log causing Jenkins to fill all the 
> disk.
> 
>
> Key: HIVE-11466
> URL: https://issues.apache.org/jira/browse/HIVE-11466
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergio Peña
>Assignee: Xuefu Zhang
> Attachments: HIVE-11466.patch
>
>
> An issue with HIVE-10166 patch is increasing the size of hive.log and  
> causing jenkins to fail because it does not have more space.
> Here's a test I run when running TestJdbcWithMiniHS2 before the patch, with 
> the patch, and after other commits.
> {noformat}
> BEFORE HIVE-10166
> 13M Aug  5 11:57 ./hive-unit/target/tmp/log/hive.log
> WITH HIVE-10166
> 2.4G Aug  5 12:07 ./hive-unit/target/tmp/log/hive.log
> CURRENT HEAD
> 3.2G Aug  5 12:36 ./hive-unit/target/tmp/log/hive.log
> {noformat}
> This is just a single test, but on Jenkins, hive.log is more than 13G of size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11448) Support vectorization of Multi-OR and Multi-AND

2015-08-07 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662547#comment-14662547
 ] 

Gopal V commented on HIVE-11448:


Tested against latest from HIVE-11398, LGTM - +1.

> Support vectorization of Multi-OR and Multi-AND
> ---
>
> Key: HIVE-11448
> URL: https://issues.apache.org/jira/browse/HIVE-11448
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-11448.01.patch, HIVE-11448.02.patch, 
> HIVE-11448.03.patch
>
>
> Support more than 2 children for OR and AND when all children are expressions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11436) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : dealing with empty char

2015-08-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662632#comment-14662632
 ] 

Hive QA commented on HIVE-11436:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12749153/HIVE-11436.04.patch

{color:green}SUCCESS:{color} +1 9342 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4864/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4864/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4864/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12749153 - PreCommit-HIVE-TRUNK-Build

> CBO: Calcite Operator To Hive Operator (Calcite Return Path) : dealing with 
> empty char
> --
>
> Key: HIVE-11436
> URL: https://issues.apache.org/jira/browse/HIVE-11436
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11436.01.patch, HIVE-11436.02.patch, 
> HIVE-11436.03.patch, HIVE-11436.04.patch
>
>
> BaseCharUtils checks whether the length of a char is in between [1,255]. This 
> causes return path to throw error when the the length of a char is 0. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11466) HIVE-10166 generates more data on hive.log causing Jenkins to fill all the disk.

2015-08-07 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662666#comment-14662666
 ] 

Chao Sun commented on HIVE-11466:
-

Yes, I'm OK with it. 

> HIVE-10166 generates more data on hive.log causing Jenkins to fill all the 
> disk.
> 
>
> Key: HIVE-11466
> URL: https://issues.apache.org/jira/browse/HIVE-11466
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergio Peña
>Assignee: Xuefu Zhang
> Attachments: HIVE-11466.patch
>
>
> An issue with HIVE-10166 patch is increasing the size of hive.log and  
> causing jenkins to fail because it does not have more space.
> Here's a test I run when running TestJdbcWithMiniHS2 before the patch, with 
> the patch, and after other commits.
> {noformat}
> BEFORE HIVE-10166
> 13M Aug  5 11:57 ./hive-unit/target/tmp/log/hive.log
> WITH HIVE-10166
> 2.4G Aug  5 12:07 ./hive-unit/target/tmp/log/hive.log
> CURRENT HEAD
> 3.2G Aug  5 12:36 ./hive-unit/target/tmp/log/hive.log
> {noformat}
> This is just a single test, but on Jenkins, hive.log is more than 13G of size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11449) "Capacity must be a power of two" error when HybridHashTableContainer memory threshold is too low

2015-08-07 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-11449:
--
Summary: "Capacity must be a power of two" error when 
HybridHashTableContainer memory threshold is too low  (was: 
HybridHashTableContainer should throw exception if not enough memory to create 
the hash tables)

> "Capacity must be a power of two" error when HybridHashTableContainer memory 
> threshold is too low
> -
>
> Key: HIVE-11449
> URL: https://issues.apache.org/jira/browse/HIVE-11449
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-11449.1.patch, HIVE-11449.2.patch
>
>
> Currently it only logs a warning message:
> {code}
>   public static int calcNumPartitions(long memoryThreshold, long dataSize, 
> int minNumParts,
>   int minWbSize, HybridHashTableConf nwayConf) throws IOException {
> int numPartitions = minNumParts;
> if (memoryThreshold < minNumParts * minWbSize) {
>   LOG.warn("Available memory is not enough to create a 
> HybridHashTableContainer!");
> }
> {code}
> Because we only log a warning, processing continues and hits a 
> hard-to-diagnose error (log below also includes extra logging I added to help 
> track this down). We should probably just fail the query a useful logging 
> message instead.
> {noformat}
> 2015-07-30 18:49:29,696 [pool-1269-thread-8()] WARN 
> org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer: 
> Available memory is not enough to create HybridHashTableContainers 
> consistently!
> 2015-07-30 18:49:29,696 [pool-1269-thread-8()] ERROR 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap: *** 
> initialCapacity 1: 10
> 2015-07-30 18:49:29,696 [pool-1269-thread-8()] ERROR 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap: *** 
> initialCapacity 2: 131072
> 2015-07-30 18:49:29,696 [pool-1269-thread-8()] ERROR 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap: *** 
> maxCapacity: 0
> 2015-07-30 18:49:29,696 [pool-1269-thread-8()] ERROR 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap: *** 
> initialCapacity 3: 0
> 2015-07-30 18:49:29,699 
> [TezTaskRunner_attempt_1437197396589_0685_1_49_00_2(attempt_1437197396589_0685_1_49_00_2)]
>  ERROR org.apache.hadoop.hive.ql.exec.tez.TezProcessor: 
> java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:258)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:168)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:157)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:349)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:71)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:60)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Async 
> initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.completeInitialization(Operator.java:419)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:389)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:514)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:467)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:379)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:243)
>   ... 15 more
> Caused by: java.util.concurrent.ExecutionException: java.lang.AssertionError: 
> Capacity must be a power of two
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.con

[jira] [Commented] (HIVE-11466) HIVE-10166 generates more data on hive.log causing Jenkins to fill all the disk.

2015-08-07 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662655#comment-14662655
 ] 

Xuefu Zhang commented on HIVE-11466:


I think this could be it. When I debugged it, it seems that the errors started 
showing up after the service is stopped. It seemed that the "service" didn't 
stop completely, but now I realized that there might be two services and only 
one was stopped.

> HIVE-10166 generates more data on hive.log causing Jenkins to fill all the 
> disk.
> 
>
> Key: HIVE-11466
> URL: https://issues.apache.org/jira/browse/HIVE-11466
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergio Peña
>Assignee: Xuefu Zhang
> Attachments: HIVE-11466.patch
>
>
> An issue with HIVE-10166 patch is increasing the size of hive.log and  
> causing jenkins to fail because it does not have more space.
> Here's a test I run when running TestJdbcWithMiniHS2 before the patch, with 
> the patch, and after other commits.
> {noformat}
> BEFORE HIVE-10166
> 13M Aug  5 11:57 ./hive-unit/target/tmp/log/hive.log
> WITH HIVE-10166
> 2.4G Aug  5 12:07 ./hive-unit/target/tmp/log/hive.log
> CURRENT HEAD
> 3.2G Aug  5 12:36 ./hive-unit/target/tmp/log/hive.log
> {noformat}
> This is just a single test, but on Jenkins, hive.log is more than 13G of size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11453) Create PostExecutionHook for ORC file dump

2015-08-07 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-11453:
-
Attachment: HIVE-11453.1.patch

> Create PostExecutionHook for ORC file dump
> --
>
> Key: HIVE-11453
> URL: https://issues.apache.org/jira/browse/HIVE-11453
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-11453.1.patch
>
>
> To catch regressions in ORC table properties similar to HIVE-11452 it will be 
> good to print out ORC metadata as part of qfile test. This can be done using 
> post execution hooks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11293) HiveConnection.setAutoCommit(true) throws exception

2015-08-07 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michał Węgrzyn updated HIVE-11293:
--
Attachment: HIVE-11293.patch

> HiveConnection.setAutoCommit(true) throws exception
> ---
>
> Key: HIVE-11293
> URL: https://issues.apache.org/jira/browse/HIVE-11293
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Reporter: Andriy Shumylo
>Assignee: Michał Węgrzyn
>Priority: Minor
> Attachments: HIVE-11293.patch
>
>
> Effectively autoCommit is always true for HiveConnection, however 
> setAutoCommit(true) throws exception, causing problems in existing JDBC code.
> Should be 
> {code}
>   @Override
>   public void setAutoCommit(boolean autoCommit) throws SQLException {
> if (!autoCommit) {
>   throw new SQLException("disabling autocommit is not supported");
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11453) Create PostExecutionHook for ORC file dump

2015-08-07 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-11453:
-
Affects Version/s: 1.3.0

> Create PostExecutionHook for ORC file dump
> --
>
> Key: HIVE-11453
> URL: https://issues.apache.org/jira/browse/HIVE-11453
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-11453.1.patch
>
>
> To catch regressions in ORC table properties similar to HIVE-11452 it will be 
> good to print out ORC metadata as part of qfile test. This can be done using 
> post execution hooks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11466) HIVE-10166 generates more data on hive.log causing Jenkins to fill all the disk.

2015-08-07 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662684#comment-14662684
 ] 

Xuefu Zhang commented on HIVE-11466:


My test shows that by removing one of the line "server.serve()", the problem 
goes away. Thanks to Jason for eying that out. How is your test coming out?

> HIVE-10166 generates more data on hive.log causing Jenkins to fill all the 
> disk.
> 
>
> Key: HIVE-11466
> URL: https://issues.apache.org/jira/browse/HIVE-11466
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergio Peña
>Assignee: Xuefu Zhang
> Attachments: HIVE-11466.patch
>
>
> An issue with HIVE-10166 patch is increasing the size of hive.log and  
> causing jenkins to fail because it does not have more space.
> Here's a test I run when running TestJdbcWithMiniHS2 before the patch, with 
> the patch, and after other commits.
> {noformat}
> BEFORE HIVE-10166
> 13M Aug  5 11:57 ./hive-unit/target/tmp/log/hive.log
> WITH HIVE-10166
> 2.4G Aug  5 12:07 ./hive-unit/target/tmp/log/hive.log
> CURRENT HEAD
> 3.2G Aug  5 12:36 ./hive-unit/target/tmp/log/hive.log
> {noformat}
> This is just a single test, but on Jenkins, hive.log is more than 13G of size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11499) Datanucleus leaks classloaders when used using embedded metastore with HiveServer2 with UDFs

2015-08-07 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-11499:

Attachment: HIVE-11499.1.patch

> Datanucleus leaks classloaders when used using embedded metastore with 
> HiveServer2 with UDFs
> 
>
> Key: HIVE-11499
> URL: https://issues.apache.org/jira/browse/HIVE-11499
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.1.1, 1.2.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-11499.1.patch, HS2-NucleusCache-Leak.tiff
>
>
> When UDFs are used, we create a new classloader to add the UDF jar. Similar 
> to what hadoop's reflection utils does(HIVE-11408), datanucleus caches the 
> classloaders 
> (https://github.com/datanucleus/datanucleus-core/blob/3.2/src/java/org/datanucleus/NucleusContext.java#L161).
>  JDOPersistanceManager factory (1 per JVM) holds on to a NucleusContext 
> reference 
> (https://github.com/datanucleus/datanucleus-api-jdo/blob/3.2/src/java/org/datanucleus/api/jdo/JDOPersistenceManagerFactory.java#L115).
>  Until we call  NucleusContext#close, the classloader cache is not cleared. 
> In case of UDFs this can lead to permgen leak, as shown in the attached 
> screenshot, where NucleusContext holds on to several URLClassloader objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11466) HIVE-10166 generates more data on hive.log causing Jenkins to fill all the disk.

2015-08-07 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662685#comment-14662685
 ] 

Chao Sun commented on HIVE-11466:
-

I removed HIVE-9152 on my local machine, but the exception messages are still 
in the log. I'm not quite sure if HIVE-9152 caused the problem.

> HIVE-10166 generates more data on hive.log causing Jenkins to fill all the 
> disk.
> 
>
> Key: HIVE-11466
> URL: https://issues.apache.org/jira/browse/HIVE-11466
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergio Peña
>Assignee: Xuefu Zhang
> Attachments: HIVE-11466.patch
>
>
> An issue with HIVE-10166 patch is increasing the size of hive.log and  
> causing jenkins to fail because it does not have more space.
> Here's a test I run when running TestJdbcWithMiniHS2 before the patch, with 
> the patch, and after other commits.
> {noformat}
> BEFORE HIVE-10166
> 13M Aug  5 11:57 ./hive-unit/target/tmp/log/hive.log
> WITH HIVE-10166
> 2.4G Aug  5 12:07 ./hive-unit/target/tmp/log/hive.log
> CURRENT HEAD
> 3.2G Aug  5 12:36 ./hive-unit/target/tmp/log/hive.log
> {noformat}
> This is just a single test, but on Jenkins, hive.log is more than 13G of size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11453) Create PostExecutionHook for ORC file dump

2015-08-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662695#comment-14662695
 ] 

Sergey Shelukhin commented on HIVE-11453:
-

+1. Is only the first column being passed to rowindex by design? 

> Create PostExecutionHook for ORC file dump
> --
>
> Key: HIVE-11453
> URL: https://issues.apache.org/jira/browse/HIVE-11453
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-11453.1.patch
>
>
> To catch regressions in ORC table properties similar to HIVE-11452 it will be 
> good to print out ORC metadata as part of qfile test. This can be done using 
> post execution hooks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11449) "Capacity must be a power of two" error when HybridHashTableContainer memory threshold is too low

2015-08-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662698#comment-14662698
 ] 

Sergey Shelukhin commented on HIVE-11449:
-

+1

> "Capacity must be a power of two" error when HybridHashTableContainer memory 
> threshold is too low
> -
>
> Key: HIVE-11449
> URL: https://issues.apache.org/jira/browse/HIVE-11449
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-11449.1.patch, HIVE-11449.2.patch
>
>
> Currently it only logs a warning message:
> {code}
>   public static int calcNumPartitions(long memoryThreshold, long dataSize, 
> int minNumParts,
>   int minWbSize, HybridHashTableConf nwayConf) throws IOException {
> int numPartitions = minNumParts;
> if (memoryThreshold < minNumParts * minWbSize) {
>   LOG.warn("Available memory is not enough to create a 
> HybridHashTableContainer!");
> }
> {code}
> Because we only log a warning, processing continues and hits a 
> hard-to-diagnose error (log below also includes extra logging I added to help 
> track this down). We should probably just fail the query a useful logging 
> message instead.
> {noformat}
> 2015-07-30 18:49:29,696 [pool-1269-thread-8()] WARN 
> org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer: 
> Available memory is not enough to create HybridHashTableContainers 
> consistently!
> 2015-07-30 18:49:29,696 [pool-1269-thread-8()] ERROR 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap: *** 
> initialCapacity 1: 10
> 2015-07-30 18:49:29,696 [pool-1269-thread-8()] ERROR 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap: *** 
> initialCapacity 2: 131072
> 2015-07-30 18:49:29,696 [pool-1269-thread-8()] ERROR 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap: *** 
> maxCapacity: 0
> 2015-07-30 18:49:29,696 [pool-1269-thread-8()] ERROR 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap: *** 
> initialCapacity 3: 0
> 2015-07-30 18:49:29,699 
> [TezTaskRunner_attempt_1437197396589_0685_1_49_00_2(attempt_1437197396589_0685_1_49_00_2)]
>  ERROR org.apache.hadoop.hive.ql.exec.tez.TezProcessor: 
> java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:258)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:168)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:157)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:349)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:71)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:60)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Async 
> initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.completeInitialization(Operator.java:419)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:389)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:514)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:467)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:379)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:243)
>   ... 15 more
> Caused by: java.util.concurrent.ExecutionException: java.lang.AssertionError: 
> Capacity must be a power of two
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:188)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.completeInitialization(Operator.java:409)
>   

[jira] [Commented] (HIVE-11466) HIVE-10166 generates more data on hive.log causing Jenkins to fill all the disk.

2015-08-07 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662699#comment-14662699
 ] 

Xuefu Zhang commented on HIVE-11466:


Yeah, it might not be related to HIVE-9152. I'm not sure why I saw the behavior 
difference previously. To me now HIVE-10330 seems to be the problem. I'm going 
to submit a patch to fix that. We can probably commit this w/o a precommit run, 
but +1 from someone will be good.

> HIVE-10166 generates more data on hive.log causing Jenkins to fill all the 
> disk.
> 
>
> Key: HIVE-11466
> URL: https://issues.apache.org/jira/browse/HIVE-11466
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergio Peña
>Assignee: Xuefu Zhang
> Attachments: HIVE-11466.patch
>
>
> An issue with HIVE-10166 patch is increasing the size of hive.log and  
> causing jenkins to fail because it does not have more space.
> Here's a test I run when running TestJdbcWithMiniHS2 before the patch, with 
> the patch, and after other commits.
> {noformat}
> BEFORE HIVE-10166
> 13M Aug  5 11:57 ./hive-unit/target/tmp/log/hive.log
> WITH HIVE-10166
> 2.4G Aug  5 12:07 ./hive-unit/target/tmp/log/hive.log
> CURRENT HEAD
> 3.2G Aug  5 12:36 ./hive-unit/target/tmp/log/hive.log
> {noformat}
> This is just a single test, but on Jenkins, hive.log is more than 13G of size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11453) Create PostExecutionHook for ORC file dump

2015-08-07 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662700#comment-14662700
 ] 

Prasanth Jayachandran commented on HIVE-11453:
--

Thats added just to make sure bloom filter fpp works as expected through table 
properties and will not be broken in future :) I can't pass parameters to post 
exec hook so its just hard coded in there. Ideally we should have SHOW INDEX 
for this.

> Create PostExecutionHook for ORC file dump
> --
>
> Key: HIVE-11453
> URL: https://issues.apache.org/jira/browse/HIVE-11453
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-11453.1.patch
>
>
> To catch regressions in ORC table properties similar to HIVE-11452 it will be 
> good to print out ORC metadata as part of qfile test. This can be done using 
> post execution hooks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11466) HIVE-10166 generates more data on hive.log causing Jenkins to fill all the disk.

2015-08-07 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-11466:
---
Attachment: HIVE-11466.1.patch

> HIVE-10166 generates more data on hive.log causing Jenkins to fill all the 
> disk.
> 
>
> Key: HIVE-11466
> URL: https://issues.apache.org/jira/browse/HIVE-11466
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergio Peña
>Assignee: Xuefu Zhang
> Attachments: HIVE-11466.1.patch, HIVE-11466.patch
>
>
> An issue with HIVE-10166 patch is increasing the size of hive.log and  
> causing jenkins to fail because it does not have more space.
> Here's a test I run when running TestJdbcWithMiniHS2 before the patch, with 
> the patch, and after other commits.
> {noformat}
> BEFORE HIVE-10166
> 13M Aug  5 11:57 ./hive-unit/target/tmp/log/hive.log
> WITH HIVE-10166
> 2.4G Aug  5 12:07 ./hive-unit/target/tmp/log/hive.log
> CURRENT HEAD
> 3.2G Aug  5 12:36 ./hive-unit/target/tmp/log/hive.log
> {noformat}
> This is just a single test, but on Jenkins, hive.log is more than 13G of size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11466) HIVE-10166 generates more data on hive.log causing Jenkins to fill all the disk.

2015-08-07 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662706#comment-14662706
 ] 

Prasanth Jayachandran commented on HIVE-11466:
--

+1 for .1 patch.

> HIVE-10166 generates more data on hive.log causing Jenkins to fill all the 
> disk.
> 
>
> Key: HIVE-11466
> URL: https://issues.apache.org/jira/browse/HIVE-11466
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergio Peña
>Assignee: Xuefu Zhang
> Attachments: HIVE-11466.1.patch, HIVE-11466.patch
>
>
> An issue with HIVE-10166 patch is increasing the size of hive.log and  
> causing jenkins to fail because it does not have more space.
> Here's a test I run when running TestJdbcWithMiniHS2 before the patch, with 
> the patch, and after other commits.
> {noformat}
> BEFORE HIVE-10166
> 13M Aug  5 11:57 ./hive-unit/target/tmp/log/hive.log
> WITH HIVE-10166
> 2.4G Aug  5 12:07 ./hive-unit/target/tmp/log/hive.log
> CURRENT HEAD
> 3.2G Aug  5 12:36 ./hive-unit/target/tmp/log/hive.log
> {noformat}
> This is just a single test, but on Jenkins, hive.log is more than 13G of size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4897) Hive should handle AlreadyExists on retries when creating tables/partitions

2015-08-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662705#comment-14662705
 ] 

Sergey Shelukhin commented on HIVE-4897:


Hmm... wouldn't it just retry after the first exception and then ignore the 
repeated exception on retry?
I think it may need to check that the object in question exists for each API 
and maybe that it matches the request, on retry.
I am not sure there's good way to add repeatable test for this...

> Hive should handle AlreadyExists on retries when creating tables/partitions
> ---
>
> Key: HIVE-4897
> URL: https://issues.apache.org/jira/browse/HIVE-4897
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Aihua Xu
> Attachments: HIVE-4897.patch, hive-snippet.log
>
>
> Creating new tables/partitions may fail with an AlreadyExistsException if 
> there is an error part way through the creation and the HMS tries again 
> without properly cleaning up or checking if this is a retry.
> While partitioning a new table via a script on distributed hive (MetaStore on 
> the same machine) there was a long timeout and then:
> {code}
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. 
> AlreadyExistsException(message:Partition already exists:Partition( ...
> {code}
> I am assuming this is due to retry. Perhaps already-exists on retry could be 
> handled better.
> A similar error occurred while creating a table through Impala, which issued 
> a single createTable call that failed with an AlreadyExistsException. See the 
> logs related to table tmp_proc_8_d2b7b0f133be455ca95615818b8a5879_7 in the 
> attached hive-snippet.log



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11358) LLAP: move LlapConfiguration into HiveConf

2015-08-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11358:

Assignee: (was: Sergey Shelukhin)

> LLAP: move LlapConfiguration into HiveConf
> --
>
> Key: HIVE-11358
> URL: https://issues.apache.org/jira/browse/HIVE-11358
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>
> Hive uses HiveConf for configuration. LlapConfiguration should be replaced 
> with parameters in HiveConf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11306) Add a bloom-1 filter for Hybrid MapJoin spills

2015-08-07 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662714#comment-14662714
 ] 

Gopal V commented on HIVE-11306:


LGTM - +1.

> Add a bloom-1 filter for Hybrid MapJoin spills
> --
>
> Key: HIVE-11306
> URL: https://issues.apache.org/jira/browse/HIVE-11306
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-11306.1.patch, HIVE-11306.2.patch, 
> HIVE-11306.3.patch
>
>
> HIVE-9277 implemented Spillable joins for Tez, which suffers from a 
> corner-case performance issue when joining wide small tables against a narrow 
> big table (like a user info table join events stream).
> The fact that the wide table is spilled causes extra IO, even though the nDV 
> of the join key might be in the thousands.
> A cheap bloom-1 filter would add a massive performance gain for such queries, 
> massively cutting down on the spill IO costs for the big-table spills.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11500) implement file footer / splits cache in HBase metastore

2015-08-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662715#comment-14662715
 ] 

Sergey Shelukhin commented on HIVE-11500:
-

I will attach a short design doc early next week.

> implement file footer / splits cache in HBase metastore
> ---
>
> Key: HIVE-11500
> URL: https://issues.apache.org/jira/browse/HIVE-11500
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> We need to cache file metadata (e.g. ORC file footers) for split generation 
> (which, on FSes that support fileId, will be valid permanently and only needs 
> to be removed lazily when ORC file is erased or compacted), and potentially 
> even some information about splits (e.g. grouping based on location that 
> would be good for some short time), in HBase metastore.
> It should be queryable by table. Partition predicate pushdown should be 
> supported. If bucket pruning is added, that too. 
> In later phases, it would be nice to save the (first category above) results 
> of expensive work done by jobs, e.g. data size after decompression/decoding 
> per column, etc. to avoid surprises when ORC encoding is very good, or very 
> bad. Perhaps it can even be lazily generated. Here's a pony: 🐴



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11500) implement file footer / splits cache in HBase metastore

2015-08-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11500:

Description: 
We need to cache file metadata (e.g. ORC file footers) for split generation 
(which, on FSes that support fileId, will be valid permanently and only needs 
to be removed lazily when ORC file is erased or compacted), and potentially 
even some information about splits (e.g. grouping based on location that would 
be good for some short time), in HBase metastore.
It should be queryable by table. Partition predicate pushdown should be 
supported. If bucket pruning is added, that too. 

In later phases, it would be nice to save the (first category above) results of 
expensive work done by jobs, e.g. data size after decompression/decoding per 
column, etc. to avoid surprises when ORC encoding is very good, or very bad. 
Perhaps it can even be lazily generated. Here's a pony: 🐴

  was:
We need to cache footer data for split generation (which, on FSes that support 
fileId, will be valid permanently and only needs to be removed lazily when ORC 
file is erased or compacted), and potentially even some information about 
splits (e.g. grouping based on location that would be good for some short 
time), in HBase metastore.
It should be queryable by table. Partition predicate pushdown should be 
supported. If bucket pruning is added, that too. 

In later phases, it would be nice to save the (first category above) results of 
expensive work done by jobs, e.g. data size after decompression/decoding per 
column, etc. to avoid surprises when ORC encoding is very good, or very bad. 
Perhaps it can even be lazily generated. Here's a pony: 🐴


> implement file footer / splits cache in HBase metastore
> ---
>
> Key: HIVE-11500
> URL: https://issues.apache.org/jira/browse/HIVE-11500
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> We need to cache file metadata (e.g. ORC file footers) for split generation 
> (which, on FSes that support fileId, will be valid permanently and only needs 
> to be removed lazily when ORC file is erased or compacted), and potentially 
> even some information about splits (e.g. grouping based on location that 
> would be good for some short time), in HBase metastore.
> It should be queryable by table. Partition predicate pushdown should be 
> supported. If bucket pruning is added, that too. 
> In later phases, it would be nice to save the (first category above) results 
> of expensive work done by jobs, e.g. data size after decompression/decoding 
> per column, etc. to avoid surprises when ORC encoding is very good, or very 
> bad. Perhaps it can even be lazily generated. Here's a pony: 🐴



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11496) Better tests for evaluating ORC predicate pushdown

2015-08-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662718#comment-14662718
 ] 

Hive QA commented on HIVE-11496:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12749152/HIVE-11496.1.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9343 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_ppd_basic
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_ppd_char
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4865/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4865/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4865/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12749152 - PreCommit-HIVE-TRUNK-Build

> Better tests for evaluating ORC predicate pushdown
> --
>
> Key: HIVE-11496
> URL: https://issues.apache.org/jira/browse/HIVE-11496
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-11496.1.patch
>
>
> There were many regressions recently wrt ORC predicate pushdown. We don't 
> have system tests to capture these regressions. Currently there is only junit 
> tests for testing ORC predicate pushdown feature. Since hive counters are not 
> available during qfile test execution there is no easy way to verify if ORC 
> PPD feature worked or not. This jira is add a post execution hook to print 
> hive counters (esp. number of input records) to error stream so that it will 
> appear in qfile test output. This way we can verify ORC SARG evaluation and 
> avoid future regressions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11468) Vectorize: Struct IN() clauses

2015-08-07 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-11468:
---
Attachment: HIVE-11468.03.patch

Rebased after dependencies

> Vectorize: Struct IN() clauses
> --
>
> Key: HIVE-11468
> URL: https://issues.apache.org/jira/browse/HIVE-11468
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-11468.01.patch, HIVE-11468.02.patch, 
> HIVE-11468.03.patch
>
>
> Improve performance by vectorizing Struct IN() clauses.  Related to 
> HIVE-11428.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11295) LLAP: clean up ORC dependencies on object pools

2015-08-07 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662725#comment-14662725
 ] 

Prasanth Jayachandran commented on HIVE-11295:
--

Why does object pool have to worry about resetting the data before offering? Is 
it caller's responsibility to do the reset before invoking offer()?

> LLAP: clean up ORC dependencies on object pools
> ---
>
> Key: HIVE-11295
> URL: https://issues.apache.org/jira/browse/HIVE-11295
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11295.patch
>
>
> Before there's storage API module, we can clean some things up



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >