[jira] [Commented] (HIVE-14160) Reduce-task costs a long time to finish on the condition that the certain sql "select a,distinct(b) group by a" has been executed on the data which has skew distributio

2016-07-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367377#comment-15367377
 ] 

ASF GitHub Bot commented on HIVE-14160:
---

GitHub user zhongdeyin opened a pull request:

https://github.com/apache/hive/pull/85

[HIVE-14160]Optimized group by within distinct data skew proplem

Jira issue: https://issues.apache.org/jira/browse/HIVE-14160

Modify GroupByOperator.endGroup() method, while keysCurrentGroup.size() is 
greater than a threshold(hive.distinct.setsize.max), will new a HashSet instead 
of clear set. Option hive.distinct.newset.max is the Maximum for getting new 
hashset, beyond this threshold,will still execute hashset.clear(). This will 
ensure task stability. User can set these two client parameters themselves.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhongdeyin/hive branch-1.1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/85.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #85


commit 390fb2e30d5043cd262ad73d3c1678a268c591e3
Author: zhongdeyin 
Date:   2016-07-08T07:38:14Z

Optimized group by within distinct data skew proplem




> Reduce-task costs a long time to finish on the condition that the certain sql 
> "select a,distinct(b) group by a" has been executed on the data which has 
> skew distribution
> -
>
> Key: HIVE-14160
> URL: https://issues.apache.org/jira/browse/HIVE-14160
> Project: Hive
>  Issue Type: Improvement
>  Components: hpl/sql
>Affects Versions: 1.1.0
>Reporter: marymwu
>
> Reduce-task costs a long time to finish on the condition that the certain sql 
> "select a,distinct(b) group by a" has been executed on the data which has 
> skew distribution
> data scale: 64G



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14124) Spark app name should be in line with MapReduce app name when using Hive On Spark

2016-07-08 Thread Thomas Scott (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367381#comment-15367381
 ] 

Thomas Scott commented on HIVE-14124:
-

[~xuefuz] Understood, this should be taken in conjunction with HIVE-14162 where 
the long running AM is not suitable in some cases.

> Spark app name should be in line with MapReduce app name when using Hive On 
> Spark
> -
>
> Key: HIVE-14124
> URL: https://issues.apache.org/jira/browse/HIVE-14124
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Thomas Scott
>Priority: Minor
>
> When using the spark execution engine the jobs submitted to YARN are 
> submitted with name "Hive On Spark" whereas in mr  execution engine the name 
> contains the query executed. This is overrideable via spark.app.name but it 
> should automatically fill out the query executed in line with the mr engine.
> Example:
> set hive.execution.engine=spark;
> Select count(*) from sometable; 
>  
> -> Launched YARN Job description: Hive On Spark
> set hive.execution.engine=mr;
> Select count(*) from sometable; 
>  
> -> Launched YARN Job description: Select count(*) from sometable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14159) sorting of tuple array using multiple field[s]

2016-07-08 Thread Simanchal Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simanchal Das updated HIVE-14159:
-
Description: 
Problem Statement:

When we are working with complex structure of data like avro.
Most of the times we are encountering array contains multiple tuples and each 
tuple have struct schema.

Suppose here struct schema is like below:
{noformat}
{
"name": "employee",
"type": [{
"type": "record",
"name": "Employee",
"namespace": "com.company.Employee",
"fields": [{
"name": "empId",
"type": "int"
}, {
"name": "empName",
"type": "string"
}, {
"name": "age",
"type": "int"
}, {
"name": "salary",
"type": "double"
}]
}]
}

{noformat}
Then while running our hive query complex array looks like array of employee 
objects.
{noformat}
Example: 
//(array>)

Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]

{noformat}
When we are implementing business use cases day to day life we are encountering 
problems like sorting a tuple array by specific field[s] like 
empId,name,salary,etc by ASC or DESC order.


Proposal:

I have developed a udf 'sort_array_by' which will sort a tuple array by one or 
more fields in ASC or DESC order provided by user ,default is ascending order 
order.
{noformat}
Example:
1.Select 
sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary","ASC");
output: 
array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]

2.Select 
sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","ASC");
output: 
array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]

3.Select 
sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age,"ASC");
output: 
array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
{noformat}

  was:
Problem Statement:

When we are working with complex structure of data like avro.
Most of the times we are encountering array contains multiple tuples and each 
tuple have struct schema.

Suppose here struct schema is like below:
{noformat}
{
"name": "employee",
"type": [{
"type": "record",
"name": "Employee",
"namespace": "com.company.Employee",
"fields": [{
"name": "empId",
"type": "int"
}, {
"name": "empName",
"type": "string"
}, {
"name": "age",
"type": "int"
}, {
"name": "salary",
"type": "double"
}]
}]
}

{noformat}
Then while running our hive query complex array looks like array of employee 
objects.
{noformat}
Example: 
//(array>)

Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]

{noformat}
When we are implementing business use cases day to day life we are encountering 
problems like sorting a tuple array by specific field[s] like 
empId,name,salary,etc.


Proposal:

I have developed a udf 'sort_array_field' which will sort a tuple array by one 
or more fields in naural order.
{noformat}
Example:
1.Select 
sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary");
output: 
array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]

2.Select 
sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary");
output: 
array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]

3.Select 
sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age);
output: 
array[struct(50

[jira] [Updated] (HIVE-14159) sorting of tuple array using multiple field[s]

2016-07-08 Thread Simanchal Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simanchal Das updated HIVE-14159:
-
Description: 
Problem Statement:

When we are working with complex structure of data like avro.
Most of the times we are encountering array contains multiple tuples and each 
tuple have struct schema.

Suppose here struct schema is like below:
{noformat}
{
"name": "employee",
"type": [{
"type": "record",
"name": "Employee",
"namespace": "com.company.Employee",
"fields": [{
"name": "empId",
"type": "int"
}, {
"name": "empName",
"type": "string"
}, {
"name": "age",
"type": "int"
}, {
"name": "salary",
"type": "double"
}]
}]
}

{noformat}
Then while running our hive query complex array looks like array of employee 
objects.
{noformat}
Example: 
//(array>)

Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]

{noformat}
When we are implementing business use cases day to day life we are encountering 
problems like sorting a tuple array by specific field[s] like 
empId,name,salary,etc by ASC or DESC order.


Proposal:

I have developed a udf 'sort_array_by' which will sort a tuple array by one or 
more fields in ASC or DESC order provided by user ,default is ascending order .
{noformat}
Example:
1.Select 
sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary","ASC");
output: 
array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]

2.Select 
sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","ASC");
output: 
array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]

3.Select 
sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age,"ASC");
output: 
array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
{noformat}

  was:
Problem Statement:

When we are working with complex structure of data like avro.
Most of the times we are encountering array contains multiple tuples and each 
tuple have struct schema.

Suppose here struct schema is like below:
{noformat}
{
"name": "employee",
"type": [{
"type": "record",
"name": "Employee",
"namespace": "com.company.Employee",
"fields": [{
"name": "empId",
"type": "int"
}, {
"name": "empName",
"type": "string"
}, {
"name": "age",
"type": "int"
}, {
"name": "salary",
"type": "double"
}]
}]
}

{noformat}
Then while running our hive query complex array looks like array of employee 
objects.
{noformat}
Example: 
//(array>)

Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]

{noformat}
When we are implementing business use cases day to day life we are encountering 
problems like sorting a tuple array by specific field[s] like 
empId,name,salary,etc by ASC or DESC order.


Proposal:

I have developed a udf 'sort_array_by' which will sort a tuple array by one or 
more fields in ASC or DESC order provided by user ,default is ascending order 
order.
{noformat}
Example:
1.Select 
sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary","ASC");
output: 
array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]

2.Select 
sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","ASC");
output: 
array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]

3.Select 
sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),

[jira] [Updated] (HIVE-14159) sorting of tuple array using multiple field[s]

2016-07-08 Thread Simanchal Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simanchal Das updated HIVE-14159:
-
Status: Patch Available  (was: Open)

> sorting of tuple array using multiple field[s]
> --
>
> Key: HIVE-14159
> URL: https://issues.apache.org/jira/browse/HIVE-14159
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Simanchal Das
>Assignee: Simanchal Das
>  Labels: patch
> Attachments: HIVE-14159.1.patch, HIVE-14159.2.patch, 
> HIVE-14159.3.patch
>
>
> Problem Statement:
> When we are working with complex structure of data like avro.
> Most of the times we are encountering array contains multiple tuples and each 
> tuple have struct schema.
> Suppose here struct schema is like below:
> {noformat}
> {
>   "name": "employee",
>   "type": [{
>   "type": "record",
>   "name": "Employee",
>   "namespace": "com.company.Employee",
>   "fields": [{
>   "name": "empId",
>   "type": "int"
>   }, {
>   "name": "empName",
>   "type": "string"
>   }, {
>   "name": "age",
>   "type": "int"
>   }, {
>   "name": "salary",
>   "type": "double"
>   }]
>   }]
> }
> {noformat}
> Then while running our hive query complex array looks like array of employee 
> objects.
> {noformat}
> Example: 
>   //(array>)
>   
> Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]
> {noformat}
> When we are implementing business use cases day to day life we are 
> encountering problems like sorting a tuple array by specific field[s] like 
> empId,name,salary,etc by ASC or DESC order.
> Proposal:
> I have developed a udf 'sort_array_by' which will sort a tuple array by one 
> or more fields in ASC or DESC order provided by user ,default is ascending 
> order .
> {noformat}
> Example:
>   1.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary","ASC");
>   output: 
> array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]
>   
>   2.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","ASC");
>   output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
>   3.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age,"ASC");
>   output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14159) sorting of tuple array using multiple field[s]

2016-07-08 Thread Simanchal Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simanchal Das updated HIVE-14159:
-
Attachment: HIVE-14159.3.patch

> sorting of tuple array using multiple field[s]
> --
>
> Key: HIVE-14159
> URL: https://issues.apache.org/jira/browse/HIVE-14159
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Simanchal Das
>Assignee: Simanchal Das
>  Labels: patch
> Attachments: HIVE-14159.1.patch, HIVE-14159.2.patch, 
> HIVE-14159.3.patch
>
>
> Problem Statement:
> When we are working with complex structure of data like avro.
> Most of the times we are encountering array contains multiple tuples and each 
> tuple have struct schema.
> Suppose here struct schema is like below:
> {noformat}
> {
>   "name": "employee",
>   "type": [{
>   "type": "record",
>   "name": "Employee",
>   "namespace": "com.company.Employee",
>   "fields": [{
>   "name": "empId",
>   "type": "int"
>   }, {
>   "name": "empName",
>   "type": "string"
>   }, {
>   "name": "age",
>   "type": "int"
>   }, {
>   "name": "salary",
>   "type": "double"
>   }]
>   }]
> }
> {noformat}
> Then while running our hive query complex array looks like array of employee 
> objects.
> {noformat}
> Example: 
>   //(array>)
>   
> Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]
> {noformat}
> When we are implementing business use cases day to day life we are 
> encountering problems like sorting a tuple array by specific field[s] like 
> empId,name,salary,etc by ASC or DESC order.
> Proposal:
> I have developed a udf 'sort_array_by' which will sort a tuple array by one 
> or more fields in ASC or DESC order provided by user ,default is ascending 
> order .
> {noformat}
> Example:
>   1.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary","ASC");
>   output: 
> array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]
>   
>   2.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","ASC");
>   output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
>   3.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age,"ASC");
>   output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14159) sorting of tuple array using multiple field[s]

2016-07-08 Thread Simanchal Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simanchal Das updated HIVE-14159:
-
Attachment: (was: HIVE-14159.3.patch)

> sorting of tuple array using multiple field[s]
> --
>
> Key: HIVE-14159
> URL: https://issues.apache.org/jira/browse/HIVE-14159
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Simanchal Das
>Assignee: Simanchal Das
>  Labels: patch
> Attachments: HIVE-14159.1.patch, HIVE-14159.2.patch
>
>
> Problem Statement:
> When we are working with complex structure of data like avro.
> Most of the times we are encountering array contains multiple tuples and each 
> tuple have struct schema.
> Suppose here struct schema is like below:
> {noformat}
> {
>   "name": "employee",
>   "type": [{
>   "type": "record",
>   "name": "Employee",
>   "namespace": "com.company.Employee",
>   "fields": [{
>   "name": "empId",
>   "type": "int"
>   }, {
>   "name": "empName",
>   "type": "string"
>   }, {
>   "name": "age",
>   "type": "int"
>   }, {
>   "name": "salary",
>   "type": "double"
>   }]
>   }]
> }
> {noformat}
> Then while running our hive query complex array looks like array of employee 
> objects.
> {noformat}
> Example: 
>   //(array>)
>   
> Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]
> {noformat}
> When we are implementing business use cases day to day life we are 
> encountering problems like sorting a tuple array by specific field[s] like 
> empId,name,salary,etc by ASC or DESC order.
> Proposal:
> I have developed a udf 'sort_array_by' which will sort a tuple array by one 
> or more fields in ASC or DESC order provided by user ,default is ascending 
> order .
> {noformat}
> Example:
>   1.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary","ASC");
>   output: 
> array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]
>   
>   2.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","ASC");
>   output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
>   3.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age,"ASC");
>   output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14159) sorting of tuple array using multiple field[s]

2016-07-08 Thread Simanchal Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simanchal Das updated HIVE-14159:
-
Status: Open  (was: Patch Available)

> sorting of tuple array using multiple field[s]
> --
>
> Key: HIVE-14159
> URL: https://issues.apache.org/jira/browse/HIVE-14159
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Simanchal Das
>Assignee: Simanchal Das
>  Labels: patch
> Attachments: HIVE-14159.1.patch, HIVE-14159.2.patch
>
>
> Problem Statement:
> When we are working with complex structure of data like avro.
> Most of the times we are encountering array contains multiple tuples and each 
> tuple have struct schema.
> Suppose here struct schema is like below:
> {noformat}
> {
>   "name": "employee",
>   "type": [{
>   "type": "record",
>   "name": "Employee",
>   "namespace": "com.company.Employee",
>   "fields": [{
>   "name": "empId",
>   "type": "int"
>   }, {
>   "name": "empName",
>   "type": "string"
>   }, {
>   "name": "age",
>   "type": "int"
>   }, {
>   "name": "salary",
>   "type": "double"
>   }]
>   }]
> }
> {noformat}
> Then while running our hive query complex array looks like array of employee 
> objects.
> {noformat}
> Example: 
>   //(array>)
>   
> Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]
> {noformat}
> When we are implementing business use cases day to day life we are 
> encountering problems like sorting a tuple array by specific field[s] like 
> empId,name,salary,etc by ASC or DESC order.
> Proposal:
> I have developed a udf 'sort_array_by' which will sort a tuple array by one 
> or more fields in ASC or DESC order provided by user ,default is ascending 
> order .
> {noformat}
> Example:
>   1.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary","ASC");
>   output: 
> array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]
>   
>   2.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","ASC");
>   output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
>   3.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age,"ASC");
>   output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14159) sorting of tuple array using multiple field[s]

2016-07-08 Thread Simanchal Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simanchal Das updated HIVE-14159:
-
Attachment: HIVE-14159.3.patch

renamed the udf to sort_array_by

> sorting of tuple array using multiple field[s]
> --
>
> Key: HIVE-14159
> URL: https://issues.apache.org/jira/browse/HIVE-14159
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Simanchal Das
>Assignee: Simanchal Das
>  Labels: patch
> Attachments: HIVE-14159.1.patch, HIVE-14159.2.patch, 
> HIVE-14159.3.patch
>
>
> Problem Statement:
> When we are working with complex structure of data like avro.
> Most of the times we are encountering array contains multiple tuples and each 
> tuple have struct schema.
> Suppose here struct schema is like below:
> {noformat}
> {
>   "name": "employee",
>   "type": [{
>   "type": "record",
>   "name": "Employee",
>   "namespace": "com.company.Employee",
>   "fields": [{
>   "name": "empId",
>   "type": "int"
>   }, {
>   "name": "empName",
>   "type": "string"
>   }, {
>   "name": "age",
>   "type": "int"
>   }, {
>   "name": "salary",
>   "type": "double"
>   }]
>   }]
> }
> {noformat}
> Then while running our hive query complex array looks like array of employee 
> objects.
> {noformat}
> Example: 
>   //(array>)
>   
> Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]
> {noformat}
> When we are implementing business use cases day to day life we are 
> encountering problems like sorting a tuple array by specific field[s] like 
> empId,name,salary,etc by ASC or DESC order.
> Proposal:
> I have developed a udf 'sort_array_by' which will sort a tuple array by one 
> or more fields in ASC or DESC order provided by user ,default is ascending 
> order .
> {noformat}
> Example:
>   1.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary","ASC");
>   output: 
> array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]
>   
>   2.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","ASC");
>   output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
>   3.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age,"ASC");
>   output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14159) sorting of tuple array using multiple field[s]

2016-07-08 Thread Simanchal Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simanchal Das updated HIVE-14159:
-
Status: Patch Available  (was: Open)

> sorting of tuple array using multiple field[s]
> --
>
> Key: HIVE-14159
> URL: https://issues.apache.org/jira/browse/HIVE-14159
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Simanchal Das
>Assignee: Simanchal Das
>  Labels: patch
> Attachments: HIVE-14159.1.patch, HIVE-14159.2.patch, 
> HIVE-14159.3.patch
>
>
> Problem Statement:
> When we are working with complex structure of data like avro.
> Most of the times we are encountering array contains multiple tuples and each 
> tuple have struct schema.
> Suppose here struct schema is like below:
> {noformat}
> {
>   "name": "employee",
>   "type": [{
>   "type": "record",
>   "name": "Employee",
>   "namespace": "com.company.Employee",
>   "fields": [{
>   "name": "empId",
>   "type": "int"
>   }, {
>   "name": "empName",
>   "type": "string"
>   }, {
>   "name": "age",
>   "type": "int"
>   }, {
>   "name": "salary",
>   "type": "double"
>   }]
>   }]
> }
> {noformat}
> Then while running our hive query complex array looks like array of employee 
> objects.
> {noformat}
> Example: 
>   //(array>)
>   
> Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]
> {noformat}
> When we are implementing business use cases day to day life we are 
> encountering problems like sorting a tuple array by specific field[s] like 
> empId,name,salary,etc by ASC or DESC order.
> Proposal:
> I have developed a udf 'sort_array_by' which will sort a tuple array by one 
> or more fields in ASC or DESC order provided by user ,default is ascending 
> order .
> {noformat}
> Example:
>   1.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary","ASC");
>   output: 
> array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]
>   
>   2.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","ASC");
>   output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
>   3.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age,"ASC");
>   output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14173) NPE was thrown after enabling directsql in the middle of session

2016-07-08 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367611#comment-15367611
 ] 

Chaoyu Tang commented on HIVE-14173:


Looks like the build infrastructure had the problem.


> NPE was thrown after enabling directsql in the middle of session
> 
>
> Key: HIVE-14173
> URL: https://issues.apache.org/jira/browse/HIVE-14173
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-14173.patch, HIVE-14173.patch
>
>
> hive.metastore.try.direct.sql is initially set to false in HMS hive-site.xml, 
> then changed to true using set metaconf command in the middle of a session, 
> running a query will be thrown NPE with error message is as following:
> {code}
> 2016-07-06T17:44:41,489 ERROR [pool-5-thread-2]: metastore.RetryingHMSHandler 
> (RetryingHMSHandler.java:invokeInternal(192)) - 
> MetaException(message:java.lang.NullPointerException)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:5741)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.rethrowException(HiveMetaStore.java:4771)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:4754)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
>   at com.sun.proxy.$Proxy18.get_partitions_by_expr(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:12048)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:12032)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.(ObjectStore.java:2667)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore$GetListHelper.(ObjectStore.java:2825)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore$4.(ObjectStore.java:2410)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:2410)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:2400)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)
>   at com.sun.proxy.$Proxy17.getPartitionsByExpr(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:4749)
>   ... 20 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14159) sorting of tuple array using multiple field[s]

2016-07-08 Thread Simanchal Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367614#comment-15367614
 ] 

Simanchal Das commented on HIVE-14159:
--

Hi Carl,

Thank you for reviewing this ticket and RB.
As per your suggestion I have fixed the review comments.
I have added one more extra optional parameter for sorting order(ASC,DESC) in 
the code, that should be last parameter in UDF.
If user does not provide any sorting order,then we do sorting in ascending 
order.

Thanks,
Simanchal

> sorting of tuple array using multiple field[s]
> --
>
> Key: HIVE-14159
> URL: https://issues.apache.org/jira/browse/HIVE-14159
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Simanchal Das
>Assignee: Simanchal Das
>  Labels: patch
> Attachments: HIVE-14159.1.patch, HIVE-14159.2.patch, 
> HIVE-14159.3.patch
>
>
> Problem Statement:
> When we are working with complex structure of data like avro.
> Most of the times we are encountering array contains multiple tuples and each 
> tuple have struct schema.
> Suppose here struct schema is like below:
> {noformat}
> {
>   "name": "employee",
>   "type": [{
>   "type": "record",
>   "name": "Employee",
>   "namespace": "com.company.Employee",
>   "fields": [{
>   "name": "empId",
>   "type": "int"
>   }, {
>   "name": "empName",
>   "type": "string"
>   }, {
>   "name": "age",
>   "type": "int"
>   }, {
>   "name": "salary",
>   "type": "double"
>   }]
>   }]
> }
> {noformat}
> Then while running our hive query complex array looks like array of employee 
> objects.
> {noformat}
> Example: 
>   //(array>)
>   
> Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]
> {noformat}
> When we are implementing business use cases day to day life we are 
> encountering problems like sorting a tuple array by specific field[s] like 
> empId,name,salary,etc by ASC or DESC order.
> Proposal:
> I have developed a udf 'sort_array_by' which will sort a tuple array by one 
> or more fields in ASC or DESC order provided by user ,default is ascending 
> order .
> {noformat}
> Example:
>   1.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary","ASC");
>   output: 
> array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]
>   
>   2.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","ASC");
>   output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
>   3.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age,"ASC");
>   output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14146) Column comments with "\n" character "corrupts" table metadata

2016-07-08 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-14146:
--
Attachment: HIVE-14146.4.patch

As discussed I merged the escapeCommand with the HiveStringUtils.escapeJava.

Created a new function in HiveStringUtils, so it can be reused.

> Column comments with "\n" character "corrupts" table metadata
> -
>
> Key: HIVE-14146
> URL: https://issues.apache.org/jira/browse/HIVE-14146
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-14146.2.patch, HIVE-14146.3.patch, 
> HIVE-14146.4.patch, HIVE-14146.patch
>
>
> Create a table with the following(noting the \n in the COMMENT):
> {noformat}
> CREATE TABLE commtest(first_nm string COMMENT 'Indicates First name\nof an 
> individual’);
> {noformat}
> Describe shows that now the metadata is messed up:
> {noformat}
> beeline> describe commtest;
> +---++---+--+
> | col_name  | data_type  |comment|
> +---++---+--+
> | first_nm | string   | Indicates First name  |
> | of an individual  | NULL   | NULL  |
> +---++---+--+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14185) Join query fails if the left table is empty and where condition searches in a list containing null

2016-07-08 Thread Frantisek Mantlik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367689#comment-15367689
 ] 

Frantisek Mantlik commented on HIVE-14185:
--

Gopal V, Thank you for explanation, but it is not the matter of the issue. The 
problem is that if you run the first query, you get the following exception if 
hive.auto.convert.join=false:
{noformat}
Error: java.io.IOException: java.lang.reflect.InvocationTargetException
 at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
 at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
 at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:266)
 at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.(HadoopShimsSecure.java:213)
 at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:333)
 at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:719)
 at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:169)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
 at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:252)
 ... 11 more
 Caused by: java.lang.NullPointerException
 at 
org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$ExpressionBuilder.boxLiteral(SearchArgumentImpl.java:446)
 at 
org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$ExpressionBuilder.getLiteralList(SearchArgumentImpl.java:489)
 at 
org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$ExpressionBuilder.createLeaf(SearchArgumentImpl.java:518)
 at 
org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$ExpressionBuilder.parse(SearchArgumentImpl.java:648)
 at 
org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$ExpressionBuilder.addChildren(SearchArgumentImpl.java:598)
 at 
org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$ExpressionBuilder.parse(SearchArgumentImpl.java:624)
 at 
org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$ExpressionBuilder.expression(SearchArgumentImpl.java:916)
 at 
org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl.(SearchArgumentImpl.java:953)
 at 
org.apache.hadoop.hive.ql.io.sarg.SearchArgumentFactory.create(SearchArgumentFactory.java:36)
 at 
org.apache.hadoop.hive.ql.io.sarg.SearchArgumentFactory.createFromConf(SearchArgumentFactory.java:50)
 at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.setSearchArgument(OrcInputFormat.java:312)
 at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.createReaderFromFile(OrcInputFormat.java:229)
 at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.(OrcInputFormat.java:163)
 at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1104)
 at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:67)
 ... 16 more
{noformat}
All map attempts fail with the same exception.
If hive.auto.convert.join=true, the message below is returned:
{noformat}
Execution failed with exit status: 2
 Obtaining error information

Task failed!
 Task ID:
 Stage-4

Logs:

/tmp/fmantli/hive.log
 FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
{noformat}
 The hive.log contains the following:
{noformat}
2016-07-08 09:33:03,526 ERROR 
[main]
: hdfs.KeyProviderCache (KeyProviderCache.java:createKeyProviderURI(87)) - 
Could not find uri with key 
[dfs.encryption.key.provider.uri]
to create a keyProvider !!
 2016-07-08 09:33:10,589 ERROR 
[main]
: exec.Task (SessionState.java:printError(960)) - Execution failed with exit 
status: 2
 2016-07-08 09:33:10,590 ERROR 
[main]
: exec.Task (SessionState.java:printError(960)) - Obtaining error information
 2016-07-08 09:33:10,590 ERROR 
[main]
: exec.Task (SessionState.java:printError(960)) -
 Task failed!
 Task ID:
 Stage-4

Logs:

2016-07-08 09:33:10,590 ERROR 
[m

[jira] [Updated] (HIVE-13749) Memory leak in Hive Metastore

2016-07-08 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-13749:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks Naveen for the contribution.

> Memory leak in Hive Metastore
> -
>
> Key: HIVE-13749
> URL: https://issues.apache.org/jira/browse/HIVE-13749
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Fix For: 2.2.0
>
> Attachments: HIVE-13749.1.patch, HIVE-13749.patch, Top_Consumers7.html
>
>
> Looking a heap dump of 10GB, a large number of Configuration objects(> 66k 
> instances) are being retained. These objects along with its retained set is 
> occupying about 95% of the heap space. This leads to HMS crashes every few 
> days.
> I will attach an exported snapshot from the eclipse MAT.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14146) Column comments with "\n" character "corrupts" table metadata

2016-07-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367757#comment-15367757
 ] 

Hive QA commented on HIVE-14146:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12816825/HIVE-14146.4.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/426/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/426/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-426/

Messages:
{noformat}
 This message was trimmed, see log for full details 
 [copy] Copying 15 files to 
/data/hive-ptest/working/apache-github-source-source/llap-tez/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
hive-llap-tez ---
[INFO] Compiling 2 source files to 
/data/hive-ptest/working/apache-github-source-source/llap-tez/target/test-classes
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/llap-tez/src/test/org/apache/hadoop/hive/llap/tezplugins/TestLlapTaskCommunicator.java:
 
/data/hive-ptest/working/apache-github-source-source/llap-tez/src/test/org/apache/hadoop/hive/llap/tezplugins/TestLlapTaskCommunicator.java
 uses or overrides a deprecated API.
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/llap-tez/src/test/org/apache/hadoop/hive/llap/tezplugins/TestLlapTaskCommunicator.java:
 Recompile with -Xlint:deprecation for details.
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/llap-tez/src/test/org/apache/hadoop/hive/llap/tezplugins/TestLlapTaskCommunicator.java:
 
/data/hive-ptest/working/apache-github-source-source/llap-tez/src/test/org/apache/hadoop/hive/llap/tezplugins/TestLlapTaskCommunicator.java
 uses unchecked or unsafe operations.
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/llap-tez/src/test/org/apache/hadoop/hive/llap/tezplugins/TestLlapTaskCommunicator.java:
 Recompile with -Xlint:unchecked for details.
[INFO] 
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-llap-tez ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-llap-tez ---
[INFO] Building jar: 
/data/hive-ptest/working/apache-github-source-source/llap-tez/target/hive-llap-tez-2.2.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
hive-llap-tez ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-llap-tez 
---
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/llap-tez/target/hive-llap-tez-2.2.0-SNAPSHOT.jar
 to 
/data/hive-ptest/working/maven/org/apache/hive/hive-llap-tez/2.2.0-SNAPSHOT/hive-llap-tez-2.2.0-SNAPSHOT.jar
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/llap-tez/pom.xml to 
/data/hive-ptest/working/maven/org/apache/hive/hive-llap-tez/2.2.0-SNAPSHOT/hive-llap-tez-2.2.0-SNAPSHOT.pom
[INFO] 
[INFO] 
[INFO] Building Spark Remote Client 2.2.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ spark-client ---
[INFO] Deleting 
/data/hive-ptest/working/apache-github-source-source/spark-client/target
[INFO] Deleting 
/data/hive-ptest/working/apache-github-source-source/spark-client (includes = 
[datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ 
spark-client ---
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ spark-client 
---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ 
spark-client ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-github-source-source/spark-client/src/main/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ spark-client ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ spark-client 
---
[INFO] Compiling 28 source files to 
/data/hive-ptest/working/apache-github-source-source/spark-client/target/classes
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java:
 
/data/hive-ptest/working/apache-github-source-source/spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java
 uses or overrides a deprecated API.
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/sp

[jira] [Updated] (HIVE-14146) Column comments with "\n" character "corrupts" table metadata

2016-07-08 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-14146:
--
Attachment: HIVE-14146.5.patch

Cat on the computer - some extra character in the patch

> Column comments with "\n" character "corrupts" table metadata
> -
>
> Key: HIVE-14146
> URL: https://issues.apache.org/jira/browse/HIVE-14146
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-14146.2.patch, HIVE-14146.3.patch, 
> HIVE-14146.4.patch, HIVE-14146.5.patch, HIVE-14146.patch
>
>
> Create a table with the following(noting the \n in the COMMENT):
> {noformat}
> CREATE TABLE commtest(first_nm string COMMENT 'Indicates First name\nof an 
> individual’);
> {noformat}
> Describe shows that now the metadata is messed up:
> {noformat}
> beeline> describe commtest;
> +---++---+--+
> | col_name  | data_type  |comment|
> +---++---+--+
> | first_nm | string   | Indicates First name  |
> | of an individual  | NULL   | NULL  |
> +---++---+--+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14195) HiveMetaStoreClient getFunction() does not throw NoSuchObjectException

2016-07-08 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-14195:
--
Attachment: HIVE-14195.patch

The NoSuchObject is handled differently than other exceptions

> HiveMetaStoreClient getFunction() does not throw NoSuchObjectException
> --
>
> Key: HIVE-14195
> URL: https://issues.apache.org/jira/browse/HIVE-14195
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-14195.patch
>
>
> HiveMetaStoreClient getFunction(dbName, funcName) does not throw 
> NoSuchObjectException when no function with funcName exists in the db. 
> Instead, I need to search the MetaException message for 
> 'NoSuchObjectException'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14195) HiveMetaStoreClient getFunction() does not throw NoSuchObjectException

2016-07-08 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-14195:
--
Status: Patch Available  (was: Open)

> HiveMetaStoreClient getFunction() does not throw NoSuchObjectException
> --
>
> Key: HIVE-14195
> URL: https://issues.apache.org/jira/browse/HIVE-14195
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-14195.patch
>
>
> HiveMetaStoreClient getFunction(dbName, funcName) does not throw 
> NoSuchObjectException when no function with funcName exists in the db. 
> Instead, I need to search the MetaException message for 
> 'NoSuchObjectException'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14195) HiveMetaStoreClient getFunction() does not throw NoSuchObjectException

2016-07-08 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-14195:
--
Priority: Minor  (was: Major)

> HiveMetaStoreClient getFunction() does not throw NoSuchObjectException
> --
>
> Key: HIVE-14195
> URL: https://issues.apache.org/jira/browse/HIVE-14195
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Attachments: HIVE-14195.patch
>
>
> HiveMetaStoreClient getFunction(dbName, funcName) does not throw 
> NoSuchObjectException when no function with funcName exists in the db. 
> Instead, I need to search the MetaException message for 
> 'NoSuchObjectException'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-14190) Investigate "Getting log thread is interrupted, since query is done"

2016-07-08 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu resolved HIVE-14190.
-
Resolution: Not A Problem
  Assignee: (was: Aihua Xu)

Seems it's expected for debug logging.

> Investigate "Getting log thread is interrupted, since query is done"
> 
>
> Key: HIVE-14190
> URL: https://issues.apache.org/jira/browse/HIVE-14190
> Project: Hive
>  Issue Type: Improvement
>  Components: Logging
>Reporter: Aihua Xu
>
> Notice the error "Getting log thread is interrupted, since query is done". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14192) False positive error due to thrift

2016-07-08 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14192:
--
Status: Open  (was: Patch Available)

> False positive error due to thrift
> --
>
> Key: HIVE-14192
> URL: https://issues.apache.org/jira/browse/HIVE-14192
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 2.1.0, 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14192.2.patch, HIVE-14192.patch
>
>
> Given Thrift definition like this
> {noformat}
> struct LockComponent {
> 1: required LockType type,
> 2: required LockLevel level,
> 3: required string dbname,
> 4: optional string tablename,
> 5: optional string partitionname,
> 6: optional DataOperationType operationType = DataOperationType.UNSET,
> 7: optional bool isAcid = false
> }
> {noformat}
> The generated LockComponent has 
> {noformat}
>   public LockComponent() {
> this.operationType = 
> org.apache.hadoop.hive.metastore.api.DataOperationType.UNSET;
> this.isAcid = false;
>   }
>   public boolean isSetOperationType() {
> return this.operationType != null;
>   }
>   public boolean isSetIsAcid() {
> return EncodingUtils.testBit(__isset_bitfield, __ISACID_ISSET_ID);
>   }
> {noformat}
> So bottom line is even if LockComponent is created by old version of the 
> client which doesn't have operationType filed, isSetOperationType() will 
> still return true on the server.
> This causes a false positive exception in TxnHandler.enqueueLockWithRetry() 
> during Rolling Upgrade scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14192) False positive error due to thrift

2016-07-08 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14192:
--
Attachment: HIVE-14192.2.patch

> False positive error due to thrift
> --
>
> Key: HIVE-14192
> URL: https://issues.apache.org/jira/browse/HIVE-14192
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 1.3.0, 2.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14192.2.patch, HIVE-14192.patch
>
>
> Given Thrift definition like this
> {noformat}
> struct LockComponent {
> 1: required LockType type,
> 2: required LockLevel level,
> 3: required string dbname,
> 4: optional string tablename,
> 5: optional string partitionname,
> 6: optional DataOperationType operationType = DataOperationType.UNSET,
> 7: optional bool isAcid = false
> }
> {noformat}
> The generated LockComponent has 
> {noformat}
>   public LockComponent() {
> this.operationType = 
> org.apache.hadoop.hive.metastore.api.DataOperationType.UNSET;
> this.isAcid = false;
>   }
>   public boolean isSetOperationType() {
> return this.operationType != null;
>   }
>   public boolean isSetIsAcid() {
> return EncodingUtils.testBit(__isset_bitfield, __ISACID_ISSET_ID);
>   }
> {noformat}
> So bottom line is even if LockComponent is created by old version of the 
> client which doesn't have operationType filed, isSetOperationType() will 
> still return true on the server.
> This causes a false positive exception in TxnHandler.enqueueLockWithRetry() 
> during Rolling Upgrade scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14192) False positive error due to thrift

2016-07-08 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14192:
--
Status: Patch Available  (was: Open)

> False positive error due to thrift
> --
>
> Key: HIVE-14192
> URL: https://issues.apache.org/jira/browse/HIVE-14192
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 2.1.0, 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14192.2.patch, HIVE-14192.patch
>
>
> Given Thrift definition like this
> {noformat}
> struct LockComponent {
> 1: required LockType type,
> 2: required LockLevel level,
> 3: required string dbname,
> 4: optional string tablename,
> 5: optional string partitionname,
> 6: optional DataOperationType operationType = DataOperationType.UNSET,
> 7: optional bool isAcid = false
> }
> {noformat}
> The generated LockComponent has 
> {noformat}
>   public LockComponent() {
> this.operationType = 
> org.apache.hadoop.hive.metastore.api.DataOperationType.UNSET;
> this.isAcid = false;
>   }
>   public boolean isSetOperationType() {
> return this.operationType != null;
>   }
>   public boolean isSetIsAcid() {
> return EncodingUtils.testBit(__isset_bitfield, __ISACID_ISSET_ID);
>   }
> {noformat}
> So bottom line is even if LockComponent is created by old version of the 
> client which doesn't have operationType filed, isSetOperationType() will 
> still return true on the server.
> This causes a false positive exception in TxnHandler.enqueueLockWithRetry() 
> during Rolling Upgrade scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14192) False positive error due to thrift

2016-07-08 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367980#comment-15367980
 ] 

Eugene Koifman commented on HIVE-14192:
---

[~wzheng] could you review please

> False positive error due to thrift
> --
>
> Key: HIVE-14192
> URL: https://issues.apache.org/jira/browse/HIVE-14192
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 1.3.0, 2.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14192.2.patch, HIVE-14192.patch
>
>
> Given Thrift definition like this
> {noformat}
> struct LockComponent {
> 1: required LockType type,
> 2: required LockLevel level,
> 3: required string dbname,
> 4: optional string tablename,
> 5: optional string partitionname,
> 6: optional DataOperationType operationType = DataOperationType.UNSET,
> 7: optional bool isAcid = false
> }
> {noformat}
> The generated LockComponent has 
> {noformat}
>   public LockComponent() {
> this.operationType = 
> org.apache.hadoop.hive.metastore.api.DataOperationType.UNSET;
> this.isAcid = false;
>   }
>   public boolean isSetOperationType() {
> return this.operationType != null;
>   }
>   public boolean isSetIsAcid() {
> return EncodingUtils.testBit(__isset_bitfield, __ISACID_ISSET_ID);
>   }
> {noformat}
> So bottom line is even if LockComponent is created by old version of the 
> client which doesn't have operationType filed, isSetOperationType() will 
> still return true on the server.
> This causes a false positive exception in TxnHandler.enqueueLockWithRetry() 
> during Rolling Upgrade scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14114) Ensure RecordWriter in streaming API is using the same UserGroupInformation as StreamingConnection

2016-07-08 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14114:
--
Target Version/s: 2.1.0, 1.3.0, 2.2.0  (was: 1.3.0, 2.1.0, 2.2.0)
  Status: Open  (was: Patch Available)

> Ensure RecordWriter in streaming API is using the same UserGroupInformation 
> as StreamingConnection
> --
>
> Key: HIVE-14114
> URL: https://issues.apache.org/jira/browse/HIVE-14114
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14114.2.patch, HIVE-14114.3.patch, HIVE-14114.patch
>
>
> currently both DelimitedInputWriter and StrictJsonWriter perform some 
> Metastore access operations but without using UGI created by the caller for 
> Metastore operations made by matching StreamingConnection & TransactionBatch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14114) Ensure RecordWriter in streaming API is using the same UserGroupInformation as StreamingConnection

2016-07-08 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14114:
--
Target Version/s: 2.1.0, 1.3.0, 2.2.0  (was: 1.3.0, 2.1.0, 2.2.0)
  Status: Patch Available  (was: Open)

> Ensure RecordWriter in streaming API is using the same UserGroupInformation 
> as StreamingConnection
> --
>
> Key: HIVE-14114
> URL: https://issues.apache.org/jira/browse/HIVE-14114
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14114.2.patch, HIVE-14114.3.patch, 
> HIVE-14114.4.patch, HIVE-14114.patch
>
>
> currently both DelimitedInputWriter and StrictJsonWriter perform some 
> Metastore access operations but without using UGI created by the caller for 
> Metastore operations made by matching StreamingConnection & TransactionBatch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14114) Ensure RecordWriter in streaming API is using the same UserGroupInformation as StreamingConnection

2016-07-08 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14114:
--
Attachment: HIVE-14114.4.patch

patch 4 is identical to 3 - to trigger bot run

> Ensure RecordWriter in streaming API is using the same UserGroupInformation 
> as StreamingConnection
> --
>
> Key: HIVE-14114
> URL: https://issues.apache.org/jira/browse/HIVE-14114
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14114.2.patch, HIVE-14114.3.patch, 
> HIVE-14114.4.patch, HIVE-14114.patch
>
>
> currently both DelimitedInputWriter and StrictJsonWriter perform some 
> Metastore access operations but without using UGI created by the caller for 
> Metastore operations made by matching StreamingConnection & TransactionBatch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14192) False positive error due to thrift

2016-07-08 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367999#comment-15367999
 ] 

Wei Zheng commented on HIVE-14192:
--

patch looks good. Just have a question, why do we need default value for 
operationType?

> False positive error due to thrift
> --
>
> Key: HIVE-14192
> URL: https://issues.apache.org/jira/browse/HIVE-14192
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 1.3.0, 2.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14192.2.patch, HIVE-14192.patch
>
>
> Given Thrift definition like this
> {noformat}
> struct LockComponent {
> 1: required LockType type,
> 2: required LockLevel level,
> 3: required string dbname,
> 4: optional string tablename,
> 5: optional string partitionname,
> 6: optional DataOperationType operationType = DataOperationType.UNSET,
> 7: optional bool isAcid = false
> }
> {noformat}
> The generated LockComponent has 
> {noformat}
>   public LockComponent() {
> this.operationType = 
> org.apache.hadoop.hive.metastore.api.DataOperationType.UNSET;
> this.isAcid = false;
>   }
>   public boolean isSetOperationType() {
> return this.operationType != null;
>   }
>   public boolean isSetIsAcid() {
> return EncodingUtils.testBit(__isset_bitfield, __ISACID_ISSET_ID);
>   }
> {noformat}
> So bottom line is even if LockComponent is created by old version of the 
> client which doesn't have operationType filed, isSetOperationType() will 
> still return true on the server.
> This causes a false positive exception in TxnHandler.enqueueLockWithRetry() 
> during Rolling Upgrade scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14159) sorting of tuple array using multiple field[s]

2016-07-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368020#comment-15368020
 ] 

Hive QA commented on HIVE-14159:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12816819/HIVE-14159.3.patch

{color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10308 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/427/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/427/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-427/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12816819 - PreCommit-HIVE-MASTER-Build

> sorting of tuple array using multiple field[s]
> --
>
> Key: HIVE-14159
> URL: https://issues.apache.org/jira/browse/HIVE-14159
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Simanchal Das
>Assignee: Simanchal Das
>  Labels: patch
> Attachments: HIVE-14159.1.patch, HIVE-14159.2.patch, 
> HIVE-14159.3.patch
>
>
> Problem Statement:
> When we are working with complex structure of data like avro.
> Most of the times we are encountering array contains multiple tuples and each 
> tuple have struct schema.
> Suppose here struct schema is like below:
> {noformat}
> {
>   "name": "employee",
>   "type": [{
>   "type": "record",
>   "name": "Employee",
>   "namespace": "com.company.Employee",
>   "fields": [{
>   "name": "empId",
>   "type": "int"
>   }, {
>   "name": "empName",
>   "type": "string"
>   }, {
>   "name": "age",
>   "type": "int"
>   }, {
>   "name": "salary",
>   "type": "double"
>   }]
>   }]
> }
> {noformat}
> Then while running our hive query complex array looks like array of employee 
> objects.
> {noformat}
> Example: 
>   //(array>)
>   
> Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]
> {noformat}
> When we are implementing business use cases day to day life we are 
> encountering problems like sorting a tuple array by specific field[s] like 
> empId,name,salary,etc by ASC or DESC order.
> Proposal:
> I have developed a udf 'sort_array_by' which will sort a tuple array by one 
> or more fields in ASC or DESC order provided by user ,default is ascending 
> order .
> {noformat}
> Example:
>   1.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary","ASC");
>   output: 
> array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]
>   
>   2.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","ASC");
>   output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
>   3.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age,"ASC");
>   output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14114) Ensure RecordWriter in streaming API is using the same UserGroupInformation as StreamingConnection

2016-07-08 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368026#comment-15368026
 ] 

Wei Zheng commented on HIVE-14114:
--

patch 4 looks good. +1

> Ensure RecordWriter in streaming API is using the same UserGroupInformation 
> as StreamingConnection
> --
>
> Key: HIVE-14114
> URL: https://issues.apache.org/jira/browse/HIVE-14114
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14114.2.patch, HIVE-14114.3.patch, 
> HIVE-14114.4.patch, HIVE-14114.patch
>
>
> currently both DelimitedInputWriter and StrictJsonWriter perform some 
> Metastore access operations but without using UGI created by the caller for 
> Metastore operations made by matching StreamingConnection & TransactionBatch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14192) False positive error due to thrift

2016-07-08 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368027#comment-15368027
 ] 

Eugene Koifman commented on HIVE-14192:
---

The idea was that the "default" value is never the right value.  It's set when 
the object (in 2.1) is created and the caller is expected to set the 
appropriate value for the given context.  Then the assert on the server side 
check to see if it gets an object with "default" value and thus it knows that 
there is a bug somewhere.  I wanted to rely on isSetOperationType() to know if 
the message is sent by a buggy 2.1 client or by an older client (which doesn't 
set this at all).  Unfortunately, this is not how isSetOperationType() behaves 
- thus there is no way to tell a message from old client and buggy 2.1 client.  
That's why I changed the assert to only apply in test mode where we know there 
is only 1 version of Thrift objects involved.

> False positive error due to thrift
> --
>
> Key: HIVE-14192
> URL: https://issues.apache.org/jira/browse/HIVE-14192
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 1.3.0, 2.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14192.2.patch, HIVE-14192.patch
>
>
> Given Thrift definition like this
> {noformat}
> struct LockComponent {
> 1: required LockType type,
> 2: required LockLevel level,
> 3: required string dbname,
> 4: optional string tablename,
> 5: optional string partitionname,
> 6: optional DataOperationType operationType = DataOperationType.UNSET,
> 7: optional bool isAcid = false
> }
> {noformat}
> The generated LockComponent has 
> {noformat}
>   public LockComponent() {
> this.operationType = 
> org.apache.hadoop.hive.metastore.api.DataOperationType.UNSET;
> this.isAcid = false;
>   }
>   public boolean isSetOperationType() {
> return this.operationType != null;
>   }
>   public boolean isSetIsAcid() {
> return EncodingUtils.testBit(__isset_bitfield, __ISACID_ISSET_ID);
>   }
> {noformat}
> So bottom line is even if LockComponent is created by old version of the 
> client which doesn't have operationType filed, isSetOperationType() will 
> still return true on the server.
> This causes a false positive exception in TxnHandler.enqueueLockWithRetry() 
> during Rolling Upgrade scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-08 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Target Version/s: 1.3.0, 2.2.0, 2.1.1  (was: 1.3.0, 2.2.0, 2.1.1, 2.0.2)

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch
>
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
> happen with auto commit mode, but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  
> ==
> Here is a more concrete example.  Let's say the file for table A are as 
> follows and created in the order listed.
> delta_4_4
> delta_5_5
> delta_4_5
> base_5
> delta_16_16
> delta_17_17
> base_17  (for example user ran major compaction)
> let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
> and ExceptionList=<16>
> Assume that all txns <= 20 commit.
> Reader can't use base_17 because it has result of txn16.  So it should chose 
> base_5 "TxnBase bestBase" in _getChildState()_.
> Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
> delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
> for such reader.
> The issue is if at the same time the Cleaner process is running.  It will see 
> everything with txnid<17 as obsolete.  Then it will check lock manger state 
> and decide to delete (as there may not be any locks in LM for table A).  The 
> order in which the files are deleted is undefined right now.  It may delete 
> delta_16_16 and delta_17_17 first and right at this moment the read request 
> with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by 
> some multi-stmt txn that started some time ago or (at least in theory) by 
> autoCommit one due to long GC pause for example.   It acquires locks after 
> the Cleaner checks LM state and calls getAcidState(). This request will 
> choose base_5 but it won't see delta_16_16 and delta_17_17 and thus return 
> the snapshot w/o modifications made by those txns.
> This is a subtle race condition but possible.
> 1. So the safest thing to do to ensure correctness is to use the latest 
> base_x as the "best" and check against exceptions in ValidTxnList and throw 
> an exception if there is an exception <=x.
> 2. A better option is to keep 2 exception lists: aborted and open and only 
> throw if there is an open txn <=x.  Compaction throws away data from aborted 
> txns and thus there is no harm using base with aborted txns in its range.
> 3. You could make each txn record the lowest open txn id at its start and 
> prevent the cleaner from cleaning anything delta with id range that includes 
> this open txn id for any txn that is still running.  This has a drawback of 
> potentially delaying GC of old files for arbitrarily long periods.  So this 
> should be a user config choice.   The implementation is not trivial.
> I would go with 1 now and do 2/3 together with multi-statement txn work.
> Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
> the other



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14192) False positive error due to thrift

2016-07-08 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368039#comment-15368039
 ] 

Wei Zheng commented on HIVE-14192:
--

OK, thanks for the explanation. +1

> False positive error due to thrift
> --
>
> Key: HIVE-14192
> URL: https://issues.apache.org/jira/browse/HIVE-14192
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 1.3.0, 2.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14192.2.patch, HIVE-14192.patch
>
>
> Given Thrift definition like this
> {noformat}
> struct LockComponent {
> 1: required LockType type,
> 2: required LockLevel level,
> 3: required string dbname,
> 4: optional string tablename,
> 5: optional string partitionname,
> 6: optional DataOperationType operationType = DataOperationType.UNSET,
> 7: optional bool isAcid = false
> }
> {noformat}
> The generated LockComponent has 
> {noformat}
>   public LockComponent() {
> this.operationType = 
> org.apache.hadoop.hive.metastore.api.DataOperationType.UNSET;
> this.isAcid = false;
>   }
>   public boolean isSetOperationType() {
> return this.operationType != null;
>   }
>   public boolean isSetIsAcid() {
> return EncodingUtils.testBit(__isset_bitfield, __ISACID_ISSET_ID);
>   }
> {noformat}
> So bottom line is even if LockComponent is created by old version of the 
> client which doesn't have operationType filed, isSetOperationType() will 
> still return true on the server.
> This causes a false positive exception in TxnHandler.enqueueLockWithRetry() 
> during Rolling Upgrade scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-14038) miscellaneous acid improvements

2016-07-08 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman resolved HIVE-14038.
---
Resolution: Fixed

> miscellaneous acid improvements
> ---
>
> Key: HIVE-14038
> URL: https://issues.apache.org/jira/browse/HIVE-14038
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Fix For: 1.3.0, 2.2.0, 2.1.1
>
> Attachments: HIVE-14038.2.patch, HIVE-14038.3.patch, 
> HIVE-14038.8.patch, HIVE-14038.patch
>
>
> 1. fix thread name inHouseKeeperServiceBase (currently they are all 
> "org.apache.hadoop.hive.ql.txn.compactor.HouseKeeperServiceBase$1-0")
> 2. dump metastore configs from HiveConf on start up to help record values of 
> properties
> 3. add some tests



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14197) LLAP service driver precondition failure should include the values

2016-07-08 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-14197:


Assignee: Prasanth Jayachandran

> LLAP service driver precondition failure should include the values
> --
>
> Key: HIVE-14197
> URL: https://issues.apache.org/jira/browse/HIVE-14197
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> LLAP service driver's precondition failure message are like below
> {code}
> Working memory + cache has to be smaller than the container sizing
> {code}
> It will be better to include the actual values for the sizes in the 
> precondition failure message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor

2016-07-08 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13934:
-
Status: Open  (was: Patch Available)

> Configure Tez to make nocondiional task size memory available for the 
> Processor
> ---
>
> Key: HIVE-13934
> URL: https://issues.apache.org/jira/browse/HIVE-13934
> Project: Hive
>  Issue Type: Bug
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13934.1.patch, HIVE-13934.2.patch, 
> HIVE-13934.3.patch, HIVE-13934.4.patch, HIVE-13934.6.patch
>
>
> Currently, noconditionaltasksize is not validated against the container size, 
> the reservations made in the container by Tez for Inputs / Outputs etc.
> Check this at compile time to see if enough memory is available, or set up 
> the vertex to reserve additional memory for the Processor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor

2016-07-08 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13934:
-
Attachment: HIVE-13934.6.patch

> Configure Tez to make nocondiional task size memory available for the 
> Processor
> ---
>
> Key: HIVE-13934
> URL: https://issues.apache.org/jira/browse/HIVE-13934
> Project: Hive
>  Issue Type: Bug
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13934.1.patch, HIVE-13934.2.patch, 
> HIVE-13934.3.patch, HIVE-13934.4.patch, HIVE-13934.6.patch
>
>
> Currently, noconditionaltasksize is not validated against the container size, 
> the reservations made in the container by Tez for Inputs / Outputs etc.
> Check this at compile time to see if enough memory is available, or set up 
> the vertex to reserve additional memory for the Processor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor

2016-07-08 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13934:
-
Attachment: (was: HIVE-13934.6.patch)

> Configure Tez to make nocondiional task size memory available for the 
> Processor
> ---
>
> Key: HIVE-13934
> URL: https://issues.apache.org/jira/browse/HIVE-13934
> Project: Hive
>  Issue Type: Bug
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13934.1.patch, HIVE-13934.2.patch, 
> HIVE-13934.3.patch, HIVE-13934.4.patch, HIVE-13934.6.patch
>
>
> Currently, noconditionaltasksize is not validated against the container size, 
> the reservations made in the container by Tez for Inputs / Outputs etc.
> Check this at compile time to see if enough memory is available, or set up 
> the vertex to reserve additional memory for the Processor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor

2016-07-08 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13934:
-
Status: Patch Available  (was: Open)

> Configure Tez to make nocondiional task size memory available for the 
> Processor
> ---
>
> Key: HIVE-13934
> URL: https://issues.apache.org/jira/browse/HIVE-13934
> Project: Hive
>  Issue Type: Bug
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13934.1.patch, HIVE-13934.2.patch, 
> HIVE-13934.3.patch, HIVE-13934.4.patch, HIVE-13934.6.patch
>
>
> Currently, noconditionaltasksize is not validated against the container size, 
> the reservations made in the container by Tez for Inputs / Outputs etc.
> Check this at compile time to see if enough memory is available, or set up 
> the vertex to reserve additional memory for the Processor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13901) Hivemetastore add partitions can be slow depending on filesystems

2016-07-08 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-13901:

Status: Open  (was: Patch Available)

> Hivemetastore add partitions can be slow depending on filesystems
> -
>
> Key: HIVE-13901
> URL: https://issues.apache.org/jira/browse/HIVE-13901
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-13901.1.patch, HIVE-13901.2.patch, 
> HIVE-13901.6.patch, HIVE-13901.7.patch, HIVE-13901.8.patch, HIVE-13901.9.patch
>
>
> Depending on FS, creating external tables & adding partitions can be 
> expensive (e.g msck which adds all partitions).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13901) Hivemetastore add partitions can be slow depending on filesystems

2016-07-08 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-13901:

Status: Patch Available  (was: Open)

> Hivemetastore add partitions can be slow depending on filesystems
> -
>
> Key: HIVE-13901
> URL: https://issues.apache.org/jira/browse/HIVE-13901
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-13901.1.patch, HIVE-13901.2.patch, 
> HIVE-13901.6.patch, HIVE-13901.7.patch, HIVE-13901.8.patch, HIVE-13901.9.patch
>
>
> Depending on FS, creating external tables & adding partitions can be 
> expensive (e.g msck which adds all partitions).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14158) deal with derived column names

2016-07-08 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14158:
---
Attachment: HIVE-14158.03.patch

> deal with derived column names
> --
>
> Key: HIVE-14158
> URL: https://issues.apache.org/jira/browse/HIVE-14158
> Project: Hive
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-14158.01.patch, HIVE-14158.02.patch, 
> HIVE-14158.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13901) Hivemetastore add partitions can be slow depending on filesystems

2016-07-08 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-13901:

Attachment: HIVE-13901.9.patch

> Hivemetastore add partitions can be slow depending on filesystems
> -
>
> Key: HIVE-13901
> URL: https://issues.apache.org/jira/browse/HIVE-13901
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-13901.1.patch, HIVE-13901.2.patch, 
> HIVE-13901.6.patch, HIVE-13901.7.patch, HIVE-13901.8.patch, HIVE-13901.9.patch
>
>
> Depending on FS, creating external tables & adding partitions can be 
> expensive (e.g msck which adds all partitions).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14158) deal with derived column names

2016-07-08 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14158:
---
Status: Open  (was: Patch Available)

> deal with derived column names
> --
>
> Key: HIVE-14158
> URL: https://issues.apache.org/jira/browse/HIVE-14158
> Project: Hive
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-14158.01.patch, HIVE-14158.02.patch, 
> HIVE-14158.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14158) deal with derived column names

2016-07-08 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14158:
---
Status: Patch Available  (was: Open)

> deal with derived column names
> --
>
> Key: HIVE-14158
> URL: https://issues.apache.org/jira/browse/HIVE-14158
> Project: Hive
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-14158.01.patch, HIVE-14158.02.patch, 
> HIVE-14158.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14178) Hive::needsToCopy should reuse FileUtils::equalsFileSystem

2016-07-08 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-14178:

Status: Open  (was: Patch Available)

> Hive::needsToCopy should reuse FileUtils::equalsFileSystem
> --
>
> Key: HIVE-14178
> URL: https://issues.apache.org/jira/browse/HIVE-14178
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.0, 1.2.1, 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-14178.1.patch
>
>
> Clear bug triggered from missing FS checks in Hive.java
> {code}
> //Check if different FileSystems
> if (!srcFs.getClass().equals(destFs.getClass()))
> { 
> return true;
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14178) Hive::needsToCopy should reuse FileUtils::equalsFileSystem

2016-07-08 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-14178:

Attachment: HIVE-14178.2.patch

> Hive::needsToCopy should reuse FileUtils::equalsFileSystem
> --
>
> Key: HIVE-14178
> URL: https://issues.apache.org/jira/browse/HIVE-14178
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1, 2.1.0, 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-14178.1.patch, HIVE-14178.2.patch
>
>
> Clear bug triggered from missing FS checks in Hive.java
> {code}
> //Check if different FileSystems
> if (!srcFs.getClass().equals(destFs.getClass()))
> { 
> return true;
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14178) Hive::needsToCopy should reuse FileUtils::equalsFileSystem

2016-07-08 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-14178:

Status: Patch Available  (was: Open)

> Hive::needsToCopy should reuse FileUtils::equalsFileSystem
> --
>
> Key: HIVE-14178
> URL: https://issues.apache.org/jira/browse/HIVE-14178
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.0, 1.2.1, 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-14178.1.patch, HIVE-14178.2.patch
>
>
> Clear bug triggered from missing FS checks in Hive.java
> {code}
> //Check if different FileSystems
> if (!srcFs.getClass().equals(destFs.getClass()))
> { 
> return true;
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14196) Exclude LLAP IO complex types test

2016-07-08 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368135#comment-15368135
 ] 

Sergey Shelukhin commented on HIVE-14196:
-

Rather we should disable IO for complex type until it's fixed.

> Exclude LLAP IO complex types test
> --
>
> Key: HIVE-14196
> URL: https://issues.apache.org/jira/browse/HIVE-14196
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> Let's exclude vector_complex_* tests added for llap which is currently broken 
> and fails in all test runs. We can re-enable it with HIVE-14089 patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14195) HiveMetaStoreClient getFunction() does not throw NoSuchObjectException

2016-07-08 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368139#comment-15368139
 ] 

Sergey Shelukhin commented on HIVE-14195:
-

+1 pending tests

> HiveMetaStoreClient getFunction() does not throw NoSuchObjectException
> --
>
> Key: HIVE-14195
> URL: https://issues.apache.org/jira/browse/HIVE-14195
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Attachments: HIVE-14195.patch
>
>
> HiveMetaStoreClient getFunction(dbName, funcName) does not throw 
> NoSuchObjectException when no function with funcName exists in the db. 
> Instead, I need to search the MetaException message for 
> 'NoSuchObjectException'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14158) deal with derived column names

2016-07-08 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14158:
---
Attachment: (was: HIVE-14158.03.patch)

> deal with derived column names
> --
>
> Key: HIVE-14158
> URL: https://issues.apache.org/jira/browse/HIVE-14158
> Project: Hive
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-14158.01.patch, HIVE-14158.02.patch, 
> HIVE-14158.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14196) Exclude LLAP IO complex types test

2016-07-08 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368151#comment-15368151
 ] 

Prasanth Jayachandran commented on HIVE-14196:
--

Yeah. That's better. Will put up patch.

> Exclude LLAP IO complex types test
> --
>
> Key: HIVE-14196
> URL: https://issues.apache.org/jira/browse/HIVE-14196
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> Let's exclude vector_complex_* tests added for llap which is currently broken 
> and fails in all test runs. We can re-enable it with HIVE-14089 patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13930) upgrade Hive to latest Hadoop version

2016-07-08 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368157#comment-15368157
 ] 

Ashutosh Chauhan commented on HIVE-13930:
-

[~sershe] TestEncrytedHDFSCliDriver failures do look legit.

> upgrade Hive to latest Hadoop version
> -
>
> Key: HIVE-13930
> URL: https://issues.apache.org/jira/browse/HIVE-13930
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13930.01.patch, HIVE-13930.02.patch, 
> HIVE-13930.03.patch, HIVE-13930.04.patch, HIVE-13930.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used

2016-07-08 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-14170:

Status: Patch Available  (was: In Progress)

> Beeline IncrementalRows should buffer rows and incrementally re-calculate 
> width if TableOutputFormat is used
> 
>
> Key: HIVE-14170
> URL: https://issues.apache.org/jira/browse/HIVE-14170
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14170.1.patch
>
>
> If {{--incremental}} is specified in Beeline, rows are meant to be printed 
> out immediately. However, if {{TableOutputFormat}} is used with this option 
> the formatting can look really off.
> The reason is that {{IncrementalRows}} does not do a global calculation of 
> the optimal width size for {{TableOutputFormat}} (it can't because it only 
> sees one row at a time). The output of {{BufferedRows}} looks much better 
> because it can do this global calculation.
> If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width 
> should be re-calculated every "x" rows ("x" can be configurable and by 
> default it can be 1000).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used

2016-07-08 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-14170 started by Sahil Takiar.
---
> Beeline IncrementalRows should buffer rows and incrementally re-calculate 
> width if TableOutputFormat is used
> 
>
> Key: HIVE-14170
> URL: https://issues.apache.org/jira/browse/HIVE-14170
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14170.1.patch
>
>
> If {{--incremental}} is specified in Beeline, rows are meant to be printed 
> out immediately. However, if {{TableOutputFormat}} is used with this option 
> the formatting can look really off.
> The reason is that {{IncrementalRows}} does not do a global calculation of 
> the optimal width size for {{TableOutputFormat}} (it can't because it only 
> sees one row at a time). The output of {{BufferedRows}} looks much better 
> because it can do this global calculation.
> If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width 
> should be re-calculated every "x" rows ("x" can be configurable and by 
> default it can be 1000).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14137) Hive on Spark throws FileAlreadyExistsException for jobs with multiple empty tables

2016-07-08 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-14137:

Attachment: HIVE-14137.4.patch

> Hive on Spark throws FileAlreadyExistsException for jobs with multiple empty 
> tables
> ---
>
> Key: HIVE-14137
> URL: https://issues.apache.org/jira/browse/HIVE-14137
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14137.1.patch, HIVE-14137.2.patch, 
> HIVE-14137.3.patch, HIVE-14137.4.patch, HIVE-14137.patch
>
>
> The following queries:
> {code}
> -- Setup
> drop table if exists empty1;
> create table empty1 (col1 bigint) stored as parquet tblproperties 
> ('parquet.compress'='snappy');
> drop table if exists empty2;
> create table empty2 (col1 bigint, col2 bigint) stored as parquet 
> tblproperties ('parquet.compress'='snappy');
> drop table if exists empty3;
> create table empty3 (col1 bigint) stored as parquet tblproperties 
> ('parquet.compress'='snappy');
> -- All empty HDFS directories.
> -- Fails with [08S01]: Error while processing statement: FAILED: Execution 
> Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask.
> select empty1.col1
> from empty1
> inner join empty2
> on empty2.col1 = empty1.col1
> inner join empty3
> on empty3.col1 = empty2.col2;
> -- Two empty HDFS directories.
> -- Create an empty file in HDFS.
> insert into empty1 select * from empty1 where false;
> -- Same query fails with [08S01]: Error while processing statement: FAILED: 
> Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask.
> select empty1.col1
> from empty1
> inner join empty2
> on empty2.col1 = empty1.col1
> inner join empty3
> on empty3.col1 = empty2.col2;
> -- One empty HDFS directory.
> -- Create an empty file in HDFS.
> insert into empty2 select * from empty2 where false;
> -- Same query succeeds.
> select empty1.col1
> from empty1
> inner join empty2
> on empty2.col1 = empty1.col1
> inner join empty3
> on empty3.col1 = empty2.col2;
> {code}
> Will result in the following exception:
> {code}
> org.apache.hadoop.fs.FileAlreadyExistsException: 
> /tmp/hive/hive/1f3837aa-9407-4780-92b1-42a66d205139/hive_2016-06-24_15-45-23_206_79177714958655528-2/-mr-10004/0/emptyFile
>  for client 172.26.14.151 already exists
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2784)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2676)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2561)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:593)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:111)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:393)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1902)
>   at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1738)
>   at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1663)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:405)
> 

[jira] [Commented] (HIVE-14184) Adding test for limit pushdown in presence of grouping sets

2016-07-08 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368166#comment-15368166
 ] 

Ashutosh Chauhan commented on HIVE-14184:
-

+1

> Adding test for limit pushdown in presence of grouping sets
> ---
>
> Key: HIVE-14184
> URL: https://issues.apache.org/jira/browse/HIVE-14184
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14184.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13930) upgrade Hive to latest Hadoop version

2016-07-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368177#comment-15368177
 ] 

Sergio Peña commented on HIVE-13930:


I am investigating about the jar and how to build it. 

> upgrade Hive to latest Hadoop version
> -
>
> Key: HIVE-13930
> URL: https://issues.apache.org/jira/browse/HIVE-13930
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13930.01.patch, HIVE-13930.02.patch, 
> HIVE-13930.03.patch, HIVE-13930.04.patch, HIVE-13930.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14147) Hive PPD might remove predicates when they are defined as a simple node e.g. "WHERE pred"

2016-07-08 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368178#comment-15368178
 ] 

Ashutosh Chauhan commented on HIVE-14147:
-

+1

> Hive PPD might remove predicates when they are defined as a simple node e.g. 
> "WHERE pred"
> -
>
> Key: HIVE-14147
> URL: https://issues.apache.org/jira/browse/HIVE-14147
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
> Attachments: HIVE-14147.01.patch, HIVE-14147.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11402) HS2 - disallow parallel query execution within a single Session

2016-07-08 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368188#comment-15368188
 ] 

Aihua Xu commented on HIVE-11402:
-

Just worried how useful that configuration will be since mostly the users will 
use mixed HUE, beeline, etc, I guess. So we may have to keep the default. But 
seems this is safe to add.

+1.

> HS2 - disallow parallel query execution within a single Session
> ---
>
> Key: HIVE-11402
> URL: https://issues.apache.org/jira/browse/HIVE-11402
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Thejas M Nair
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11402.01.patch, HIVE-11402.02.patch, 
> HIVE-11402.patch
>
>
> HiveServer2 currently allows concurrent queries to be run in a single 
> session. However, every HS2 session has  an associated SessionState object, 
> and the use of SessionState in many places assumes that only one thread is 
> using it, ie it is not thread safe.
> There are many places where SesssionState thread safety needs to be 
> addressed, and until then we should serialize all query execution for a 
> single HS2 session. -This problem can become more visible with HIVE-4239 now 
> allowing parallel query compilation.-
> Note that running queries in parallel for single session is not 
> straightforward  with jdbc, you need to spawn another thread as the 
> Statement.execute calls are blocking. I believe ODBC has non blocking query 
> execution API, and Hue is another well known application that shares sessions 
> for all queries that a user runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14173) NPE was thrown after enabling directsql in the middle of session

2016-07-08 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-14173:
---
Attachment: HIVE-14173.patch

To kickoff a new precommit build

> NPE was thrown after enabling directsql in the middle of session
> 
>
> Key: HIVE-14173
> URL: https://issues.apache.org/jira/browse/HIVE-14173
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-14173.patch, HIVE-14173.patch, HIVE-14173.patch
>
>
> hive.metastore.try.direct.sql is initially set to false in HMS hive-site.xml, 
> then changed to true using set metaconf command in the middle of a session, 
> running a query will be thrown NPE with error message is as following:
> {code}
> 2016-07-06T17:44:41,489 ERROR [pool-5-thread-2]: metastore.RetryingHMSHandler 
> (RetryingHMSHandler.java:invokeInternal(192)) - 
> MetaException(message:java.lang.NullPointerException)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:5741)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.rethrowException(HiveMetaStore.java:4771)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:4754)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
>   at com.sun.proxy.$Proxy18.get_partitions_by_expr(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:12048)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:12032)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.(ObjectStore.java:2667)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore$GetListHelper.(ObjectStore.java:2825)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore$4.(ObjectStore.java:2410)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:2410)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:2400)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)
>   at com.sun.proxy.$Proxy17.getPartitionsByExpr(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:4749)
>   ... 20 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14146) Column comments with "\n" character "corrupts" table metadata

2016-07-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368241#comment-15368241
 ] 

Hive QA commented on HIVE-14146:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12816835/HIVE-14146.5.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10293 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/428/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/428/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-428/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12816835 - PreCommit-HIVE-MASTER-Build

> Column comments with "\n" character "corrupts" table metadata
> -
>
> Key: HIVE-14146
> URL: https://issues.apache.org/jira/browse/HIVE-14146
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-14146.2.patch, HIVE-14146.3.patch, 
> HIVE-14146.4.patch, HIVE-14146.5.patch, HIVE-14146.patch
>
>
> Create a table with the following(noting the \n in the COMMENT):
> {noformat}
> CREATE TABLE commtest(first_nm string COMMENT 'Indicates First name\nof an 
> individual’);
> {noformat}
> Describe shows that now the metadata is messed up:
> {noformat}
> beeline> describe commtest;
> +---++---+--+
> | col_name  | data_type  |comment|
> +---++---+--+
> | first_nm | string   | Indicates First name  |
> | of an individual  | NULL   | NULL  |
> +---++---+--+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14089) complex type support in LLAP IO is broken

2016-07-08 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-14089:
---

Assignee: Sergey Shelukhin  (was: Prasanth Jayachandran)

> complex type support in LLAP IO is broken 
> --
>
> Key: HIVE-14089
> URL: https://issues.apache.org/jira/browse/HIVE-14089
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14089.WIP.2.patch, HIVE-14089.WIP.patch
>
>
> HIVE-13617 is causing MiniLlapCliDriver following test failures
> {code}
> org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all
> org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14172) LLAP: force evict blocks by size to handle memory fragmentation

2016-07-08 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14172:

Attachment: HIVE-14172.01.patch

The same patch for HiveQA

> LLAP: force evict blocks by size to handle memory fragmentation
> ---
>
> Key: HIVE-14172
> URL: https://issues.apache.org/jira/browse/HIVE-14172
> Project: Hive
>  Issue Type: Bug
>Reporter: Nita Dembla
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14172.01.patch, HIVE-14172.patch
>
>
> In the long run, we should replace buddy allocator with a better scheme. For 
> now do a workaround for fragmentation that cannot be easily resolved. It's 
> still not perfect but works for practical  ORC cases, where we have the 
> default size and smaller blocks, rather than large allocations having trouble.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14111) better concurrency handling for TezSessionState - part I

2016-07-08 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14111:

Attachment: HIVE-14111.05.patch

The same patch for HiveQA

> better concurrency handling for TezSessionState - part I
> 
>
> Key: HIVE-14111
> URL: https://issues.apache.org/jira/browse/HIVE-14111
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14111.01.patch, HIVE-14111.02.patch, 
> HIVE-14111.03.patch, HIVE-14111.04.patch, HIVE-14111.05.patch, 
> HIVE-14111.patch, sessionPoolNotes.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14189) backport HIVE-13945 to branch-1

2016-07-08 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14189:

Attachment: HIVE-14189.02-branch-1.patch

Same patch...

> backport HIVE-13945 to branch-1
> ---
>
> Key: HIVE-14189
> URL: https://issues.apache.org/jira/browse/HIVE-14189
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14189.01-branch-1.patch, 
> HIVE-14189.02-branch-1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14188) LLAPIF: wrong user field is used from the token

2016-07-08 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14188:

Attachment: HIVE-14188.patch

Same patch...

> LLAPIF: wrong user field is used from the token
> ---
>
> Key: HIVE-14188
> URL: https://issues.apache.org/jira/browse/HIVE-14188
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14188.patch, HIVE-14188.patch
>
>
> realUser is not usually set in all cases for delegation tokens; we should use 
> the owner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11402) HS2 - add an option to disallow parallel query execution within a single Session

2016-07-08 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11402:

Summary: HS2 - add an option to disallow parallel query execution within a 
single Session  (was: HS2 - disallow parallel query execution within a single 
Session)

> HS2 - add an option to disallow parallel query execution within a single 
> Session
> 
>
> Key: HIVE-11402
> URL: https://issues.apache.org/jira/browse/HIVE-11402
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Thejas M Nair
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11402.01.patch, HIVE-11402.02.patch, 
> HIVE-11402.patch
>
>
> HiveServer2 currently allows concurrent queries to be run in a single 
> session. However, every HS2 session has  an associated SessionState object, 
> and the use of SessionState in many places assumes that only one thread is 
> using it, ie it is not thread safe.
> There are many places where SesssionState thread safety needs to be 
> addressed, and until then we should serialize all query execution for a 
> single HS2 session. -This problem can become more visible with HIVE-4239 now 
> allowing parallel query compilation.-
> Note that running queries in parallel for single session is not 
> straightforward  with jdbc, you need to spawn another thread as the 
> Statement.execute calls are blocking. I believe ODBC has non blocking query 
> execution API, and Hue is another well known application that shares sessions 
> for all queries that a user runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14147) Hive PPD might remove predicates when they are defined as a simple node e.g. WHERE pred

2016-07-08 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14147:
---
Summary: Hive PPD might remove predicates when they are defined as a simple 
node e.g. WHERE pred  (was: Hive PPD might remove predicates when they are 
defined as a simple node e.g. "WHERE pred")

> Hive PPD might remove predicates when they are defined as a simple node e.g. 
> WHERE pred
> ---
>
> Key: HIVE-14147
> URL: https://issues.apache.org/jira/browse/HIVE-14147
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
> Attachments: HIVE-14147.01.patch, HIVE-14147.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14147) Hive PPD might remove predicates when they are defined as a simple expr e.g. WHERE 'a'

2016-07-08 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14147:
---
Summary: Hive PPD might remove predicates when they are defined as a simple 
expr e.g. WHERE 'a'  (was: Hive PPD might remove predicates when they are 
defined as a simple node e.g. WHERE pred)

> Hive PPD might remove predicates when they are defined as a simple expr e.g. 
> WHERE 'a'
> --
>
> Key: HIVE-14147
> URL: https://issues.apache.org/jira/browse/HIVE-14147
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
> Attachments: HIVE-14147.01.patch, HIVE-14147.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14147) Hive PPD might remove predicates when they are defined as a simple expr e.g. WHERE 'a'

2016-07-08 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14147:
---
   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master, branch-2.1. Thanks for reviewing [~ashutoshc]!

> Hive PPD might remove predicates when they are defined as a simple expr e.g. 
> WHERE 'a'
> --
>
> Key: HIVE-14147
> URL: https://issues.apache.org/jira/browse/HIVE-14147
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14147.01.patch, HIVE-14147.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14147) Hive PPD might remove predicates when they are defined as a simple expr e.g. WHERE 'a'

2016-07-08 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14147:
---
Fix Version/s: 1.3.0

> Hive PPD might remove predicates when they are defined as a simple expr e.g. 
> WHERE 'a'
> --
>
> Key: HIVE-14147
> URL: https://issues.apache.org/jira/browse/HIVE-14147
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
> Fix For: 1.3.0, 2.2.0, 2.1.1
>
> Attachments: HIVE-14147.01.patch, HIVE-14147.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HIVE-13392) disable speculative execution for ACID Compactor

2016-07-08 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reopened HIVE-13392:
---

this needs to go into 2.1.1 as well

> disable speculative execution for ACID Compactor
> 
>
> Key: HIVE-13392
> URL: https://issues.apache.org/jira/browse/HIVE-13392
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Fix For: 1.3.0, 2.2.0
>
> Attachments: HIVE-13392.2.patch, HIVE-13392.3.patch, 
> HIVE-13392.4.patch, HIVE-13392.patch
>
>
> https://developer.yahoo.com/hadoop/tutorial/module4.html
> Speculative execution is enabled by default. You can disable speculative 
> execution for the mappers and reducers by setting the 
> mapred.map.tasks.speculative.execution and 
> mapred.reduce.tasks.speculative.execution JobConf options to false, 
> respectively.
> CompactorMR is currently not set up to handle speculative execution and may 
> lead to something like
> {code}
> 2016-02-08 22:56:38,256 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
>  Failed to CREATE_FILE 
> /apps/hive/warehouse/service_logs_v2/ds=2016-01-20/_tmp_6cf08b9f-c2e2-4182-bc81-e032801b147f/base_13858600/bucket_4
>  for DFSClient_attempt_1454628390210_27756_m_01_1_131224698_1 on 
> 172.18.129.12 because this file lease is currently owned by 
> DFSClient_attempt_1454628390210_27756_m_01_0_-2027182532_1 on 
> 172.18.129.18
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2937)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2562)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2451)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2335)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:688)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:397)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
> {code}
> Short term: disable speculative execution for this job
> Longer term perhaps make each task write to dir with UUID...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-12393) Simplify ColumnPruner when CBO optimizes the query

2016-07-08 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-12393.

Resolution: Duplicate

> Simplify ColumnPruner when CBO optimizes the query
> --
>
> Key: HIVE-12393
> URL: https://issues.apache.org/jira/browse/HIVE-12393
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> The plan for any given query optimized by CBO will always contain a Project 
> operator on top of the TS that prunes that columns that are not needed.
> Thus, there is no need for Hive optimizer to traverse the whole plan to check 
> which columns can be pruned. In fact, Hive ColumnPruner optimizer only needs 
> to match TS operators when CBO optimized the plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14169) Beeline Row printing should only calculate the width if TableOutputFormat is used

2016-07-08 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-14169:

Attachment: HIVE-14169.1.patch

> Beeline Row printing should only calculate the width if TableOutputFormat is 
> used
> -
>
> Key: HIVE-14169
> URL: https://issues.apache.org/jira/browse/HIVE-14169
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14169.1.patch
>
>
> * When Beeline prints out a {{ResultSet}} to stdout it uses the 
> {{BeeLine.print}} method
> * This method takes the {{ResultSet}} from the completed query and uses a 
> specified {{OutputFormat}} to print the rows (by default it uses 
> {{TableOutputFormat}})
> * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
> (either a {{IncrementalRows}} or a {{BufferedRows}} class)
> * The {{Rows}} class will calculate the optimal width that each row in the 
> {{ResultSet}} should be displayed with
> * However, this width is only relevant / used by {{TableOutputFormat}}
> We should modify the logic so that the width is only calculated if 
> {{TableOutputFormat}} is used. This will save CPU cycles when printing 
> records out to the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HIVE-14169) Beeline Row printing should only calculate the width if TableOutputFormat is used

2016-07-08 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-14169 started by Sahil Takiar.
---
> Beeline Row printing should only calculate the width if TableOutputFormat is 
> used
> -
>
> Key: HIVE-14169
> URL: https://issues.apache.org/jira/browse/HIVE-14169
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14169.1.patch
>
>
> * When Beeline prints out a {{ResultSet}} to stdout it uses the 
> {{BeeLine.print}} method
> * This method takes the {{ResultSet}} from the completed query and uses a 
> specified {{OutputFormat}} to print the rows (by default it uses 
> {{TableOutputFormat}})
> * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
> (either a {{IncrementalRows}} or a {{BufferedRows}} class)
> * The {{Rows}} class will calculate the optimal width that each row in the 
> {{ResultSet}} should be displayed with
> * However, this width is only relevant / used by {{TableOutputFormat}}
> We should modify the logic so that the width is only calculated if 
> {{TableOutputFormat}} is used. This will save CPU cycles when printing 
> records out to the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14169) Beeline Row printing should only calculate the width if TableOutputFormat is used

2016-07-08 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-14169:

Status: Patch Available  (was: In Progress)

> Beeline Row printing should only calculate the width if TableOutputFormat is 
> used
> -
>
> Key: HIVE-14169
> URL: https://issues.apache.org/jira/browse/HIVE-14169
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14169.1.patch
>
>
> * When Beeline prints out a {{ResultSet}} to stdout it uses the 
> {{BeeLine.print}} method
> * This method takes the {{ResultSet}} from the completed query and uses a 
> specified {{OutputFormat}} to print the rows (by default it uses 
> {{TableOutputFormat}})
> * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
> (either a {{IncrementalRows}} or a {{BufferedRows}} class)
> * The {{Rows}} class will calculate the optimal width that each row in the 
> {{ResultSet}} should be displayed with
> * However, this width is only relevant / used by {{TableOutputFormat}}
> We should modify the logic so that the width is only calculated if 
> {{TableOutputFormat}} is used. This will save CPU cycles when printing 
> records out to the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14198) Refactor aux jar related code to make them more consistency

2016-07-08 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14198:

Attachment: HIVE-14198.1.patch

> Refactor aux jar related code to make them more consistency
> ---
>
> Key: HIVE-14198
> URL: https://issues.apache.org/jira/browse/HIVE-14198
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14198.1.patch
>
>
> There are some redundancy and inconsistency between hive.aux.jar.paths and 
> hive.reloadable.aux.jar.paths and also between MR and spark. 
> Refactor the code to reuse the same code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14169) Honor --incremental flag only if TableOutputFormat is used

2016-07-08 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-14169:

Summary: Honor --incremental flag only if TableOutputFormat is used  (was: 
Beeline Row printing should only calculate the width if TableOutputFormat is 
used)

> Honor --incremental flag only if TableOutputFormat is used
> --
>
> Key: HIVE-14169
> URL: https://issues.apache.org/jira/browse/HIVE-14169
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14169.1.patch
>
>
> * When Beeline prints out a {{ResultSet}} to stdout it uses the 
> {{BeeLine.print}} method
> * This method takes the {{ResultSet}} from the completed query and uses a 
> specified {{OutputFormat}} to print the rows (by default it uses 
> {{TableOutputFormat}})
> * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
> (either a {{IncrementalRows}} or a {{BufferedRows}} class)
> * The {{Rows}} class will calculate the optimal width that each row in the 
> {{ResultSet}} should be displayed with
> * However, this width is only relevant / used by {{TableOutputFormat}}
> We should modify the logic so that the width is only calculated if 
> {{TableOutputFormat}} is used. This will save CPU cycles when printing 
> records out to the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14169) Beeline Row printing should only calculate the width if TableOutputFormat is used

2016-07-08 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368452#comment-15368452
 ] 

Sahil Takiar commented on HIVE-14169:
-

[~thejas] I checked and it looks like the column width is only calculated if 
{{TableOutputFormat}} is used. {{BufferedRows}} has a method called 
{{normalizeWidths}} that is only invoked in {{TableOutputFormat}}. Thus, I am 
changing the goal of this JIRA, the code will now only honor the 
{{--incremental}} flag if {{TableOutputFormat}} is used. If a different 
{{OutputFormat}} is used then {{IncrementalRows}} is always used.

> Beeline Row printing should only calculate the width if TableOutputFormat is 
> used
> -
>
> Key: HIVE-14169
> URL: https://issues.apache.org/jira/browse/HIVE-14169
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14169.1.patch
>
>
> * When Beeline prints out a {{ResultSet}} to stdout it uses the 
> {{BeeLine.print}} method
> * This method takes the {{ResultSet}} from the completed query and uses a 
> specified {{OutputFormat}} to print the rows (by default it uses 
> {{TableOutputFormat}})
> * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
> (either a {{IncrementalRows}} or a {{BufferedRows}} class)
> * The {{Rows}} class will calculate the optimal width that each row in the 
> {{ResultSet}} should be displayed with
> * However, this width is only relevant / used by {{TableOutputFormat}}
> We should modify the logic so that the width is only calculated if 
> {{TableOutputFormat}} is used. This will save CPU cycles when printing 
> records out to the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14198) Refactor aux jar related code to make them more consistent

2016-07-08 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14198:

Summary: Refactor aux jar related code to make them more consistent  (was: 
Refactor aux jar related code to make them more consistency)

> Refactor aux jar related code to make them more consistent
> --
>
> Key: HIVE-14198
> URL: https://issues.apache.org/jira/browse/HIVE-14198
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14198.1.patch
>
>
> There are some redundancy and inconsistency between hive.aux.jar.paths and 
> hive.reloadable.aux.jar.paths and also between MR and spark. 
> Refactor the code to reuse the same code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14198) Refactor aux jar related code to make them more consistent

2016-07-08 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368454#comment-15368454
 ] 

Aihua Xu commented on HIVE-14198:
-

Attached patch-1: refactor the code so hive.aux.jar.paths will be initialized 
with the same function call as hive.reloadable.aux.jar.paths. So both will 
support folder and files. Also made the change to share the same code between 
MR and spark.

> Refactor aux jar related code to make them more consistent
> --
>
> Key: HIVE-14198
> URL: https://issues.apache.org/jira/browse/HIVE-14198
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14198.1.patch
>
>
> There are some redundancy and inconsistency between hive.aux.jar.paths and 
> hive.reloadable.aux.jar.paths and also between MR and spark. 
> Refactor the code to reuse the same code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14169) Honor --incremental flag only if TableOutputFormat is used

2016-07-08 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-14169:

Description: 
* When Beeline prints out a {{ResultSet}} to stdout it uses the 
{{BeeLine.print}} method
* This method takes the {{ResultSet}} from the completed query and uses a 
specified {{OutputFormat}} to print the rows (by default it uses 
{{TableOutputFormat}})
* The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
(either a {{IncrementalRows}} or a {{BufferedRows}} class)

The advantage of {{BufferedRows}} is that it can do a global calculation of the 
column width, however, this is only useful for {{TableOutputFormat}}. So there 
is no need to buffer all the rows if a different {{OutputFormat}} is used. This 
JIRA will change the behavior of the {{--incremental}} flag so that it is only 
honored if {{TableOutputFormat}} is used.

  was:
* When Beeline prints out a {{ResultSet}} to stdout it uses the 
{{BeeLine.print}} method
* This method takes the {{ResultSet}} from the completed query and uses a 
specified {{OutputFormat}} to print the rows (by default it uses 
{{TableOutputFormat}})
* The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
(either a {{IncrementalRows}} or a {{BufferedRows}} class)
* The {{Rows}} class will calculate the optimal width that each row in the 
{{ResultSet}} should be displayed with
* However, this width is only relevant / used by {{TableOutputFormat}}

We should modify the logic so that the width is only calculated if 
{{TableOutputFormat}} is used. This will save CPU cycles when printing records 
out to the user.




> Honor --incremental flag only if TableOutputFormat is used
> --
>
> Key: HIVE-14169
> URL: https://issues.apache.org/jira/browse/HIVE-14169
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14169.1.patch
>
>
> * When Beeline prints out a {{ResultSet}} to stdout it uses the 
> {{BeeLine.print}} method
> * This method takes the {{ResultSet}} from the completed query and uses a 
> specified {{OutputFormat}} to print the rows (by default it uses 
> {{TableOutputFormat}})
> * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
> (either a {{IncrementalRows}} or a {{BufferedRows}} class)
> The advantage of {{BufferedRows}} is that it can do a global calculation of 
> the column width, however, this is only useful for {{TableOutputFormat}}. So 
> there is no need to buffer all the rows if a different {{OutputFormat}} is 
> used. This JIRA will change the behavior of the {{--incremental}} flag so 
> that it is only honored if {{TableOutputFormat}} is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14198) Refactor aux jar related code to make them more consistent

2016-07-08 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14198:

Affects Version/s: 2.2.0
   Status: Patch Available  (was: Open)

> Refactor aux jar related code to make them more consistent
> --
>
> Key: HIVE-14198
> URL: https://issues.apache.org/jira/browse/HIVE-14198
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14198.1.patch
>
>
> There are some redundancy and inconsistency between hive.aux.jar.paths and 
> hive.reloadable.aux.jar.paths and also between MR and spark. 
> Refactor the code to reuse the same code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14197) LLAP service driver precondition failure should include the values

2016-07-08 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14197:
-
Description: 
LLAP service driver's precondition failure message are like below

{code}
Working memory + cache has to be smaller than the container sizing
{code}

It will be better to include the actual values for the sizes in the 
precondition failure message.

NO PRECOMMIT TESTS

  was:
LLAP service driver's precondition failure message are like below

{code}
Working memory + cache has to be smaller than the container sizing
{code}

It will be better to include the actual values for the sizes in the 
precondition failure message.


> LLAP service driver precondition failure should include the values
> --
>
> Key: HIVE-14197
> URL: https://issues.apache.org/jira/browse/HIVE-14197
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14197.1.patch
>
>
> LLAP service driver's precondition failure message are like below
> {code}
> Working memory + cache has to be smaller than the container sizing
> {code}
> It will be better to include the actual values for the sizes in the 
> precondition failure message.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14197) LLAP service driver precondition failure should include the values

2016-07-08 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14197:
-
Attachment: HIVE-14197.1.patch

> LLAP service driver precondition failure should include the values
> --
>
> Key: HIVE-14197
> URL: https://issues.apache.org/jira/browse/HIVE-14197
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14197.1.patch
>
>
> LLAP service driver's precondition failure message are like below
> {code}
> Working memory + cache has to be smaller than the container sizing
> {code}
> It will be better to include the actual values for the sizes in the 
> precondition failure message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14197) LLAP service driver precondition failure should include the values

2016-07-08 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14197:
-
Status: Patch Available  (was: Open)

> LLAP service driver precondition failure should include the values
> --
>
> Key: HIVE-14197
> URL: https://issues.apache.org/jira/browse/HIVE-14197
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14197.1.patch
>
>
> LLAP service driver's precondition failure message are like below
> {code}
> Working memory + cache has to be smaller than the container sizing
> {code}
> It will be better to include the actual values for the sizes in the 
> precondition failure message.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor

2016-07-08 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13934:
-
Attachment: HIVE-13934.7.patch

> Configure Tez to make nocondiional task size memory available for the 
> Processor
> ---
>
> Key: HIVE-13934
> URL: https://issues.apache.org/jira/browse/HIVE-13934
> Project: Hive
>  Issue Type: Bug
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13934.1.patch, HIVE-13934.2.patch, 
> HIVE-13934.3.patch, HIVE-13934.4.patch, HIVE-13934.6.patch, HIVE-13934.7.patch
>
>
> Currently, noconditionaltasksize is not validated against the container size, 
> the reservations made in the container by Tez for Inputs / Outputs etc.
> Check this at compile time to see if enough memory is available, or set up 
> the vertex to reserve additional memory for the Processor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14197) LLAP service driver precondition failure should include the values

2016-07-08 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368460#comment-15368460
 ] 

Siddharth Seth commented on HIVE-14197:
---

+1

> LLAP service driver precondition failure should include the values
> --
>
> Key: HIVE-14197
> URL: https://issues.apache.org/jira/browse/HIVE-14197
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14197.1.patch
>
>
> LLAP service driver's precondition failure message are like below
> {code}
> Working memory + cache has to be smaller than the container sizing
> {code}
> It will be better to include the actual values for the sizes in the 
> precondition failure message.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14197) LLAP service driver precondition failure should include the values

2016-07-08 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14197:
-
   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

Committed to branch-2.1 and master. Thanks [~sseth] for the review!

> LLAP service driver precondition failure should include the values
> --
>
> Key: HIVE-14197
> URL: https://issues.apache.org/jira/browse/HIVE-14197
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14197.1.patch
>
>
> LLAP service driver's precondition failure message are like below
> {code}
> Working memory + cache has to be smaller than the container sizing
> {code}
> It will be better to include the actual values for the sizes in the 
> precondition failure message.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14195) HiveMetaStoreClient getFunction() does not throw NoSuchObjectException

2016-07-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368514#comment-15368514
 ] 

Hive QA commented on HIVE-14195:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12816848/HIVE-14195.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10293 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver
org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore.testSimpleFunction
org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testSimpleFunction
org.apache.hadoop.hive.metastore.TestSetUGIOnBothClientServer.testSimpleFunction
org.apache.hadoop.hive.metastore.TestSetUGIOnOnlyClient.testSimpleFunction
org.apache.hadoop.hive.metastore.TestSetUGIOnOnlyServer.testSimpleFunction
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/429/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/429/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-429/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12816848 - PreCommit-HIVE-MASTER-Build

> HiveMetaStoreClient getFunction() does not throw NoSuchObjectException
> --
>
> Key: HIVE-14195
> URL: https://issues.apache.org/jira/browse/HIVE-14195
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Attachments: HIVE-14195.patch
>
>
> HiveMetaStoreClient getFunction(dbName, funcName) does not throw 
> NoSuchObjectException when no function with funcName exists in the db. 
> Instead, I need to search the MetaException message for 
> 'NoSuchObjectException'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-13392) disable speculative execution for ACID Compactor

2016-07-08 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman resolved HIVE-13392.
---
   Resolution: Fixed
Fix Version/s: 2.1.1

committed to 2.1 as well 
https://github.com/apache/hive/commit/39ecc205e64cd1808bebec3ae1dc448e01c48680

> disable speculative execution for ACID Compactor
> 
>
> Key: HIVE-13392
> URL: https://issues.apache.org/jira/browse/HIVE-13392
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Fix For: 1.3.0, 2.2.0, 2.1.1
>
> Attachments: HIVE-13392.2.patch, HIVE-13392.3.patch, 
> HIVE-13392.4.patch, HIVE-13392.patch
>
>
> https://developer.yahoo.com/hadoop/tutorial/module4.html
> Speculative execution is enabled by default. You can disable speculative 
> execution for the mappers and reducers by setting the 
> mapred.map.tasks.speculative.execution and 
> mapred.reduce.tasks.speculative.execution JobConf options to false, 
> respectively.
> CompactorMR is currently not set up to handle speculative execution and may 
> lead to something like
> {code}
> 2016-02-08 22:56:38,256 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
>  Failed to CREATE_FILE 
> /apps/hive/warehouse/service_logs_v2/ds=2016-01-20/_tmp_6cf08b9f-c2e2-4182-bc81-e032801b147f/base_13858600/bucket_4
>  for DFSClient_attempt_1454628390210_27756_m_01_1_131224698_1 on 
> 172.18.129.12 because this file lease is currently owned by 
> DFSClient_attempt_1454628390210_27756_m_01_0_-2027182532_1 on 
> 172.18.129.18
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2937)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2562)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2451)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2335)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:688)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:397)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
> {code}
> Short term: disable speculative execution for this job
> Longer term perhaps make each task write to dir with UUID...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor

2016-07-08 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13934:
-
Attachment: HIVE-13934.7.patch

> Configure Tez to make nocondiional task size memory available for the 
> Processor
> ---
>
> Key: HIVE-13934
> URL: https://issues.apache.org/jira/browse/HIVE-13934
> Project: Hive
>  Issue Type: Bug
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13934.1.patch, HIVE-13934.2.patch, 
> HIVE-13934.3.patch, HIVE-13934.4.patch, HIVE-13934.6.patch, HIVE-13934.7.patch
>
>
> Currently, noconditionaltasksize is not validated against the container size, 
> the reservations made in the container by Tez for Inputs / Outputs etc.
> Check this at compile time to see if enough memory is available, or set up 
> the vertex to reserve additional memory for the Processor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor

2016-07-08 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13934:
-
Attachment: (was: HIVE-13934.7.patch)

> Configure Tez to make nocondiional task size memory available for the 
> Processor
> ---
>
> Key: HIVE-13934
> URL: https://issues.apache.org/jira/browse/HIVE-13934
> Project: Hive
>  Issue Type: Bug
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13934.1.patch, HIVE-13934.2.patch, 
> HIVE-13934.3.patch, HIVE-13934.4.patch, HIVE-13934.6.patch, HIVE-13934.7.patch
>
>
> Currently, noconditionaltasksize is not validated against the container size, 
> the reservations made in the container by Tez for Inputs / Outputs etc.
> Check this at compile time to see if enough memory is available, or set up 
> the vertex to reserve additional memory for the Processor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14168) Avoid serializing all parameters from HiveConf.java into in-memory HiveConf instances

2016-07-08 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368644#comment-15368644
 ] 

Siddharth Seth commented on HIVE-14168:
---

Any thoughts on this ?

> Avoid serializing all parameters from HiveConf.java into in-memory HiveConf 
> instances
> -
>
> Key: HIVE-14168
> URL: https://issues.apache.org/jira/browse/HIVE-14168
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Priority: Critical
>
> All non-null parameters from HiveConf.java are explicitly set in each 
> HiveConf instance.
> {code}
> // Overlay the ConfVars. Note that this ignores ConfVars with null values
> addResource(getConfVarInputStream());
> {code}
> This unnecessarily bloats each Configuration object - 400+ conf variables 
> being set instead of probably <30 which would exist in hive-site.xml.
> Looking at a HS2 heapdump - HiveConf is almost always the largest component 
> by a long way. Conf objects are also serialized very often - transmitting 
> lots of unneeded variables (serialized Hive conf is typically 1000+ variables 
> - due to Hadoop injecting it's configs into every config instance).
> As long as HiveConf.get() is the approach used to read from a config - this 
> is avoidable. Hive code itself should be doing this.
> This would be a potentially incompatible change for UDFs and other plugins 
> which have access to a Configuration object.
> I'd suggest turning off the insert by default, and adding a flag to control 
> this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14192) False positive error due to thrift

2016-07-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368677#comment-15368677
 ] 

Hive QA commented on HIVE-14192:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12816856/HIVE-14192.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10293 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/430/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/430/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-430/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12816856 - PreCommit-HIVE-MASTER-Build

> False positive error due to thrift
> --
>
> Key: HIVE-14192
> URL: https://issues.apache.org/jira/browse/HIVE-14192
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 1.3.0, 2.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14192.2.patch, HIVE-14192.patch
>
>
> Given Thrift definition like this
> {noformat}
> struct LockComponent {
> 1: required LockType type,
> 2: required LockLevel level,
> 3: required string dbname,
> 4: optional string tablename,
> 5: optional string partitionname,
> 6: optional DataOperationType operationType = DataOperationType.UNSET,
> 7: optional bool isAcid = false
> }
> {noformat}
> The generated LockComponent has 
> {noformat}
>   public LockComponent() {
> this.operationType = 
> org.apache.hadoop.hive.metastore.api.DataOperationType.UNSET;
> this.isAcid = false;
>   }
>   public boolean isSetOperationType() {
> return this.operationType != null;
>   }
>   public boolean isSetIsAcid() {
> return EncodingUtils.testBit(__isset_bitfield, __ISACID_ISSET_ID);
>   }
> {noformat}
> So bottom line is even if LockComponent is created by old version of the 
> client which doesn't have operationType filed, isSetOperationType() will 
> still return true on the server.
> This causes a false positive exception in TxnHandler.enqueueLockWithRetry() 
> during Rolling Upgrade scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14178) Hive::needsToCopy should reuse FileUtils::equalsFileSystem

2016-07-08 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-14178:

   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master & branch-2.1 Thanks, Gopal!

> Hive::needsToCopy should reuse FileUtils::equalsFileSystem
> --
>
> Key: HIVE-14178
> URL: https://issues.apache.org/jira/browse/HIVE-14178
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1, 2.1.0, 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14178.1.patch, HIVE-14178.2.patch
>
>
> Clear bug triggered from missing FS checks in Hive.java
> {code}
> //Check if different FileSystems
> if (!srcFs.getClass().equals(destFs.getClass()))
> { 
> return true;
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14192) False positive error due to thrift

2016-07-08 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368704#comment-15368704
 ] 

Eugene Koifman commented on HIVE-14192:
---

all failures have age > 1

> False positive error due to thrift
> --
>
> Key: HIVE-14192
> URL: https://issues.apache.org/jira/browse/HIVE-14192
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 1.3.0, 2.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14192.2.patch, HIVE-14192.patch
>
>
> Given Thrift definition like this
> {noformat}
> struct LockComponent {
> 1: required LockType type,
> 2: required LockLevel level,
> 3: required string dbname,
> 4: optional string tablename,
> 5: optional string partitionname,
> 6: optional DataOperationType operationType = DataOperationType.UNSET,
> 7: optional bool isAcid = false
> }
> {noformat}
> The generated LockComponent has 
> {noformat}
>   public LockComponent() {
> this.operationType = 
> org.apache.hadoop.hive.metastore.api.DataOperationType.UNSET;
> this.isAcid = false;
>   }
>   public boolean isSetOperationType() {
> return this.operationType != null;
>   }
>   public boolean isSetIsAcid() {
> return EncodingUtils.testBit(__isset_bitfield, __ISACID_ISSET_ID);
>   }
> {noformat}
> So bottom line is even if LockComponent is created by old version of the 
> client which doesn't have operationType filed, isSetOperationType() will 
> still return true on the server.
> This causes a false positive exception in TxnHandler.enqueueLockWithRetry() 
> during Rolling Upgrade scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14129) Execute move tasks in parallel

2016-07-08 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368756#comment-15368756
 ] 

Ashutosh Chauhan commented on HIVE-14129:
-

[~thejas] You pointed out couple of issues on HIVE-9665 Can you comment if they 
are resolved and thus enabling this will be safe now or those are still 
unresolved.

> Execute move tasks in parallel
> --
>
> Key: HIVE-14129
> URL: https://issues.apache.org/jira/browse/HIVE-14129
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
> Attachments: HIVE-14129.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >