date:20150609

[jira] [Updated] (HIVE-10971) count(*) with count(distinct) gives wrong results when hive.groupby.skewindata=true

2015-06-09 Thread wangmeng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangmeng updated HIVE-10971:

Attachment: HIVE-10971.01.patch

 count(*) with count(distinct) gives wrong results when 
 hive.groupby.skewindata=true
 ---

 Key: HIVE-10971
 URL: https://issues.apache.org/jira/browse/HIVE-10971
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 1.2.0
Reporter: wangmeng
Assignee: wangmeng
 Attachments: HIVE-10971.01.patch


 When hive.groupby.skewindata=true, the following query based on TPC-H gives 
 wrong results:
 {code}
 set hive.groupby.skewindata=true;
 select l_returnflag, count(*), count(distinct l_linestatus)
 from lineitem
 group by l_returnflag
 limit 10;
 {code}
 The query plan shows that it generates only one MapReduce job instead of two 
 theoretically, which is dictated by hive.groupby.skewindata=true.
 The problem arises only when {noformat}count(*){noformat} and 
 {noformat}count(distinct){noformat} exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10903) Add hive.in.test for HoS tests

2015-06-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578621#comment-14578621
 ] 

Hive QA commented on HIVE-10903:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12738527/HIVE-10903.2.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 9004 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_merge_multi_expressions
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_louter_join_ppr
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_outer_join_ppr
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_cast_constant
org.apache.hive.beeline.TestSchemaTool.testSchemaInit
org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4222/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4222/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4222/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12738527 - PreCommit-HIVE-TRUNK-Build

 Add hive.in.test for HoS tests
 --

 Key: HIVE-10903
 URL: https://issues.apache.org/jira/browse/HIVE-10903
 Project: Hive
  Issue Type: Test
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-10903.1.patch, HIVE-10903.2.patch


 Missing the property can make CBO fails to run during UT. There should be 
 other effects that can be identified here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10971) count(*) with count(distinct) gives wrong results when hive.groupby.skewindata=true

2015-06-09 Thread wangmeng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangmeng updated HIVE-10971:

Component/s: (was: Hive)
 Logical Optimizer

 count(*) with count(distinct) gives wrong results when 
 hive.groupby.skewindata=true
 ---

 Key: HIVE-10971
 URL: https://issues.apache.org/jira/browse/HIVE-10971
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 1.2.0
Reporter: wangmeng
Assignee: wangmeng

 When hive.groupby.skewindata=true, the following query based on TPC-H gives 
 wrong results:
 {code}
 set hive.groupby.skewindata=true;
 select l_returnflag, count(*), count(distinct l_linestatus)
 from lineitem
 group by l_returnflag
 limit 10;
 {code}
 The query plan shows that it generates only one MapReduce job instead of two 
 theoretically, which is dictated by hive.groupby.skewindata=true.
 The problem arises only when {noformat}count(*){noformat} and 
 {noformat}count(distinct){noformat} exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10971) count(*) with count(distinct) gives wrong results when hive.groupby.skewindata=true

2015-06-09 Thread wangmeng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangmeng updated HIVE-10971:

Attachment: HIVE-10971.01.patch

 count(*) with count(distinct) gives wrong results when 
 hive.groupby.skewindata=true
 ---

 Key: HIVE-10971
 URL: https://issues.apache.org/jira/browse/HIVE-10971
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 1.2.0
Reporter: wangmeng
Assignee: wangmeng
 Attachments: HIVE-10971.01.patch, HIVE-10971.01.patch


 When hive.groupby.skewindata=true, the following query based on TPC-H gives 
 wrong results:
 {code}
 set hive.groupby.skewindata=true;
 select l_returnflag, count(*), count(distinct l_linestatus)
 from lineitem
 group by l_returnflag
 limit 10;
 {code}
 The query plan shows that it generates only one MapReduce job instead of two 
 theoretically, which is dictated by hive.groupby.skewindata=true.
 The problem arises only when {noformat}count(*){noformat} and 
 {noformat}count(distinct){noformat} exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10971) count(*) with count(distinct) gives wrong results when hive.groupby.skewindata=true

2015-06-09 Thread wangmeng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangmeng updated HIVE-10971:

Attachment: (was: HIVE-10971.01.patch)

 count(*) with count(distinct) gives wrong results when 
 hive.groupby.skewindata=true
 ---

 Key: HIVE-10971
 URL: https://issues.apache.org/jira/browse/HIVE-10971
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 1.2.0
Reporter: wangmeng
Assignee: wangmeng
 Attachments: HIVE-10971.01.patch


 When hive.groupby.skewindata=true, the following query based on TPC-H gives 
 wrong results:
 {code}
 set hive.groupby.skewindata=true;
 select l_returnflag, count(*), count(distinct l_linestatus)
 from lineitem
 group by l_returnflag
 limit 10;
 {code}
 The query plan shows that it generates only one MapReduce job instead of two 
 theoretically, which is dictated by hive.groupby.skewindata=true.
 The problem arises only when {noformat}count(*){noformat} and 
 {noformat}count(distinct){noformat} exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10933) Hive 0.13 returns precision 0 for varchar(32) from DatabaseMetadata.getColumns()

2015-06-09 Thread Damien Carol (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-10933:

Component/s: (was: API)
 JDBC

 Hive 0.13 returns precision 0 for varchar(32) from 
 DatabaseMetadata.getColumns()
 

 Key: HIVE-10933
 URL: https://issues.apache.org/jira/browse/HIVE-10933
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.13.0
Reporter: Son Nguyen
Assignee: Chaoyu Tang

 DatabaseMetadata.getColumns() returns COLUMN_SIZE as 0 for a column defined 
 as varchar(32), or char(32).   While ResultSetMetaData.getPrecision() returns 
 correct value 32.
 Here is the segment program that reproduces the issue.
 {code}
 try {
   statement = connection.createStatement();
   
   statement.execute(drop table if exists son_table);
   
   statement.execute(create table son_table( col1 varchar(32) ));
   
   statement.close();
   
 } catch ( Exception e) {
  return;
 } 
   
 // get column info using metadata
 try {
   DatabaseMetaData dmd = null;
   ResultSet resultSet = null;
   
   dmd = connection.getMetaData();
   
   resultSet = dmd.getColumns(null, null, son_table, col1);
   
   if ( resultSet.next() ) {
   String tabName = resultSet.getString(TABLE_NAME);
   String colName = resultSet.getString(COLUMN_NAME);
   String dataType = resultSet.getString(DATA_TYPE);
   String typeName = resultSet.getString(TYPE_NAME);
   int precision = resultSet.getInt(COLUMN_SIZE);
   
   // output is: colName = col1, dataType = 12, typeName = 
 VARCHAR, precision = 0.
 System.out.format(colName = %s, dataType = %s, typeName = %s, 
 precision = %d.,
   colName, dataType, typeName, precision);
   }
 } catch ( Exception e) {
   return;
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10933) Hive 0.13 returns precision 0 for varchar(32) from DatabaseMetadata.getColumns()

2015-06-09 Thread Damien Carol (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-10933:

Description: 
DatabaseMetadata.getColumns() returns COLUMN_SIZE as 0 for a column defined 
as varchar(32), or char(32).   While ResultSetMetaData.getPrecision() returns 
correct value 32.

Here is the segment program that reproduces the issue.
{code}
try {
statement = connection.createStatement();

statement.execute(drop table if exists son_table);

statement.execute(create table son_table( col1 varchar(32) ));

statement.close();

} catch ( Exception e) {
   return;
}   

// get column info using metadata
try {
DatabaseMetaData dmd = null;
ResultSet resultSet = null;

dmd = connection.getMetaData();

resultSet = dmd.getColumns(null, null, son_table, col1);

if ( resultSet.next() ) {
String tabName = resultSet.getString(TABLE_NAME);
String colName = resultSet.getString(COLUMN_NAME);
String dataType = resultSet.getString(DATA_TYPE);
String typeName = resultSet.getString(TYPE_NAME);
int precision = resultSet.getInt(COLUMN_SIZE);

// output is: colName = col1, dataType = 12, typeName = 
VARCHAR, precision = 0.
  System.out.format(colName = %s, dataType = %s, typeName = %s, 
precision = %d.,
colName, dataType, typeName, precision);
}

} catch ( Exception e) {
return;
}
{code}

  was:
DatabaseMetadata.getColumns() returns COLUMN_SIZE as 0 for a column defined 
as varchar(32), or char(32).   While ResultSetMetaData.getPrecision() returns 
correct value 32.

Here is the segment program that reproduces the issue.

try {
statement = connection.createStatement();

statement.execute(drop table if exists son_table);

statement.execute(create table son_table( col1 varchar(32) ));

statement.close();

} catch ( Exception e) {
   return;
}   

// get column info using metadata
try {
DatabaseMetaData dmd = null;
ResultSet resultSet = null;

dmd = connection.getMetaData();

resultSet = dmd.getColumns(null, null, son_table, col1);

if ( resultSet.next() ) {
String tabName = resultSet.getString(TABLE_NAME);
String colName = resultSet.getString(COLUMN_NAME);
String dataType = resultSet.getString(DATA_TYPE);
String typeName = resultSet.getString(TYPE_NAME);
int precision = resultSet.getInt(COLUMN_SIZE);

// output is: colName = col1, dataType = 12, typeName = 
VARCHAR, precision = 0.
  System.out.format(colName = %s, dataType = %s, typeName = %s, 
precision = %d.,
colName, dataType, typeName, precision);
}

} catch ( Exception e) {
return;
}




 Hive 0.13 returns precision 0 for varchar(32) from 
 DatabaseMetadata.getColumns()
 

 Key: HIVE-10933
 URL: https://issues.apache.org/jira/browse/HIVE-10933
 Project: Hive
  Issue Type: Bug
  Components: API
Affects Versions: 0.13.0
Reporter: Son Nguyen
Assignee: Chaoyu Tang

 DatabaseMetadata.getColumns() returns COLUMN_SIZE as 0 for a column defined 
 as varchar(32), or char(32).   While ResultSetMetaData.getPrecision() returns 
 correct value 32.
 Here is the segment program that reproduces the issue.
 {code}
 try {
   statement = connection.createStatement();
   
   statement.execute(drop table if exists son_table);
   
   statement.execute(create table son_table( col1 varchar(32) ));
   
   statement.close();
   
 } catch ( Exception e) {
  return;
 } 
   
 // get column info using metadata
 try {
   DatabaseMetaData dmd = null;
   ResultSet resultSet = null;
   
   dmd = connection.getMetaData();
   
   resultSet = dmd.getColumns(null, null, son_table, col1);
   
   if ( resultSet.next() ) {
   String tabName = resultSet.getString(TABLE_NAME);
   String colName = resultSet.getString(COLUMN_NAME);

[jira] [Commented] (HIVE-10971) count(*) with count(distinct) gives wrong results when hive.groupby.skewindata=true

2015-06-09 Thread wangmeng (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578602#comment-14578602
 ] 

wangmeng commented on HIVE-10971:
-

{code}
hive set  hive.groupby.skewindata=true;
hive explain select l_returnflag,count(*),count(distinct  l_linestatus) from 
lineitem  group by l_returnflag  limit 10;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: lineitem
Statistics: Num rows: 1008537518 Data size: 201707503616 Basic 
stats: COMPLETE Column stats: NONE
Select Operator
  expressions: l_returnflag (type: string), l_linestatus (type: 
string)
  outputColumnNames: l_returnflag, l_linestatus
  Statistics: Num rows: 1008537518 Data size: 201707503616 Basic 
stats: COMPLETE Column stats: NONE
  Group By Operator
aggregations: count(), count(DISTINCT l_linestatus)
keys: l_returnflag (type: string), l_linestatus (type: string)
mode: hash
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 1008537518 Data size: 201707503616 Basic 
stats: COMPLETE Column stats: NONE
Reduce Output Operator
  key expressions: _col0 (type: string), _col1 (type: string)
  sort order: ++
  Map-reduce partition columns: _col0 (type: string)
  Statistics: Num rows: 1008537518 Data size: 201707503616 
Basic stats: COMPLETE Column stats: NONE
  value expressions: _col2 (type: bigint)
  Reduce Operator Tree:
Group By Operator
  aggregations: count(VALUE._col0), count(DISTINCT KEY._col1:0._col0)
  keys: KEY._col0 (type: string)
  mode: complete
  outputColumnNames: _col0, _col1, _col2
  Statistics: Num rows: 504268759 Data size: 100853751808 Basic stats: 
COMPLETE Column stats: NONE
  Select Operator
expressions: _col0 (type: string), _col1 (type: bigint), _col2 
(type: bigint)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 504268759 Data size: 100853751808 Basic 
stats: COMPLETE Column stats: NONE
Limit
  Number of rows: 10
  Statistics: Num rows: 10 Data size: 2000 Basic stats: COMPLETE 
Column stats: NONE
  File Output Operator
compressed: true
Statistics: Num rows: 10 Data size: 2000 Basic stats: COMPLETE 
Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
Fetch Operator
  limit: 10
{code}

When hive.groupby.skewindata=false, the Group By operator has mode 
mergepartial, which gives the correct results.

 count(*) with count(distinct) gives wrong results when 
 hive.groupby.skewindata=true
 ---

 Key: HIVE-10971
 URL: https://issues.apache.org/jira/browse/HIVE-10971
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0
Reporter: wangmeng
Assignee: wangmeng

 When hive.groupby.skewindata=true, the following query based on TPC-H gives 
 wrong results:
 {code}
 set hive.groupby.skewindata=true;
 select l_returnflag, count(*), count(distinct l_linestatus)
 from lineitem
 group by l_returnflag
 limit 10;
 {code}
 The query plan shows that it generates only one MapReduce job instead of two 
 theoretically, which is dictated by hive.groupby.skewindata=true.
 The problem arises only when {noformat}count(*){noformat} and 
 {noformat}count(distinct){noformat} exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-686) add UDF substring_index

2015-06-09 Thread Damien Carol (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-686:
--
Description: 
SUBSTRING_INDEX(str,delim,count)

Returns the substring from string str before count occurrences of the delimiter 
delim. If count is positive, everything to the left of the final delimiter 
(counting from the left) is returned. If count is negative, everything to the 
right of the final delimiter (counting from the right) is returned. 
SUBSTRING_INDEX() performs a case-sensitive match when searching for delim.
Examples:
{code:sql}
SELECT SUBSTRING_INDEX('www.mysql.com', '.', 3);
--www.mysql.com
SELECT SUBSTRING_INDEX('www.mysql.com', '.', 2);
--www.mysql
SELECT SUBSTRING_INDEX('www.mysql.com', '.', 1);
--www
SELECT SUBSTRING_INDEX('www.mysql.com', '.', 0);
--''
SELECT SUBSTRING_INDEX('www.mysql.com', '.', -1);
--com
SELECT SUBSTRING_INDEX('www.mysql.com', '.', -2);
--mysql.com
SELECT SUBSTRING_INDEX('www.mysql.com', '.', -3);
--www.mysql.com
{code}
{code:sql}
--#delim does not exist in str
SELECT SUBSTRING_INDEX('www.mysql.com', 'Q', 1);
--www.mysql.com

--#delim is 2 chars
SELECT SUBSTRING_INDEX('www||mysql||com', '||', 2);
--www||mysql

--#delim is empty string
SELECT SUBSTRING_INDEX('www.mysql.com', '', 2);
--''

--#str is empty string
SELECT SUBSTRING_INDEX('', '.', 2);
--''
{code}
{code:sql}
--#null params
SELECT SUBSTRING_INDEX(null, '.', 1);
--null
SELECT SUBSTRING_INDEX('www.mysql.com', null, 1);
--null
SELECT SUBSTRING_INDEX('www.mysql.com', '.', null);
--null
{code}

  was:
SUBSTRING_INDEX(str,delim,count)

Returns the substring from string str before count occurrences of the delimiter 
delim. If count is positive, everything to the left of the final delimiter 
(counting from the left) is returned. If count is negative, everything to the 
right of the final delimiter (counting from the right) is returned. 
SUBSTRING_INDEX() performs a case-sensitive match when searching for delim.
Examples:
{code}
SELECT SUBSTRING_INDEX('www.mysql.com', '.', 3);
--www.mysql.com
SELECT SUBSTRING_INDEX('www.mysql.com', '.', 2);
--www.mysql
SELECT SUBSTRING_INDEX('www.mysql.com', '.', 1);
--www
SELECT SUBSTRING_INDEX('www.mysql.com', '.', 0);
--''
SELECT SUBSTRING_INDEX('www.mysql.com', '.', -1);
--com
SELECT SUBSTRING_INDEX('www.mysql.com', '.', -2);
--mysql.com
SELECT SUBSTRING_INDEX('www.mysql.com', '.', -3);
--www.mysql.com
{code}
{code}
--#delim does not exist in str
SELECT SUBSTRING_INDEX('www.mysql.com', 'Q', 1);
--www.mysql.com

--#delim is 2 chars
SELECT SUBSTRING_INDEX('www||mysql||com', '||', 2);
--www||mysql

--#delim is empty string
SELECT SUBSTRING_INDEX('www.mysql.com', '', 2);
--''

--#str is empty string
SELECT SUBSTRING_INDEX('', '.', 2);
--''
{code}
{code}
--#null params
SELECT SUBSTRING_INDEX(null, '.', 1);
--null
SELECT SUBSTRING_INDEX('www.mysql.com', null, 1);
--null
SELECT SUBSTRING_INDEX('www.mysql.com', '.', null);
--null
{code}


 add UDF substring_index
 ---

 Key: HIVE-686
 URL: https://issues.apache.org/jira/browse/HIVE-686
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Namit Jain
Assignee: Alexander Pivovarov
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-686.1.patch, HIVE-686.1.patch, HIVE-686.patch, 
 HIVE-686.patch


 SUBSTRING_INDEX(str,delim,count)
 Returns the substring from string str before count occurrences of the 
 delimiter delim. If count is positive, everything to the left of the final 
 delimiter (counting from the left) is returned. If count is negative, 
 everything to the right of the final delimiter (counting from the right) is 
 returned. SUBSTRING_INDEX() performs a case-sensitive match when searching 
 for delim.
 Examples:
 {code:sql}
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', 3);
 --www.mysql.com
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', 2);
 --www.mysql
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', 1);
 --www
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', 0);
 --''
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', -1);
 --com
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', -2);
 --mysql.com
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', -3);
 --www.mysql.com
 {code}
 {code:sql}
 --#delim does not exist in str
 SELECT SUBSTRING_INDEX('www.mysql.com', 'Q', 1);
 --www.mysql.com
 --#delim is 2 chars
 SELECT SUBSTRING_INDEX('www||mysql||com', '||', 2);
 --www||mysql
 --#delim is empty string
 SELECT SUBSTRING_INDEX('www.mysql.com', '', 2);
 --''
 --#str is empty string
 SELECT SUBSTRING_INDEX('', '.', 2);
 --''
 {code}
 {code:sql}
 --#null params
 SELECT SUBSTRING_INDEX(null, '.', 1);
 --null
 SELECT SUBSTRING_INDEX('www.mysql.com', null, 1);
 --null
 SELECT SUBSTRING_INDEX('www.mysql.com', '.', null);
 --null
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10971) count(*) with count(distinct) gives wrong results when hive.groupby.skewindata=true

2015-06-09 Thread wangmeng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangmeng updated HIVE-10971:

Description: 
When hive.groupby.skewindata=true, the following query based on TPC-H gives 
wrong results:

{code}
set hive.groupby.skewindata=true;

select l_returnflag, count(*), count(distinct l_linestatus)
from lineitem
group by l_returnflag
limit 10;
{code}

The query plan shows that it generates only one MapReduce job instead of two 
theoretically, which is dictated by hive.groupby.skewindata=true.

The problem arises only when {noformat}count(*){noformat} and 
{noformat}count(distinct){noformat} exist together.

  was:
When hive.groupby.skewindata=true, the following query based on TPC-H gives 
wrong results:

{code}
set hive.groupby.skewindata=true;

select l_returnflag, count(*), count(distinct l_linestatus)
from lineitem
group by l_returnflag
limit 10;
{code}

The query plan shows that it generates only one MapReduce job instead of two, 
which is dictated by hive.groupby.skewindata=true.

The problem arises only when {noformat}count(*){noformat} and 
{noformat}count(distinct){noformat} exist together.


 count(*) with count(distinct) gives wrong results when 
 hive.groupby.skewindata=true
 ---

 Key: HIVE-10971
 URL: https://issues.apache.org/jira/browse/HIVE-10971
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0
Reporter: wangmeng
Assignee: wangmeng

 When hive.groupby.skewindata=true, the following query based on TPC-H gives 
 wrong results:
 {code}
 set hive.groupby.skewindata=true;
 select l_returnflag, count(*), count(distinct l_linestatus)
 from lineitem
 group by l_returnflag
 limit 10;
 {code}
 The query plan shows that it generates only one MapReduce job instead of two 
 theoretically, which is dictated by hive.groupby.skewindata=true.
 The problem arises only when {noformat}count(*){noformat} and 
 {noformat}count(distinct){noformat} exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10903) Add hive.in.test for HoS tests [Spark Branch]

2015-06-09 Thread Rui Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-10903:
--
Attachment: (was: HIVE-10903.1-spark.patch)

 Add hive.in.test for HoS tests [Spark Branch]
 -

 Key: HIVE-10903
 URL: https://issues.apache.org/jira/browse/HIVE-10903
 Project: Hive
  Issue Type: Test
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-10903.1.patch


 Missing the property can make CBO fails to run during UT. There should be 
 other effects that can be identified here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HIVE-10453) HS2 leaking open file descriptors when using UDFs

2015-06-09 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta reopened HIVE-10453:
-

[~ychena] In our internal testing, we found that closing the classloader 
removes certain jars from the classpath, resulting in ClassNotFoundException. 
I'm reopening the jira and reverting the patch from the respective branches via 
the linked jira.

 HS2 leaking open file descriptors when using UDFs
 -

 Key: HIVE-10453
 URL: https://issues.apache.org/jira/browse/HIVE-10453
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Fix For: 1.3.0, 1.2.1, 2.0.0

 Attachments: HIVE-10453.1.patch, HIVE-10453.2.patch


 1. create a custom function by
 CREATE FUNCTION myfunc AS 'someudfclass' using jar 'hdfs:///tmp/myudf.jar';
 2. Create a simple jdbc client, just do 
 connect, 
 run simple query which using the function such as:
 select myfunc(col1) from sometable
 3. Disconnect.
 Check open file for HiveServer2 by:
 lsof -p HSProcID | grep myudf.jar
 You will see the leak as:
 {noformat}
 java  28718 ychen  txt  REG1,4741 212977666 
 /private/var/folders/6p/7_njf13d6h144wldzbbsfpz8gp/T/1bfe3de0-ac63-4eba-a725-6a9840f1f8d5_resources/myudf.jar
 java  28718 ychen  330r REG1,4741 212977666 
 /private/var/folders/6p/7_njf13d6h144wldzbbsfpz8gp/T/1bfe3de0-ac63-4eba-a725-6a9840f1f8d5_resources/myudf.jar
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10959) Templeton launcher job should reconnect to the running child job on task retry

2015-06-09 Thread Ivan Mitic (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Mitic updated HIVE-10959:
--
Attachment: HIVE-10959.2.patch

Attaching updated patch based on above comments and additional testing.

 Templeton launcher job should reconnect to the running child job on task retry
 --

 Key: HIVE-10959
 URL: https://issues.apache.org/jira/browse/HIVE-10959
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.15.0
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Attachments: HIVE-10959.2.patch, HIVE-10959.patch


 Currently, Templeton launcher kills all child jobs (jobs tagged with the 
 parent job's id) upon task retry. 
 Upon templeton launcher task retry, templeton should reconnect to the running 
 job and continue tracking its progress that way. 
 This logic cannot be used for all job kinds (e.g. for jobs that are driven by 
 the client side like regular hive). However, for MapReduceV2, and possibly 
 Tez and HiveOnTez, this should be the default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10903) Add hive.in.test for HoS tests

2015-06-09 Thread Rui Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-10903:
--
Summary: Add hive.in.test for HoS tests  (was: Add hive.in.test for HoS 
tests [Spark Branch])

 Add hive.in.test for HoS tests
 --

 Key: HIVE-10903
 URL: https://issues.apache.org/jira/browse/HIVE-10903
 Project: Hive
  Issue Type: Test
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-10903.1.patch, HIVE-10903.2.patch


 Missing the property can make CBO fails to run during UT. There should be 
 other effects that can be identified here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10903) Add hive.in.test for HoS tests [Spark Branch]

2015-06-09 Thread Rui Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-10903:
--
Attachment: HIVE-10903.2.patch

Verified that the changes to golden files are inline with the MR version.
This patch also makes {{cbo_subq_in.q}} and 
{{groupby_complex_types_multi_single_reducer.q}} deterministic. So it's for 
master branch.

 Add hive.in.test for HoS tests [Spark Branch]
 -

 Key: HIVE-10903
 URL: https://issues.apache.org/jira/browse/HIVE-10903
 Project: Hive
  Issue Type: Test
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-10903.1.patch, HIVE-10903.2.patch


 Missing the property can make CBO fails to run during UT. There should be 
 other effects that can be identified here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-06-09 Thread Laljo John Pullokkaran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578474#comment-14578474
 ] 

Laljo John Pullokkaran commented on HIVE-10841:
---

[~a.semyannikov]
#1. We need both: SemanticAnalyzer(JoinTree) change for predicate push down  
OpProc factory to prevent illegal push downs.
#2 Yes, the failed tests need to be analyzed



 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-10841.1.patch, HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator

[jira] [Commented] (HIVE-10866) Throw error when client try to insert into bucketed table

2015-06-09 Thread Yongzhi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579110#comment-14579110
 ] 

Yongzhi Chen commented on HIVE-10866:
-

insert into bucketed table with data should not succeed when 
hive.enforce.bucketing true. fix unit test. 

 Throw error when client try to insert into bucketed table
 -

 Key: HIVE-10866
 URL: https://issues.apache.org/jira/browse/HIVE-10866
 Project: Hive
  Issue Type: Improvement
Affects Versions: 1.2.0, 1.3.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-10866.1.patch, HIVE-10866.2.patch


 Currently, hive does not support appends(insert into) bucketed table, see 
 open jira HIVE-3608. When insert into such table, the data will be 
 corrupted and not fit for bucketmapjoin. 
 We need find a way to prevent client from inserting into such table.
 Reproduce:
 {noformat}
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert into table buckettestoutput1 select code from sample_07 where 
 total_emp  134354250 limit 10;
 After this first insert, I did:
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 set hive.auto.convert.sortmerge.join.noconditionaltask=true;
 0: jdbc:hive2://localhost:1 select * from buckettestoutput1 a join 
 buckettestoutput2 b on (a.data=b.data);
 +---+---+
 | data  | data  |
 +---+---+
 +---+---+
 So select works fine. 
 Second insert:
 0: jdbc:hive2://localhost:1 insert into table buckettestoutput1 select 
 code from sample_07 where total_emp = 134354250 limit 10;
 No rows affected (61.235 seconds)
 Then select:
 0: jdbc:hive2://localhost:1 select * from buckettestoutput1 a join 
 buckettestoutput2 b on (a.data=b.data);
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 4 
 (state=42000,code=10141)
 0: jdbc:hive2://localhost:1
 {noformat}
 Insert into empty table or partition will be fine, but insert into the 
 non-empty one (after second insert in the reproduce), the bucketmapjoin will 
 throw an error. We should not let second insert succeed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10963) Hive throws NPE rather than meaningful error message when window is missing

2015-06-09 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10963:

Attachment: (was: HIVE-10963.patch)

 Hive throws NPE rather than meaningful error message when window is missing
 ---

 Key: HIVE-10963
 URL: https://issues.apache.org/jira/browse/HIVE-10963
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Affects Versions: 1.3.0
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-10963.patch


 {{select sum(salary) over w1 from emp;}} throws NPE rather than meaningful 
 error message like missing window.
 And also give the right window name rather than the classname in the error 
 message after NPE issue is fixed.
 {noformat}
 org.apache.hadoop.hive.ql.parse.SemanticException: Window Spec 
 org.apache.hadoop.hive.ql.parse.WindowingSpec$WindowSpec@7954e1de refers to 
 an unknown source
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10903) Add hive.in.test for HoS tests

2015-06-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579073#comment-14579073
 ] 

Hive QA commented on HIVE-10903:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12738582/HIVE-10903.3.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9004 tests executed
*Failed tests:*
{noformat}
org.apache.hive.beeline.TestSchemaTool.testSchemaInit
org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4225/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4225/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4225/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12738582 - PreCommit-HIVE-TRUNK-Build

 Add hive.in.test for HoS tests
 --

 Key: HIVE-10903
 URL: https://issues.apache.org/jira/browse/HIVE-10903
 Project: Hive
  Issue Type: Test
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-10903.1.patch, HIVE-10903.2.patch, 
 HIVE-10903.3.patch


 Missing the property can make CBO fails to run during UT. There should be 
 other effects that can be identified here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10880) The bucket number is not respected in insert overwrite.

2015-06-09 Thread Yongzhi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579119#comment-14579119
 ] 

Yongzhi Chen commented on HIVE-10880:
-

[~xuefuz], I agree with you, there are something more serious than the missing 
files. I think the bucket algorithm is broken. I just tried to insert overwrite 
from a very big table, all the data goes to one bucket too. Seems the hash map 
no longer working. I will try to figure out why. 

 The bucket number is not respected in insert overwrite.
 ---

 Key: HIVE-10880
 URL: https://issues.apache.org/jira/browse/HIVE-10880
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Blocker
 Attachments: HIVE-10880.1.patch, HIVE-10880.2.patch, 
 HIVE-10880.3.patch


 When hive.enforce.bucketing is true, the bucket number defined in the table 
 is no longer respected in current master and 1.2. This is a regression.
 Reproduce:
 {noformat}
 CREATE TABLE IF NOT EXISTS buckettestinput( 
 data string 
 ) 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 Then I inserted the following data into the buckettestinput table
 firstinsert1 
 firstinsert2 
 firstinsert3 
 firstinsert4 
 firstinsert5 
 firstinsert6 
 firstinsert7 
 firstinsert8 
 secondinsert1 
 secondinsert2 
 secondinsert3 
 secondinsert4 
 secondinsert5 
 secondinsert6 
 secondinsert7 
 secondinsert8
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%';
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
 (state=42000,code=10141)
 {noformat}
 The related debug information related to insert overwrite:
 {noformat}
 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%'insert overwrite table 
 buckettestoutput1 
 0: jdbc:hive2://localhost:1 ;
 select * from buckettestinput where data like ' 
 first%';
 INFO  : Number of reduce tasks determined at compile time: 2
 INFO  : In order to change the average load for a reducer (in bytes):
 INFO  :   set hive.exec.reducers.bytes.per.reducer=number
 INFO  : In order to limit the maximum number of reducers:
 INFO  :   set hive.exec.reducers.max=number
 INFO  : In order to set a constant number of reducers:
 INFO  :   set mapred.reduce.tasks=number
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
 INFO  : Ended Job = job_local107155352_0001
 INFO  : Loading data to table default.buckettestoutput1 from 
 file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1
 INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, 
 totalSize=52, rawDataSize=48]
 No rows affected (1.692 seconds)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10943) Beeline-cli: Enable precommit for beelie-cli branch

2015-06-09 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-10943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579076#comment-14579076
 ] 

Sergio Peña commented on HIVE-10943:


The variable BEELINE-CLI_URL is not valid for shell scripts because. Could you 
try with BEELINE_CLI_URL?

{code}
$ BEELINE-CLI_URL=url
BEELINE-CLI_URL=url: command not found

$ BEELINE_CLI_URL=url
$
{code}

 Beeline-cli: Enable precommit for beelie-cli branch 
 

 Key: HIVE-10943
 URL: https://issues.apache.org/jira/browse/HIVE-10943
 Project: Hive
  Issue Type: Sub-task
  Components: Testing Infrastructure
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
Priority: Minor
 Attachments: HIVE-10943.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10866) Throw error when client try to insert into bucketed table

2015-06-09 Thread Yongzhi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-10866:

Attachment: HIVE-10866.2.patch

 Throw error when client try to insert into bucketed table
 -

 Key: HIVE-10866
 URL: https://issues.apache.org/jira/browse/HIVE-10866
 Project: Hive
  Issue Type: Improvement
Affects Versions: 1.2.0, 1.3.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-10866.1.patch, HIVE-10866.2.patch


 Currently, hive does not support appends(insert into) bucketed table, see 
 open jira HIVE-3608. When insert into such table, the data will be 
 corrupted and not fit for bucketmapjoin. 
 We need find a way to prevent client from inserting into such table.
 Reproduce:
 {noformat}
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert into table buckettestoutput1 select code from sample_07 where 
 total_emp  134354250 limit 10;
 After this first insert, I did:
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 set hive.auto.convert.sortmerge.join.noconditionaltask=true;
 0: jdbc:hive2://localhost:1 select * from buckettestoutput1 a join 
 buckettestoutput2 b on (a.data=b.data);
 +---+---+
 | data  | data  |
 +---+---+
 +---+---+
 So select works fine. 
 Second insert:
 0: jdbc:hive2://localhost:1 insert into table buckettestoutput1 select 
 code from sample_07 where total_emp = 134354250 limit 10;
 No rows affected (61.235 seconds)
 Then select:
 0: jdbc:hive2://localhost:1 select * from buckettestoutput1 a join 
 buckettestoutput2 b on (a.data=b.data);
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 4 
 (state=42000,code=10141)
 0: jdbc:hive2://localhost:1
 {noformat}
 Insert into empty table or partition will be fine, but insert into the 
 non-empty one (after second insert in the reproduce), the bucketmapjoin will 
 throw an error. We should not let second insert succeed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10963) Hive throws NPE rather than meaningful error message when window is missing

2015-06-09 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579011#comment-14579011
 ] 

Ashutosh Chauhan commented on HIVE-10963:
-

+1

 Hive throws NPE rather than meaningful error message when window is missing
 ---

 Key: HIVE-10963
 URL: https://issues.apache.org/jira/browse/HIVE-10963
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Affects Versions: 1.3.0
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-10963.patch


 {{select sum(salary) over w1 from emp;}} throws NPE rather than meaningful 
 error message like missing window.
 And also give the right window name rather than the classname in the error 
 message after NPE issue is fixed.
 {noformat}
 org.apache.hadoop.hive.ql.parse.SemanticException: Window Spec 
 org.apache.hadoop.hive.ql.parse.WindowingSpec$WindowSpec@7954e1de refers to 
 an unknown source
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10963) Hive throws NPE rather than meaningful error message when window is missing

2015-06-09 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10963:

Attachment: HIVE-10963.patch

 Hive throws NPE rather than meaningful error message when window is missing
 ---

 Key: HIVE-10963
 URL: https://issues.apache.org/jira/browse/HIVE-10963
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Affects Versions: 1.3.0
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-10963.patch


 {{select sum(salary) over w1 from emp;}} throws NPE rather than meaningful 
 error message like missing window.
 And also give the right window name rather than the classname in the error 
 message after NPE issue is fixed.
 {noformat}
 org.apache.hadoop.hive.ql.parse.SemanticException: Window Spec 
 org.apache.hadoop.hive.ql.parse.WindowingSpec$WindowSpec@7954e1de refers to 
 an unknown source
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7180) BufferedReader is not closed in MetaStoreSchemaInfo ctor

2015-06-09 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HIVE-7180:
-
Description: 
Here is related code:
{code}
  BufferedReader bfReader =
new BufferedReader(new FileReader(upgradeListFile));
  String currSchemaVersion;
  while ((currSchemaVersion = bfReader.readLine()) != null) {
upgradeOrderList.add(currSchemaVersion.trim());
{code}
BufferedReader / FileReader should be closed upon return from ctor.

  was:
Here is related code:
{code}
  BufferedReader bfReader =
new BufferedReader(new FileReader(upgradeListFile));
  String currSchemaVersion;
  while ((currSchemaVersion = bfReader.readLine()) != null) {
upgradeOrderList.add(currSchemaVersion.trim());
{code}

BufferedReader / FileReader should be closed upon return from ctor.


 BufferedReader is not closed in MetaStoreSchemaInfo ctor
 

 Key: HIVE-7180
 URL: https://issues.apache.org/jira/browse/HIVE-7180
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Ted Yu
Assignee: skrho
Priority: Minor
  Labels: patch
 Attachments: HIVE-7180.patch, HIVE-7180_001.patch


 Here is related code:
 {code}
   BufferedReader bfReader =
 new BufferedReader(new FileReader(upgradeListFile));
   String currSchemaVersion;
   while ((currSchemaVersion = bfReader.readLine()) != null) {
 upgradeOrderList.add(currSchemaVersion.trim());
 {code}
 BufferedReader / FileReader should be closed upon return from ctor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7172) Potential resource leak in HiveSchemaTool#getMetaStoreSchemaVersion()

2015-06-09 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HIVE-7172:
-
Description: 
{code}
  ResultSet res = stmt.executeQuery(versionQuery);
  if (!res.next()) {
throw new HiveMetaException(Didn't find version data in metastore);
  }
  String currentSchemaVersion = res.getString(1);
  metastoreConn.close();
{code}

When HiveMetaException is thrown, metastoreConn.close() would be skipped.
stmt is not closed upon return from the method.

  was:
{code}
  ResultSet res = stmt.executeQuery(versionQuery);
  if (!res.next()) {
throw new HiveMetaException(Didn't find version data in metastore);
  }
  String currentSchemaVersion = res.getString(1);
  metastoreConn.close();
{code}
When HiveMetaException is thrown, metastoreConn.close() would be skipped.
stmt is not closed upon return from the method.


 Potential resource leak in HiveSchemaTool#getMetaStoreSchemaVersion()
 -

 Key: HIVE-7172
 URL: https://issues.apache.org/jira/browse/HIVE-7172
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-7172.patch


 {code}
   ResultSet res = stmt.executeQuery(versionQuery);
   if (!res.next()) {
 throw new HiveMetaException(Didn't find version data in metastore);
   }
   String currentSchemaVersion = res.getString(1);
   metastoreConn.close();
 {code}
 When HiveMetaException is thrown, metastoreConn.close() would be skipped.
 stmt is not closed upon return from the method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7305) Return value from in.read() is ignored in SerializationUtils#readLongLE()

2015-06-09 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HIVE-7305:
-
Description: 
{code}
  long readLongLE(InputStream in) throws IOException {
in.read(readBuffer, 0, 8);
return (((readBuffer[0]  0xff)  0)
+ ((readBuffer[1]  0xff)  8)
{code}
Return value from read() may indicate fewer than 8 bytes read.
The return value should be checked.

  was:
{code}
  long readLongLE(InputStream in) throws IOException {
in.read(readBuffer, 0, 8);
return (((readBuffer[0]  0xff)  0)
+ ((readBuffer[1]  0xff)  8)
{code}
Return value from read() may indicate fewer than 8 bytes read.

The return value should be checked.


 Return value from in.read() is ignored in SerializationUtils#readLongLE()
 -

 Key: HIVE-7305
 URL: https://issues.apache.org/jira/browse/HIVE-7305
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: skrho
Priority: Minor
 Attachments: HIVE-7305_001.patch


 {code}
   long readLongLE(InputStream in) throws IOException {
 in.read(readBuffer, 0, 8);
 return (((readBuffer[0]  0xff)  0)
 + ((readBuffer[1]  0xff)  8)
 {code}
 Return value from read() may indicate fewer than 8 bytes read.
 The return value should be checked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10956) HS2 leaks HMS connections

2015-06-09 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-10956:
---
Attachment: HIVE-10956.2.patch

Attached patch v2. It is also on RB: https://reviews.apache.org/r/35256/
The new patch closes the connection when the Hive session is closed.

 HS2 leaks HMS connections
 -

 Key: HIVE-10956
 URL: https://issues.apache.org/jira/browse/HIVE-10956
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-10956.1.patch, HIVE-10956.2.patch


 HS2 uses threadlocal to cache HMS client in class Hive. When the thread is 
 dead, the HMS client is not closed. So the connection to the HMS is leaked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10963) Hive throws NPE rather than meaningful error message when window is missing

2015-06-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579215#comment-14579215
 ] 

Hive QA commented on HIVE-10963:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12738596/HIVE-10963.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9005 tests executed
*Failed tests:*
{noformat}
org.apache.hive.beeline.TestSchemaTool.testSchemaInit
org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4226/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4226/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4226/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12738596 - PreCommit-HIVE-TRUNK-Build

 Hive throws NPE rather than meaningful error message when window is missing
 ---

 Key: HIVE-10963
 URL: https://issues.apache.org/jira/browse/HIVE-10963
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Affects Versions: 1.3.0
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-10963.patch


 {{select sum(salary) over w1 from emp;}} throws NPE rather than meaningful 
 error message like missing window.
 And also give the right window name rather than the classname in the error 
 message after NPE issue is fixed.
 {noformat}
 org.apache.hadoop.hive.ql.parse.SemanticException: Window Spec 
 org.apache.hadoop.hive.ql.parse.WindowingSpec$WindowSpec@7954e1de refers to 
 an unknown source
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10882) CBO: Calcite Operator To Hive Operator (Calcite Return Path) empty filterMap of join operator causes NPE exception

2015-06-09 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579242#comment-14579242
 ] 

Pengcheng Xiong commented on HIVE-10882:


[~jcamachorodriguez], I have started but I have not figured out a solution yet. 
Thus, please go ahead and take it as I am busy with UT failures these days. 
Also ccing [~jpullokkaran]. Thanks.

 CBO: Calcite Operator To Hive Operator (Calcite Return Path) empty filterMap 
 of join operator causes NPE exception
 --

 Key: HIVE-10882
 URL: https://issues.apache.org/jira/browse/HIVE-10882
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong

 CBO return path creates join operator with empty filters. However, 
 vectorization is checking the filters of bigTable in join. This causes NPE 
 exception. To reproduce, run vector_outer_join2.q with return path turned on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10882) CBO: Calcite Operator To Hive Operator (Calcite Return Path) empty filterMap of join operator causes NPE exception

2015-06-09 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10882:
---
Assignee: Jesus Camacho Rodriguez  (was: Pengcheng Xiong)

 CBO: Calcite Operator To Hive Operator (Calcite Return Path) empty filterMap 
 of join operator causes NPE exception
 --

 Key: HIVE-10882
 URL: https://issues.apache.org/jira/browse/HIVE-10882
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Jesus Camacho Rodriguez

 CBO return path creates join operator with empty filters. However, 
 vectorization is checking the filters of bigTable in join. This causes NPE 
 exception. To reproduce, run vector_outer_join2.q with return path turned on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7598) Potential null pointer dereference in MergeTask#closeJob()

2015-06-09 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HIVE-7598:
-
Description: 
Call to Utilities.mvFileToFinalPath() passes null as second last parameter, 
conf.
null gets passed to createEmptyBuckets() which dereferences conf directly:
{code}
boolean isCompressed = conf.getCompressed();
TableDesc tableInfo = conf.getTableInfo();
{code}

  was:
Call to Utilities.mvFileToFinalPath() passes null as second last parameter, 
conf.

null gets passed to createEmptyBuckets() which dereferences conf directly:
{code}
boolean isCompressed = conf.getCompressed();
TableDesc tableInfo = conf.getTableInfo();
{code}


 Potential null pointer dereference in MergeTask#closeJob()
 --

 Key: HIVE-7598
 URL: https://issues.apache.org/jira/browse/HIVE-7598
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: SUYEON LEE
Priority: Minor
 Attachments: HIVE-7598.patch


 Call to Utilities.mvFileToFinalPath() passes null as second last parameter, 
 conf.
 null gets passed to createEmptyBuckets() which dereferences conf directly:
 {code}
 boolean isCompressed = conf.getCompressed();
 TableDesc tableInfo = conf.getTableInfo();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7150) FileInputStream is not closed in HiveConnection#getHttpClient()

2015-06-09 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HIVE-7150:
-
Description: 
Here is related code:

{code}
sslTrustStore.load(new FileInputStream(sslTrustStorePath),
sslTrustStorePassword.toCharArray());
{code}
The FileInputStream is not closed upon returning from the method.

  was:
Here is related code:
{code}
sslTrustStore.load(new FileInputStream(sslTrustStorePath),
sslTrustStorePassword.toCharArray());
{code}
The FileInputStream is not closed upon returning from the method.


 FileInputStream is not closed in HiveConnection#getHttpClient()
 ---

 Key: HIVE-7150
 URL: https://issues.apache.org/jira/browse/HIVE-7150
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
  Labels: jdbc
 Attachments: HIVE-7150.1.patch, HIVE-7150.2.patch


 Here is related code:
 {code}
 sslTrustStore.load(new FileInputStream(sslTrustStorePath),
 sslTrustStorePassword.toCharArray());
 {code}
 The FileInputStream is not closed upon returning from the method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10943) Beeline-cli: Enable precommit for beelie-cli branch

2015-06-09 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-10943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579420#comment-14579420
 ] 

Sergio Peña commented on HIVE-10943:


+1

I created the job in jenkins, and added the new properties file to the jenkins 
instance.
I think I need to restart jenkins, I'll wait until there are no more jobs 
running.

 Beeline-cli: Enable precommit for beelie-cli branch 
 

 Key: HIVE-10943
 URL: https://issues.apache.org/jira/browse/HIVE-10943
 Project: Hive
  Issue Type: Sub-task
  Components: Testing Infrastructure
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
Priority: Minor
 Attachments: HIVE-10943.patch, HIVE-10943.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-06-09 Thread Laljo John Pullokkaran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran reassigned HIVE-10841:
-

Assignee: Laljo John Pullokkaran  (was: Alexander Pivovarov)

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0
Reporter: Alexander Pivovarov
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-10841.1.patch, HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE

[jira] [Commented] (HIVE-10968) Windows: analyze json table via beeline failed throwing Class org.apache.hive.hcatalog.data.JsonSerDe not found

2015-06-09 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579286#comment-14579286
 ] 

Thejas M Nair commented on HIVE-10968:
--

+1
Thanks for the patch Hari!


 Windows: analyze json table via beeline failed throwing Class 
 org.apache.hive.hcatalog.data.JsonSerDe not found
 ---

 Key: HIVE-10968
 URL: https://issues.apache.org/jira/browse/HIVE-10968
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
 Environment: Windows
Reporter: Takahiko Saito
Assignee: Hari Sankar Sivarama Subramaniyan
 Fix For: 1.2.1

 Attachments: HIVE-10968.1.patch


 NO PRECOMMIT TESTS
 Run the following via beeline:
 {noformat}0: jdbc:hive2://localhost:10001 analyze table all100kjson compute 
 statistics;
 15/06/05 20:44:11 INFO log.PerfLogger: PERFLOG method=parse 
 from=org.apache.hadoop.hive.ql.Driver
 15/06/05 20:44:11 INFO parse.ParseDriver: Parsing command: analyze table 
 all100kjson compute statistics
 15/06/05 20:44:11 INFO parse.ParseDriver: Parse Completed
 15/06/05 20:44:11 INFO log.PerfLogger: /PERFLOG method=parse 
 start=1433537051075 end=1433537051077 duration=2 from=org.
 apache.hadoop.hive.ql.Driver
 15/06/05 20:44:11 INFO log.PerfLogger: PERFLOG method=semanticAnalyze 
 from=org.apache.hadoop.hive.ql.Driver
 15/06/05 20:44:11 INFO parse.ColumnStatsSemanticAnalyzer: Invoking analyze on 
 original query
 15/06/05 20:44:11 INFO parse.ColumnStatsSemanticAnalyzer: Starting Semantic 
 Analysis
 15/06/05 20:44:11 INFO parse.ColumnStatsSemanticAnalyzer: Completed phase 1 
 of Semantic Analysis
 15/06/05 20:44:11 INFO parse.ColumnStatsSemanticAnalyzer: Get metadata for 
 source tables
 15/06/05 20:44:11 INFO metastore.HiveMetaStore: 5: get_table : db=default 
 tbl=all100kjson
 15/06/05 20:44:11 INFO HiveMetaStore.audit: ugi=hadoopqa
 ip=unknown-ip-addr  cmd=get_table : db=default tbl=a
 ll100kjson
 15/06/05 20:44:11 INFO metastore.HiveMetaStore: 5: get_table : db=default 
 tbl=all100kjson
 15/06/05 20:44:11 INFO HiveMetaStore.audit: ugi=hadoopqa
 ip=unknown-ip-addr  cmd=get_table : db=default tbl=a
 ll100kjson
 15/06/05 20:44:11 INFO parse.ColumnStatsSemanticAnalyzer: Get metadata for 
 subqueries
 15/06/05 20:44:11 INFO parse.ColumnStatsSemanticAnalyzer: Get metadata for 
 destination tables
 15/06/05 20:44:11 INFO parse.ColumnStatsSemanticAnalyzer: Completed getting 
 MetaData in Semantic Analysis
 15/06/05 20:44:11 INFO common.FileUtils: Creating directory if it doesn't 
 exist: hdfs://dal-hs211:8020/user/hcat/tests/d
 ata/all100kjson/.hive-staging_hive_2015-06-05_20-44-11_075_4520028480897676073-5
 15/06/05 20:44:11 INFO parse.ColumnStatsSemanticAnalyzer: Set stats 
 collection dir : hdfs://dal-hs211:8020/user/hcat/tes
 ts/data/all100kjson/.hive-staging_hive_2015-06-05_20-44-11_075_4520028480897676073-5/-ext-1
 15/06/05 20:44:11 INFO ppd.OpProcFactory: Processing for TS(5)
 15/06/05 20:44:11 INFO log.PerfLogger: PERFLOG method=partition-retrieving 
 from=org.apache.hadoop.hive.ql.optimizer.ppr
 .PartitionPruner
 15/06/05 20:44:11 INFO log.PerfLogger: /PERFLOG method=partition-retrieving 
 start=1433537051345 end=1433537051345 durat
 ion=0 from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner
 15/06/05 20:44:11 INFO metastore.HiveMetaStore: 5: get_indexes : db=default 
 tbl=all100kjson
 15/06/05 20:44:11 INFO HiveMetaStore.audit: ugi=hadoopqa
 ip=unknown-ip-addr  cmd=get_indexes : db=default tbl
 =all100kjson
 15/06/05 20:44:11 INFO metastore.HiveMetaStore: 5: get_indexes : db=default 
 tbl=all100kjson
 15/06/05 20:44:11 INFO HiveMetaStore.audit: ugi=hadoopqa
 ip=unknown-ip-addr  cmd=get_indexes : db=default tbl
 =all100kjson
 15/06/05 20:44:11 INFO physical.NullScanTaskDispatcher: Looking for table 
 scans where optimization is applicable
 15/06/05 20:44:11 INFO physical.NullScanTaskDispatcher: Found 0 null table 
 scans
 15/06/05 20:44:11 INFO physical.NullScanTaskDispatcher: Looking for table 
 scans where optimization is applicable
 15/06/05 20:44:11 INFO physical.NullScanTaskDispatcher: Found 0 null table 
 scans
 15/06/05 20:44:11 INFO physical.NullScanTaskDispatcher: Looking for table 
 scans where optimization is applicable
 15/06/05 20:44:11 INFO physical.NullScanTaskDispatcher: Found 0 null table 
 scans
 15/06/05 20:44:11 INFO physical.Vectorizer: Validating MapWork...
 15/06/05 20:44:11 INFO physical.Vectorizer: Input format: 
 org.apache.hadoop.mapred.TextInputFormat, doesn't provide vect
 orized input
 15/06/05 20:44:11 INFO parse.ColumnStatsSemanticAnalyzer: Completed plan 
 generation
 15/06/05 20:44:11 INFO ql.Driver: Semantic Analysis Completed
 15/06/05 20:44:11 INFO log.PerfLogger: /PERFLOG

[jira] [Updated] (HIVE-10966) direct SQL for stats has a cast exception on some databases

2015-06-09 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-10966:
-
Attachment: HIVE-10966.patch

Uploading patch for kicking off another test run.


 direct SQL for stats has a cast exception on some databases
 ---

 Key: HIVE-10966
 URL: https://issues.apache.org/jira/browse/HIVE-10966
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 1.3.0, 1.2.1, 2.0.0

 Attachments: HIVE-10966.patch, HIVE-10966.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10866) Throw error when client try to insert into bucketed table

2015-06-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579366#comment-14579366
 ] 

Hive QA commented on HIVE-10866:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12738599/HIVE-10866.2.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9005 tests executed
*Failed tests:*
{noformat}
org.apache.hive.beeline.TestSchemaTool.testSchemaInit
org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade
org.apache.hive.hcatalog.streaming.TestStreaming.testRemainingTransactions
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4227/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4227/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4227/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12738599 - PreCommit-HIVE-TRUNK-Build

 Throw error when client try to insert into bucketed table
 -

 Key: HIVE-10866
 URL: https://issues.apache.org/jira/browse/HIVE-10866
 Project: Hive
  Issue Type: Improvement
Affects Versions: 1.2.0, 1.3.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-10866.1.patch, HIVE-10866.2.patch


 Currently, hive does not support appends(insert into) bucketed table, see 
 open jira HIVE-3608. When insert into such table, the data will be 
 corrupted and not fit for bucketmapjoin. 
 We need find a way to prevent client from inserting into such table.
 Reproduce:
 {noformat}
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert into table buckettestoutput1 select code from sample_07 where 
 total_emp  134354250 limit 10;
 After this first insert, I did:
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 set hive.auto.convert.sortmerge.join.noconditionaltask=true;
 0: jdbc:hive2://localhost:1 select * from buckettestoutput1 a join 
 buckettestoutput2 b on (a.data=b.data);
 +---+---+
 | data  | data  |
 +---+---+
 +---+---+
 So select works fine. 
 Second insert:
 0: jdbc:hive2://localhost:1 insert into table buckettestoutput1 select 
 code from sample_07 where total_emp = 134354250 limit 10;
 No rows affected (61.235 seconds)
 Then select:
 0: jdbc:hive2://localhost:1 select * from buckettestoutput1 a join 
 buckettestoutput2 b on (a.data=b.data);
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 4 
 (state=42000,code=10141)
 0: jdbc:hive2://localhost:1
 {noformat}
 Insert into empty table or partition will be fine, but insert into the 
 non-empty one (after second insert in the reproduce), the bucketmapjoin will 
 throw an error. We should not let second insert succeed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10943) Beeline-cli: Enable precommit for beelie-cli branch

2015-06-09 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10943:
---
Attachment: HIVE-10943.patch

Fix the type in the patch and reattached.

 Beeline-cli: Enable precommit for beelie-cli branch 
 

 Key: HIVE-10943
 URL: https://issues.apache.org/jira/browse/HIVE-10943
 Project: Hive
  Issue Type: Sub-task
  Components: Testing Infrastructure
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
Priority: Minor
 Attachments: HIVE-10943.patch, HIVE-10943.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-06-09 Thread Laljo John Pullokkaran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579397#comment-14579397
 ] 

Laljo John Pullokkaran commented on HIVE-10841:
---

[~a.semyannikov] I hope you don't mind; i have assigned the bug to myself.

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0
Reporter: Alexander Pivovarov
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-10841.1.patch, HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4

[jira] [Commented] (HIVE-10929) In Tez mode,dynamic partitioning query with union all fails at moveTask,Invalid partition key values

2015-06-09 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579525#comment-14579525
 ] 

Vaibhav Gumashta commented on HIVE-10929:
-

Also committed to branch-1.

 In Tez mode,dynamic partitioning query with union all fails at 
 moveTask,Invalid partition key  values
 --

 Key: HIVE-10929
 URL: https://issues.apache.org/jira/browse/HIVE-10929
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 1.2.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: 1.3.0, 1.2.1, 2.0.0

 Attachments: HIVE-10929.1.patch, HIVE-10929.2.patch, 
 HIVE-10929.3.patch, HIVE-10929.4.patch


 {code}
 create table dummy(i int);
 insert into table dummy values (1);
 select * from dummy;
 create table partunion1(id1 int) partitioned by (part1 string);
 set hive.exec.dynamic.partition.mode=nonstrict;
 set hive.execution.engine=tez;
 explain insert into table partunion1 partition(part1)
 select temps.* from (
 select 1 as id1, '2014' as part1 from dummy 
 union all 
 select 2 as id1, '2014' as part1 from dummy ) temps;
 insert into table partunion1 partition(part1)
 select temps.* from (
 select 1 as id1, '2014' as part1 from dummy 
 union all 
 select 2 as id1, '2014' as part1 from dummy ) temps;
 select * from partunion1;
 {code}
 fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10974) Use Configuration::getRaw() for the Base64 data

2015-06-09 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10974:
---
Attachment: HIVE-10974.1.patch

 Use Configuration::getRaw() for the Base64 data
 ---

 Key: HIVE-10974
 URL: https://issues.apache.org/jira/browse/HIVE-10974
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: HIVE-10974.1.patch


 Inspired by the Twitter HadoopSummit talk
 {code}
if (HiveConf.getBoolVar(conf, ConfVars.HIVE_RPC_QUERY_PLAN)) {
   LOG.debug(Loading plan from string: +path.toUri().getPath());
   String planString = conf.get(path.toUri().getPath());
 {code}
 Use getRaw() in other places where Base64 data is present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10974) Use Configuration::getRaw() for the Base64 data

2015-06-09 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10974:
---
Summary: Use Configuration::getRaw() for the Base64 data  (was: Use 
Cofiguration::getRaw() for the Base64 data)

 Use Configuration::getRaw() for the Base64 data
 ---

 Key: HIVE-10974
 URL: https://issues.apache.org/jira/browse/HIVE-10974
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: HIVE-10974.1.patch


 Inspired by the Twitter HadoopSummit talk
 {code}
if (HiveConf.getBoolVar(conf, ConfVars.HIVE_RPC_QUERY_PLAN)) {
   LOG.debug(Loading plan from string: +path.toUri().getPath());
   String planString = conf.get(path.toUri().getPath());
 {code}
 Use getRaw() in other places where Base64 data is present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6991) History not able to disable/enable after session started

2015-06-09 Thread Chinna Rao Lalam (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579438#comment-14579438
 ] 

Chinna Rao Lalam commented on HIVE-6991:


Hi [~jxiang],

Yes.. I have tested it  in my live cluster by disable and enable this property, 
in CLI and HiveServer2. It is working as expected.
Thanks for the review.

 History not able to disable/enable after session started
 

 Key: HIVE-6991
 URL: https://issues.apache.org/jira/browse/HIVE-6991
 Project: Hive
  Issue Type: Bug
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-6991.1.patch, HIVE-6991.2.patch, HIVE-6991.patch


 By default history is disabled, after session started if enable history 
 through this command set hive.session.history.enabled=true. It is not working.
 I think it will help to this user query
 http://mail-archives.apache.org/mod_mbox/hive-user/201404.mbox/%3ccajqy7afapa_pjs6buon0o8zyt2qwfn2wt-mtznwfmurav_8...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-06-09 Thread Alexander Pivovarov (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579530#comment-14579530
 ] 

Alexander Pivovarov commented on HIVE-10841:


Sure, np. Btw, my apache id is apivovarov

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0
Reporter: Alexander Pivovarov
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-10841.1.patch, HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE

[jira] [Updated] (HIVE-10961) LLAP: ShuffleHandler + Submit work init race condition

2015-06-09 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-10961:
--
Attachment: HIVE-10961.1.txt

 LLAP: ShuffleHandler + Submit work init race condition
 --

 Key: HIVE-10961
 URL: https://issues.apache.org/jira/browse/HIVE-10961
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Gopal V
Assignee: Siddharth Seth
 Fix For: llap

 Attachments: HIVE-10961.1.txt


 When flexing in a new node, it accepts DAG requests before the shuffle 
 handler is setup, causing fatals
 {code}
 DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:2
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
 vertexId=vertex_1433459966952_0729_1_00, diagnostics=[Task failed, 
 taskId=task_1t
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:145)
 at 
 org.apache.hadoop.hive.llap.shufflehandler.ShuffleHandler.get(ShuffleHandler.java:353)
 at 
 org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl.submitWork(ContainerRunnerImpl.java:192)
 at 
 org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.submitWork(LlapDaemon.java:301)
 at 
 org.apache.hadoop.hive.llap.daemon.impl.LlapDaemonProtocolServerImpl.submitWork(LlapDaemonProtocolServerImpl.java:75)
 at 
 org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2.callBlockingMethod(LlapDaemonProtocolProtos.java:12094)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:972)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2085)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2081)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1654)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2081)
 ], TaskAttempt 1 failed, 
 info=[org.apache.hadoop.ipc.RemoteException(java.lang.IllegalStateException): 
 ShuffleHandler must be started before invoking get
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:145)
 at 
 org.apache.hadoop.hive.llap.shufflehandler.ShuffleHandler.get(ShuffleHandler.java:353)
 at 
 org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl.submitWork(ContainerRunnerImpl.java:192)
 at 
 org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.submitWork(LlapDaemon.java:301)
 at 
 org.apache.hadoop.hive.llap.daemon.impl.LlapDaemonProtocolServerImpl.submitWork(LlapDaemonProtocolServerImpl.java:75)
 at 
 org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2.callBlockingMethod(LlapDaemonProtocolProtos.java:12094)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:972)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2085)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10972) DummyTxnManager always locks the current database in shared mode, which is incorrect.

2015-06-09 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10972:

Description: 
In DummyTxnManager [line 163 | 
http://grepcode.com/file/repo1.maven.org/maven2/co.cask.cdap/hive-exec/0.13.0/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java#163],
 it always locks the current database. 

That is not correct since the current database can be db1, and the query can 
be select * from db2.tb1, which will lock db1 unnecessarily.

  was:
In DummyTxnManager [line 163 | 
http://grepcode.com/file/repo1.maven.org/maven2/co.cask.cdap/hive-exec/0.13.0/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java#163],
 it always locks the current database. 

That is not correct since the current database can be db1, and the query can 
be select * from db2.tb1, which will lock db1 unnessary.


 DummyTxnManager always locks the current database in shared mode, which is 
 incorrect.
 -

 Key: HIVE-10972
 URL: https://issues.apache.org/jira/browse/HIVE-10972
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 2.0.0
Reporter: Aihua Xu
Assignee: Aihua Xu

 In DummyTxnManager [line 163 | 
 http://grepcode.com/file/repo1.maven.org/maven2/co.cask.cdap/hive-exec/0.13.0/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java#163],
  it always locks the current database. 
 That is not correct since the current database can be db1, and the query 
 can be select * from db2.tb1, which will lock db1 unnecessarily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10956) HS2 leaks HMS connections

2015-06-09 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-10956:
---
Attachment: HIVE-10956.3.patch

 HS2 leaks HMS connections
 -

 Key: HIVE-10956
 URL: https://issues.apache.org/jira/browse/HIVE-10956
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-10956.1.patch, HIVE-10956.2.patch, 
 HIVE-10956.3.patch


 HS2 uses threadlocal to cache HMS client in class Hive. When the thread is 
 dead, the HMS client is not closed. So the connection to the HMS is leaked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10415) hive.start.cleanup.scratchdir configuration is not taking effect

2015-06-09 Thread Chinna Rao Lalam (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579446#comment-14579446
 ] 

Chinna Rao Lalam commented on HIVE-10415:
-

Hi [~jxiang],

Yes.. By default this setting is off.
Thanks for the review.

 hive.start.cleanup.scratchdir configuration is not taking effect
 

 Key: HIVE-10415
 URL: https://issues.apache.org/jira/browse/HIVE-10415
 Project: Hive
  Issue Type: Bug
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-10415.patch


 This configuration hive.start.cleanup.scratchdir is not taking effect



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10956) HS2 leaks HMS connections

2015-06-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579491#comment-14579491
 ] 

Hive QA commented on HIVE-10956:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12738615/HIVE-10956.2.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 8990 tests executed
*Failed tests:*
{noformat}
TestEncryptedHDFSCliDriver - did not produce a TEST-*.xml file
org.apache.hive.beeline.TestSchemaTool.testSchemaInit
org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4228/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4228/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4228/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12738615 - PreCommit-HIVE-TRUNK-Build

 HS2 leaks HMS connections
 -

 Key: HIVE-10956
 URL: https://issues.apache.org/jira/browse/HIVE-10956
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-10956.1.patch, HIVE-10956.2.patch


 HS2 uses threadlocal to cache HMS client in class Hive. When the thread is 
 dead, the HMS client is not closed. So the connection to the HMS is leaked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10857) Accumulo storage handler fail throwing java.lang.IllegalArgumentException: Cannot determine SASL mechanism for token class: class org.apache.accumulo.core.client.securi

2015-06-09 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579452#comment-14579452
 ] 

Sushanth Sowmyan commented on HIVE-10857:
-

+1.

I don't know enough about Accumulo to know that the patch does fix the issue, 
but the code makes sense, if it works the way it reads, and I see that this 
does not negatively impact the backward compatible case, and does not 
negatively impact hive. Also, the relevant .q and unit tests all succeed.

 Accumulo storage handler fail throwing java.lang.IllegalArgumentException: 
 Cannot determine SASL mechanism for token class: class 
 org.apache.accumulo.core.client.security.tokens.PasswordToken
 ---

 Key: HIVE-10857
 URL: https://issues.apache.org/jira/browse/HIVE-10857
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Takahiko Saito
Assignee: Josh Elser
 Fix For: 1.2.1

 Attachments: HIVE-10857.2.patch, HIVE-10857.patch


 create table Accumulo storage with Accumulo storage handler fails due to 
 ACCUMULO-2815.
 {noformat}
 create table accumulo_1(key string, age int) stored by 
 'org.apache.hadoop.hive.accumulo.AccumuloStorageHandler' with serdeproperties 
 ( accumulo.columns.mapping = :rowid,info:age);
 {noformat}
 The error shows:
 {noformat}
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask. 
 MetaException(message:org.apache.accumulo.core.client.AccumuloException: 
 java.lang.IllegalArgumentException: Cannot determine SASL mechanism for token 
 class: class org.apache.accumulo.core.client.security.tokens.PasswordToken
   at 
 org.apache.accumulo.core.client.impl.ServerClient.execute(ServerClient.java:67)
   at 
 org.apache.accumulo.core.client.impl.ConnectorImpl.init(ConnectorImpl.java:67)
   at 
 org.apache.accumulo.core.client.ZooKeeperInstance.getConnector(ZooKeeperInstance.java:248)
   at 
 org.apache.hadoop.hive.accumulo.AccumuloConnectionParameters.getConnector(AccumuloConnectionParameters.java:125)
   at 
 org.apache.hadoop.hive.accumulo.AccumuloConnectionParameters.getConnector(AccumuloConnectionParameters.java:111)
   at 
 org.apache.hadoop.hive.accumulo.AccumuloStorageHandler.preCreateTable(AccumuloStorageHandler.java:245)
   at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:664)
   at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:657)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
   at com.sun.proxy.$Proxy5.createTable(Unknown Source)
   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:714)
   at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4135)
   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:306)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1650)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1409)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1192)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
   at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
   at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at

[jira] [Commented] (HIVE-10971) count(*) with count(distinct) gives wrong results when hive.groupby.skewindata=true

2015-06-09 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579482#comment-14579482
 ] 

Ashutosh Chauhan commented on HIVE-10971:
-

LGTM +1 need to update test files though.

 count(*) with count(distinct) gives wrong results when 
 hive.groupby.skewindata=true
 ---

 Key: HIVE-10971
 URL: https://issues.apache.org/jira/browse/HIVE-10971
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 1.2.0
Reporter: wangmeng
Assignee: wangmeng
 Attachments: HIVE-10971.01.patch


 When hive.groupby.skewindata=true, the following query based on TPC-H gives 
 wrong results:
 {code}
 set hive.groupby.skewindata=true;
 select l_returnflag, count(*), count(distinct l_linestatus)
 from lineitem
 group by l_returnflag
 limit 10;
 {code}
 The query plan shows that it generates only one MapReduce job instead of two 
 theoretically, which is dictated by hive.groupby.skewindata=true.
 The problem arises only when {noformat}count(*){noformat} and 
 {noformat}count(distinct){noformat} exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-10961) LLAP: ShuffleHandler + Submit work init race condition

2015-06-09 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved HIVE-10961.
---
Resolution: Fixed

 LLAP: ShuffleHandler + Submit work init race condition
 --

 Key: HIVE-10961
 URL: https://issues.apache.org/jira/browse/HIVE-10961
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Gopal V
Assignee: Siddharth Seth
 Fix For: llap

 Attachments: HIVE-10961.1.txt


 When flexing in a new node, it accepts DAG requests before the shuffle 
 handler is setup, causing fatals
 {code}
 DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:2
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
 vertexId=vertex_1433459966952_0729_1_00, diagnostics=[Task failed, 
 taskId=task_1t
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:145)
 at 
 org.apache.hadoop.hive.llap.shufflehandler.ShuffleHandler.get(ShuffleHandler.java:353)
 at 
 org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl.submitWork(ContainerRunnerImpl.java:192)
 at 
 org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.submitWork(LlapDaemon.java:301)
 at 
 org.apache.hadoop.hive.llap.daemon.impl.LlapDaemonProtocolServerImpl.submitWork(LlapDaemonProtocolServerImpl.java:75)
 at 
 org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2.callBlockingMethod(LlapDaemonProtocolProtos.java:12094)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:972)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2085)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2081)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1654)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2081)
 ], TaskAttempt 1 failed, 
 info=[org.apache.hadoop.ipc.RemoteException(java.lang.IllegalStateException): 
 ShuffleHandler must be started before invoking get
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:145)
 at 
 org.apache.hadoop.hive.llap.shufflehandler.ShuffleHandler.get(ShuffleHandler.java:353)
 at 
 org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl.submitWork(ContainerRunnerImpl.java:192)
 at 
 org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.submitWork(LlapDaemon.java:301)
 at 
 org.apache.hadoop.hive.llap.daemon.impl.LlapDaemonProtocolServerImpl.submitWork(LlapDaemonProtocolServerImpl.java:75)
 at 
 org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2.callBlockingMethod(LlapDaemonProtocolProtos.java:12094)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:972)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2085)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10956) HS2 leaks HMS connections

2015-06-09 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579580#comment-14579580
 ] 

Xuefu Zhang commented on HIVE-10956:


+1

 HS2 leaks HMS connections
 -

 Key: HIVE-10956
 URL: https://issues.apache.org/jira/browse/HIVE-10956
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-10956.1.patch, HIVE-10956.2.patch, 
 HIVE-10956.3.patch


 HS2 uses threadlocal to cache HMS client in class Hive. When the thread is 
 dead, the HMS client is not closed. So the connection to the HMS is leaked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10943) Beeline-cli: Enable precommit for beelie-cli branch

2015-06-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579728#comment-14579728
 ] 

Hive QA commented on HIVE-10943:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12738633/HIVE-10943.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9006 tests executed
*Failed tests:*
{noformat}
org.apache.hive.beeline.TestSchemaTool.testSchemaInit
org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4230/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4230/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4230/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12738633 - PreCommit-HIVE-TRUNK-Build

 Beeline-cli: Enable precommit for beelie-cli branch 
 

 Key: HIVE-10943
 URL: https://issues.apache.org/jira/browse/HIVE-10943
 Project: Hive
  Issue Type: Sub-task
  Components: Testing Infrastructure
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
Priority: Minor
 Attachments: HIVE-10943.patch, HIVE-10943.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10857) Accumulo storage handler fail throwing java.lang.IllegalArgumentException: Cannot determine SASL mechanism for token class: class org.apache.accumulo.core.client.securi

2015-06-09 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579747#comment-14579747
 ] 

Josh Elser commented on HIVE-10857:
---

Thanks so much for the review [~sushanth]! I know these changes are a little 
obtuse for Hive -- I greatly appreciate the effort.

 Accumulo storage handler fail throwing java.lang.IllegalArgumentException: 
 Cannot determine SASL mechanism for token class: class 
 org.apache.accumulo.core.client.security.tokens.PasswordToken
 ---

 Key: HIVE-10857
 URL: https://issues.apache.org/jira/browse/HIVE-10857
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Takahiko Saito
Assignee: Josh Elser
 Fix For: 1.2.1

 Attachments: HIVE-10857.2.patch, HIVE-10857.patch


 create table Accumulo storage with Accumulo storage handler fails due to 
 ACCUMULO-2815.
 {noformat}
 create table accumulo_1(key string, age int) stored by 
 'org.apache.hadoop.hive.accumulo.AccumuloStorageHandler' with serdeproperties 
 ( accumulo.columns.mapping = :rowid,info:age);
 {noformat}
 The error shows:
 {noformat}
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask. 
 MetaException(message:org.apache.accumulo.core.client.AccumuloException: 
 java.lang.IllegalArgumentException: Cannot determine SASL mechanism for token 
 class: class org.apache.accumulo.core.client.security.tokens.PasswordToken
   at 
 org.apache.accumulo.core.client.impl.ServerClient.execute(ServerClient.java:67)
   at 
 org.apache.accumulo.core.client.impl.ConnectorImpl.init(ConnectorImpl.java:67)
   at 
 org.apache.accumulo.core.client.ZooKeeperInstance.getConnector(ZooKeeperInstance.java:248)
   at 
 org.apache.hadoop.hive.accumulo.AccumuloConnectionParameters.getConnector(AccumuloConnectionParameters.java:125)
   at 
 org.apache.hadoop.hive.accumulo.AccumuloConnectionParameters.getConnector(AccumuloConnectionParameters.java:111)
   at 
 org.apache.hadoop.hive.accumulo.AccumuloStorageHandler.preCreateTable(AccumuloStorageHandler.java:245)
   at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:664)
   at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:657)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
   at com.sun.proxy.$Proxy5.createTable(Unknown Source)
   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:714)
   at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4135)
   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:306)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1650)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1409)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1192)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
   at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
   at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
   at

[jira] [Commented] (HIVE-10415) hive.start.cleanup.scratchdir configuration is not taking effect

2015-06-09 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579754#comment-14579754
 ] 

Lefty Leverenz commented on HIVE-10415:
---

Doc note:  *hive.start.cleanup.scratchdir* is documented in the wiki here:

* [Configuration Properties -- hive.start.cleanup.scratchdir | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.start.cleanup.scratchdir]

I added a Fixed In line with a link to this issue.

 hive.start.cleanup.scratchdir configuration is not taking effect
 

 Key: HIVE-10415
 URL: https://issues.apache.org/jira/browse/HIVE-10415
 Project: Hive
  Issue Type: Bug
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-10415.patch


 This configuration hive.start.cleanup.scratchdir is not taking effect



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-2181) Clean up the scratch.dir (tmp/hive-root) while restarting Hive server.

2015-06-09 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579749#comment-14579749
 ] 

Lefty Leverenz commented on HIVE-2181:
--

Doc note:  This added configuration parameter *hive.start.cleanup.scratchdir* 
to HiveConf.java in release 0.8.1 (not 0.8.0 as indicated in Fix Version).

HIVE-10415 fixed a bug in release 1.3.0.

*hive.start.cleanup.scratchdir* is documented in the wiki here:

* [Configuration Properties -- hive.start.cleanup.scratchdir | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.start.cleanup.scratchdir]
 

  Clean up the scratch.dir (tmp/hive-root) while restarting Hive server. 
 

 Key: HIVE-2181
 URL: https://issues.apache.org/jira/browse/HIVE-2181
 Project: Hive
  Issue Type: Bug
  Components: Server Infrastructure
Affects Versions: 0.8.0
 Environment: Suse linux, Hadoop 20.1, Hive 0.8
Reporter: sanoj mathew
Assignee: Chinna Rao Lalam
Priority: Minor
 Fix For: 0.8.0

 Attachments: HIVE-2181.1.patch, HIVE-2181.2.patch, HIVE-2181.3.patch, 
 HIVE-2181.4.patch, HIVE-2181.5.patch, HIVE-2181.6.patch, HIVE-2181.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 Now queries leaves the map outputs under scratch.dir after execution. If the 
 hive server is stopped we need not keep the stopped server's map oputputs. So 
 whle starting the server we can clear the scratch.dir. This can help in 
 improved disk usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10959) Templeton launcher job should reconnect to the running child job on task retry

2015-06-09 Thread Ivan Mitic (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Mitic updated HIVE-10959:
--
Attachment: HIVE-10959.3.patch

Attaching updated patch based on offline feedback from [~thejas]. I introduced 
a user arg which allows specifying whether templeton should attempt to 
reconnect to a running job or not. This is because user jar might be doing 
additional work after the MR job itself, and by reconnecting templeton would 
lose track of this work. 

 Templeton launcher job should reconnect to the running child job on task retry
 --

 Key: HIVE-10959
 URL: https://issues.apache.org/jira/browse/HIVE-10959
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.15.0
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Attachments: HIVE-10959.2.patch, HIVE-10959.3.patch, HIVE-10959.patch


 Currently, Templeton launcher kills all child jobs (jobs tagged with the 
 parent job's id) upon task retry. 
 Upon templeton launcher task retry, templeton should reconnect to the running 
 job and continue tracking its progress that way. 
 This logic cannot be used for all job kinds (e.g. for jobs that are driven by 
 the client side like regular hive). However, for MapReduceV2, and possibly 
 Tez and HiveOnTez, this should be the default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10816) NPE in ExecDriver::handleSampling when submitted via child JVM

2015-06-09 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579784#comment-14579784
 ] 

Xuefu Zhang commented on HIVE-10816:


Yeah. I think it makes sense to commit it to branch-1 as well.

 NPE in ExecDriver::handleSampling when submitted via child JVM
 --

 Key: HIVE-10816
 URL: https://issues.apache.org/jira/browse/HIVE-10816
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li
 Fix For: 1.3.0

 Attachments: HIVE-10816.1.patch, HIVE-10816.1.patch


 When {{hive.exec.submitviachild = true}}, parallel order by fails with NPE 
 and falls back to single-reducer mode. Stack trace:
 {noformat}
 2015-05-25 08:41:04,446 ERROR [main]: mr.ExecDriver 
 (ExecDriver.java:execute(386)) - Sampling error
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.handleSampling(ExecDriver.java:513)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:379)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:750)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:497)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10965) direct SQL for stats fails in 0-column case

2015-06-09 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10965:

Attachment: HIVE-10965.01.patch

 direct SQL for stats fails in 0-column case
 ---

 Key: HIVE-10965
 URL: https://issues.apache.org/jira/browse/HIVE-10965
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 1.3.0, 1.2.1, 2.0.0

 Attachments: HIVE-10965.01.patch, HIVE-10965.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10967) add mapreduce.job.tags to sql std authorization config whitelist

2015-06-09 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579639#comment-14579639
 ] 

Lefty Leverenz commented on HIVE-10967:
---

Doc note:  Updated the description of 
*hive.security.authorization.sqlstd.confwhitelist* in the wiki to include this 
jira.

* [Configuration Properties -- hive.security.authorization.sqlstd.confwhitelist 
| 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.security.authorization.sqlstd.confwhitelist]

 add mapreduce.job.tags to sql std authorization config whitelist
 

 Key: HIVE-10967
 URL: https://issues.apache.org/jira/browse/HIVE-10967
 Project: Hive
  Issue Type: Bug
  Components: Authorization, SQLStandardAuthorization
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 1.2.1

 Attachments: HIVE-10967.1.patch


 mapreduce.job.tags is set by oozie for HiveServer2 actions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10816) NPE in ExecDriver::handleSampling when submitted via child JVM

2015-06-09 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579672#comment-14579672
 ] 

Lefty Leverenz commented on HIVE-10816:
---

Version question:  Since this was committed to master shouldn't it say Fix 
Version 2.0.0, or will it also be committed to branch-1 for 1.3.0?

 NPE in ExecDriver::handleSampling when submitted via child JVM
 --

 Key: HIVE-10816
 URL: https://issues.apache.org/jira/browse/HIVE-10816
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li
 Fix For: 1.3.0

 Attachments: HIVE-10816.1.patch, HIVE-10816.1.patch


 When {{hive.exec.submitviachild = true}}, parallel order by fails with NPE 
 and falls back to single-reducer mode. Stack trace:
 {noformat}
 2015-05-25 08:41:04,446 ERROR [main]: mr.ExecDriver 
 (ExecDriver.java:execute(386)) - Sampling error
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.handleSampling(ExecDriver.java:513)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:379)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:750)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:497)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10947) LLAP: preemption appears to count against failure count for the task

2015-06-09 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579604#comment-14579604
 ] 

Siddharth Seth commented on HIVE-10947:
---

If this happens again, please capture the logs. I'm not sure these tasks were 
actually preempted. They may have failed for other reasons. THere's 20 
additional attempts, most of which were KILLED (likely due to preemption) 
before the 2 FAILED aatempts - which caused the task to fail.

 LLAP: preemption appears to count against failure count for the task
 

 Key: HIVE-10947
 URL: https://issues.apache.org/jira/browse/HIVE-10947
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Siddharth Seth

 Looks like the following stack in very parallel workload counts as task error 
 and DAG fails:
 {noformat}
 : Error while processing statement: FAILED: Execution Error, return code 2 
 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, 
 vertexName=Map 1, vertexId=vertex_1433459966952_0482_4_03, diagnostics=[Task 
 failed, taskId=task_1433459966952_0482_4_03_22, diagnostics=[TaskAttempt 
 0 killed, TaskAttempt 1 killed, TaskAttempt 2 killed, TaskAttempt 3 killed, 
 TaskAttempt 4 killed, TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 
 7 killed, TaskAttempt 8 killed, TaskAttempt 9 killed, TaskAttempt 10 killed, 
 TaskAttempt 11 killed, TaskAttempt 12 killed, TaskAttempt 13 killed, 
 TaskAttempt 14 killed, TaskAttempt 15 killed, TaskAttempt 16 killed, 
 TaskAttempt 17 killed, TaskAttempt 18 killed, TaskAttempt 19 failed, 
 info=[Error: Failure while running task: 
 attempt_1433459966952_0482_4_03_22_19:java.lang.RuntimeException: 
 java.lang.RuntimeException: Map operator initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:181)
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:146)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:349)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:71)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1654)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:60)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.RuntimeException: Map operator initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:256)
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:157)
   ... 14 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Async 
 initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.Operator.completeInitialization(Operator.java:416)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:388)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:511)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:464)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:378)
   at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:241)
   ... 15 more
 Caused by: java.util.concurrent.CancellationException
   at java.util.concurrent.FutureTask.report(FutureTask.java:121)
   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.completeInitialization(Operator.java:408)
   ... 20 more
 ], TaskAttempt 20 failed, info=[Error: Failure while running task: 
 attempt_1433459966952_0482_4_03_22_20:java.lang.RuntimeException: 
 java.lang.RuntimeException: Map operator initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:181)
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:146)
   at

[jira] [Commented] (HIVE-10966) direct SQL for stats has a cast exception on some databases

2015-06-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579606#comment-14579606
 ] 

Hive QA commented on HIVE-10966:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12738624/HIVE-10966.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9006 tests executed
*Failed tests:*
{noformat}
org.apache.hive.beeline.TestSchemaTool.testSchemaInit
org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4229/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4229/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4229/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12738624 - PreCommit-HIVE-TRUNK-Build

 direct SQL for stats has a cast exception on some databases
 ---

 Key: HIVE-10966
 URL: https://issues.apache.org/jira/browse/HIVE-10966
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 1.3.0, 1.2.1, 2.0.0

 Attachments: HIVE-10966.patch, HIVE-10966.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10685) Alter table concatenate oparetor will cause duplicate data

2015-06-09 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10685:
-
Priority: Critical  (was: Major)

 Alter table concatenate oparetor will cause duplicate data
 --

 Key: HIVE-10685
 URL: https://issues.apache.org/jira/browse/HIVE-10685
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.3.0, 1.2.1
Reporter: guoliming
Assignee: guoliming
Priority: Critical
 Fix For: 1.2.0, 1.1.0

 Attachments: HIVE-10685.1.patch, HIVE-10685.patch


 Orders table has 15 rows and stored as ORC. 
 {noformat}
 hive select count(*) from orders;
 OK
 15
 Time taken: 37.692 seconds, Fetched: 1 row(s)
 {noformat}
 The table contain 14 files,the size of each file is about 2.1 ~ 3.2 GB.
 After executing command : ALTER TABLE orders CONCATENATE;
 The table is already 1530115000 rows.
 My hive version is 1.1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10970) Revert HIVE-10453: HS2 leaking open file descriptors when using UDFs

2015-06-09 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579597#comment-14579597
 ] 

Vaibhav Gumashta commented on HIVE-10970:
-

[~xuefuz] I'm trying to reproduce it locally. Internally, we saw 
TestJdbcDriver2 fail with several classnotfound exception. As a quick fix, I 
tried reverting this and it seems to fix the issue. However, before reverting 
on apache, I'm going to try to get a repro and also come up with why that was 
happening.

 Revert HIVE-10453: HS2 leaking open file descriptors when using UDFs
 

 Key: HIVE-10970
 URL: https://issues.apache.org/jira/browse/HIVE-10970
 Project: Hive
  Issue Type: Bug
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10453) HS2 leaking open file descriptors when using UDFs

2015-06-09 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579593#comment-14579593
 ] 

Vaibhav Gumashta commented on HIVE-10453:
-

I'm trying to reproduce this locally before I revert this (will add more 
comments on HIVE-10970). 

 HS2 leaking open file descriptors when using UDFs
 -

 Key: HIVE-10453
 URL: https://issues.apache.org/jira/browse/HIVE-10453
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Fix For: 1.3.0, 1.2.1, 2.0.0

 Attachments: HIVE-10453.1.patch, HIVE-10453.2.patch


 1. create a custom function by
 CREATE FUNCTION myfunc AS 'someudfclass' using jar 'hdfs:///tmp/myudf.jar';
 2. Create a simple jdbc client, just do 
 connect, 
 run simple query which using the function such as:
 select myfunc(col1) from sometable
 3. Disconnect.
 Check open file for HiveServer2 by:
 lsof -p HSProcID | grep myudf.jar
 You will see the leak as:
 {noformat}
 java  28718 ychen  txt  REG1,4741 212977666 
 /private/var/folders/6p/7_njf13d6h144wldzbbsfpz8gp/T/1bfe3de0-ac63-4eba-a725-6a9840f1f8d5_resources/myudf.jar
 java  28718 ychen  330r REG1,4741 212977666 
 /private/var/folders/6p/7_njf13d6h144wldzbbsfpz8gp/T/1bfe3de0-ac63-4eba-a725-6a9840f1f8d5_resources/myudf.jar
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10533) CBO (Calcite Return Path): Join to MultiJoin support for outer joins

2015-06-09 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-10533:
---
Attachment: HIVE-10533.02.patch

 CBO (Calcite Return Path): Join to MultiJoin support for outer joins
 

 Key: HIVE-10533
 URL: https://issues.apache.org/jira/browse/HIVE-10533
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-10533.01.patch, HIVE-10533.02.patch, 
 HIVE-10533.02.patch, HIVE-10533.patch


 CBO return path: auto_join7.q can be used to reproduce the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10971) count(*) with count(distinct) gives wrong results when hive.groupby.skewindata=true

2015-06-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578802#comment-14578802
 ] 

Hive QA commented on HIVE-10971:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12738553/HIVE-10971.01.patch

{color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 9004 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_map_skew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8_map_skew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_cube1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_rollup1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_groupby2
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_groupby2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby8
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby8_map_skew
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_cube1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_rollup1
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
org.apache.hive.beeline.TestSchemaTool.testSchemaInit
org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4223/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4223/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4223/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 20 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12738553 - PreCommit-HIVE-TRUNK-Build

 count(*) with count(distinct) gives wrong results when 
 hive.groupby.skewindata=true
 ---

 Key: HIVE-10971
 URL: https://issues.apache.org/jira/browse/HIVE-10971
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 1.2.0
Reporter: wangmeng
Assignee: wangmeng
 Attachments: HIVE-10971.01.patch


 When hive.groupby.skewindata=true, the following query based on TPC-H gives 
 wrong results:
 {code}
 set hive.groupby.skewindata=true;
 select l_returnflag, count(*), count(distinct l_linestatus)
 from lineitem
 group by l_returnflag
 limit 10;
 {code}
 The query plan shows that it generates only one MapReduce job instead of two 
 theoretically, which is dictated by hive.groupby.skewindata=true.
 The problem arises only when {noformat}count(*){noformat} and 
 {noformat}count(distinct){noformat} exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10903) Add hive.in.test for HoS tests

2015-06-09 Thread Rui Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-10903:
--
Attachment: HIVE-10903.3.patch

Update more outputs.

 Add hive.in.test for HoS tests
 --

 Key: HIVE-10903
 URL: https://issues.apache.org/jira/browse/HIVE-10903
 Project: Hive
  Issue Type: Test
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-10903.1.patch, HIVE-10903.2.patch, 
 HIVE-10903.3.patch


 Missing the property can make CBO fails to run during UT. There should be 
 other effects that can be identified here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10533) CBO (Calcite Return Path): Join to MultiJoin support for outer joins

2015-06-09 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578793#comment-14578793
 ] 

Jesus Camacho Rodriguez commented on HIVE-10533:


Fails are unrelated, reuploading the patch for another QA run.

 CBO (Calcite Return Path): Join to MultiJoin support for outer joins
 

 Key: HIVE-10533
 URL: https://issues.apache.org/jira/browse/HIVE-10533
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-10533.01.patch, HIVE-10533.02.patch, 
 HIVE-10533.patch


 CBO return path: auto_join7.q can be used to reproduce the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10970) Revert HIVE-10453: HS2 leaking open file descriptors when using UDFs

2015-06-09 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578852#comment-14578852
 ] 

Xuefu Zhang commented on HIVE-10970:


Could you please add description giving the reasoning? Thanks.

 Revert HIVE-10453: HS2 leaking open file descriptors when using UDFs
 

 Key: HIVE-10970
 URL: https://issues.apache.org/jira/browse/HIVE-10970
 Project: Hive
  Issue Type: Bug
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10943) Beeline-cli: Enable precommit for beelie-cli branch

2015-06-09 Thread Ferdinand Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579799#comment-14579799
 ] 

Ferdinand Xu commented on HIVE-10943:
-

Thanks [~spena] for creating the instance.

 Beeline-cli: Enable precommit for beelie-cli branch 
 

 Key: HIVE-10943
 URL: https://issues.apache.org/jira/browse/HIVE-10943
 Project: Hive
  Issue Type: Sub-task
  Components: Testing Infrastructure
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
Priority: Minor
 Attachments: HIVE-10943.patch, HIVE-10943.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10857) Accumulo storage handler fail throwing java.lang.IllegalArgumentException: Cannot determine SASL mechanism for token class: class org.apache.accumulo.core.client.securi

2015-06-09 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579849#comment-14579849
 ] 

Sushanth Sowmyan commented on HIVE-10857:
-

(Also committed to branch-1, forgot earlier)

 Accumulo storage handler fail throwing java.lang.IllegalArgumentException: 
 Cannot determine SASL mechanism for token class: class 
 org.apache.accumulo.core.client.security.tokens.PasswordToken
 ---

 Key: HIVE-10857
 URL: https://issues.apache.org/jira/browse/HIVE-10857
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Takahiko Saito
Assignee: Josh Elser
 Fix For: 1.2.1

 Attachments: HIVE-10857.2.patch, HIVE-10857.patch


 create table Accumulo storage with Accumulo storage handler fails due to 
 ACCUMULO-2815.
 {noformat}
 create table accumulo_1(key string, age int) stored by 
 'org.apache.hadoop.hive.accumulo.AccumuloStorageHandler' with serdeproperties 
 ( accumulo.columns.mapping = :rowid,info:age);
 {noformat}
 The error shows:
 {noformat}
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask. 
 MetaException(message:org.apache.accumulo.core.client.AccumuloException: 
 java.lang.IllegalArgumentException: Cannot determine SASL mechanism for token 
 class: class org.apache.accumulo.core.client.security.tokens.PasswordToken
   at 
 org.apache.accumulo.core.client.impl.ServerClient.execute(ServerClient.java:67)
   at 
 org.apache.accumulo.core.client.impl.ConnectorImpl.init(ConnectorImpl.java:67)
   at 
 org.apache.accumulo.core.client.ZooKeeperInstance.getConnector(ZooKeeperInstance.java:248)
   at 
 org.apache.hadoop.hive.accumulo.AccumuloConnectionParameters.getConnector(AccumuloConnectionParameters.java:125)
   at 
 org.apache.hadoop.hive.accumulo.AccumuloConnectionParameters.getConnector(AccumuloConnectionParameters.java:111)
   at 
 org.apache.hadoop.hive.accumulo.AccumuloStorageHandler.preCreateTable(AccumuloStorageHandler.java:245)
   at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:664)
   at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:657)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
   at com.sun.proxy.$Proxy5.createTable(Unknown Source)
   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:714)
   at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4135)
   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:306)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1650)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1409)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1192)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
   at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
   at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 Caused by: java.lang.IllegalArgumentException: Cannot determine SASL

[jira] [Commented] (HIVE-6791) Support variable substition for Beeline shell command

2015-06-09 Thread Ferdinand Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579868#comment-14579868
 ] 

Ferdinand Xu commented on HIVE-6791:


Thanks [~xuefuz] for your reviews. I left some comments on the review board. 
Thank you!

 Support variable substition for Beeline shell command
 -

 Key: HIVE-6791
 URL: https://issues.apache.org/jira/browse/HIVE-6791
 Project: Hive
  Issue Type: New Feature
  Components: CLI, Clients
Affects Versions: 0.14.0
Reporter: Xuefu Zhang
Assignee: Ferdinand Xu
 Attachments: HIVE-6791-beeline-cli.patch


 A follow-up task from HIVE-6694. Similar to HIVE-6570.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10974) Use Configuration::getRaw() for the Base64 data

2015-06-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579879#comment-14579879
 ] 

Hive QA commented on HIVE-10974:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12738665/HIVE-10974.1.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9006 tests executed
*Failed tests:*
{noformat}
org.apache.hive.beeline.TestSchemaTool.testSchemaInit
org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4232/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4232/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4232/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12738665 - PreCommit-HIVE-TRUNK-Build

 Use Configuration::getRaw() for the Base64 data
 ---

 Key: HIVE-10974
 URL: https://issues.apache.org/jira/browse/HIVE-10974
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: HIVE-10974.1.patch


 Inspired by the Twitter HadoopSummit talk
 {code}
if (HiveConf.getBoolVar(conf, ConfVars.HIVE_RPC_QUERY_PLAN)) {
   LOG.debug(Loading plan from string: +path.toUri().getPath());
   String planString = conf.get(path.toUri().getPath());
 {code}
 Use getRaw() in other places where Base64 data is present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10958) Centos: TestMiniTezCliDriver.testCliDriver_mergejoin fails

2015-06-09 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579895#comment-14579895
 ] 

Thejas M Nair commented on HIVE-10958:
--

Is it also committed to branch-1 ? 
Everything in branch-1.2 should go into branch-1 as well, as branch-1 is the 
'trunk' for future 1.x releases.


 Centos: TestMiniTezCliDriver.testCliDriver_mergejoin fails
 --

 Key: HIVE-10958
 URL: https://issues.apache.org/jira/browse/HIVE-10958
 Project: Hive
  Issue Type: Bug
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Fix For: 1.2.1, 2.0.0

 Attachments: HIVE-10958.01.patch


 Centos: TestMiniTezCliDriver.testCliDriver_mergejoin fails due to the 
 statement set mapred.reduce.tasks = 18;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10956) HS2 leaks HMS connections

2015-06-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579811#comment-14579811
 ] 

Hive QA commented on HIVE-10956:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12738657/HIVE-10956.3.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9004 tests executed
*Failed tests:*
{noformat}
org.apache.hive.beeline.TestSchemaTool.testSchemaInit
org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4231/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4231/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4231/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12738657 - PreCommit-HIVE-TRUNK-Build

 HS2 leaks HMS connections
 -

 Key: HIVE-10956
 URL: https://issues.apache.org/jira/browse/HIVE-10956
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-10956.1.patch, HIVE-10956.2.patch, 
 HIVE-10956.3.patch


 HS2 uses threadlocal to cache HMS client in class Hive. When the thread is 
 dead, the HMS client is not closed. So the connection to the HMS is leaked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10816) NPE in ExecDriver::handleSampling when submitted via child JVM

2015-06-09 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579865#comment-14579865
 ] 

Rui Li commented on HIVE-10816:
---

Thanks [~leftylev] and [~xuefuz] for catching this. I didn't realize master is 
on 2.0.0 now.
Committed to branch-1 as well.

 NPE in ExecDriver::handleSampling when submitted via child JVM
 --

 Key: HIVE-10816
 URL: https://issues.apache.org/jira/browse/HIVE-10816
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li
 Fix For: 1.3.0

 Attachments: HIVE-10816.1.patch, HIVE-10816.1.patch


 When {{hive.exec.submitviachild = true}}, parallel order by fails with NPE 
 and falls back to single-reducer mode. Stack trace:
 {noformat}
 2015-05-25 08:41:04,446 ERROR [main]: mr.ExecDriver 
 (ExecDriver.java:execute(386)) - Sampling error
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.handleSampling(ExecDriver.java:513)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:379)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:750)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:497)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10959) Templeton launcher job should reconnect to the running child job on task retry

2015-06-09 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579845#comment-14579845
 ] 

Thejas M Nair commented on HIVE-10959:
--

+1

 Templeton launcher job should reconnect to the running child job on task retry
 --

 Key: HIVE-10959
 URL: https://issues.apache.org/jira/browse/HIVE-10959
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.15.0
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Attachments: HIVE-10959.2.patch, HIVE-10959.3.patch, HIVE-10959.patch


 Currently, Templeton launcher kills all child jobs (jobs tagged with the 
 parent job's id) upon task retry. 
 Upon templeton launcher task retry, templeton should reconnect to the running 
 job and continue tracking its progress that way. 
 This logic cannot be used for all job kinds (e.g. for jobs that are driven by 
 the client side like regular hive). However, for MapReduceV2, and possibly 
 Tez and HiveOnTez, this should be the default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10903) Add hive.in.test for HoS tests

2015-06-09 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579846#comment-14579846
 ] 

Rui Li commented on HIVE-10903:
---

cc [~xuefuz]

 Add hive.in.test for HoS tests
 --

 Key: HIVE-10903
 URL: https://issues.apache.org/jira/browse/HIVE-10903
 Project: Hive
  Issue Type: Test
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-10903.1.patch, HIVE-10903.2.patch, 
 HIVE-10903.3.patch


 Missing the property can make CBO fails to run during UT. There should be 
 other effects that can be identified here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10816) NPE in ExecDriver::handleSampling when submitted via child JVM

2015-06-09 Thread Rui Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-10816:
--
Fix Version/s: 2.0.0

 NPE in ExecDriver::handleSampling when submitted via child JVM
 --

 Key: HIVE-10816
 URL: https://issues.apache.org/jira/browse/HIVE-10816
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-10816.1.patch, HIVE-10816.1.patch


 When {{hive.exec.submitviachild = true}}, parallel order by fails with NPE 
 and falls back to single-reducer mode. Stack trace:
 {noformat}
 2015-05-25 08:41:04,446 ERROR [main]: mr.ExecDriver 
 (ExecDriver.java:execute(386)) - Sampling error
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.handleSampling(ExecDriver.java:513)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:379)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:750)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:497)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10533) CBO (Calcite Return Path): Join to MultiJoin support for outer joins

2015-06-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578900#comment-14578900
 ] 

Hive QA commented on HIVE-10533:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12738569/HIVE-10533.02.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9005 tests executed
*Failed tests:*
{noformat}
org.apache.hive.beeline.TestSchemaTool.testSchemaInit
org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4224/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4224/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4224/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12738569 - PreCommit-HIVE-TRUNK-Build

 CBO (Calcite Return Path): Join to MultiJoin support for outer joins
 

 Key: HIVE-10533
 URL: https://issues.apache.org/jira/browse/HIVE-10533
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-10533.01.patch, HIVE-10533.02.patch, 
 HIVE-10533.02.patch, HIVE-10533.patch


 CBO return path: auto_join7.q can be used to reproduce the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-10933) Hive 0.13 returns precision 0 for varchar(32) from DatabaseMetadata.getColumns()

2015-06-09 Thread Chaoyu Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang resolved HIVE-10933.

Resolution: Cannot Reproduce

As I understand it has been resolved by HIVE-5847. If you see any related issue 
in future, please feel free to reopen this JIRA or open a new one.

 Hive 0.13 returns precision 0 for varchar(32) from 
 DatabaseMetadata.getColumns()
 

 Key: HIVE-10933
 URL: https://issues.apache.org/jira/browse/HIVE-10933
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.13.0
Reporter: Son Nguyen
Assignee: Chaoyu Tang

 DatabaseMetadata.getColumns() returns COLUMN_SIZE as 0 for a column defined 
 as varchar(32), or char(32).   While ResultSetMetaData.getPrecision() returns 
 correct value 32.
 Here is the segment program that reproduces the issue.
 {code}
 try {
   statement = connection.createStatement();
   
   statement.execute(drop table if exists son_table);
   
   statement.execute(create table son_table( col1 varchar(32) ));
   
   statement.close();
   
 } catch ( Exception e) {
  return;
 } 
   
 // get column info using metadata
 try {
   DatabaseMetaData dmd = null;
   ResultSet resultSet = null;
   
   dmd = connection.getMetaData();
   
   resultSet = dmd.getColumns(null, null, son_table, col1);
   
   if ( resultSet.next() ) {
   String tabName = resultSet.getString(TABLE_NAME);
   String colName = resultSet.getString(COLUMN_NAME);
   String dataType = resultSet.getString(DATA_TYPE);
   String typeName = resultSet.getString(TYPE_NAME);
   int precision = resultSet.getInt(COLUMN_SIZE);
   
   // output is: colName = col1, dataType = 12, typeName = 
 VARCHAR, precision = 0.
 System.out.format(colName = %s, dataType = %s, typeName = %s, 
 precision = %d.,
   colName, dataType, typeName, precision);
   }
 } catch ( Exception e) {
   return;
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10971) count(*) with count(distinct) gives wrong results when hive.groupby.skewindata=true

2015-06-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580047#comment-14580047
 ] 

Hive QA commented on HIVE-10971:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12738683/HIVE-10971.1.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9006 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby2
org.apache.hive.beeline.TestSchemaTool.testSchemaInit
org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4234/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4234/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4234/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12738683 - PreCommit-HIVE-TRUNK-Build

 count(*) with count(distinct) gives wrong results when 
 hive.groupby.skewindata=true
 ---

 Key: HIVE-10971
 URL: https://issues.apache.org/jira/browse/HIVE-10971
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 1.2.0
Reporter: wangmeng
Assignee: wangmeng
 Attachments: HIVE-10971.01.patch, HIVE-10971.1.patch


 When hive.groupby.skewindata=true, the following query based on TPC-H gives 
 wrong results:
 {code}
 set hive.groupby.skewindata=true;
 select l_returnflag, count(*), count(distinct l_linestatus)
 from lineitem
 group by l_returnflag
 limit 10;
 {code}
 The query plan shows that it generates only one MapReduce job instead of two 
 theoretically, which is dictated by hive.groupby.skewindata=true.
 The problem arises only when {noformat}count(*){noformat} and 
 {noformat}count(distinct){noformat} exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10959) Templeton launcher job should reconnect to the running child job on task retry

2015-06-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580058#comment-14580058
 ] 

Hive QA commented on HIVE-10959:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12738697/HIVE-10959.3.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4235/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4235/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4235/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ 
hive-hcatalog-server-extensions ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/hcatalog/server-extensions/target/tmp
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/hcatalog/server-extensions/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/hcatalog/server-extensions/target/tmp/conf
 [copy] Copying 11 files to 
/data/hive-ptest/working/apache-github-source-source/hcatalog/server-extensions/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
hive-hcatalog-server-extensions ---
[INFO] Compiling 2 source files to 
/data/hive-ptest/working/apache-github-source-source/hcatalog/server-extensions/target/test-classes
[INFO] 
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ 
hive-hcatalog-server-extensions ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ 
hive-hcatalog-server-extensions ---
[INFO] Building jar: 
/data/hive-ptest/working/apache-github-source-source/hcatalog/server-extensions/target/hive-hcatalog-server-extensions-2.0.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
hive-hcatalog-server-extensions ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ 
hive-hcatalog-server-extensions ---
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/hcatalog/server-extensions/target/hive-hcatalog-server-extensions-2.0.0-SNAPSHOT.jar
 to 
/home/hiveptest/.m2/repository/org/apache/hive/hcatalog/hive-hcatalog-server-extensions/2.0.0-SNAPSHOT/hive-hcatalog-server-extensions-2.0.0-SNAPSHOT.jar
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/hcatalog/server-extensions/pom.xml
 to 
/home/hiveptest/.m2/repository/org/apache/hive/hcatalog/hive-hcatalog-server-extensions/2.0.0-SNAPSHOT/hive-hcatalog-server-extensions-2.0.0-SNAPSHOT.pom
[INFO] 
[INFO] 
[INFO] Building Hive HCatalog Webhcat Java Client 2.0.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ 
hive-webhcat-java-client ---
[INFO] Deleting 
/data/hive-ptest/working/apache-github-source-source/hcatalog/webhcat/java-client/target
[INFO] Deleting 
/data/hive-ptest/working/apache-github-source-source/hcatalog/webhcat/java-client
 (includes = [datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ 
hive-webhcat-java-client ---
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ 
hive-webhcat-java-client ---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ 
hive-webhcat-java-client ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-github-source-source/hcatalog/webhcat/java-client/src/main/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ 
hive-webhcat-java-client ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ 
hive-webhcat-java-client ---
[INFO] Compiling 36 source files to 
/data/hive-ptest/working/apache-github-source-source/hcatalog/webhcat/java-client/target/classes
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatClientHMSImpl.java:
 
/data/hive-ptest/working/apache-github-source-source/hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatClientHMSImpl.java
 uses or overrides a deprecated API.
[WARNING]

[jira] [Commented] (HIVE-10729) Query failed when select complex columns from joinned table (tez map join only)

2015-06-09 Thread Greg Senia (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578856#comment-14578856
 ] 

Greg Senia commented on HIVE-10729:
---

Here is the query and source table describe that shows the arraystring which 
seems to be the cause...

drop table debug.ct_gsd_events1_test;
create table debug.ct_gsd_events1_test
as select  a.*,
b.svcrqst_id,
b.svcrqct_cds,
b.svcrtyp_cd,
b.cmpltyp_cd,
b.sum_reason_cd as src,
b.cnctmd_cd,
b.notes
from ctm.ct_gsd_events a
inner join
mbr.gsd_service_request b
on a.contact_event_id = b.cnctevn_id;


hive describe formatted ctm.ct_gsd_events;
OK
# col_name  data_type   comment 
 
hmoid   string  
cumb_id_no  int 
mbrind_id   string  
contact_event_idstring  
ce_create_dtstring  
ce_end_dt   string  
contact_typestring  
cnctevs_cd  string  
contact_modestring  
cntvnst_stts_cd string  
total_transfers int 
ce_notesarraystring   
 
# Detailed Table Information 
Database:   ctm  
Owner:  LOAD_USER  
CreateTime: Fri May 29 09:41:58 EDT 2015 
LastAccessTime: UNKNOWN  
Protect Mode:   None 
Retention:  0
Location:   
hdfs://xhadnnm1p.example.com:8020/apps/hive/warehouse/ctm.db/ct_gsd_events  
   
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE   true
numFiles154 
numRows 0   
rawDataSize 0   
totalSize   5464108 
transient_lastDdlTime   1432906919  
 
# Storage Information
SerDe Library:  org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
 
InputFormat:org.apache.hadoop.mapred.TextInputFormat 
OutputFormat:   
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
Compressed: No   
Num Buckets:-1   
Bucket Columns: []   
Sort Columns:   []   
Storage Desc Params: 
serialization.format1   
Time taken: 2.968 seconds, Fetched: 42 row(s)

 Query failed when select complex columns from joinned table (tez map join 
 only)
 ---

 Key: HIVE-10729
 URL: https://issues.apache.org/jira/browse/HIVE-10729
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.2.0
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10729.1.patch, HIVE-10729.2.patch


 When map join happens, if projection columns include complex data types, 
 query will fail. 
 Steps to reproduce:
 {code:sql}
 hive set hive.auto.convert.join;
 hive.auto.convert.join=true
 hive desc foo;
 a arrayint
 hive select * from foo;
 [1,2]
 hive desc src_int;
 key   int
 value string
 hive select * from src_int where key=2;
 2val_2
 hive select * from foo join src_int src  on src.key = foo.a[1];
 {code}
 Query will fail with stack trace
 {noformat}
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast to 
 [Ljava.lang.Object;
   at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:111)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:314)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:262)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:246)
   at 
 org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:50)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:692)
   at

[jira] [Commented] (HIVE-10729) Query failed when select complex columns from joinned table (tez map join only)

2015-06-09 Thread Greg Senia (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578874#comment-14578874
 ] 

Greg Senia commented on HIVE-10729:
---

Here is a sample of the data I think the cause is their is a null in the 
arraystring field of notes... this was not a problem with Hive 0.13 it 
definitely started with Hive 0.14/1.x line..


Vertex failed, vertexName=Map 2, vertexId=vertex_1426958683478_216665_2_01, 
diagnostics=[Task failed, taskId=task_1426958683478_216665_2_01_000104, 
diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row 
{cumb_id_no:31585,cumb_id_no_sub:31585,cnctevn_id:0021XXX86715,svcrqst_id:003XXX346030,svcrqst_crt_dts:2015-03-09
 11:25:10.927722,subject_seq_no:1,cntmbrp_id:692XX60 
,plan_component:H 
,psuniq_id:14XXX279,cust_segment:RM ,idcard:MEXX
 
,cnctyp_cd:001,cnctmd_cd:D01,cnctevs_cd:007,svcrtyp_cd:722,svrstyp_cd:832,cmpltyp_cd:
 ,catsrsn_cd:,apealvl_cd: 
,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,svcrqst_lupdusr_id:XXX
 
,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[
   ],svcrqst_lupdt:2015-03-09 
11:25:10.927722,crsr_lupdt:null,cntmbrp_lupdt:2015-03-09 
11:24:51.315134,cntevsds_lupdt:2015-03-09 
11:25:13.429458,ignore_me:1,notes:null}
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row 
{cumb_id_no:31XXX585,cumb_id_no_sub:31XXX585,cnctevn_id:0021XXX86715,svcrqst_id:003XXX346030,svcrqst_crt_dts:2015-03-09
 11:25:10.927722,subject_seq_no:1,cntmbrp_id:692XX60 
,plan_component:H 
,psuniq_id:14XXX279,cust_segment:RM ,idcard:MEXX
 
,cnctyp_cd:001,cnctmd_cd:D01,cnctevs_cd:007,svcrtyp_cd:722,svrstyp_cd:832,cmpltyp_cd:
 ,catsrsn_cd:,apealvl_cd: 
,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,svcrqst_lupdusr_id:XXX
 
,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[
   ],svcrqst_lupdt:2015-03-09 
11:25:10.927722,crsr_lupdt:null,cntmbrp_lupdt:2015-03-09 
11:24:51.315134,cntevsds_lupdt:2015-03-09 
11:25:13.429458,ignore_me:1,notes:null}
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
{cumb_id_no:31585,cumb_id_no_sub:31585,cnctevn_id:0021XXX86715,svcrqst_id:003XXX346030,svcrqst_crt_dts:2015-03-09
 11:25:10.927722,subject_seq_no:1,cntmbrp_id:692XX60 
,plan_component:H 
,psuniq_id:14XXX279,cust_segment:RM ,idcard:MEXX
 
,cnctyp_cd:001,cnctmd_cd:D01,cnctevs_cd:007,svcrtyp_cd:722,svrstyp_cd:832,cmpltyp_cd:
 ,catsrsn_cd:,apealvl_cd: 
,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,svcrqst_lupdusr_id:XXX
 
,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[
   ],svcrqst_lupdt:2015-03-09 
11:25:10.927722,crsr_lupdt:null,cntmbrp_lupdt:2015-03-09 
11:24:51.315134,cntevsds_lupdt:2015-03-09

[jira] [Commented] (HIVE-10880) The bucket number is not respected in insert overwrite.

2015-06-09 Thread Yongzhi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578871#comment-14578871
 ] 

Yongzhi Chen commented on HIVE-10880:
-

[~xuefuz], when I debug the issue, I noticed that right number of reducer is 
used. I also noticed that dynamic partition insert works fine because it adds 
the missing files. I think we should treat static partition and ordinary table 
the same way, so I fixed the issue by adding the missing buckets. Following is 
the code for dynamic partition part:

{noformat}
taskIDToFile = removeTempOrDuplicateFiles(items, fs);
// if the table is bucketed and enforce bucketing, we should check and 
generate all buckets
if (dpCtx.getNumBuckets()  0  taskIDToFile != null) {
  // refresh the file list
  items = fs.listStatus(parts[i].getPath());
  // get the missing buckets and generate empty buckets
  String taskID1 = taskIDToFile.keySet().iterator().next();
  Path bucketPath = taskIDToFile.values().iterator().next().getPath();
  for (int j = 0; j  dpCtx.getNumBuckets(); ++j) {
String taskID2 = replaceTaskId(taskID1, j);
if (!taskIDToFile.containsKey(taskID2)) {
  // create empty bucket, file name should be derived from taskID2
  String path2 = 
replaceTaskIdFromFilename(bucketPath.toUri().getPath().toString(), j);
  result.add(path2);
}
  }
}

{noformat}

 The bucket number is not respected in insert overwrite.
 ---

 Key: HIVE-10880
 URL: https://issues.apache.org/jira/browse/HIVE-10880
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Blocker
 Attachments: HIVE-10880.1.patch, HIVE-10880.2.patch, 
 HIVE-10880.3.patch


 When hive.enforce.bucketing is true, the bucket number defined in the table 
 is no longer respected in current master and 1.2. This is a regression.
 Reproduce:
 {noformat}
 CREATE TABLE IF NOT EXISTS buckettestinput( 
 data string 
 ) 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 Then I inserted the following data into the buckettestinput table
 firstinsert1 
 firstinsert2 
 firstinsert3 
 firstinsert4 
 firstinsert5 
 firstinsert6 
 firstinsert7 
 firstinsert8 
 secondinsert1 
 secondinsert2 
 secondinsert3 
 secondinsert4 
 secondinsert5 
 secondinsert6 
 secondinsert7 
 secondinsert8
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%';
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
 (state=42000,code=10141)
 {noformat}
 The related debug information related to insert overwrite:
 {noformat}
 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%'insert overwrite table 
 buckettestoutput1 
 0: jdbc:hive2://localhost:1 ;
 select * from buckettestinput where data like ' 
 first%';
 INFO  : Number of reduce tasks determined at compile time: 2
 INFO  : In order to change the average load for a reducer (in bytes):
 INFO  :   set hive.exec.reducers.bytes.per.reducer=number
 INFO  : In order to limit the maximum number of reducers:
 INFO  :   set hive.exec.reducers.max=number
 INFO  : In order to set a constant number of reducers:
 INFO  :   set mapred.reduce.tasks=number
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
 INFO  : Ended Job = job_local107155352_0001
 INFO  : Loading data to table default.buckettestoutput1 from 
 file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1
 INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, 
 totalSize=52, rawDataSize=48]
 No rows affected (1.692 seconds)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10903) Add hive.in.test for HoS tests

2015-06-09 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579985#comment-14579985
 ] 

Xuefu Zhang commented on HIVE-10903:


+1 LGTM

 Add hive.in.test for HoS tests
 --

 Key: HIVE-10903
 URL: https://issues.apache.org/jira/browse/HIVE-10903
 Project: Hive
  Issue Type: Test
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-10903.1.patch, HIVE-10903.2.patch, 
 HIVE-10903.3.patch


 Missing the property can make CBO fails to run during UT. There should be 
 other effects that can be identified here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10816) NPE in ExecDriver::handleSampling when submitted via child JVM

2015-06-09 Thread Navis (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580013#comment-14580013
 ] 

Navis commented on HIVE-10816:
--

[~lirui] I don't know why I've not been notified but here is my late +1

 NPE in ExecDriver::handleSampling when submitted via child JVM
 --

 Key: HIVE-10816
 URL: https://issues.apache.org/jira/browse/HIVE-10816
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-10816.1.patch, HIVE-10816.1.patch


 When {{hive.exec.submitviachild = true}}, parallel order by fails with NPE 
 and falls back to single-reducer mode. Stack trace:
 {noformat}
 2015-05-25 08:41:04,446 ERROR [main]: mr.ExecDriver 
 (ExecDriver.java:execute(386)) - Sampling error
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.handleSampling(ExecDriver.java:513)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:379)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:750)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:497)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10890) Provide implementable engine selector

2015-06-09 Thread Navis (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580028#comment-14580028
 ] 

Navis commented on HIVE-10890:
--

Right, that should also be checked. Included implementation was just for 
showing the intention. I'll think of a way to know the engine is configured 
properly. Anyway, I don't know why I'm not notified these days from hive 
community.

 Provide implementable engine selector
 -

 Key: HIVE-10890
 URL: https://issues.apache.org/jira/browse/HIVE-10890
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial

 Now hive supports three kind of engines. It would be good to have an 
 automatic engine selector without setting explicitly engine for execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10958) Centos: TestMiniTezCliDriver.testCliDriver_mergejoin fails

2015-06-09 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579898#comment-14579898
 ] 

Pengcheng Xiong commented on HIVE-10958:


I am not sure if it is also committed to branch-1 too. [~ashutoshc], could you 
please take a look at [~thejas] question? Thanks.

 Centos: TestMiniTezCliDriver.testCliDriver_mergejoin fails
 --

 Key: HIVE-10958
 URL: https://issues.apache.org/jira/browse/HIVE-10958
 Project: Hive
  Issue Type: Bug
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Fix For: 1.2.1, 2.0.0

 Attachments: HIVE-10958.01.patch


 Centos: TestMiniTezCliDriver.testCliDriver_mergejoin fails due to the 
 statement set mapred.reduce.tasks = 18;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10478) resolved

2015-06-09 Thread wangmeng (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579978#comment-14579978
 ] 

wangmeng commented on HIVE-10478:
-

Hi, I also encountered the same problem ,how did you solve it ?  SET 
hive.exec.parallel=false?  Thanks.

 resolved
 

 Key: HIVE-10478
 URL: https://issues.apache.org/jira/browse/HIVE-10478
 Project: Hive
  Issue Type: Task
  Components: Hive
Reporter: anna ken
  Labels: hadoop, hive, hue, kryo

 resolved



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10965) direct SQL for stats fails in 0-column case

2015-06-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579993#comment-14579993
 ] 

Hive QA commented on HIVE-10965:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12738677/HIVE-10965.01.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 9006 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_stats
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_stats
org.apache.hive.beeline.TestSchemaTool.testSchemaInit
org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4233/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4233/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4233/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12738677 - PreCommit-HIVE-TRUNK-Build

 direct SQL for stats fails in 0-column case
 ---

 Key: HIVE-10965
 URL: https://issues.apache.org/jira/browse/HIVE-10965
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 1.3.0, 1.2.1, 2.0.0

 Attachments: HIVE-10965.01.patch, HIVE-10965.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

96 matches

Mail list logo