[jira] [Created] (HIVE-8570) Unexpected "IllegalArgumentException" from parseURL method in org.apache.hive.jdbc.Utils interrupts java.sql.driverManager running before choosing the right driver when us

2014-10-22 Thread Shi Yuxiang (JIRA)
Shi Yuxiang created HIVE-8570:
-

 Summary: Unexpected "IllegalArgumentException" from parseURL 
method in org.apache.hive.jdbc.Utils interrupts java.sql.driverManager running 
before choosing the right driver when using multiple datasource
 Key: HIVE-8570
 URL: https://issues.apache.org/jira/browse/HIVE-8570
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.10.0
 Environment: centos 6,cdh 4.3.1
Reporter: Shi Yuxiang


My JDK is 1.6. 
I checked the source code in jdbc-hive 0.10-cdh4.3.1. I found the code in 
parseURL is not appropriate. When uri is not start with URL_PREFIX, parseURL 
will throw an exception. This exception is thrown to the outside of hive-jdbc. 
It will interrupt some logic outside of this driver, especially when I use 
multiple datesources besides hive.

if (!uri.startsWith(URL_PREFIX)) {
  throw new IllegalArgumentException("Bad URL format");
}

For example, I use mysql-connector-java and hive-jdbc to connect to mysql and 
hive respectively in my project. And I use java.sql.drivermanager to get 
connections after I initialize both hive and mysql jdbc drivers.
In java.sql.drivermanager, getconnection (...) method, we will choose the right 
driver according to url. In fact, it scans the driver list and tries each 
driver to get connection according to url. The code is like following:
for (int i = 0; i < drivers.size(); i++) {
DriverInfo di = (DriverInfo)drivers.elementAt(i);
// If the caller does not have permission to load the driver then 
// skip it.
if ( getCallerClass(callerCL, di.driverClassName ) != 
di.driverClass ) {
println("skipping: " + di);
continue;
}
try {
println("trying " + di);
// -- try to get connection here 
--
Connection result = di.driver.connect(url, info);
// -- if the connection is not null, return 
---
if (result != null) {
// Success!
println("getConnection returning " + di);
return (result);
}
} catch (SQLException ex) {
if (reason == null) {
reason = ex;
}
}
}
In this way, if I use hive.jdbc.Utils.parseURL to parse a mysql uri, parseURL 
will throw an IllegalArgumentException("Bad URL format"). Because drivermanager 
does not handler this exception, so drivermanager stops trying other drivers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8571) Unexpected "IllegalArgumentException" from parseURL method in org.apache.hive.jdbc.Utils interrupts java.sql.driverManager running before choosing the right driver when us

2014-10-22 Thread Shi Yuxiang (JIRA)
Shi Yuxiang created HIVE-8571:
-

 Summary: Unexpected "IllegalArgumentException" from parseURL 
method in org.apache.hive.jdbc.Utils interrupts java.sql.driverManager running 
before choosing the right driver when using multiple datasource
 Key: HIVE-8571
 URL: https://issues.apache.org/jira/browse/HIVE-8571
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.10.0
 Environment: centos 6,cdh 4.3.1
Reporter: Shi Yuxiang


My JDK is 1.6. 
I checked the source code in jdbc-hive 0.10-cdh4.3.1. I found the code in 
parseURL is not appropriate. When uri is not start with URL_PREFIX, parseURL 
will throw an exception. This exception is thrown to the outside of hive-jdbc. 
It will interrupt some logic outside of this driver, especially when I use 
multiple datesources besides hive.

if (!uri.startsWith(URL_PREFIX)) {
  throw new IllegalArgumentException("Bad URL format");
}

For example, I use mysql-connector-java and hive-jdbc to connect to mysql and 
hive respectively in my project. And I use java.sql.drivermanager to get 
connections after I initialize both hive and mysql jdbc drivers.
In java.sql.drivermanager, getconnection (...) method, we will choose the right 
driver according to url. In fact, it scans the driver list and tries each 
driver to get connection according to url. The code is like following:
for (int i = 0; i < drivers.size(); i++) {
DriverInfo di = (DriverInfo)drivers.elementAt(i);
// If the caller does not have permission to load the driver then 
// skip it.
if ( getCallerClass(callerCL, di.driverClassName ) != 
di.driverClass ) {
println("skipping: " + di);
continue;
}
try {
println("trying " + di);
// -- try to get connection here 
--
Connection result = di.driver.connect(url, info);
// -- if the connection is not null, return 
---
if (result != null) {
// Success!
println("getConnection returning " + di);
return (result);
}
} catch (SQLException ex) {
if (reason == null) {
reason = ex;
}
}
}
In this way, if I use hive.jdbc.Utils.parseURL to parse a mysql uri, parseURL 
will throw an IllegalArgumentException("Bad URL format"). Because drivermanager 
does not handler this exception, so drivermanager stops trying other drivers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8568) Add HS2 API to fetch Job IDs for a given query

2014-10-22 Thread Mohit Sabharwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181065#comment-14181065
 ] 

Mohit Sabharwal commented on HIVE-8568:
---

Attached patch adds a Thrift call to fetch Job IDs corresponding to all  
running MR tasks. The response includes a list of job IDs and an enum 
indicating the execution engine. Throws an exception if no jobIDs were fetched. 

The client may need to make the call multiple times since the jobs may not have 
started running. 

Currently only returns MR job IDs. Support for Tez job IDs is left as a todo 
item for a future commit.

> Add HS2 API to fetch Job IDs for a given query
> --
>
> Key: HIVE-8568
> URL: https://issues.apache.org/jira/browse/HIVE-8568
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohit Sabharwal
>Assignee: Mohit Sabharwal
> Attachments: HIVE-8568.patch
>
>
> Fetching Job IDs corresponding to all running MR/Tez tasks is useful for 
> clients like Hue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8568) Add HS2 API to fetch Job IDs for a given query

2014-10-22 Thread Mohit Sabharwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohit Sabharwal updated HIVE-8568:
--
Status: Patch Available  (was: Open)

> Add HS2 API to fetch Job IDs for a given query
> --
>
> Key: HIVE-8568
> URL: https://issues.apache.org/jira/browse/HIVE-8568
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohit Sabharwal
>Assignee: Mohit Sabharwal
> Attachments: HIVE-8568.patch
>
>
> Fetching Job IDs corresponding to all running MR/Tez tasks is useful for 
> clients like Hue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8568) Add HS2 API to fetch Job IDs for a given query

2014-10-22 Thread Mohit Sabharwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohit Sabharwal updated HIVE-8568:
--
Attachment: HIVE-8568.patch

> Add HS2 API to fetch Job IDs for a given query
> --
>
> Key: HIVE-8568
> URL: https://issues.apache.org/jira/browse/HIVE-8568
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohit Sabharwal
>Assignee: Mohit Sabharwal
> Attachments: HIVE-8568.patch
>
>
> Fetching Job IDs corresponding to all running MR/Tez tasks is useful for 
> clients like Hue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 26988: HIVE-8568 : Add HS2 API to fetch Job IDs for a given query

2014-10-22 Thread Mohit Sabharwal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26988/
---

Review request for hive.


Bugs: HIVE-8568
https://issues.apache.org/jira/browse/HIVE-8568


Repository: hive-git


Description
---

HIVE-8568 : Add HS2 API to fetch Job IDs for a given query

This patch adds a Thrift call to fetch Job IDs corresponding to all 
running MR tasks. The response includes a list of job IDs and an
enum indicating the execution engine. Throws an exception if no
jobIDs were fetched.

Currently only returns MR job IDs. Support for Tez job IDs is left 
as a todo item for a future commit.

The client may need to make the call multiple times since the jobs
may not have started running. Exposed RunningJob associated with 
ExecDriver as a public method, so that the Driver can access the
jobIDs corresponding to all running tasks.


Diffs
-

  
itests/hive-unit-hadoop2/src/test/java/org/apache/hive/TestThriftGetJobIDs.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java 
e25450531a71ef4ae4c6d9ea1788e618189a17cb 
  ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java 
c7d3b6652f89cf7b6507f35176962ff3287d112d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 
4e3df75c614fe7232e670201f2560c7ccd1db41c 
  ql/src/java/org/apache/hadoop/hive/ql/thrift/JobIDSet.java PRE-CREATION 
  service/if/TCLIService.thrift 4024bb3f412440fb7533f2e2d8ebc9a7cdc0776d 
  service/src/gen/thrift/gen-cpp/TCLIService.h 
030475b25188c5d2494da4de0bd6edc1ae807eca 
  service/src/gen/thrift/gen-cpp/TCLIService.cpp 
209ce63ae1ffd593de81e8e0a8e73218afe3cd79 
  service/src/gen/thrift/gen-cpp/TCLIService_server.skeleton.cpp 
988bb4c11ddb717f585e0ba2fb4773ec5fff77e6 
  service/src/gen/thrift/gen-cpp/TCLIService_types.h 
f32dc3c90caedba86d943a9295a2f246a7b0ec90 
  service/src/gen/thrift/gen-cpp/TCLIService_types.cpp 
326d25b8b7d814f7bbdfab7dde805be4834493dc 
  
service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TCLIService.java
 54851b8d513179e3618ee5a974941bb6a72378b6 
  
service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TExecutionEngine.java
 PRE-CREATION 
  
service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TGetJobIDsReq.java
 PRE-CREATION 
  
service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TGetJobIDsResp.java
 PRE-CREATION 
  service/src/gen/thrift/gen-php/TCLIService.php 
d2462967c4ee40c46bdeb6c8e24e22e63f3567e3 
  service/src/gen/thrift/gen-py/TCLIService/TCLIService-remote 
f6ff43f021524adb1a179595bfdd9260e98bff28 
  service/src/gen/thrift/gen-py/TCLIService/TCLIService.py 
ebc65746ccd7e58c0878f426b429274d4b59ed0b 
  service/src/gen/thrift/gen-py/TCLIService/ttypes.py 
6cd64d0386f1e7c73eee7d9868c387c4942a5f9f 
  service/src/gen/thrift/gen-rb/t_c_l_i_service.rb 
fd1ca9aa13f3db170caf310ebb1ee1bac9f70b63 
  service/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 
c731544888f7480b0b1af70440ce1697e8597c12 
  service/src/java/org/apache/hive/service/cli/CLIService.java 
f5751f1305d7dd4c1f74af5a3a4f94f018b7a38f 
  service/src/java/org/apache/hive/service/cli/CLIServiceClient.java 
3155c238ff688bfea16b0aaeea950599bb659b5b 
  service/src/java/org/apache/hive/service/cli/EmbeddedCLIServiceClient.java 
9cad5be198c063115a8e90c67b1c2fd910ca8bc6 
  service/src/java/org/apache/hive/service/cli/ICLIService.java 
c9cc1f4da56f1cd10f6348ea2b9e17e203b87664 
  service/src/java/org/apache/hive/service/cli/operation/Operation.java 
acb95cb015395f4a1a9280c3b0c719228e584df7 
  service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
a57b6e5d322ac312636c19633cee44f711b653df 
  service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
8cabf7ee2945296774d31925a2bce46a7320d668 
  service/src/java/org/apache/hive/service/cli/session/HiveSession.java 
6359a5b879928e8726017520f9a733d6b11decd4 
  service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
fa28a6b6a4acb61d8b442ed13b0421e1e0f13368 
  service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
a0a6e183bbd05cd61ba97f187d66b286c145969c 
  
service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIServiceClient.java 
1af45398b895cd7616c5627d318422e14b81e734 
  service/src/test/org/apache/hive/service/cli/thrift/ThriftCLIServiceTest.java 
630cfc9124abf7a8871b613b967141d0447eb18e 

Diff: https://reviews.apache.org/r/26988/diff/


Testing
---

Added unit test that issues async execute statements and follows
it up with getJobID calls.

Did not add to ThriftCLIServiceTest since test needs a miniMR cluster
to run jobs that generate jobIDs.


Thanks,

Mohit Sabharwal



[jira] [Updated] (HIVE-8569) error result when hive meet window function

2014-10-22 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-8569:

Assignee: (was: Navis)

> error result when hive meet window function
> ---
>
> Key: HIVE-8569
> URL: https://issues.apache.org/jira/browse/HIVE-8569
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Yi Tian
>
> how to reproduce:
> {quote}
> drop table over10k;
> create table over10k(
>t tinyint,
>si smallint,
>i int,
>b bigint,
>f float,
>d double,
>bo boolean,
>s string,
>  ts timestamp, 
>dec decimal,  
>bin binary)
>row format delimited
>fields terminated by '|';
> load data local inpath '../data/files/over10k' into table over10k;
> select ts,s,i, sum(i) over(partition by ts order by s) from over10k where 
> s='ethan van buren' and ts='2013-03-01 09:11:58.703325';
> {quote}
> the result is :
> {quote}
> 2013-03-01 09:11:58.703325ethan van buren 65644   131222
> 2013-03-01 09:11:58.703325ethan van buren 65578   131222
> {quote}
> but the fourth field of the first line should be 65644.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-8569) error result when hive meet window function

2014-10-22 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis reassigned HIVE-8569:
---

Assignee: Navis

> error result when hive meet window function
> ---
>
> Key: HIVE-8569
> URL: https://issues.apache.org/jira/browse/HIVE-8569
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Yi Tian
>Assignee: Navis
>
> how to reproduce:
> {quote}
> drop table over10k;
> create table over10k(
>t tinyint,
>si smallint,
>i int,
>b bigint,
>f float,
>d double,
>bo boolean,
>s string,
>  ts timestamp, 
>dec decimal,  
>bin binary)
>row format delimited
>fields terminated by '|';
> load data local inpath '../data/files/over10k' into table over10k;
> select ts,s,i, sum(i) over(partition by ts order by s) from over10k where 
> s='ethan van buren' and ts='2013-03-01 09:11:58.703325';
> {quote}
> the result is :
> {quote}
> 2013-03-01 09:11:58.703325ethan van buren 65644   131222
> 2013-03-01 09:11:58.703325ethan van buren 65578   131222
> {quote}
> but the fourth field of the first line should be 65644.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8569) error result when hive meet window function

2014-10-22 Thread Yi Tian (JIRA)
Yi Tian created HIVE-8569:
-

 Summary: error result when hive meet window function
 Key: HIVE-8569
 URL: https://issues.apache.org/jira/browse/HIVE-8569
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.13.0, 0.12.0
Reporter: Yi Tian


how to reproduce:
{quote}
drop table over10k;

create table over10k(
   t tinyint,
   si smallint,
   i int,
   b bigint,
   f float,
   d double,
   bo boolean,
   s string,
   ts timestamp, 
   dec decimal,  
   bin binary)
   row format delimited
   fields terminated by '|';

load data local inpath '../data/files/over10k' into table over10k;

select ts,s,i, sum(i) over(partition by ts order by s) from over10k where 
s='ethan van buren' and ts='2013-03-01 09:11:58.703325';
{quote}
the result is :
{quote}
2013-03-01 09:11:58.703325  ethan van buren 65644   131222
2013-03-01 09:11:58.703325  ethan van buren 65578   131222
{quote}
but the fourth field of the first line should be 65644.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8563) Running annotate_stats_join_pkfk.q in TestMiniTezCliDriver is causing NPE

2014-10-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181051#comment-14181051
 ] 

Hive QA commented on HIVE-8563:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12676453/HIVE-8563.2.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6576 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join_pkfk
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
org.apache.hive.minikdc.TestJdbcWithMiniKdc.testNegativeTokenAuth
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1406/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1406/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1406/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12676453 - PreCommit-HIVE-TRUNK-Build

> Running annotate_stats_join_pkfk.q in TestMiniTezCliDriver is causing NPE
> -
>
> Key: HIVE-8563
> URL: https://issues.apache.org/jira/browse/HIVE-8563
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth J
>Assignee: Vikram Dixit K
>Priority: Critical
> Attachments: HIVE-8563.2.patch, HIVE-8563.WIP.patch
>
>
> I added a test case as part of HIVE-8549 to annotate_stats_join_pkfk.q. This 
> test case fails with NullPointerException when we run using 
> TestMiniTezCliDriver. Here is the stack trace
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.plan.PlanUtils.getFieldSchemasFromRowSchema(PlanUtils.java:548)
> at 
> org.apache.hadoop.hive.ql.optimizer.ReduceSinkMapJoinProc.process(ReduceSinkMapJoinProc.java:239)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:87)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:69)
> at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:367)
> at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:202)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10057)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:417)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:303)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1070)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1132)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1007)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:997)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8567) Vectorized queries output extra stuff for Binary columns

2014-10-22 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181033#comment-14181033
 ] 

Jitendra Nath Pandey commented on HIVE-8567:


+1

> Vectorized queries output extra stuff for Binary columns
> 
>
> Key: HIVE-8567
> URL: https://issues.apache.org/jira/browse/HIVE-8567
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 0.14.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8567.01.patch
>
>
> See vector_data_types.q query output.  Non-vectorized output is shorter than 
> vectorized binary column output which seems to include characters from 
> earlier rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8561) Expose Hive optiq operator tree to be able to support other sql on hadoop query engines

2014-10-22 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181032#comment-14181032
 ] 

Xuefu Zhang commented on HIVE-8561:
---

Hi @Na Yang, I took a brief look. With my limitted understanding of CBO code, 
the patch looks fine. My only concern is that this seems expose some Hive's 
internals to the outside world. If Hive make any changes on this, it might 
inadvertantly break other applications. What's your thought on this? Is there a 
different approach that achieves the same purpose, or has Hive exposed anything 
similar to this?

> Expose Hive optiq operator tree to be able to support other sql on hadoop 
> query engines
> ---
>
> Key: HIVE-8561
> URL: https://issues.apache.org/jira/browse/HIVE-8561
> Project: Hive
>  Issue Type: Task
>  Components: CBO
>Affects Versions: 0.14.0
>Reporter: Na Yang
>Assignee: Na Yang
> Attachments: HIVE-8561.patch
>
>
> Hive-0.14 added cost based optimization and optiq operator tree is created 
> for select queries. However, the optiq operator tree is not visible from 
> outside and hard to be used by other Sql on Hadoop query engine such as 
> apache Drill. To be able to allow drill to access the hive optiq operator 
> tree, we need to add a public api to return the hive optiq operator tree.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8567) Vectorized queries output extra stuff for Binary columns

2014-10-22 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-8567:
---
Status: Patch Available  (was: Open)

> Vectorized queries output extra stuff for Binary columns
> 
>
> Key: HIVE-8567
> URL: https://issues.apache.org/jira/browse/HIVE-8567
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 0.14.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8567.01.patch
>
>
> See vector_data_types.q query output.  Non-vectorized output is shorter than 
> vectorized binary column output which seems to include characters from 
> earlier rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8567) Vectorized queries output extra stuff for Binary columns

2014-10-22 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-8567:
---
Attachment: HIVE-8567.01.patch

> Vectorized queries output extra stuff for Binary columns
> 
>
> Key: HIVE-8567
> URL: https://issues.apache.org/jira/browse/HIVE-8567
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 0.14.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8567.01.patch
>
>
> See vector_data_types.q query output.  Non-vectorized output is shorter than 
> vectorized binary column output which seems to include characters from 
> earlier rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8545) Exception when casting Text to BytesWritable [Spark Branch]

2014-10-22 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181030#comment-14181030
 ] 

Chao commented on HIVE-8545:


[~xuefuz] I agree. This indeed would be a better approach. Let me make the 
change and re-upload a patch.

> Exception when casting Text to BytesWritable [Spark Branch]
> ---
>
> Key: HIVE-8545
> URL: https://issues.apache.org/jira/browse/HIVE-8545
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao
>Assignee: Chao
> Attachments: HIVE-8545.1-spark.patch, HIVE-8545.2-spark.patch, 
> HIVE-8545.3-spark.patch, HIVE-8545.4-spark.patch, HIVE-8545.5-spark.patch
>
>
> With the current multi-insertion implementation, when caching is enabled for 
> input RDD, query may fail with the following exception:
> {noformat}
> 2014-10-21 13:57:34,742 WARN  [task-result-getter-0]: 
> scheduler.TaskSetManager (Logging.scala:logWarning(71)) - Lost task 0.0 in 
> stage 1.0 (TID 1, localhost): java.lang.ClassCastException: 
> org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.BytesWritable
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:67)
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:61)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> 
> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:234)
> 
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
> org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
> 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> org.apache.spark.scheduler.Task.run(Task.scala:56)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
> {noformat}
> The fix should be easy. However, interestingly, this error doesn't show up 
> when the caching is turned off. We need to find out why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8545) Exception when casting Text to BytesWritable [Spark Branch]

2014-10-22 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181024#comment-14181024
 ] 

Xuefu Zhang commented on HIVE-8545:
---

[~csun], I have a second thought. I think we can still keep HiveCopyFunction 
where it used to be. I think calling WritableUtils.clone() doesn't require 
Spark's JobConf. We can just create a default Configuration, conf = new 
Configuration() and pass it to WritableUtils.clone(). That way, 
HiveCopyFunction can keep its old way and stay where it was. After doing this, 
we can keep toCache variable in MapInput. This seems a little cleaner. What do 
you think?

> Exception when casting Text to BytesWritable [Spark Branch]
> ---
>
> Key: HIVE-8545
> URL: https://issues.apache.org/jira/browse/HIVE-8545
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao
>Assignee: Chao
> Attachments: HIVE-8545.1-spark.patch, HIVE-8545.2-spark.patch, 
> HIVE-8545.3-spark.patch, HIVE-8545.4-spark.patch, HIVE-8545.5-spark.patch
>
>
> With the current multi-insertion implementation, when caching is enabled for 
> input RDD, query may fail with the following exception:
> {noformat}
> 2014-10-21 13:57:34,742 WARN  [task-result-getter-0]: 
> scheduler.TaskSetManager (Logging.scala:logWarning(71)) - Lost task 0.0 in 
> stage 1.0 (TID 1, localhost): java.lang.ClassCastException: 
> org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.BytesWritable
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:67)
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:61)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> 
> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:234)
> 
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
> org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
> 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> org.apache.spark.scheduler.Task.run(Task.scala:56)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
> {noformat}
> The fix should be easy. However, interestingly, this error doesn't show up 
> when the caching is turned off. We need to find out why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8435) Add identity project remover optimization

2014-10-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181019#comment-14181019
 ] 

Hive QA commented on HIVE-8435:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12676444/HIVE-8435.02.patch

{color:red}ERROR:{color} -1 due to 549 failed/errored test(s), 6561 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver_accumulo_predicate_pushdown
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver_accumulo_queries
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ambiguous_col
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join_pkfk
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_excludeHadoop20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_create_temp_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join16
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join21
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join22
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join24
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join27
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join28
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join29
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join31
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_reordering_values
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_smb_mapjoin_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_6
org.apache.hadoop.hive.cli.TestCliDriver.tes

[jira] [Commented] (HIVE-6165) Unify HivePreparedStatement from jdbc:hive and jdbc:hive2

2014-10-22 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181017#comment-14181017
 ] 

Xuefu Zhang commented on HIVE-6165:
---

[~lethum], could you please rebase the patch?

> Unify HivePreparedStatement from jdbc:hive and jdbc:hive2
> -
>
> Key: HIVE-6165
> URL: https://issues.apache.org/jira/browse/HIVE-6165
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Reporter: Helmut Zechmann
>Priority: Minor
> Attachments: HIVE-6165.1.patch.txt, HIVE-6165.1.patch.txt
>
>
> org.apache.hadoop.hive.jdbc.HivePreparedStatement.class from the hive jdbc 
> driver and org.apache.hive.jdbc.HivePreparedStatement.class from the hive2 
> jdbc drivers contain lots of duplicate code. 
> Especially hive-HivePreparedStatement supports "setObject", while the hive2 
> version does not.
> Share more code between the two to avoid duplicate work and to make sure that 
> both support the broadest possible feature set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8565) beeline may go into an infinite loop when using EOF

2014-10-22 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181008#comment-14181008
 ] 

Chao commented on HIVE-8565:


[~brocknoland] The issue cannot be reproduced after I reverted the commit.

> beeline may go into an infinite loop when using EOF
> ---
>
> Key: HIVE-8565
> URL: https://issues.apache.org/jira/browse/HIVE-8565
> Project: Hive
>  Issue Type: Bug
>Reporter: Chao
>Assignee: Chao
>
> The problem can be reproduced by a simple query:
> {noformat}
> $HIVE_HOME/bin/beeline -u  -n  -p  < > show databases;
> > EOF
> {noformat}
> Then, it will go into an infinite loop and keep printing command prompt.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8565) beeline may go into an infinite loop when using EOF

2014-10-22 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181000#comment-14181000
 ] 

Chao commented on HIVE-8565:


Yeah, I tried jline2, but there are a lot of compatibility issues.
I also filed an issue on github, in the hope that they would notice it 
(although the project is not active for a long time).

> beeline may go into an infinite loop when using EOF
> ---
>
> Key: HIVE-8565
> URL: https://issues.apache.org/jira/browse/HIVE-8565
> Project: Hive
>  Issue Type: Bug
>Reporter: Chao
>Assignee: Chao
>
> The problem can be reproduced by a simple query:
> {noformat}
> $HIVE_HOME/bin/beeline -u  -n  -p  < > show databases;
> > EOF
> {noformat}
> Then, it will go into an infinite loop and keep printing command prompt.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8545) Exception when casting Text to BytesWritable [Spark Branch]

2014-10-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180997#comment-14180997
 ] 

Hive QA commented on HIVE-8545:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12676519/HIVE-8545.5-spark.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6772 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_tez_smb_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_move_tasks_share_dependencies
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/253/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/253/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-253/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12676519 - PreCommit-HIVE-SPARK-Build

> Exception when casting Text to BytesWritable [Spark Branch]
> ---
>
> Key: HIVE-8545
> URL: https://issues.apache.org/jira/browse/HIVE-8545
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao
>Assignee: Chao
> Attachments: HIVE-8545.1-spark.patch, HIVE-8545.2-spark.patch, 
> HIVE-8545.3-spark.patch, HIVE-8545.4-spark.patch, HIVE-8545.5-spark.patch
>
>
> With the current multi-insertion implementation, when caching is enabled for 
> input RDD, query may fail with the following exception:
> {noformat}
> 2014-10-21 13:57:34,742 WARN  [task-result-getter-0]: 
> scheduler.TaskSetManager (Logging.scala:logWarning(71)) - Lost task 0.0 in 
> stage 1.0 (TID 1, localhost): java.lang.ClassCastException: 
> org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.BytesWritable
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:67)
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:61)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> 
> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:234)
> 
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
> org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
> 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> org.apache.spark.scheduler.Task.run(Task.scala:56)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
> {noformat}
> The fix should be easy. However, interestingly, this error doesn't show up 
> when the caching is turned off. We need to find out why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8565) beeline may go into an infinite loop when using EOF

2014-10-22 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180996#comment-14180996
 ] 

Brock Noland commented on HIVE-8565:


Also this might be fixed in the latest version of jline

https://github.com/jline/jline2/blob/master/src/main/java/jline/console/ConsoleReader.java

> beeline may go into an infinite loop when using EOF
> ---
>
> Key: HIVE-8565
> URL: https://issues.apache.org/jira/browse/HIVE-8565
> Project: Hive
>  Issue Type: Bug
>Reporter: Chao
>Assignee: Chao
>
> The problem can be reproduced by a simple query:
> {noformat}
> $HIVE_HOME/bin/beeline -u  -n  -p  < > show databases;
> > EOF
> {noformat}
> Then, it will go into an infinite loop and keep printing command prompt.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8565) beeline may go into an infinite loop when using EOF

2014-10-22 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180993#comment-14180993
 ] 

Brock Noland commented on HIVE-8565:


Can you reproduce if you revert his commit?

https://github.com/apache/hive/commit/35094222c4af6e8b8df91e2314040e2c45415bd6

this method is only used the terminal is not supported:

https://github.com/jline/jline/blob/jline-2.5/src/main/java/jline/console/ConsoleReader.java#L1076

> beeline may go into an infinite loop when using EOF
> ---
>
> Key: HIVE-8565
> URL: https://issues.apache.org/jira/browse/HIVE-8565
> Project: Hive
>  Issue Type: Bug
>Reporter: Chao
>Assignee: Chao
>
> The problem can be reproduced by a simple query:
> {noformat}
> $HIVE_HOME/bin/beeline -u  -n  -p  < > show databases;
> > EOF
> {noformat}
> Then, it will go into an infinite loop and keep printing command prompt.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8561) Expose Hive optiq operator tree to be able to support other sql on hadoop query engines

2014-10-22 Thread Na Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180978#comment-14180978
 ] 

Na Yang commented on HIVE-8561:
---

Hi [~xuefuz], can you please help review this patch? This patch is not fixing a 
hive bug, but rather providing support for other sql on hadoop query engine 
such as apache Drill.

Thanks & Regards,
Na


> Expose Hive optiq operator tree to be able to support other sql on hadoop 
> query engines
> ---
>
> Key: HIVE-8561
> URL: https://issues.apache.org/jira/browse/HIVE-8561
> Project: Hive
>  Issue Type: Task
>  Components: CBO
>Affects Versions: 0.14.0
>Reporter: Na Yang
>Assignee: Na Yang
> Attachments: HIVE-8561.patch
>
>
> Hive-0.14 added cost based optimization and optiq operator tree is created 
> for select queries. However, the optiq operator tree is not visible from 
> outside and hard to be used by other Sql on Hadoop query engine such as 
> apache Drill. To be able to allow drill to access the hive optiq operator 
> tree, we need to add a public api to return the hive optiq operator tree.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 27065: Expose Hive optiq operator tree to be able to support other sql on hadoop query engines

2014-10-22 Thread Na Yang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27065/
---

Review request for hive.


Bugs: Hive-8561
https://issues.apache.org/jira/browse/Hive-8561


Repository: hive-git


Description
---

Expose Hive optiq operator tree to be able to support other sql on hadoop query 
engines such as apache Drill


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/Driver.java e254505 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveLimitRel.java
 f8755d0 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 511103b 

Diff: https://reviews.apache.org/r/27065/diff/


Testing
---


Thanks,

Na Yang



[jira] [Created] (HIVE-8568) Add HS2 API to fetch Job IDs for a given query

2014-10-22 Thread Mohit Sabharwal (JIRA)
Mohit Sabharwal created HIVE-8568:
-

 Summary: Add HS2 API to fetch Job IDs for a given query
 Key: HIVE-8568
 URL: https://issues.apache.org/jira/browse/HIVE-8568
 Project: Hive
  Issue Type: Bug
Reporter: Mohit Sabharwal
Assignee: Mohit Sabharwal


Fetching Job IDs corresponding to all running MR/Tez tasks is useful for 
clients like Hue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 27046: HIVE-8545 - Exception when casting Text to BytesWritable [Spark Branch]

2014-10-22 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27046/#review57962
---



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java


This assertion is incorrect. It seems that we don't need toCache at all. 
Patch #5 fixes all these.


- Xuefu Zhang


On Oct. 23, 2014, 12:42 a.m., Chao Sun wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27046/
> ---
> 
> (Updated Oct. 23, 2014, 12:42 a.m.)
> 
> 
> Review request for hive, Brock Noland and Xuefu Zhang.
> 
> 
> Bugs: hive-8545
> https://issues.apache.org/jira/browse/hive-8545
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> With the current multi-insertion implementation, when caching is enabled for 
> input RDD, query may fail with the following exception:
> 2014-10-21 13:57:34,742 WARN  [task-result-getter-0]: 
> scheduler.TaskSetManager (Logging.scala:logWarning(71)) - Lost task 0.0 in 
> stage 1.0 (TID 1, localhost): java.lang.ClassCastException: 
> org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.BytesWritable
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:67)
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:61)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> 
> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:234)
> 
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
> org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
> 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> org.apache.spark.scheduler.Task.run(Task.scala:56)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
> The fix should be easy. However, interestingly, this error doesn't show up 
> when the caching is turned off. We need to find out why.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
>  dc5d148 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveCopyFunction.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 9849b49 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
> 25a4515 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTran.java 8a3dbf2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
> 0f21b46 
> 
> Diff: https://reviews.apache.org/r/27046/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Chao Sun
> 
>



[jira] [Commented] (HIVE-8545) Exception when casting Text to BytesWritable [Spark Branch]

2014-10-22 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180960#comment-14180960
 ] 

Chao commented on HIVE-8545:


[~xuefuz] Yeah, I think it's better this way - I also considered to get rid of 
{{toCache}}.

> Exception when casting Text to BytesWritable [Spark Branch]
> ---
>
> Key: HIVE-8545
> URL: https://issues.apache.org/jira/browse/HIVE-8545
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao
>Assignee: Chao
> Attachments: HIVE-8545.1-spark.patch, HIVE-8545.2-spark.patch, 
> HIVE-8545.3-spark.patch, HIVE-8545.4-spark.patch, HIVE-8545.5-spark.patch
>
>
> With the current multi-insertion implementation, when caching is enabled for 
> input RDD, query may fail with the following exception:
> {noformat}
> 2014-10-21 13:57:34,742 WARN  [task-result-getter-0]: 
> scheduler.TaskSetManager (Logging.scala:logWarning(71)) - Lost task 0.0 in 
> stage 1.0 (TID 1, localhost): java.lang.ClassCastException: 
> org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.BytesWritable
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:67)
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:61)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> 
> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:234)
> 
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
> org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
> 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> org.apache.spark.scheduler.Task.run(Task.scala:56)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
> {noformat}
> The fix should be easy. However, interestingly, this error doesn't show up 
> when the caching is turned off. We need to find out why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8545) Exception when casting Text to BytesWritable [Spark Branch]

2014-10-22 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180954#comment-14180954
 ] 

Chao commented on HIVE-8545:


[~xuefuz] Oops, it should be {{copyFunction != null}}. Sorry for this silly 
mistake.

> Exception when casting Text to BytesWritable [Spark Branch]
> ---
>
> Key: HIVE-8545
> URL: https://issues.apache.org/jira/browse/HIVE-8545
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao
>Assignee: Chao
> Attachments: HIVE-8545.1-spark.patch, HIVE-8545.2-spark.patch, 
> HIVE-8545.3-spark.patch, HIVE-8545.4-spark.patch, HIVE-8545.5-spark.patch
>
>
> With the current multi-insertion implementation, when caching is enabled for 
> input RDD, query may fail with the following exception:
> {noformat}
> 2014-10-21 13:57:34,742 WARN  [task-result-getter-0]: 
> scheduler.TaskSetManager (Logging.scala:logWarning(71)) - Lost task 0.0 in 
> stage 1.0 (TID 1, localhost): java.lang.ClassCastException: 
> org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.BytesWritable
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:67)
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:61)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> 
> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:234)
> 
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
> org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
> 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> org.apache.spark.scheduler.Task.run(Task.scala:56)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
> {noformat}
> The fix should be easy. However, interestingly, this error doesn't show up 
> when the caching is turned off. We need to find out why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8545) Exception when casting Text to BytesWritable [Spark Branch]

2014-10-22 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8545:
--
Attachment: HIVE-8545.5-spark.patch

Actually there seems a problem in the precondition assertion. Patch #5 fixes 
that, with a little bit code cleanup. [~csun], could you take a look at the 
patch?

> Exception when casting Text to BytesWritable [Spark Branch]
> ---
>
> Key: HIVE-8545
> URL: https://issues.apache.org/jira/browse/HIVE-8545
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao
>Assignee: Chao
> Attachments: HIVE-8545.1-spark.patch, HIVE-8545.2-spark.patch, 
> HIVE-8545.3-spark.patch, HIVE-8545.4-spark.patch, HIVE-8545.5-spark.patch
>
>
> With the current multi-insertion implementation, when caching is enabled for 
> input RDD, query may fail with the following exception:
> {noformat}
> 2014-10-21 13:57:34,742 WARN  [task-result-getter-0]: 
> scheduler.TaskSetManager (Logging.scala:logWarning(71)) - Lost task 0.0 in 
> stage 1.0 (TID 1, localhost): java.lang.ClassCastException: 
> org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.BytesWritable
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:67)
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:61)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> 
> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:234)
> 
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
> org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
> 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> org.apache.spark.scheduler.Task.run(Task.scala:56)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
> {noformat}
> The fix should be easy. However, interestingly, this error doesn't show up 
> when the caching is turned off. We need to find out why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8534) sql std auth : update configuration whitelist for 0.14

2014-10-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180949#comment-14180949
 ] 

Hive QA commented on HIVE-8534:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12676443/HIVE-8534.5.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6578 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join_pkfk
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
org.apache.hive.minikdc.TestJdbcWithMiniKdc.testNegativeTokenAuth
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1404/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1404/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1404/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12676443 - PreCommit-HIVE-TRUNK-Build

> sql std auth : update configuration whitelist for 0.14
> --
>
> Key: HIVE-8534
> URL: https://issues.apache.org/jira/browse/HIVE-8534
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, SQLStandardAuthorization
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>Priority: Blocker
> Fix For: 0.14.0
>
> Attachments: HIVE-8534.1.patch, HIVE-8534.2.patch, HIVE-8534.3.patch, 
> HIVE-8534.4.patch, HIVE-8534.5.patch
>
>
> New config parameters have been introduced in hive 0.14. SQL standard 
> authorization needs to be updated to allow some new parameters to be set, 
> when the authorization mode is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8549) NPE in PK-FK inference when one side of join is complex tree

2014-10-22 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-8549:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to branch .14 and trunk.

> NPE in PK-FK inference when one side of join is complex tree
> 
>
> Key: HIVE-8549
> URL: https://issues.apache.org/jira/browse/HIVE-8549
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor, Statistics
>Affects Versions: 0.14.0, 0.15.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8549.1.patch, HIVE-8549.2.patch
>
>
> HIVE-8168 added PK-FK inference from column stats. But when one side of join 
> is complex tree which propagates FK, relationship inference fails with NPE.
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$JoinStatsRule.getSelectivity(StatsRulesProcFactory.java:1293)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$JoinStatsRule.inferPKFKRelationship(StatsRulesProcFactory.java:1250)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$JoinStatsRule.process(StatsRulesProcFactory.java:1067)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:54)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:78)
> at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsAnnotation(TezCompiler.java:248)
> at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:120)
> at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:99)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10039)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:415)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:303)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1067)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1129)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1004)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:994)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:247)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8549) NPE in PK-FK inference when one side of join is complex tree

2014-10-22 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-8549:
-
Fix Version/s: 0.14.0

> NPE in PK-FK inference when one side of join is complex tree
> 
>
> Key: HIVE-8549
> URL: https://issues.apache.org/jira/browse/HIVE-8549
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor, Statistics
>Affects Versions: 0.14.0, 0.15.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8549.1.patch, HIVE-8549.2.patch
>
>
> HIVE-8168 added PK-FK inference from column stats. But when one side of join 
> is complex tree which propagates FK, relationship inference fails with NPE.
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$JoinStatsRule.getSelectivity(StatsRulesProcFactory.java:1293)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$JoinStatsRule.inferPKFKRelationship(StatsRulesProcFactory.java:1250)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$JoinStatsRule.process(StatsRulesProcFactory.java:1067)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:54)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:78)
> at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsAnnotation(TezCompiler.java:248)
> at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:120)
> at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:99)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10039)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:415)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:303)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1067)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1129)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1004)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:994)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:247)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8549) NPE in PK-FK inference when one side of join is complex tree

2014-10-22 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180942#comment-14180942
 ] 

Vikram Dixit K commented on HIVE-8549:
--

+1 for 0.14

> NPE in PK-FK inference when one side of join is complex tree
> 
>
> Key: HIVE-8549
> URL: https://issues.apache.org/jira/browse/HIVE-8549
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor, Statistics
>Affects Versions: 0.14.0, 0.15.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8549.1.patch, HIVE-8549.2.patch
>
>
> HIVE-8168 added PK-FK inference from column stats. But when one side of join 
> is complex tree which propagates FK, relationship inference fails with NPE.
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$JoinStatsRule.getSelectivity(StatsRulesProcFactory.java:1293)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$JoinStatsRule.inferPKFKRelationship(StatsRulesProcFactory.java:1250)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$JoinStatsRule.process(StatsRulesProcFactory.java:1067)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:54)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:78)
> at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsAnnotation(TezCompiler.java:248)
> at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:120)
> at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:99)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10039)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:415)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:303)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1067)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1129)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1004)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:994)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:247)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]

2014-10-22 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180931#comment-14180931
 ] 

Rui Li commented on HIVE-8528:
--

Yep, I see. Thanks for explaining :-)

> Add remote Spark client to Hive [Spark Branch]
> --
>
> Key: HIVE-8528
> URL: https://issues.apache.org/jira/browse/HIVE-8528
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, 
> HIVE-8528.2-spark.patch, HIVE-8528.2-spark.patch
>
>
> For the time being, at least, we've decided to build the Spark client (see 
> SPARK-3215) inside Hive. This task tracks merging the ongoing work into the 
> Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7985) With CBO enabled cross product is generated when a subquery is present

2014-10-22 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7985:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to branch .14 and trunk. Thanks [~jpullokkaran] and [~ashutoshc]!

> With CBO enabled cross product is generated when a subquery is present
> --
>
> Key: HIVE-7985
> URL: https://issues.apache.org/jira/browse/HIVE-7985
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Laljo John Pullokkaran
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-7985.1.patch, HIVE-7985.2.patch, HIVE-7985.patch
>
>
> This is a regression introduced in the latest build of the CBO branch.
> Removing the subquery for item will remove the cross products
> Query
> {code}
> select i_item_id,sum(ss_ext_sales_price) total_sales from store_sales, 
> date_dim, item where item.i_item_id in (select i.i_item_id from item i where 
> i_color in ('purple','burlywood','indian')) and ss_item_sk = i_item_sk and 
> ss_sold_date_sk = d_date_sk and d_year = 2001 and d_moy = 1 group by 
> i_item_id;
> {code}
> {code}
> Warning: Map Join MAPJOIN[38][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[39][bigTable=store_sales] in task 'Map 4' is a 
> cross product
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Map 1 <- Map 3 (BROADCAST_EDGE)
> Map 4 <- Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE)
> Reducer 5 <- Map 4 (SIMPLE_EDGE)
>   DagName: mmokhtar_20140904141313_9c253f7e-aad1-4ca4-9be1-ea45e3d34496:1
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: item
>   filterExpr: (true and i_item_id is not null) (type: boolean)
>   Statistics: Num rows: 462000 Data size: 663862160 Basic 
> stats: COMPLETE Column stats: NONE
>   Filter Operator
> predicate: i_item_id is not null (type: boolean)
> Statistics: Num rows: 231000 Data size: 331931080 Basic 
> stats: COMPLETE Column stats: NONE
> Map Join Operator
>   condition map:
>Inner Join 0 to 1
>   condition expressions:
> 0 {i_item_sk} {i_item_id}
> 1 {d_date_sk}
>   keys:
> 0
> 1
>   outputColumnNames: _col0, _col1, _col25
>   input vertices:
> 1 Map 3
>   Statistics: Num rows: 254100 Data size: 365124192 Basic 
> stats: COMPLETE Column stats: NONE
>   Select Operator
> expressions: _col0 (type: int), _col1 (type: string), 
> _col25 (type: int)
> outputColumnNames: _col0, _col1, _col25
> Statistics: Num rows: 254100 Data size: 365124192 
> Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
>   sort order:
>   Statistics: Num rows: 254100 Data size: 365124192 
> Basic stats: COMPLETE Column stats: NONE
>   value expressions: _col0 (type: int), _col1 (type: 
> string), _col25 (type: int)
> Execution mode: vectorized
> Map 2
> Map Operator Tree:
> TableScan
>   alias: i
>   filterExpr: ((i_color) IN ('purple', 'burlywood', 'indian') 
> and i_item_id is not null) (type: boolean)
>   Statistics: Num rows: 462000 Data size: 663862160 Basic 
> stats: COMPLETE Column stats: NONE
>   Filter Operator
> predicate: ((i_color) IN ('purple', 'burlywood', 
> 'indian') and i_item_id is not null) (type: boolean)
> Statistics: Num rows: 115500 Data size: 165965540 Basic 
> stats: COMPLETE Column stats: NONE
> Select Operator
>   expressions: i_item_id (type: string)
>   outputColumnNames: _col0
>   Statistics: Num rows: 115500 Data size: 165965540 Basic 
> stats: COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 115500 Data size: 165965540 
> Basic stats: 

[jira] [Commented] (HIVE-8545) Exception when casting Text to BytesWritable [Spark Branch]

2014-10-22 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180930#comment-14180930
 ] 

Xuefu Zhang commented on HIVE-8545:
---

+1

> Exception when casting Text to BytesWritable [Spark Branch]
> ---
>
> Key: HIVE-8545
> URL: https://issues.apache.org/jira/browse/HIVE-8545
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao
>Assignee: Chao
> Attachments: HIVE-8545.1-spark.patch, HIVE-8545.2-spark.patch, 
> HIVE-8545.3-spark.patch, HIVE-8545.4-spark.patch
>
>
> With the current multi-insertion implementation, when caching is enabled for 
> input RDD, query may fail with the following exception:
> {noformat}
> 2014-10-21 13:57:34,742 WARN  [task-result-getter-0]: 
> scheduler.TaskSetManager (Logging.scala:logWarning(71)) - Lost task 0.0 in 
> stage 1.0 (TID 1, localhost): java.lang.ClassCastException: 
> org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.BytesWritable
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:67)
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:61)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> 
> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:234)
> 
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
> org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
> 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> org.apache.spark.scheduler.Task.run(Task.scala:56)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
> {noformat}
> The fix should be easy. However, interestingly, this error doesn't show up 
> when the caching is turned off. We need to find out why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8395) CBO: enable by default

2014-10-22 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-8395:
---
Attachment: HIVE-8395.10.patch

update for non-Cli driver test failure 

> CBO: enable by default
> --
>
> Key: HIVE-8395
> URL: https://issues.apache.org/jira/browse/HIVE-8395
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 0.15.0
>
> Attachments: HIVE-8395.01.patch, HIVE-8395.02.patch, 
> HIVE-8395.03.patch, HIVE-8395.04.patch, HIVE-8395.05.patch, 
> HIVE-8395.06.patch, HIVE-8395.07.patch, HIVE-8395.08.patch, 
> HIVE-8395.09.patch, HIVE-8395.10.patch, HIVE-8395.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-8561) Expose Hive optiq operator tree to be able to support other sql on hadoop query engines

2014-10-22 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180921#comment-14180921
 ] 

Xuefu Zhang edited comment on HIVE-8561 at 10/23/14 2:22 AM:
-

-+1-


was (Author: xuefuz):
+1

> Expose Hive optiq operator tree to be able to support other sql on hadoop 
> query engines
> ---
>
> Key: HIVE-8561
> URL: https://issues.apache.org/jira/browse/HIVE-8561
> Project: Hive
>  Issue Type: Task
>  Components: CBO
>Affects Versions: 0.14.0
>Reporter: Na Yang
>Assignee: Na Yang
> Attachments: HIVE-8561.patch
>
>
> Hive-0.14 added cost based optimization and optiq operator tree is created 
> for select queries. However, the optiq operator tree is not visible from 
> outside and hard to be used by other Sql on Hadoop query engine such as 
> apache Drill. To be able to allow drill to access the hive optiq operator 
> tree, we need to add a public api to return the hive optiq operator tree.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8395) CBO: enable by default

2014-10-22 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-8395:
---
Attachment: (was: HIVE-8395.10.patch)

> CBO: enable by default
> --
>
> Key: HIVE-8395
> URL: https://issues.apache.org/jira/browse/HIVE-8395
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 0.15.0
>
> Attachments: HIVE-8395.01.patch, HIVE-8395.02.patch, 
> HIVE-8395.03.patch, HIVE-8395.04.patch, HIVE-8395.05.patch, 
> HIVE-8395.06.patch, HIVE-8395.07.patch, HIVE-8395.08.patch, 
> HIVE-8395.09.patch, HIVE-8395.10.patch, HIVE-8395.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8561) Expose Hive optiq operator tree to be able to support other sql on hadoop query engines

2014-10-22 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180927#comment-14180927
 ] 

Xuefu Zhang commented on HIVE-8561:
---

Sorry. +1ed the wrong JIRA. I didn't review, but I'd like to if needed.

> Expose Hive optiq operator tree to be able to support other sql on hadoop 
> query engines
> ---
>
> Key: HIVE-8561
> URL: https://issues.apache.org/jira/browse/HIVE-8561
> Project: Hive
>  Issue Type: Task
>  Components: CBO
>Affects Versions: 0.14.0
>Reporter: Na Yang
>Assignee: Na Yang
> Attachments: HIVE-8561.patch
>
>
> Hive-0.14 added cost based optimization and optiq operator tree is created 
> for select queries. However, the optiq operator tree is not visible from 
> outside and hard to be used by other Sql on Hadoop query engine such as 
> apache Drill. To be able to allow drill to access the hive optiq operator 
> tree, we need to add a public api to return the hive optiq operator tree.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8561) Expose Hive optiq operator tree to be able to support other sql on hadoop query engines

2014-10-22 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180924#comment-14180924
 ] 

Sergey Shelukhin commented on HIVE-8561:


[~jpullokkaran] do you want to also take a look?

> Expose Hive optiq operator tree to be able to support other sql on hadoop 
> query engines
> ---
>
> Key: HIVE-8561
> URL: https://issues.apache.org/jira/browse/HIVE-8561
> Project: Hive
>  Issue Type: Task
>  Components: CBO
>Affects Versions: 0.14.0
>Reporter: Na Yang
>Assignee: Na Yang
> Attachments: HIVE-8561.patch
>
>
> Hive-0.14 added cost based optimization and optiq operator tree is created 
> for select queries. However, the optiq operator tree is not visible from 
> outside and hard to be used by other Sql on Hadoop query engine such as 
> apache Drill. To be able to allow drill to access the hive optiq operator 
> tree, we need to add a public api to return the hive optiq operator tree.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]

2014-10-22 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180923#comment-14180923
 ] 

Xuefu Zhang commented on HIVE-8528:
---

It's mainly for HiveSever2, where there are concurrent client sessions, each 
having its SparkContext object. SparkContext is heavy, putting memory pressure 
on HiveServer2, proportionally to the number of active client sessions. With 
remote SparkContext, the pressure is transfered to remote process.

> Add remote Spark client to Hive [Spark Branch]
> --
>
> Key: HIVE-8528
> URL: https://issues.apache.org/jira/browse/HIVE-8528
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, 
> HIVE-8528.2-spark.patch, HIVE-8528.2-spark.patch
>
>
> For the time being, at least, we've decided to build the Spark client (see 
> SPARK-3215) inside Hive. This task tracks merging the ongoing work into the 
> Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8561) Expose Hive optiq operator tree to be able to support other sql on hadoop query engines

2014-10-22 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180921#comment-14180921
 ] 

Xuefu Zhang commented on HIVE-8561:
---

+1

> Expose Hive optiq operator tree to be able to support other sql on hadoop 
> query engines
> ---
>
> Key: HIVE-8561
> URL: https://issues.apache.org/jira/browse/HIVE-8561
> Project: Hive
>  Issue Type: Task
>  Components: CBO
>Affects Versions: 0.14.0
>Reporter: Na Yang
>Assignee: Na Yang
> Attachments: HIVE-8561.patch
>
>
> Hive-0.14 added cost based optimization and optiq operator tree is created 
> for select queries. However, the optiq operator tree is not visible from 
> outside and hard to be used by other Sql on Hadoop query engine such as 
> apache Drill. To be able to allow drill to access the hive optiq operator 
> tree, we need to add a public api to return the hive optiq operator tree.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]

2014-10-22 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180919#comment-14180919
 ] 

Rui Li commented on HIVE-8528:
--

[~xuefuz] - thanks for pointing me to the detailed info. I guess the main use 
case for hive is in hive server or when the client has limited resources?

> Add remote Spark client to Hive [Spark Branch]
> --
>
> Key: HIVE-8528
> URL: https://issues.apache.org/jira/browse/HIVE-8528
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, 
> HIVE-8528.2-spark.patch, HIVE-8528.2-spark.patch
>
>
> For the time being, at least, we've decided to build the Spark client (see 
> SPARK-3215) inside Hive. This task tracks merging the ongoing work into the 
> Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-8395) CBO: enable by default

2014-10-22 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180912#comment-14180912
 ] 

Sergey Shelukhin edited comment on HIVE-8395 at 10/23/14 2:07 AM:
--

rinse, repeat


was (Author: sershe):
rince, repeat

> CBO: enable by default
> --
>
> Key: HIVE-8395
> URL: https://issues.apache.org/jira/browse/HIVE-8395
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 0.15.0
>
> Attachments: HIVE-8395.01.patch, HIVE-8395.02.patch, 
> HIVE-8395.03.patch, HIVE-8395.04.patch, HIVE-8395.05.patch, 
> HIVE-8395.06.patch, HIVE-8395.07.patch, HIVE-8395.08.patch, 
> HIVE-8395.09.patch, HIVE-8395.10.patch, HIVE-8395.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8395) CBO: enable by default

2014-10-22 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-8395:
---
Attachment: HIVE-8395.10.patch

rince, repeat

> CBO: enable by default
> --
>
> Key: HIVE-8395
> URL: https://issues.apache.org/jira/browse/HIVE-8395
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 0.15.0
>
> Attachments: HIVE-8395.01.patch, HIVE-8395.02.patch, 
> HIVE-8395.03.patch, HIVE-8395.04.patch, HIVE-8395.05.patch, 
> HIVE-8395.06.patch, HIVE-8395.07.patch, HIVE-8395.08.patch, 
> HIVE-8395.09.patch, HIVE-8395.10.patch, HIVE-8395.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8021) CBO: support CTAS and insert ... select

2014-10-22 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180909#comment-14180909
 ] 

Sergey Shelukhin commented on HIVE-8021:


annotate_stats_join_pkfk is caused by some other jira, the other two are 
unstable and unrelated

> CBO: support CTAS and insert ... select
> ---
>
> Key: HIVE-8021
> URL: https://issues.apache.org/jira/browse/HIVE-8021
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-8021.01.patch, HIVE-8021.01.patch, 
> HIVE-8021.02.patch, HIVE-8021.03.patch, HIVE-8021.04.patch, 
> HIVE-8021.05.patch, HIVE-8021.patch, HIVE-8021.preliminary.patch
>
>
> Need to send only the select part to CBO for now



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8545) Exception when casting Text to BytesWritable [Spark Branch]

2014-10-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180904#comment-14180904
 ] 

Hive QA commented on HIVE-8545:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12676491/HIVE-8545.4-spark.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6772 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_tez_smb_1
org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.testStatsAfterCompactionPartTbl
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/252/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/252/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-252/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12676491 - PreCommit-HIVE-SPARK-Build

> Exception when casting Text to BytesWritable [Spark Branch]
> ---
>
> Key: HIVE-8545
> URL: https://issues.apache.org/jira/browse/HIVE-8545
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao
>Assignee: Chao
> Attachments: HIVE-8545.1-spark.patch, HIVE-8545.2-spark.patch, 
> HIVE-8545.3-spark.patch, HIVE-8545.4-spark.patch
>
>
> With the current multi-insertion implementation, when caching is enabled for 
> input RDD, query may fail with the following exception:
> {noformat}
> 2014-10-21 13:57:34,742 WARN  [task-result-getter-0]: 
> scheduler.TaskSetManager (Logging.scala:logWarning(71)) - Lost task 0.0 in 
> stage 1.0 (TID 1, localhost): java.lang.ClassCastException: 
> org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.BytesWritable
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:67)
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:61)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> 
> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:234)
> 
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
> org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
> 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> org.apache.spark.scheduler.Task.run(Task.scala:56)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
> {noformat}
> The fix should be easy. However, interestingly, this error doesn't show up 
> when the caching is turned off. We need to find out why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7390) Make quote character optional and configurable in BeeLine CSV/TSV output

2014-10-22 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180902#comment-14180902
 ] 

Ferdinand Xu commented on HIVE-7390:


Hi [~thejas] and [~leftylev],
Release notes are added. Sorry for being late. 

> Make quote character optional and configurable in BeeLine CSV/TSV output
> 
>
> Key: HIVE-7390
> URL: https://issues.apache.org/jira/browse/HIVE-7390
> Project: Hive
>  Issue Type: New Feature
>  Components: Clients
>Affects Versions: 0.13.1
>Reporter: Jim Halfpenny
>Assignee: Ferdinand Xu
>  Labels: TODOC14
> Fix For: 0.14.0
>
> Attachments: HIVE-7390.1.patch, HIVE-7390.2.patch, HIVE-7390.3.patch, 
> HIVE-7390.4.patch, HIVE-7390.5.patch, HIVE-7390.6.patch, HIVE-7390.7.patch, 
> HIVE-7390.8.patch, HIVE-7390.9.patch, HIVE-7390.patch
>
>
> Currently when either the CSV or TSV output formats are used in beeline each 
> column is wrapped in single quotes. Quote wrapping of columns should be 
> optional and the user should be able to choose the character used to wrap the 
> columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7390) Make quote character optional and configurable in BeeLine CSV/TSV output

2014-10-22 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-7390:
---
Release Note: 
--outputformat=[table/vertical/csv/tsv/dsv] 
Format mode for result display. Default is table.
Usage: beeline --outputformat=tsv

--delimiterForDSV=DELIMITER
specify the delimiter for delimiter-separated values output format (default: |)
Usage: beeline --outputformat=dsv --delimiterForDSV=,

beeline dsv and delimiterForDSV examples are as followings:
% bin/beeline
Hive version 0.11.0-SNAPSHOT by Apache
beeline> !connect jdbc:hive2://localhost:1 scott tiger 
org.apache.hive.jdbc.HiveDriver
!connect jdbc:hive2://localhost:1 scott tiger 
org.apache.hive.jdbc.HiveDriver
Connecting to jdbc:hive2://localhost:1
Connected to: Hive (version 0.14.0-SNAPSHOT)
Driver: Hive (version 0.14.0-SNAPSHOT)
Transaction isolation: TRANSACTION_REPEATABLE_READ
HiveServer2 Clients – dsv Example
0: jdbc:hive2://localhost:1> create table csv_table(id int, name string, 
info string) row format delimited fields terminated by '\t';
No rows affected (0.121 seconds)
0: jdbc:hive2://localhost:1> load data local inpath '/root/names' overwrite 
into table csv_table;   
No rows affected (0.245 seconds)
0: jdbc:hive2://localhost:1> select * from csv_table;   

+---+-+-+--+
| csv_table.id  | csv_table.name  | csv_table.info  |
+---+-+-+--+
| 19630001  | "john"  | lennon  |
| 19630002  | peter,paul  | mccartney   |
| 19630003  | george  | harrison|
| 19630004  | ringo   | starr   |
+---+-+-+--+
4 rows selected (0.09 seconds)
0: jdbc:hive2://localhost:1> !outformat csv 
Unknown command: outformat csv
0: jdbc:hive2://localhost:1> !outputformat csv
0: jdbc:hive2://localhost:1> select * from csv_table;
csv_table.id,csv_table.name,csv_table.info
19630001,"""john""",lennon
19630002,"peter,paul",mccartney
19630003,george,harrison
19630004,ringo,starr
4 rows selected (0.105 seconds)
0: jdbc:hive2://localhost:1> !outputformat dsv   
0: jdbc:hive2://localhost:1> select * from csv_table;
csv_table.id|csv_table.name|csv_table.info
19630001|"""john"""|lennon
19630002|peter,paul|mccartney
19630003|george|harrison
19630004|ringo|starr
4 rows selected (0.123 seconds)
0: jdbc:hive2://localhost:1> !set delimiterForDSV ',';
0: jdbc:hive2://localhost:1> select * from csv_table; 
csv_table.id'csv_table.name'csv_table.info
19630001'"""john"""'lennon
19630002'peter,paul'mccartney
19630003'george'harrison
19630004'ringo'starr
4 rows selected (0.11 seconds)

> Make quote character optional and configurable in BeeLine CSV/TSV output
> 
>
> Key: HIVE-7390
> URL: https://issues.apache.org/jira/browse/HIVE-7390
> Project: Hive
>  Issue Type: New Feature
>  Components: Clients
>Affects Versions: 0.13.1
>Reporter: Jim Halfpenny
>Assignee: Ferdinand Xu
>  Labels: TODOC14
> Fix For: 0.14.0
>
> Attachments: HIVE-7390.1.patch, HIVE-7390.2.patch, HIVE-7390.3.patch, 
> HIVE-7390.4.patch, HIVE-7390.5.patch, HIVE-7390.6.patch, HIVE-7390.7.patch, 
> HIVE-7390.8.patch, HIVE-7390.9.patch, HIVE-7390.patch
>
>
> Currently when either the CSV or TSV output formats are used in beeline each 
> column is wrapped in single quotes. Quote wrapping of columns should be 
> optional and the user should be able to choose the character used to wrap the 
> columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8555) Too many casts results in loss of original string representation for constant

2014-10-22 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180894#comment-14180894
 ] 

Vikram Dixit K commented on HIVE-8555:
--

+1 for 0.14

> Too many casts results in loss of original string representation for constant 
> --
>
> Key: HIVE-8555
> URL: https://issues.apache.org/jira/browse/HIVE-8555
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 0.14.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.15.0
>
> Attachments: HIVE-8555.1.patch, HIVE-8555.patch
>
>
> {code}
> SELECT key, value FROM src WHERE key = cast(86 as double);
> 86.0  val_86
> {code}
> With constant propagate off we get different and correct result.
> {code}
> set hive.optimize.constant.propagation=false;
> SELECT key, value FROM src WHERE key =  cast(86 as double);
> 86  val_86
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8320) Error in MetaException(message:Got exception: org.apache.thrift.transport.TTransportException java.net.SocketTimeoutException: Read timed out)

2014-10-22 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180895#comment-14180895
 ] 

Vikram Dixit K commented on HIVE-8320:
--

+1 for 0.14

> Error in MetaException(message:Got exception: 
> org.apache.thrift.transport.TTransportException 
> java.net.SocketTimeoutException: Read timed out)
> --
>
> Key: HIVE-8320
> URL: https://issues.apache.org/jira/browse/HIVE-8320
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 0.13.1
>Reporter: gavin kim
>Assignee: gavin kim
>Priority: Critical
>  Labels: patch
> Fix For: 0.14.0
>
> Attachments: HIVE-8320.1.patch, HIVE-8320.2.patch
>
>
> I'm using Hive 13.1 in cdh environment.
> Using hue's beeswax, sometimes hiveserver2 occur MetaException.
> And after that, hive meta data request timed out.
> error log's detail is below.
> 2014-09-29 12:05:44,829 ERROR hive.log: Got exception: 
> org.apache.thrift.transport.TTransportException 
> java.net.SocketTimeoutException: Read timed out
> org.apache.thrift.transport.TTransportException: 
> java.net.SocketTimeoutException: Read timed out
> at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_databases(ThriftHiveMetastore.java:600)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:587)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:826)
> at 
> org.apache.hive.service.cli.operation.GetSchemasOperation.run(GetSchemasOperation.java:62)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.runOperationWithLogCapture(HiveSessionImpl.java:562)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.getSchemas(HiveSessionImpl.java:315)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:493)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:60)
> at com.sun.proxy.$Proxy13.getSchemas(Unknown Source)
> at 
> org.apache.hive.service.cli.CLIService.getSchemas(CLIService.java:273)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetSchemas(ThriftCLIService.java:402)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas.getResult(TCLIService.java:1429)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas.getResult(TCLIService.java:1414)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java

[jira] [Commented] (HIVE-7985) With CBO enabled cross product is generated when a subquery is present

2014-10-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180893#comment-14180893
 ] 

Hive QA commented on HIVE-7985:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12676432/HIVE-7985.2.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6576 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join_pkfk
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
org.apache.hive.minikdc.TestJdbcWithMiniKdc.testNegativeTokenAuth
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1402/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1402/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1402/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12676432 - PreCommit-HIVE-TRUNK-Build

> With CBO enabled cross product is generated when a subquery is present
> --
>
> Key: HIVE-7985
> URL: https://issues.apache.org/jira/browse/HIVE-7985
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Laljo John Pullokkaran
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-7985.1.patch, HIVE-7985.2.patch, HIVE-7985.patch
>
>
> This is a regression introduced in the latest build of the CBO branch.
> Removing the subquery for item will remove the cross products
> Query
> {code}
> select i_item_id,sum(ss_ext_sales_price) total_sales from store_sales, 
> date_dim, item where item.i_item_id in (select i.i_item_id from item i where 
> i_color in ('purple','burlywood','indian')) and ss_item_sk = i_item_sk and 
> ss_sold_date_sk = d_date_sk and d_year = 2001 and d_moy = 1 group by 
> i_item_id;
> {code}
> {code}
> Warning: Map Join MAPJOIN[38][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[39][bigTable=store_sales] in task 'Map 4' is a 
> cross product
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Map 1 <- Map 3 (BROADCAST_EDGE)
> Map 4 <- Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE)
> Reducer 5 <- Map 4 (SIMPLE_EDGE)
>   DagName: mmokhtar_20140904141313_9c253f7e-aad1-4ca4-9be1-ea45e3d34496:1
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: item
>   filterExpr: (true and i_item_id is not null) (type: boolean)
>   Statistics: Num rows: 462000 Data size: 663862160 Basic 
> stats: COMPLETE Column stats: NONE
>   Filter Operator
> predicate: i_item_id is not null (type: boolean)
> Statistics: Num rows: 231000 Data size: 331931080 Basic 
> stats: COMPLETE Column stats: NONE
> Map Join Operator
>   condition map:
>Inner Join 0 to 1
>   condition expressions:
> 0 {i_item_sk} {i_item_id}
> 1 {d_date_sk}
>   keys:
> 0
> 1
>   outputColumnNames: _col0, _col1, _col25
>   input vertices:
> 1 Map 3
>   Statistics: Num rows: 254100 Data size: 365124192 Basic 
> stats: COMPLETE Column stats: NONE
>   Select Operator
> expressions: _col0 (type: int), _col1 (type: string), 
> _col25 (type: int)
> outputColumnNames: _col0, _col1, _col25
> Statistics: Num rows: 254100 Data size: 365124192 
> Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
>   sort order:
>   Statistics: Num rows: 254100 Data size: 365124192 
> Basic stats: COMPLETE Column stats: NONE
>   value expressions: _col0 (type: int), _col1 (type: 
> string), _col25 (type: int)
> Execution mode: vectorized
> Map 2
> Map Operator Tree:
>

[jira] [Commented] (HIVE-8320) Error in MetaException(message:Got exception: org.apache.thrift.transport.TTransportException java.net.SocketTimeoutException: Read timed out)

2014-10-22 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180892#comment-14180892
 ] 

Thejas M Nair commented on HIVE-8320:
-

The precommit test failed because it was trying to apply the wrong file. I have 
deleted the wrong file, so it should hopefully work now. (the failed run 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1401/console
 )


> Error in MetaException(message:Got exception: 
> org.apache.thrift.transport.TTransportException 
> java.net.SocketTimeoutException: Read timed out)
> --
>
> Key: HIVE-8320
> URL: https://issues.apache.org/jira/browse/HIVE-8320
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 0.13.1
>Reporter: gavin kim
>Assignee: gavin kim
>Priority: Critical
>  Labels: patch
> Fix For: 0.14.0
>
> Attachments: HIVE-8320.1.patch, HIVE-8320.2.patch
>
>
> I'm using Hive 13.1 in cdh environment.
> Using hue's beeswax, sometimes hiveserver2 occur MetaException.
> And after that, hive meta data request timed out.
> error log's detail is below.
> 2014-09-29 12:05:44,829 ERROR hive.log: Got exception: 
> org.apache.thrift.transport.TTransportException 
> java.net.SocketTimeoutException: Read timed out
> org.apache.thrift.transport.TTransportException: 
> java.net.SocketTimeoutException: Read timed out
> at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_databases(ThriftHiveMetastore.java:600)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:587)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:826)
> at 
> org.apache.hive.service.cli.operation.GetSchemasOperation.run(GetSchemasOperation.java:62)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.runOperationWithLogCapture(HiveSessionImpl.java:562)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.getSchemas(HiveSessionImpl.java:315)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:493)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:60)
> at com.sun.proxy.$Proxy13.getSchemas(Unknown Source)
> at 
> org.apache.hive.service.cli.CLIService.getSchemas(CLIService.java:273)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetSchemas(ThriftCLIService.java:402)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas.getResult(TCLIService.java:1429)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas.getResult(TCLIService.java:1414)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

[jira] [Updated] (HIVE-8320) Error in MetaException(message:Got exception: org.apache.thrift.transport.TTransportException java.net.SocketTimeoutException: Read timed out)

2014-10-22 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-8320:

Attachment: (was: 
0001-make-to-synchronize-hiveserver2-session-s-metastore-.patch)

> Error in MetaException(message:Got exception: 
> org.apache.thrift.transport.TTransportException 
> java.net.SocketTimeoutException: Read timed out)
> --
>
> Key: HIVE-8320
> URL: https://issues.apache.org/jira/browse/HIVE-8320
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 0.13.1
>Reporter: gavin kim
>Assignee: gavin kim
>Priority: Critical
>  Labels: patch
> Fix For: 0.14.0
>
> Attachments: HIVE-8320.1.patch, HIVE-8320.2.patch
>
>
> I'm using Hive 13.1 in cdh environment.
> Using hue's beeswax, sometimes hiveserver2 occur MetaException.
> And after that, hive meta data request timed out.
> error log's detail is below.
> 2014-09-29 12:05:44,829 ERROR hive.log: Got exception: 
> org.apache.thrift.transport.TTransportException 
> java.net.SocketTimeoutException: Read timed out
> org.apache.thrift.transport.TTransportException: 
> java.net.SocketTimeoutException: Read timed out
> at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_databases(ThriftHiveMetastore.java:600)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:587)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:826)
> at 
> org.apache.hive.service.cli.operation.GetSchemasOperation.run(GetSchemasOperation.java:62)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.runOperationWithLogCapture(HiveSessionImpl.java:562)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.getSchemas(HiveSessionImpl.java:315)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:493)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:60)
> at com.sun.proxy.$Proxy13.getSchemas(Unknown Source)
> at 
> org.apache.hive.service.cli.CLIService.getSchemas(CLIService.java:273)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetSchemas(ThriftCLIService.java:402)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas.getResult(TCLIService.java:1429)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas.getResult(TCLIService.java:1414)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(So

[jira] [Commented] (HIVE-7985) With CBO enabled cross product is generated when a subquery is present

2014-10-22 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180888#comment-14180888
 ] 

Vikram Dixit K commented on HIVE-7985:
--

+1 for 0.14


> With CBO enabled cross product is generated when a subquery is present
> --
>
> Key: HIVE-7985
> URL: https://issues.apache.org/jira/browse/HIVE-7985
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Laljo John Pullokkaran
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-7985.1.patch, HIVE-7985.2.patch, HIVE-7985.patch
>
>
> This is a regression introduced in the latest build of the CBO branch.
> Removing the subquery for item will remove the cross products
> Query
> {code}
> select i_item_id,sum(ss_ext_sales_price) total_sales from store_sales, 
> date_dim, item where item.i_item_id in (select i.i_item_id from item i where 
> i_color in ('purple','burlywood','indian')) and ss_item_sk = i_item_sk and 
> ss_sold_date_sk = d_date_sk and d_year = 2001 and d_moy = 1 group by 
> i_item_id;
> {code}
> {code}
> Warning: Map Join MAPJOIN[38][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[39][bigTable=store_sales] in task 'Map 4' is a 
> cross product
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Map 1 <- Map 3 (BROADCAST_EDGE)
> Map 4 <- Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE)
> Reducer 5 <- Map 4 (SIMPLE_EDGE)
>   DagName: mmokhtar_20140904141313_9c253f7e-aad1-4ca4-9be1-ea45e3d34496:1
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: item
>   filterExpr: (true and i_item_id is not null) (type: boolean)
>   Statistics: Num rows: 462000 Data size: 663862160 Basic 
> stats: COMPLETE Column stats: NONE
>   Filter Operator
> predicate: i_item_id is not null (type: boolean)
> Statistics: Num rows: 231000 Data size: 331931080 Basic 
> stats: COMPLETE Column stats: NONE
> Map Join Operator
>   condition map:
>Inner Join 0 to 1
>   condition expressions:
> 0 {i_item_sk} {i_item_id}
> 1 {d_date_sk}
>   keys:
> 0
> 1
>   outputColumnNames: _col0, _col1, _col25
>   input vertices:
> 1 Map 3
>   Statistics: Num rows: 254100 Data size: 365124192 Basic 
> stats: COMPLETE Column stats: NONE
>   Select Operator
> expressions: _col0 (type: int), _col1 (type: string), 
> _col25 (type: int)
> outputColumnNames: _col0, _col1, _col25
> Statistics: Num rows: 254100 Data size: 365124192 
> Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
>   sort order:
>   Statistics: Num rows: 254100 Data size: 365124192 
> Basic stats: COMPLETE Column stats: NONE
>   value expressions: _col0 (type: int), _col1 (type: 
> string), _col25 (type: int)
> Execution mode: vectorized
> Map 2
> Map Operator Tree:
> TableScan
>   alias: i
>   filterExpr: ((i_color) IN ('purple', 'burlywood', 'indian') 
> and i_item_id is not null) (type: boolean)
>   Statistics: Num rows: 462000 Data size: 663862160 Basic 
> stats: COMPLETE Column stats: NONE
>   Filter Operator
> predicate: ((i_color) IN ('purple', 'burlywood', 
> 'indian') and i_item_id is not null) (type: boolean)
> Statistics: Num rows: 115500 Data size: 165965540 Basic 
> stats: COMPLETE Column stats: NONE
> Select Operator
>   expressions: i_item_id (type: string)
>   outputColumnNames: _col0
>   Statistics: Num rows: 115500 Data size: 165965540 Basic 
> stats: COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 115500 Data size: 165965540 
> Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
>   

[jira] [Commented] (HIVE-8565) beeline may go into an infinite loop when using EOF

2014-10-22 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180887#comment-14180887
 ] 

Chao commented on HIVE-8565:


It seems like this is a bug in jline: in {{ConsoleReader::readLine}} it is 
supposed to return {{null}} if the end of input stream has been reached, but it 
returns an empty string instead. I noticed that in this method there's a line:

{noformat}
return new BufferedReader (new InputStreamReader (in)).readLine ();
{noformat}

which could be doing the right thing, but it's commented out. Not sure why.

> beeline may go into an infinite loop when using EOF
> ---
>
> Key: HIVE-8565
> URL: https://issues.apache.org/jira/browse/HIVE-8565
> Project: Hive
>  Issue Type: Bug
>Reporter: Chao
>Assignee: Chao
>
> The problem can be reproduced by a simple query:
> {noformat}
> $HIVE_HOME/bin/beeline -u  -n  -p  < > show databases;
> > EOF
> {noformat}
> Then, it will go into an infinite loop and keep printing command prompt.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8567) Vectorized queries output extra stuff for Binary columns

2014-10-22 Thread Matt McCline (JIRA)
Matt McCline created HIVE-8567:
--

 Summary: Vectorized queries output extra stuff for Binary columns
 Key: HIVE-8567
 URL: https://issues.apache.org/jira/browse/HIVE-8567
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.14.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 0.14.0


See vector_data_types.q query output.  Non-vectorized output is shorter than 
vectorized binary column output which seems to include characters from earlier 
rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8454) Select Operator does not rename column stats properly in case of select star

2014-10-22 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8454:
-
Attachment: HIVE-8454.2.patch

Should fix test cases

> Select Operator does not rename column stats properly in case of select star
> 
>
> Key: HIVE-8454
> URL: https://issues.apache.org/jira/browse/HIVE-8454
> Project: Hive
>  Issue Type: Sub-task
>  Components: Physical Optimizer
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Prasanth J
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8454.1.patch, HIVE-8454.2.patch
>
>
> The estimated data size of some Select Operators is 0. BytesBytesHashMap uses 
> data size to determine the estimated initial number of entries in the 
> hashmap. If this data size is 0 then exception is thrown (refer below)
> Query 
> {code}
> select count(*) from
>  store_sales
> JOIN store_returns ON store_sales.ss_item_sk = 
> store_returns.sr_item_sk and store_sales.ss_ticket_number = 
> store_returns.sr_ticket_number
> JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
> JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
> JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
> JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
> JOIN store ON store_sales.ss_store_sk = store.s_store_sk
>   JOIN item ON store_sales.ss_item_sk = item.i_item_sk
>   JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
> cd1.cd_demo_sk
> JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
> cd2.cd_demo_sk
> JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
> JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = 
> hd1.hd_demo_sk
> JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = 
> hd2.hd_demo_sk
> JOIN customer_address ad1 ON store_sales.ss_addr_sk = 
> ad1.ca_address_sk
> JOIN customer_address ad2 ON customer.c_current_addr_sk = 
> ad2.ca_address_sk
> JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
> JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
> JOIN
>  (select cs_item_sk
> ,sum(cs_ext_list_price) as 
> sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
>   from catalog_sales JOIN catalog_returns
>   ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
> and catalog_sales.cs_order_number = catalog_returns.cr_order_number
>   group by cs_item_sk
>   having 
> sum(cs_ext_list_price)>2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit))
>  cs_ui
> ON store_sales.ss_item_sk = cs_ui.cs_item_sk
>   WHERE  
>  cd1.cd_marital_status <> cd2.cd_marital_status and
>  i_color in ('maroon','burnished','dim','steel','navajo','chocolate') 
> and
>  i_current_price between 35 and 35 + 10 and
>  i_current_price between 35 + 1 and 35 + 15
>and d1.d_year = 2001;
> {code}
> {code}
> ], TaskAttempt 3 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: java.lang.RuntimeException: 
> java.lang.AssertionError: Capacity must be a power of two
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:187)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:142)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: java.lang.AssertionError: Capacity 
> must be a power of two
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:93)
>   at 
> org.apache.hadoop.

[jira] [Commented] (HIVE-8320) Error in MetaException(message:Got exception: org.apache.thrift.transport.TTransportException java.net.SocketTimeoutException: Read timed out)

2014-10-22 Thread gavin kim (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180843#comment-14180843
 ] 

gavin kim commented on HIVE-8320:
-

[~thejas] Thank you for all this guidance to me. :)

Is the next step waiting for whether this patch will apply to master or not??

> Error in MetaException(message:Got exception: 
> org.apache.thrift.transport.TTransportException 
> java.net.SocketTimeoutException: Read timed out)
> --
>
> Key: HIVE-8320
> URL: https://issues.apache.org/jira/browse/HIVE-8320
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 0.13.1
>Reporter: gavin kim
>Assignee: gavin kim
>Priority: Critical
>  Labels: patch
> Fix For: 0.14.0
>
> Attachments: 
> 0001-make-to-synchronize-hiveserver2-session-s-metastore-.patch, 
> HIVE-8320.1.patch, HIVE-8320.2.patch
>
>
> I'm using Hive 13.1 in cdh environment.
> Using hue's beeswax, sometimes hiveserver2 occur MetaException.
> And after that, hive meta data request timed out.
> error log's detail is below.
> 2014-09-29 12:05:44,829 ERROR hive.log: Got exception: 
> org.apache.thrift.transport.TTransportException 
> java.net.SocketTimeoutException: Read timed out
> org.apache.thrift.transport.TTransportException: 
> java.net.SocketTimeoutException: Read timed out
> at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_databases(ThriftHiveMetastore.java:600)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:587)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:826)
> at 
> org.apache.hive.service.cli.operation.GetSchemasOperation.run(GetSchemasOperation.java:62)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.runOperationWithLogCapture(HiveSessionImpl.java:562)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.getSchemas(HiveSessionImpl.java:315)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:493)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:60)
> at com.sun.proxy.$Proxy13.getSchemas(Unknown Source)
> at 
> org.apache.hive.service.cli.CLIService.getSchemas(CLIService.java:273)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetSchemas(ThriftCLIService.java:402)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas.getResult(TCLIService.java:1429)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas.getResult(TCLIService.java:1414)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: ja

[jira] [Commented] (HIVE-8021) CBO: support CTAS and insert ... select

2014-10-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180835#comment-14180835
 ] 

Hive QA commented on HIVE-8021:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12676419/HIVE-8021.05.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6577 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join_pkfk
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hive.minikdc.TestJdbcWithMiniKdc.testNegativeTokenAuth
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1400/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1400/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1400/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12676419 - PreCommit-HIVE-TRUNK-Build

> CBO: support CTAS and insert ... select
> ---
>
> Key: HIVE-8021
> URL: https://issues.apache.org/jira/browse/HIVE-8021
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-8021.01.patch, HIVE-8021.01.patch, 
> HIVE-8021.02.patch, HIVE-8021.03.patch, HIVE-8021.04.patch, 
> HIVE-8021.05.patch, HIVE-8021.patch, HIVE-8021.preliminary.patch
>
>
> Need to send only the select part to CBO for now



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7985) With CBO enabled cross product is generated when a subquery is present

2014-10-22 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180829#comment-14180829
 ] 

Laljo John Pullokkaran commented on HIVE-7985:
--

These failures are not related to CBO.

> With CBO enabled cross product is generated when a subquery is present
> --
>
> Key: HIVE-7985
> URL: https://issues.apache.org/jira/browse/HIVE-7985
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Laljo John Pullokkaran
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-7985.1.patch, HIVE-7985.2.patch, HIVE-7985.patch
>
>
> This is a regression introduced in the latest build of the CBO branch.
> Removing the subquery for item will remove the cross products
> Query
> {code}
> select i_item_id,sum(ss_ext_sales_price) total_sales from store_sales, 
> date_dim, item where item.i_item_id in (select i.i_item_id from item i where 
> i_color in ('purple','burlywood','indian')) and ss_item_sk = i_item_sk and 
> ss_sold_date_sk = d_date_sk and d_year = 2001 and d_moy = 1 group by 
> i_item_id;
> {code}
> {code}
> Warning: Map Join MAPJOIN[38][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[39][bigTable=store_sales] in task 'Map 4' is a 
> cross product
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Map 1 <- Map 3 (BROADCAST_EDGE)
> Map 4 <- Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE)
> Reducer 5 <- Map 4 (SIMPLE_EDGE)
>   DagName: mmokhtar_20140904141313_9c253f7e-aad1-4ca4-9be1-ea45e3d34496:1
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: item
>   filterExpr: (true and i_item_id is not null) (type: boolean)
>   Statistics: Num rows: 462000 Data size: 663862160 Basic 
> stats: COMPLETE Column stats: NONE
>   Filter Operator
> predicate: i_item_id is not null (type: boolean)
> Statistics: Num rows: 231000 Data size: 331931080 Basic 
> stats: COMPLETE Column stats: NONE
> Map Join Operator
>   condition map:
>Inner Join 0 to 1
>   condition expressions:
> 0 {i_item_sk} {i_item_id}
> 1 {d_date_sk}
>   keys:
> 0
> 1
>   outputColumnNames: _col0, _col1, _col25
>   input vertices:
> 1 Map 3
>   Statistics: Num rows: 254100 Data size: 365124192 Basic 
> stats: COMPLETE Column stats: NONE
>   Select Operator
> expressions: _col0 (type: int), _col1 (type: string), 
> _col25 (type: int)
> outputColumnNames: _col0, _col1, _col25
> Statistics: Num rows: 254100 Data size: 365124192 
> Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
>   sort order:
>   Statistics: Num rows: 254100 Data size: 365124192 
> Basic stats: COMPLETE Column stats: NONE
>   value expressions: _col0 (type: int), _col1 (type: 
> string), _col25 (type: int)
> Execution mode: vectorized
> Map 2
> Map Operator Tree:
> TableScan
>   alias: i
>   filterExpr: ((i_color) IN ('purple', 'burlywood', 'indian') 
> and i_item_id is not null) (type: boolean)
>   Statistics: Num rows: 462000 Data size: 663862160 Basic 
> stats: COMPLETE Column stats: NONE
>   Filter Operator
> predicate: ((i_color) IN ('purple', 'burlywood', 
> 'indian') and i_item_id is not null) (type: boolean)
> Statistics: Num rows: 115500 Data size: 165965540 Basic 
> stats: COMPLETE Column stats: NONE
> Select Operator
>   expressions: i_item_id (type: string)
>   outputColumnNames: _col0
>   Statistics: Num rows: 115500 Data size: 165965540 Basic 
> stats: COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 115500 Data size: 165965540 
> Basic stats: COMPLETE Column stats: NONE
>  

[jira] [Updated] (HIVE-8545) Exception when casting Text to BytesWritable [Spark Branch]

2014-10-22 Thread Chao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-8545:
---
Attachment: HIVE-8545.4-spark.patch

Addressing RB comments.

> Exception when casting Text to BytesWritable [Spark Branch]
> ---
>
> Key: HIVE-8545
> URL: https://issues.apache.org/jira/browse/HIVE-8545
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao
>Assignee: Chao
> Attachments: HIVE-8545.1-spark.patch, HIVE-8545.2-spark.patch, 
> HIVE-8545.3-spark.patch, HIVE-8545.4-spark.patch
>
>
> With the current multi-insertion implementation, when caching is enabled for 
> input RDD, query may fail with the following exception:
> {noformat}
> 2014-10-21 13:57:34,742 WARN  [task-result-getter-0]: 
> scheduler.TaskSetManager (Logging.scala:logWarning(71)) - Lost task 0.0 in 
> stage 1.0 (TID 1, localhost): java.lang.ClassCastException: 
> org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.BytesWritable
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:67)
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:61)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> 
> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:234)
> 
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
> org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
> 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> org.apache.spark.scheduler.Task.run(Task.scala:56)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
> {noformat}
> The fix should be easy. However, interestingly, this error doesn't show up 
> when the caching is turned off. We need to find out why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 27046: HIVE-8545 - Exception when casting Text to BytesWritable [Spark Branch]

2014-10-22 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27046/
---

(Updated Oct. 23, 2014, 12:42 a.m.)


Review request for hive, Brock Noland and Xuefu Zhang.


Changes
---

Thanks Xuefu for the comments. I've changed my patch accordingly.


Bugs: hive-8545
https://issues.apache.org/jira/browse/hive-8545


Repository: hive-git


Description
---

With the current multi-insertion implementation, when caching is enabled for 
input RDD, query may fail with the following exception:
2014-10-21 13:57:34,742 WARN  [task-result-getter-0]: scheduler.TaskSetManager 
(Logging.scala:logWarning(71)) - Lost task 0.0 in stage 1.0 (TID 1, localhost): 
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
org.apache.hadoop.io.BytesWritable

org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:67)

org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:61)

org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)

org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:234)
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)

org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)

org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
org.apache.spark.scheduler.Task.run(Task.scala:56)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
The fix should be easy. However, interestingly, this error doesn't show up when 
the caching is turned off. We need to find out why.


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
 dc5d148 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveCopyFunction.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 9849b49 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
25a4515 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTran.java 8a3dbf2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 0f21b46 

Diff: https://reviews.apache.org/r/27046/diff/


Testing
---


Thanks,

Chao Sun



Re: Review Request 27046: HIVE-8545 - Exception when casting Text to BytesWritable [Spark Branch]

2014-10-22 Thread Chao Sun


> On Oct. 22, 2014, 11:36 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java,
> >  line 77
> > 
> >
> > I think we should let this stay in SparkUtils which otherwise now 
> > become an empty class.

OK. To make it consistent, I also moved copyHiveKey to SparkUtilities.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27046/#review57929
---


On Oct. 22, 2014, 5:50 p.m., Chao Sun wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27046/
> ---
> 
> (Updated Oct. 22, 2014, 5:50 p.m.)
> 
> 
> Review request for hive, Brock Noland and Xuefu Zhang.
> 
> 
> Bugs: hive-8545
> https://issues.apache.org/jira/browse/hive-8545
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> With the current multi-insertion implementation, when caching is enabled for 
> input RDD, query may fail with the following exception:
> 2014-10-21 13:57:34,742 WARN  [task-result-getter-0]: 
> scheduler.TaskSetManager (Logging.scala:logWarning(71)) - Lost task 0.0 in 
> stage 1.0 (TID 1, localhost): java.lang.ClassCastException: 
> org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.BytesWritable
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:67)
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:61)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> 
> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:234)
> 
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
> org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
> 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> org.apache.spark.scheduler.Task.run(Task.scala:56)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
> The fix should be easy. However, interestingly, this error doesn't show up 
> when the caching is turned off. We need to find out why.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
>  dc5d148 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveCopyFunction.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 9849b49 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
> 25a4515 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTran.java 8a3dbf2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
> 0f21b46 
> 
> Diff: https://reviews.apache.org/r/27046/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Chao Sun
> 
>



Re: Review Request 27046: HIVE-8545 - Exception when casting Text to BytesWritable [Spark Branch]

2014-10-22 Thread Chao Sun


> On Oct. 22, 2014, 11:40 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTran.java, line 25
> > 
> >
> > Why KO becomes Writable now? Should it be WritableComparable according 
> > to MapInput?

My mistake, it should be WritableComparable. Thanks for pointing out.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27046/#review57932
---


On Oct. 22, 2014, 5:50 p.m., Chao Sun wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27046/
> ---
> 
> (Updated Oct. 22, 2014, 5:50 p.m.)
> 
> 
> Review request for hive, Brock Noland and Xuefu Zhang.
> 
> 
> Bugs: hive-8545
> https://issues.apache.org/jira/browse/hive-8545
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> With the current multi-insertion implementation, when caching is enabled for 
> input RDD, query may fail with the following exception:
> 2014-10-21 13:57:34,742 WARN  [task-result-getter-0]: 
> scheduler.TaskSetManager (Logging.scala:logWarning(71)) - Lost task 0.0 in 
> stage 1.0 (TID 1, localhost): java.lang.ClassCastException: 
> org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.BytesWritable
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:67)
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:61)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> 
> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:234)
> 
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
> org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
> 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> org.apache.spark.scheduler.Task.run(Task.scala:56)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
> The fix should be easy. However, interestingly, this error doesn't show up 
> when the caching is turned off. We need to find out why.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
>  dc5d148 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveCopyFunction.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 9849b49 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
> 25a4515 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTran.java 8a3dbf2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
> 0f21b46 
> 
> Diff: https://reviews.apache.org/r/27046/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Chao Sun
> 
>



Re: Review Request 26805: HIVE-8320: edit hiveserver2 session's metastore client to use ThreadLocal client

2014-10-22 Thread Gavin Kim

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26805/#review57944
---

Ship it!


Ship It!

- Gavin Kim


On Oct. 16, 2014, 7:15 a.m., Gavin Kim wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26805/
> ---
> 
> (Updated Oct. 16, 2014, 7:15 a.m.)
> 
> 
> Review request for hive and Thejas Nair.
> 
> 
> Bugs: HIVE-8320
> https://issues.apache.org/jira/browse/HIVE-8320
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-8320: edit hiveserver2 session's metastore client to use ThreadLocal 
> client
> 
> 
> Diffs
> -
> 
>   service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
> a9d5902 
> 
> Diff: https://reviews.apache.org/r/26805/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Gavin Kim
> 
>



Re: Review Request 26854: HIVE-2573 Create per-session function registry

2014-10-22 Thread Jason Dere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26854/
---

(Updated Oct. 23, 2014, 12:20 a.m.)


Review request for hive, Navis Ryu and Thejas Nair.


Changes
---

Updating with HIVE-2573.9.patch.txt from Navis


Bugs: HIVE-2573
https://issues.apache.org/jira/browse/HIVE-2573


Repository: hive-git


Description
---

Small updates to Navis' changes:
- session registry doesn't lookup metastore for UDFs
- my feedback from Navis' original patch
- metastore udfs should not be considered native. This allows them to be 
added/removed from registry


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 9ac540e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonFunctionInfo.java 93c15c0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionInfo.java 074255b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 08e1136 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionTask.java 569c125 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/WindowFunctionInfo.java efecb05 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java b900627 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/SqlFunctionConverter.java
 31f906a 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 
e43d39f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/FunctionSemanticAnalyzer.java 
22e5b47 
  ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java af633cb 
  ql/src/test/org/apache/hadoop/hive/ql/parse/TestMacroSemanticAnalyzer.java 
46f8052 
  ql/src/test/queries/clientnegative/drop_native_udf.q ae047bb 
  ql/src/test/results/clientnegative/create_function_nonexistent_class.q.out 
c7405ed 
  ql/src/test/results/clientnegative/create_function_nonudf_class.q.out d0dd50a 
  ql/src/test/results/clientnegative/drop_native_udf.q.out 9f0eaa5 
  service/src/test/org/apache/hadoop/hive/service/TestHiveServerSessions.java 
fd38907 

Diff: https://reviews.apache.org/r/26854/diff/


Testing
---


Thanks,

Jason Dere



[jira] [Commented] (HIVE-8256) Add SORT_QUERY_RESULTS for test that doesn't guarantee order #2

2014-10-22 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180773#comment-14180773
 ] 

Xuefu Zhang commented on HIVE-8256:
---

+1 on latest patch.

> Add SORT_QUERY_RESULTS for test that doesn't guarantee order #2
> ---
>
> Key: HIVE-8256
> URL: https://issues.apache.org/jira/browse/HIVE-8256
> Project: Hive
>  Issue Type: Test
>Reporter: Chao
>Assignee: Chao
>Priority: Minor
> Attachments: HIVE-8256.1-spark.patch, HIVE-8256.2.patch, 
> HIVE-8256.patch
>
>
> Following HIVE-8533, we need to further add {{SORT_QUERY_RESULTS}} to a few 
> more tests that doesn't guarantee output order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8550) Hive cannot load data into partitioned table with Unicode key

2014-10-22 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-8550:
-
Assignee: Xiaobing Zhou

> Hive cannot load data into partitioned table with Unicode key
> -
>
> Key: HIVE-8550
> URL: https://issues.apache.org/jira/browse/HIVE-8550
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
> Environment: Windows
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
>Priority: Critical
> Attachments: CreatePartitionedTable.hql, 
> LoadIntoPartitionedTable.hql, partitioned.txt
>
>
> Steps to reproduce:
> 1) Copy the file partitioned.txt to the root folder of your HDFS root dir. 
> Copy the two hql files to your local directory.
> 2) Open Hive CLI.
> 3) Run:
> hive> source ;
> 4) Run 
> hive> source ;
> The following error will be shown:
> hive> source C:\Scripts\partition\LoadIntoPartitionedTable.hql;
> Loading data to table default.mypartitioned partition (tag=䶵)
> Failed with exception null
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7576) Add PartitionSpec support in HCatClient API

2014-10-22 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7576:
-
Status: Patch Available  (was: Open)

> Add PartitionSpec support in HCatClient API
> ---
>
> Key: HIVE-7576
> URL: https://issues.apache.org/jira/browse/HIVE-7576
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Metastore
>Affects Versions: 0.13.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-7576.1.patch
>
>
> HIVE-7223 adds support for PartitionSpecs in Hive Metastore. The HCatClient 
> API must add support to fetch partitions, add partitions, etc. using 
> PartitionSpec semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8561) Expose Hive optiq operator tree to be able to support other sql on hadoop query engines

2014-10-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180756#comment-14180756
 ] 

Hive QA commented on HIVE-8561:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12676411/HIVE-8561.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6575 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join_pkfk
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
org.apache.hive.minikdc.TestJdbcWithMiniKdc.testNegativeTokenAuth
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1399/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1399/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1399/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12676411 - PreCommit-HIVE-TRUNK-Build

> Expose Hive optiq operator tree to be able to support other sql on hadoop 
> query engines
> ---
>
> Key: HIVE-8561
> URL: https://issues.apache.org/jira/browse/HIVE-8561
> Project: Hive
>  Issue Type: Task
>  Components: CBO
>Affects Versions: 0.14.0
>Reporter: Na Yang
>Assignee: Na Yang
> Attachments: HIVE-8561.patch
>
>
> Hive-0.14 added cost based optimization and optiq operator tree is created 
> for select queries. However, the optiq operator tree is not visible from 
> outside and hard to be used by other Sql on Hadoop query engine such as 
> apache Drill. To be able to allow drill to access the hive optiq operator 
> tree, we need to add a public api to return the hive optiq operator tree.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 27046: HIVE-8545 - Exception when casting Text to BytesWritable [Spark Branch]

2014-10-22 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27046/#review57932
---



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTran.java


Why KO becomes Writable now? Should it be WritableComparable according to 
MapInput?


- Xuefu Zhang


On Oct. 22, 2014, 5:50 p.m., Chao Sun wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27046/
> ---
> 
> (Updated Oct. 22, 2014, 5:50 p.m.)
> 
> 
> Review request for hive, Brock Noland and Xuefu Zhang.
> 
> 
> Bugs: hive-8545
> https://issues.apache.org/jira/browse/hive-8545
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> With the current multi-insertion implementation, when caching is enabled for 
> input RDD, query may fail with the following exception:
> 2014-10-21 13:57:34,742 WARN  [task-result-getter-0]: 
> scheduler.TaskSetManager (Logging.scala:logWarning(71)) - Lost task 0.0 in 
> stage 1.0 (TID 1, localhost): java.lang.ClassCastException: 
> org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.BytesWritable
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:67)
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:61)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> 
> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:234)
> 
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
> org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
> 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> org.apache.spark.scheduler.Task.run(Task.scala:56)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
> The fix should be easy. However, interestingly, this error doesn't show up 
> when the caching is turned off. We need to find out why.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
>  dc5d148 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveCopyFunction.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 9849b49 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
> 25a4515 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTran.java 8a3dbf2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
> 0f21b46 
> 
> Diff: https://reviews.apache.org/r/27046/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Chao Sun
> 
>



Re: Review Request 27046: HIVE-8545 - Exception when casting Text to BytesWritable [Spark Branch]

2014-10-22 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27046/#review57929
---



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java


I think we should let this stay in SparkUtils which otherwise now become an 
empty class.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java


Nit: It seems that we need to create copy function function and set it to 
MapInput if only if caching is true. Can we avoid doing this work when caching 
is false?


- Xuefu Zhang


On Oct. 22, 2014, 5:50 p.m., Chao Sun wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27046/
> ---
> 
> (Updated Oct. 22, 2014, 5:50 p.m.)
> 
> 
> Review request for hive, Brock Noland and Xuefu Zhang.
> 
> 
> Bugs: hive-8545
> https://issues.apache.org/jira/browse/hive-8545
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> With the current multi-insertion implementation, when caching is enabled for 
> input RDD, query may fail with the following exception:
> 2014-10-21 13:57:34,742 WARN  [task-result-getter-0]: 
> scheduler.TaskSetManager (Logging.scala:logWarning(71)) - Lost task 0.0 in 
> stage 1.0 (TID 1, localhost): java.lang.ClassCastException: 
> org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.BytesWritable
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:67)
> 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:61)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> 
> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:234)
> 
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
> org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
> 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> org.apache.spark.scheduler.Task.run(Task.scala:56)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
> The fix should be easy. However, interestingly, this error doesn't show up 
> when the caching is turned off. We need to find out why.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
>  dc5d148 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveCopyFunction.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 9849b49 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
> 25a4515 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTran.java 8a3dbf2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
> 0f21b46 
> 
> Diff: https://reviews.apache.org/r/27046/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Chao Sun
> 
>



Re: Review Request 26721: HIVE-8433 CBO loses a column during AST conversion

2014-10-22 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26721/
---

(Updated Oct. 22, 2014, 11:18 p.m.)


Review request for hive, Ashutosh Chauhan and John Pullokkaran.


Repository: hive-git


Description
---

see jira


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/ASTConverter.java
 0428263 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/PlanModifierForASTConv.java
 4f96d02 
  ql/src/java/org/apache/hadoop/hive/ql/parse/RowResolver.java 10ac4b2 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java d8c50e3 
  ql/src/test/queries/clientpositive/cbo_correctness.q 4d8f156 
  ql/src/test/queries/clientpositive/select_same_col.q PRE-CREATION 
  ql/src/test/results/clientpositive/cbo_correctness.q.out 7c25e1f 
  ql/src/test/results/clientpositive/select_same_col.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/tez/cbo_correctness.q.out e467773 

Diff: https://reviews.apache.org/r/26721/diff/


Testing
---


Thanks,

Sergey Shelukhin



[jira] [Updated] (HIVE-8556) introduce overflow control and sanity check to BytesBytesMapJoin

2014-10-22 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-8556:
---
Status: Patch Available  (was: Open)

[~gopalv] [~mmokhtar] do you guys want to review?

> introduce overflow control and sanity check to BytesBytesMapJoin
> 
>
> Key: HIVE-8556
> URL: https://issues.apache.org/jira/browse/HIVE-8556
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HIVE-8556.patch
>
>
> When stats are incorrect, negative or very large number can be passed to the 
> map



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8556) introduce overflow control and sanity check to BytesBytesMapJoin

2014-10-22 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-8556:
---
Attachment: HIVE-8556.patch

first attempt

> introduce overflow control and sanity check to BytesBytesMapJoin
> 
>
> Key: HIVE-8556
> URL: https://issues.apache.org/jira/browse/HIVE-8556
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HIVE-8556.patch
>
>
> When stats are incorrect, negative or very large number can be passed to the 
> map



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8433) CBO loses a column during AST conversion

2014-10-22 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-8433:
---
Attachment: HIVE-8433.04.patch

update ordering in cbo test. I am not adding SORT_BEFORE_DIFF or some such 
since cbo test is so large, some queries might be testing ordering. Now outputs 
for tez and non-tez are the same

> CBO loses a column during AST conversion
> 
>
> Key: HIVE-8433
> URL: https://issues.apache.org/jira/browse/HIVE-8433
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Critical
> Attachments: HIVE-8433.01.patch, HIVE-8433.02.patch, 
> HIVE-8433.03.patch, HIVE-8433.04.patch, HIVE-8433.patch
>
>
> {noformat}
> SELECT
>   CAST(value AS BINARY),
>   value
> FROM src
> ORDER BY value
> LIMIT 100
> {noformat}
> returns only one column.
> Final CBO plan is
> {noformat}
>   HiveSortRel(sort0=[$1], dir0=[ASC]): rowcount = 500.0, cumulative cost = 
> {24858.432393688767 rows, 500.0 cpu, 0.0 io}, id = 44
> HiveProjectRel(value=[CAST($0):BINARY(2147483647) NOT NULL], 
> value1=[$0]): rowcount = 500.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 
> io}, id = 42
>   HiveProjectRel(value=[$1]): rowcount = 500.0, cumulative cost = {0.0 
> rows, 0.0 cpu, 0.0 io}, id = 40
> HiveTableScanRel(table=[[default.src]]): rowcount = 500.0, cumulative 
> cost = {0}, id = 0
> {noformat}
> but the resulting AST has only one column. Must be some bug in conversion, 
> probably related to the name collision in the schema, judging by the alias of 
> the column for the binary-cast value in the AST
> {noformat} 
> TOK_QUERY
>TOK_FROM
>   TOK_SUBQUERY
>  TOK_QUERY
> TOK_FROM
>TOK_TABREF
>   TOK_TABNAME
>  default
>  src
>   src
> TOK_INSERT
>TOK_DESTINATION
>   TOK_DIR
>  TOK_TMP_FILE
>TOK_SELECT
>   TOK_SELEXPR
>  .
> TOK_TABLE_OR_COL
>src
> value
>  value
>  $hdt$_0
>TOK_INSERT
>   TOK_DESTINATION
>  TOK_DIR
> TOK_TMP_FILE
>   TOK_SELECT
>  TOK_SELEXPR
> TOK_FUNCTION
>TOK_BINARY
>.
>   TOK_TABLE_OR_COL
>  $hdt$_0
>   value
> value
>   TOK_ORDERBY
>  TOK_TABSORTCOLNAMEASC
> TOK_TABLE_OR_COL
>value
>   TOK_LIMIT
>  100
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8566) Vectorized queries output wrong timestamps

2014-10-22 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-8566:
---
Status: Patch Available  (was: Open)

> Vectorized queries output wrong timestamps
> --
>
> Key: HIVE-8566
> URL: https://issues.apache.org/jira/browse/HIVE-8566
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 0.14.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8566.01.patch
>
>
> Huge differences between non-vectorized and vectorized query outputs for 
> vector_data_types.q
> Differences look similar to wrong results when seconds instead of 
> milliseconds used for DATE type. But this is for TIMESTAMP type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8566) Vectorized queries output wrong timestamps

2014-10-22 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-8566:
---
Attachment: HIVE-8566.01.patch

> Vectorized queries output wrong timestamps
> --
>
> Key: HIVE-8566
> URL: https://issues.apache.org/jira/browse/HIVE-8566
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 0.14.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8566.01.patch
>
>
> Huge differences between non-vectorized and vectorized query outputs for 
> vector_data_types.q
> Differences look similar to wrong results when seconds instead of 
> milliseconds used for DATE type. But this is for TIMESTAMP type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8556) introduce overflow control and sanity check to BytesBytesMapJoin

2014-10-22 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-8556:
---
Attachment: (was: HIVE-8556.patch)

> introduce overflow control and sanity check to BytesBytesMapJoin
> 
>
> Key: HIVE-8556
> URL: https://issues.apache.org/jira/browse/HIVE-8556
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HIVE-8556.patch
>
>
> When stats are incorrect, negative or very large number can be passed to the 
> map



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8566) Vectorized queries output wrong timestamps

2014-10-22 Thread Matt McCline (JIRA)
Matt McCline created HIVE-8566:
--

 Summary: Vectorized queries output wrong timestamps
 Key: HIVE-8566
 URL: https://issues.apache.org/jira/browse/HIVE-8566
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.14.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 0.14.0


Huge differences between non-vectorized and vectorized query outputs for 
vector_data_types.q
Differences look similar to wrong results when seconds instead of milliseconds 
used for DATE type. But this is for TIMESTAMP type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-8235) Insert into partitioned bucketed sorted tables fails with "this file is already being created by"

2014-10-22 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved HIVE-8235.
--
Resolution: Cannot Reproduce

Closing as cannot reproduce, as I cannot reproduce this.  Please re-open if you 
see it again.

> Insert into partitioned bucketed sorted tables fails with "this file is 
> already being created by"
> -
>
> Key: HIVE-8235
> URL: https://issues.apache.org/jira/browse/HIVE-8235
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.14.0
> Environment: cn105
>Reporter: Mostafa Mokhtar
>Assignee: Alan Gates
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: insert_into_partitioned_bucketed_table.txt.tar.gz.zip
>
>
> When loading into a partitioned bucketed sorted table the query fails with 
> {code}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
>  Failed to create file 
> [/tmp/hive/mmokhtar/621d7923-90d1-4d9d-a4c6-b3bb075c7a8c/hive_2014-09-22_23-25-11_678_1598300430132235708-1/_task_tmp.-ext-1/ss_sold_date=1998-01-02/_tmp.00_3/delta_0123305_0123305/bucket_0]
>  for [DFSClient_attempt_1406566393272_6085_r_000144_3_-1677753045_12] for 
> client [172.21.128.111], because this file is already being created by 
> [DFSClient_attempt_1406566393272_6085_r_31_3_-1506661042_12] on 
> [172.21.128.122]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2543)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2308)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2237)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2190)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:520)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:354)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy15.create(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy15.create(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:258)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1600)
>   at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1465)
>   at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1390)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:394)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:390)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:390)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:334)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
>   at org.apache.hadoop.fs.FileSystem.

[jira] [Updated] (HIVE-8556) introduce overflow control and sanity check to BytesBytesMapJoin

2014-10-22 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-8556:
---
Attachment: HIVE-8556.patch

First attempt

> introduce overflow control and sanity check to BytesBytesMapJoin
> 
>
> Key: HIVE-8556
> URL: https://issues.apache.org/jira/browse/HIVE-8556
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HIVE-8556.patch
>
>
> When stats are incorrect, negative or very large number can be passed to the 
> map



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8256) Add SORT_QUERY_RESULTS for test that doesn't guarantee order #2

2014-10-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180710#comment-14180710
 ] 

Hive QA commented on HIVE-8256:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12676394/HIVE-8256.2.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6576 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join_pkfk
org.apache.hive.minikdc.TestJdbcWithMiniKdc.testNegativeTokenAuth
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1398/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1398/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1398/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12676394 - PreCommit-HIVE-TRUNK-Build

> Add SORT_QUERY_RESULTS for test that doesn't guarantee order #2
> ---
>
> Key: HIVE-8256
> URL: https://issues.apache.org/jira/browse/HIVE-8256
> Project: Hive
>  Issue Type: Test
>Reporter: Chao
>Assignee: Chao
>Priority: Minor
> Attachments: HIVE-8256.1-spark.patch, HIVE-8256.2.patch, 
> HIVE-8256.patch
>
>
> Following HIVE-8533, we need to further add {{SORT_QUERY_RESULTS}} to a few 
> more tests that doesn't guarantee output order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8564) DROP TABLE IF EXISTS throws exception if the table does not exist.

2014-10-22 Thread Ben (JIRA)
Ben created HIVE-8564:
-

 Summary: DROP TABLE IF EXISTS throws exception if the table does 
not exist.  
 Key: HIVE-8564
 URL: https://issues.apache.org/jira/browse/HIVE-8564
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.1
Reporter: Ben
Priority: Minor


DROP TABLE IF EXISTS throws exception if the table does not exist.  

I tried set hive.exec.drop.ignorenonexistent=true, and it made no difference.


hive> DROP TABLE IF EXISTS testdb.mytable;
14/10/22 15:48:29 ERROR metadata.Hive: 
NoSuchObjectException(message:testdb.mytable table not found)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:29338)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:29306)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result.read(ThriftHiveMetastore.java:29237)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table(ThriftHiveMetastore.java:1036)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table(ThriftHiveMetastore.java:1022)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:997)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
at com.sun.proxy.$Proxy7.getTable(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:975)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:930)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:917)
at 
org.apache.hadoop.hive.ql.exec.DDLTask.dropTableOrPartitions(DDLTask.java:3846)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:306)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

OK




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8565) beeline may go into an infinite loop when using EOF

2014-10-22 Thread Chao (JIRA)
Chao created HIVE-8565:
--

 Summary: beeline may go into an infinite loop when using EOF
 Key: HIVE-8565
 URL: https://issues.apache.org/jira/browse/HIVE-8565
 Project: Hive
  Issue Type: Bug
Reporter: Chao


The problem can be reproduced by a simple query:
{noformat}
$HIVE_HOME/bin/beeline -u  -n  -p  < show databases;
> EOF
{noformat}

Then, it will go into an infinite loop and keep printing command prompt.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-8565) beeline may go into an infinite loop when using EOF

2014-10-22 Thread Chao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao reassigned HIVE-8565:
--

Assignee: Chao

> beeline may go into an infinite loop when using EOF
> ---
>
> Key: HIVE-8565
> URL: https://issues.apache.org/jira/browse/HIVE-8565
> Project: Hive
>  Issue Type: Bug
>Reporter: Chao
>Assignee: Chao
>
> The problem can be reproduced by a simple query:
> {noformat}
> $HIVE_HOME/bin/beeline -u  -n  -p  < > show databases;
> > EOF
> {noformat}
> Then, it will go into an infinite loop and keep printing command prompt.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8550) Hive cannot load data into partitioned table with Unicode key

2014-10-22 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HIVE-8550:

Affects Version/s: 0.14.0

> Hive cannot load data into partitioned table with Unicode key
> -
>
> Key: HIVE-8550
> URL: https://issues.apache.org/jira/browse/HIVE-8550
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
> Environment: Windows
>Reporter: Xiaobing Zhou
>Priority: Critical
> Attachments: CreatePartitionedTable.hql, 
> LoadIntoPartitionedTable.hql, partitioned.txt
>
>
> Steps to reproduce:
> 1) Copy the file partitioned.txt to the root folder of your HDFS root dir. 
> Copy the two hql files to your local directory.
> 2) Open Hive CLI.
> 3) Run:
> hive> source ;
> 4) Run 
> hive> source ;
> The following error will be shown:
> hive> source C:\Scripts\partition\LoadIntoPartitionedTable.hql;
> Loading data to table default.mypartitioned partition (tag=䶵)
> Failed with exception null
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8433) CBO loses a column during AST conversion

2014-10-22 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180673#comment-14180673
 ] 

Sergey Shelukhin commented on HIVE-8433:


results are actually different between tez and non-tez before this patch

> CBO loses a column during AST conversion
> 
>
> Key: HIVE-8433
> URL: https://issues.apache.org/jira/browse/HIVE-8433
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Critical
> Attachments: HIVE-8433.01.patch, HIVE-8433.02.patch, 
> HIVE-8433.03.patch, HIVE-8433.patch
>
>
> {noformat}
> SELECT
>   CAST(value AS BINARY),
>   value
> FROM src
> ORDER BY value
> LIMIT 100
> {noformat}
> returns only one column.
> Final CBO plan is
> {noformat}
>   HiveSortRel(sort0=[$1], dir0=[ASC]): rowcount = 500.0, cumulative cost = 
> {24858.432393688767 rows, 500.0 cpu, 0.0 io}, id = 44
> HiveProjectRel(value=[CAST($0):BINARY(2147483647) NOT NULL], 
> value1=[$0]): rowcount = 500.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 
> io}, id = 42
>   HiveProjectRel(value=[$1]): rowcount = 500.0, cumulative cost = {0.0 
> rows, 0.0 cpu, 0.0 io}, id = 40
> HiveTableScanRel(table=[[default.src]]): rowcount = 500.0, cumulative 
> cost = {0}, id = 0
> {noformat}
> but the resulting AST has only one column. Must be some bug in conversion, 
> probably related to the name collision in the schema, judging by the alias of 
> the column for the binary-cast value in the AST
> {noformat} 
> TOK_QUERY
>TOK_FROM
>   TOK_SUBQUERY
>  TOK_QUERY
> TOK_FROM
>TOK_TABREF
>   TOK_TABNAME
>  default
>  src
>   src
> TOK_INSERT
>TOK_DESTINATION
>   TOK_DIR
>  TOK_TMP_FILE
>TOK_SELECT
>   TOK_SELEXPR
>  .
> TOK_TABLE_OR_COL
>src
> value
>  value
>  $hdt$_0
>TOK_INSERT
>   TOK_DESTINATION
>  TOK_DIR
> TOK_TMP_FILE
>   TOK_SELECT
>  TOK_SELEXPR
> TOK_FUNCTION
>TOK_BINARY
>.
>   TOK_TABLE_OR_COL
>  $hdt$_0
>   value
> value
>   TOK_ORDERBY
>  TOK_TABSORTCOLNAMEASC
> TOK_TABLE_OR_COL
>value
>   TOK_LIMIT
>  100
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8555) Too many casts results in loss of original string representation for constant

2014-10-22 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8555:
---
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. 
[~vikram.dixit]  ok for 0.14 ?

> Too many casts results in loss of original string representation for constant 
> --
>
> Key: HIVE-8555
> URL: https://issues.apache.org/jira/browse/HIVE-8555
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 0.14.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.15.0
>
> Attachments: HIVE-8555.1.patch, HIVE-8555.patch
>
>
> {code}
> SELECT key, value FROM src WHERE key = cast(86 as double);
> 86.0  val_86
> {code}
> With constant propagate off we get different and correct result.
> {code}
> set hive.optimize.constant.propagation=false;
> SELECT key, value FROM src WHERE key =  cast(86 as double);
> 86  val_86
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8563) Running annotate_stats_join_pkfk.q in TestMiniTezCliDriver is causing NPE

2014-10-22 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-8563:
-
Attachment: HIVE-8563.2.patch

Rebased patch.

> Running annotate_stats_join_pkfk.q in TestMiniTezCliDriver is causing NPE
> -
>
> Key: HIVE-8563
> URL: https://issues.apache.org/jira/browse/HIVE-8563
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth J
>Assignee: Vikram Dixit K
>Priority: Critical
> Attachments: HIVE-8563.2.patch, HIVE-8563.WIP.patch
>
>
> I added a test case as part of HIVE-8549 to annotate_stats_join_pkfk.q. This 
> test case fails with NullPointerException when we run using 
> TestMiniTezCliDriver. Here is the stack trace
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.plan.PlanUtils.getFieldSchemasFromRowSchema(PlanUtils.java:548)
> at 
> org.apache.hadoop.hive.ql.optimizer.ReduceSinkMapJoinProc.process(ReduceSinkMapJoinProc.java:239)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:87)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:69)
> at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:367)
> at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:202)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10057)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:417)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:303)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1070)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1132)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1007)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:997)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8563) Running annotate_stats_join_pkfk.q in TestMiniTezCliDriver is causing NPE

2014-10-22 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-8563:
-
Attachment: HIVE-8563.WIP.patch

WIP patch. [~prasanth_j] can you try this out in case this issue is blocking 
you.

> Running annotate_stats_join_pkfk.q in TestMiniTezCliDriver is causing NPE
> -
>
> Key: HIVE-8563
> URL: https://issues.apache.org/jira/browse/HIVE-8563
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth J
>Assignee: Vikram Dixit K
>Priority: Critical
> Attachments: HIVE-8563.WIP.patch
>
>
> I added a test case as part of HIVE-8549 to annotate_stats_join_pkfk.q. This 
> test case fails with NullPointerException when we run using 
> TestMiniTezCliDriver. Here is the stack trace
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.plan.PlanUtils.getFieldSchemasFromRowSchema(PlanUtils.java:548)
> at 
> org.apache.hadoop.hive.ql.optimizer.ReduceSinkMapJoinProc.process(ReduceSinkMapJoinProc.java:239)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:87)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:69)
> at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:367)
> at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:202)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10057)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:417)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:303)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1070)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1132)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1007)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:997)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8563) Running annotate_stats_join_pkfk.q in TestMiniTezCliDriver is causing NPE

2014-10-22 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-8563:
-
Status: Patch Available  (was: Open)

> Running annotate_stats_join_pkfk.q in TestMiniTezCliDriver is causing NPE
> -
>
> Key: HIVE-8563
> URL: https://issues.apache.org/jira/browse/HIVE-8563
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth J
>Assignee: Vikram Dixit K
>Priority: Critical
> Attachments: HIVE-8563.WIP.patch
>
>
> I added a test case as part of HIVE-8549 to annotate_stats_join_pkfk.q. This 
> test case fails with NullPointerException when we run using 
> TestMiniTezCliDriver. Here is the stack trace
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.plan.PlanUtils.getFieldSchemasFromRowSchema(PlanUtils.java:548)
> at 
> org.apache.hadoop.hive.ql.optimizer.ReduceSinkMapJoinProc.process(ReduceSinkMapJoinProc.java:239)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:87)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:69)
> at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:367)
> at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:202)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10057)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:417)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:303)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1070)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1132)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1007)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:997)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8563) Running annotate_stats_join_pkfk.q in TestMiniTezCliDriver is causing NPE

2014-10-22 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-8563:
-
Assignee: Vikram Dixit K

> Running annotate_stats_join_pkfk.q in TestMiniTezCliDriver is causing NPE
> -
>
> Key: HIVE-8563
> URL: https://issues.apache.org/jira/browse/HIVE-8563
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth J
>Assignee: Vikram Dixit K
>Priority: Critical
>
> I added a test case as part of HIVE-8549 to annotate_stats_join_pkfk.q. This 
> test case fails with NullPointerException when we run using 
> TestMiniTezCliDriver. Here is the stack trace
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.plan.PlanUtils.getFieldSchemasFromRowSchema(PlanUtils.java:548)
> at 
> org.apache.hadoop.hive.ql.optimizer.ReduceSinkMapJoinProc.process(ReduceSinkMapJoinProc.java:239)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:87)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:69)
> at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:367)
> at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:202)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10057)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:417)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:303)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1070)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1132)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1007)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:997)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8559) DECIMAL field can not store 0 if precision = scale

2014-10-22 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180606#comment-14180606
 ] 

Alexander Pivovarov commented on HIVE-8559:
---

mysql> select cast(0 as DECIMAL(6,6)) f1;
+--+
| f1   |
+--+
| 0.00 |
+--+


> DECIMAL field can not store 0 if precision = scale
> --
>
> Key: HIVE-8559
> URL: https://issues.apache.org/jira/browse/HIVE-8559
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1
>Reporter: Alexander Pivovarov
>
> For example the following query returns NULL instead of 0 in hive (trunk )
> select cast(0 as decimal(6,6));
> OK
> NULL
> I tried DECIMAL(6,6) in oracle.  It can store 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7985) With CBO enabled cross product is generated when a subquery is present

2014-10-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180608#comment-14180608
 ] 

Hive QA commented on HIVE-7985:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12675995/HIVE-7985.1.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6575 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join_pkfk
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hive.minikdc.TestJdbcWithMiniKdc.testNegativeTokenAuth
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1397/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1397/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1397/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12675995 - PreCommit-HIVE-TRUNK-Build

> With CBO enabled cross product is generated when a subquery is present
> --
>
> Key: HIVE-7985
> URL: https://issues.apache.org/jira/browse/HIVE-7985
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Laljo John Pullokkaran
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-7985.1.patch, HIVE-7985.2.patch, HIVE-7985.patch
>
>
> This is a regression introduced in the latest build of the CBO branch.
> Removing the subquery for item will remove the cross products
> Query
> {code}
> select i_item_id,sum(ss_ext_sales_price) total_sales from store_sales, 
> date_dim, item where item.i_item_id in (select i.i_item_id from item i where 
> i_color in ('purple','burlywood','indian')) and ss_item_sk = i_item_sk and 
> ss_sold_date_sk = d_date_sk and d_year = 2001 and d_moy = 1 group by 
> i_item_id;
> {code}
> {code}
> Warning: Map Join MAPJOIN[38][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[39][bigTable=store_sales] in task 'Map 4' is a 
> cross product
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Map 1 <- Map 3 (BROADCAST_EDGE)
> Map 4 <- Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE)
> Reducer 5 <- Map 4 (SIMPLE_EDGE)
>   DagName: mmokhtar_20140904141313_9c253f7e-aad1-4ca4-9be1-ea45e3d34496:1
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: item
>   filterExpr: (true and i_item_id is not null) (type: boolean)
>   Statistics: Num rows: 462000 Data size: 663862160 Basic 
> stats: COMPLETE Column stats: NONE
>   Filter Operator
> predicate: i_item_id is not null (type: boolean)
> Statistics: Num rows: 231000 Data size: 331931080 Basic 
> stats: COMPLETE Column stats: NONE
> Map Join Operator
>   condition map:
>Inner Join 0 to 1
>   condition expressions:
> 0 {i_item_sk} {i_item_id}
> 1 {d_date_sk}
>   keys:
> 0
> 1
>   outputColumnNames: _col0, _col1, _col25
>   input vertices:
> 1 Map 3
>   Statistics: Num rows: 254100 Data size: 365124192 Basic 
> stats: COMPLETE Column stats: NONE
>   Select Operator
> expressions: _col0 (type: int), _col1 (type: string), 
> _col25 (type: int)
> outputColumnNames: _col0, _col1, _col25
> Statistics: Num rows: 254100 Data size: 365124192 
> Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
>   sort order:
>   Statistics: Num rows: 254100 Data size: 365124192 
> Basic stats: COMPLETE Column stats: NONE
>   value expressions: _col0 (type: int), _col1 (type: 
> string), _col25 (type: int)
> Execution mode: vectorized
> Map 2
> Map Operator Tree:
> 

[jira] [Created] (HIVE-8563) Running annotate_stats_join_pkfk.q in TestMiniTezCliDriver is causing NPE

2014-10-22 Thread Prasanth J (JIRA)
Prasanth J created HIVE-8563:


 Summary: Running annotate_stats_join_pkfk.q in 
TestMiniTezCliDriver is causing NPE
 Key: HIVE-8563
 URL: https://issues.apache.org/jira/browse/HIVE-8563
 Project: Hive
  Issue Type: Bug
Reporter: Prasanth J
Priority: Critical


I added a test case as part of HIVE-8549 to annotate_stats_join_pkfk.q. This 
test case fails with NullPointerException when we run using 
TestMiniTezCliDriver. Here is the stack trace
{code}
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.plan.PlanUtils.getFieldSchemasFromRowSchema(PlanUtils.java:548)
at 
org.apache.hadoop.hive.ql.optimizer.ReduceSinkMapJoinProc.process(ReduceSinkMapJoinProc.java:239)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
at 
org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:87)
at 
org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
at 
org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
at 
org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
at 
org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
at 
org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
at 
org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:69)
at 
org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:367)
at 
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:202)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10057)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:417)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:303)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1070)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1132)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1007)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:997)

{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8435) Add identity project remover optimization

2014-10-22 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-8435:
---
Attachment: HIVE-8435.02.patch

submitting rebased patch without test file changes, to see what tests fail in 
HiveQA now

> Add identity project remover optimization
> -
>
> Key: HIVE-8435
> URL: https://issues.apache.org/jira/browse/HIVE-8435
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer
>Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-8435.02.patch, HIVE-8435.1.patch, HIVE-8435.patch
>
>
> In some cases there is an identity project in plan which is useless. Better 
> to optimize it away to avoid evaluating it without any benefit at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   >