Re: Review Request 40867: HIVE-11527 - bypass HiveServer2 thrift interface for query results

2015-12-09 Thread Takanobu Asanuma


> On 12月 3, 2015, 9:48 p.m., Sergey Shelukhin wrote:
> > jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java, line 481
> > 
> >
> > this seems similar with the code in SQLOperation, perhaps they can be 
> > refactored into an utility method used by both

Yes, I refereed to the codes of SQLOperation. I will create an utility method 
later.


> On 12月 3, 2015, 9:48 p.m., Sergey Shelukhin wrote:
> > jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java, line 509
> > 
> >
> > please log instead

Like this? 
```java
LOG.error(ex.toString());
```
printStackTrace is used at other lines(312, 438). Should we log there too?


> On 12月 3, 2015, 9:48 p.m., Sergey Shelukhin wrote:
> > service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java, 
> > line 511
> > 
> >
> > nit: IIRC setting it to null is also harmless

I agree with it. I will fix this.


- Takanobu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40867/#review108891
---


On 12月 2, 2015, 12:52 p.m., Takanobu Asanuma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40867/
> ---
> 
> (Updated 12月 2, 2015, 12:52 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This is a WIP patch for HIVE-11527
> 
> * I added a new configuration whose name is 
> hive.server2.webhdfs.bypass.enabled. The default is false. When this value is 
> true, clients use the bypass.
> 
> * I still have not considered security such as Kerberos and SSL at present.
> 
> * I have not implement Statement#setFetchSize for bypass yet.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java db942b0 
>   jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java 245c6a3 
>   jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 180f99e8 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 8fafd61 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java 1634143 
>   service/if/TCLIService.thrift baf583f 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.h b078c99 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.cpp b852379 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TExecuteStatementResp.java
>  0b9aa0f 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TProtocolVersion.java
>  c936ada 
>   service/src/gen/thrift/gen-py/TCLIService/ttypes.py ef5f5f5 
>   service/src/gen/thrift/gen-py/hive_service/ThriftHive-remote e167d5b 
>   service/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb f004ec4 
>   service/src/java/org/apache/hive/service/cli/CLIService.java adc9809 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 25cefc2 
>   
> service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
> b0bd351 
>   service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
> 1331a99 
>   service/src/java/org/apache/hive/service/cli/session/HiveSession.java 
> 4f4e92d 
>   service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
> a14908b 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> 8434965 
> 
> Diff: https://reviews.apache.org/r/40867/diff/
> 
> 
> Testing
> ---
> 
> I have tested few simple queries and they worked well. But I think there are 
> some problems for some queries. I'm going to test more queries and fix bugs. 
> I'm also going to add unit tests.
> 
> 
> Thanks,
> 
> Takanobu Asanuma
> 
>



Re: Review Request 40867: HIVE-11527 - bypass HiveServer2 thrift interface for query results

2015-12-09 Thread Takanobu Asanuma


> On 12月 3, 2015, 9:40 p.m., Sergey Shelukhin wrote:
> > jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java, line 390
> > 
> >
> > some existing configuration is probably needed (and better)

IIUC, hive configurations are in the server side and HiveQueryResultSet is in 
the client side. So I don't understand how to load hive configurations in 
HiveQueryResultSet. Is there any idea?


> On 12月 3, 2015, 9:40 p.m., Sergey Shelukhin wrote:
> > jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java, line 402
> > 
> >
> > also, I wonder if it makes sense to add rows to rowset immediately 
> > after reading. Storing all rows and then adding them all to rowset stores 
> > all rows twice in memory.

Thank you for the advice. I'll think it over.


> On 12月 3, 2015, 9:40 p.m., Sergey Shelukhin wrote:
> > jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java, line 456
> > 
> >
> > nit: why object?

I will change to string.


> On 12月 3, 2015, 9:40 p.m., Sergey Shelukhin wrote:
> > jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java, line 466
> > 
> >
> > I wonder why we need to have thrift serialization? One of the goals is 
> > to avoid it. Perhaps it can be done in a follow-up JIRA

Sorry to confuse you, what I want to do here is to convert a object to a 
standard Java object. I think we can just use 
ObjectInspectorUtils#copyToStandardObject here.


> On 12月 3, 2015, 9:40 p.m., Sergey Shelukhin wrote:
> > jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java, line 486
> > 
> >
> > nit: useless check, schema is already used above to get descriptors

I will fix it.


> On 12月 3, 2015, 9:40 p.m., Sergey Shelukhin wrote:
> > jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java, line 492
> > 
> >
> > nit: get(pos) can be done once

I will fix it.


- Takanobu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40867/#review108889
---


On 12月 2, 2015, 12:52 p.m., Takanobu Asanuma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40867/
> ---
> 
> (Updated 12月 2, 2015, 12:52 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This is a WIP patch for HIVE-11527
> 
> * I added a new configuration whose name is 
> hive.server2.webhdfs.bypass.enabled. The default is false. When this value is 
> true, clients use the bypass.
> 
> * I still have not considered security such as Kerberos and SSL at present.
> 
> * I have not implement Statement#setFetchSize for bypass yet.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java db942b0 
>   jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java 245c6a3 
>   jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 180f99e8 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 8fafd61 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java 1634143 
>   service/if/TCLIService.thrift baf583f 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.h b078c99 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.cpp b852379 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TExecuteStatementResp.java
>  0b9aa0f 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TProtocolVersion.java
>  c936ada 
>   service/src/gen/thrift/gen-py/TCLIService/ttypes.py ef5f5f5 
>   service/src/gen/thrift/gen-py/hive_service/ThriftHive-remote e167d5b 
>   service/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb f004ec4 
>   service/src/java/org/apache/hive/service/cli/CLIService.java adc9809 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 25cefc2 
>   
> service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
> b0bd351 
>   service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
> 1331a99 
>   service/src/java/org/apache/hive/service/cli/session/HiveSession.java 
> 4f4e92d 
>   service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
> a14908b 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> 8434965 
> 
> Diff: https://reviews.apache.org/r/40867/diff/
> 
> 
> Testing
> ---
> 
> I have tested few simple queries and 

Re: Review Request 38663: HIVE-11878: ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive

2015-12-09 Thread Ratandeep Ratti

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38663/
---

(Updated Dec. 9, 2015, 8:33 a.m.)


Review request for hive.


Changes
---

Addressed failing tests


Bugs: HIVE-11878
https://issues.apache.org/jira/browse/HIVE-11878


Repository: hive-git


Description
---

HIVE-11878: ClassNotFoundException can possibly occur if multiple jars are 
registered one at a time in Hive


Diffs (updated)
-

  conf/ivysettings.xml bda842a 
  itests/custom-udfs/pom.xml PRE-CREATION 
  itests/custom-udfs/udf-classloader-udf1/pom.xml PRE-CREATION 
  
itests/custom-udfs/udf-classloader-udf1/src/main/java/hive/it/custom/udfs/UDF1.java
 PRE-CREATION 
  itests/custom-udfs/udf-classloader-udf2/pom.xml PRE-CREATION 
  
itests/custom-udfs/udf-classloader-udf2/src/main/java/hive/it/custom/udfs/UDF2.java
 PRE-CREATION 
  itests/custom-udfs/udf-classloader-util/pom.xml PRE-CREATION 
  
itests/custom-udfs/udf-classloader-util/src/main/java/hive/it/custom/udfs/Util.java
 PRE-CREATION 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/authorization/plugin/TestHiveAuthorizerShowFilters.java
 0c03a00 
  itests/pom.xml 5d8249f 
  itests/qtest/pom.xml 8f6807a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/UDFClassLoader.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java c01994f 
  ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 5c69fb6 
  ql/src/test/queries/clientpositive/udf_classloader.q PRE-CREATION 
  
ql/src/test/queries/clientpositive/udf_classloader_dynamic_dependency_resolution.q
 PRE-CREATION 
  ql/src/test/results/clientpositive/udf_classloader.q.out PRE-CREATION 
  
ql/src/test/results/clientpositive/udf_classloader_dynamic_dependency_resolution.q.out
 PRE-CREATION 

Diff: https://reviews.apache.org/r/38663/diff/


Testing
---


Thanks,

Ratandeep Ratti



[GitHub] hive pull request: Merge pull request #1 from apache/master

2015-12-09 Thread Cazen
GitHub user Cazen opened a pull request:

https://github.com/apache/hive/pull/57

Merge pull request #1 from apache/master

Update

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Cazen/hive master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/57.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #57


commit e50342e94751a8a50ae49f73121385151be3cb9e
Author: Cazen Lee 
Date:   2015-12-09T11:12:54Z

Merge pull request #1 from apache/master

Update




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (HIVE-12629) hive.auto.convert.join=true makes join sql failed on spark engine on yarn

2015-12-09 Thread JIRA
吴子美 created HIVE-12629:
--

 Summary: hive.auto.convert.join=true makes join sql failed on 
spark engine on yarn
 Key: HIVE-12629
 URL: https://issues.apache.org/jira/browse/HIVE-12629
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 1.2.1
Reporter: 吴子美
Assignee: Xuefu Zhang


I use hive1.2 on spark on yarn. 

I found 
select count(*) from 
(select  user_id from xxx group by user_id ) a join
(select  user_id from yyy lateral view json_tuple(u, 'h') v1 as h) b
on a.user_id=b.user_id ;
failed in hive on spark on yarn, but OK in hive on MR.

I tried the following sql on spark. It was OK.
select count(*) from 
(select  user_id from xxx group by user_id ) a left join
(select  user_id from yyy lateral view json_tuple(u, 'h') v1 as h) b
on a.user_id=b.user_id ;

When I turn hive.auto.convert.join from true to false. Everything goes OK.

The error message in hive.log was :
2015-12-09 21:10:17,190 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(569)) - 15/12/09 21:10:17 INFO log.PerfLogger: 

2015-12-09 21:10:17,190 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(569)) - 15/12/09 21:10:17 INFO exec.Utilities: 
Serializing ReduceWork via kryo
2015-12-09 21:10:17,214 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(569)) - 15/12/09 21:10:17 INFO log.PerfLogger: 

2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(569)) - 15/12/09 21:10:17 INFO client.RemoteDriver: 
Failed to run job 8fed1ca8-834f-497f-b189-eab343440a9f
2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(569)) - java.lang.IllegalStateException: Connection 
already exists
2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(569)) -at 
org.apache.hadoop.hive.ql.exec.spark.SparkPlan.connect(SparkPlan.java:142)
2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(569)) -at 
org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateParentTran(SparkPlanGenerator.java:142)
2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(569)) -at 
org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:106)
2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(569)) -at 
org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:252)
2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(569)) -at 
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:366)
2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(569)) -at 
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:335)
2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(569)) -at 
java.util.concurrent.FutureTask.run(FutureTask.java:262)
2015-12-09 21:10:17,262 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(569)) -at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
2015-12-09 21:10:17,262 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(569)) -at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
2015-12-09 21:10:17,262 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(569)) -at 
java.lang.Thread.run(Thread.java:745)
2015-12-09 21:10:17,266 INFO  [RPC-Handler-3]: client.SparkClientImpl 
(SparkClientImpl.java:handle(522)) - Received result for 
8fed1ca8-834f-497f-b189-eab343440a9f
2015-12-09 21:10:18,054 ERROR [HiveServer2-Background-Pool: Thread-43]: 
status.SparkJobMonitor (SessionState.java:printError(960)) - Status: Failed
2015-12-09 21:10:18,055 INFO  [HiveServer2-Background-Pool: Thread-43]: 
log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - 
2015-12-09 21:10:18,076 ERROR [HiveServer2-Background-Pool: Thread-43]: 
ql.Driver (SessionState.java:printError(960)) - FAILED: Execution Error, return 
code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

Is it the bug of hive on spark?







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 41062: HIVE-12485 Secure HS2 web UI with kerberos

2015-12-09 Thread Jimmy Xiang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41062/
---

(Updated Dec. 9, 2015, 4:35 p.m.)


Review request for hive, Szehon Ho and Xuefu Zhang.


Changes
---

Changed use.SSL/SPNEGO to lower case.


Bugs: HIVE-12485
https://issues.apache.org/jira/browse/HIVE-12485


Repository: hive-git


Description
---

Added an AuthenticationFilter to secure the HS2 web ui with kerberos


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java d52f994 
  common/src/java/org/apache/hive/http/HttpServer.java 4b0ed68 
  service/src/java/org/apache/hive/service/server/HiveServer2.java cad541a 

Diff: https://reviews.apache.org/r/41062/diff/


Testing
---

Manually tested it locally.


Thanks,

Jimmy Xiang



[jira] [Created] (HIVE-12630) Import should create a new WriteEntity for the new table it's creating to mimic CREATETABLE behaviour

2015-12-09 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-12630:
---

 Summary: Import should create a new WriteEntity for the new table 
it's creating to mimic CREATETABLE behaviour
 Key: HIVE-12630
 URL: https://issues.apache.org/jira/browse/HIVE-12630
 Project: Hive
  Issue Type: Bug
  Components: Authorization, Import/Export
Affects Versions: 1.2.0, 1.3.0, 2.0.0, 2.1.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


CREATE-TABLE creates a new WriteEntity for the new table being created, whereas 
IMPORT does not mimic that behaviour.

While SQLStandardAuth itself does not care about this difference, external 
Authorizers, as with Ranger can and do make a distinction on this, and can have 
policies set up on patterns for objects that do not yet exist. Thus, we must 
emit a WriteEntity for the yet-to-be-created table as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 41062: HIVE-12485 Secure HS2 web UI with kerberos

2015-12-09 Thread Mohit Sabharwal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41062/#review109581
---

Ship it!


LGTM

- Mohit Sabharwal


On Dec. 9, 2015, 4:35 p.m., Jimmy Xiang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41062/
> ---
> 
> (Updated Dec. 9, 2015, 4:35 p.m.)
> 
> 
> Review request for hive, Szehon Ho and Xuefu Zhang.
> 
> 
> Bugs: HIVE-12485
> https://issues.apache.org/jira/browse/HIVE-12485
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Added an AuthenticationFilter to secure the HS2 web ui with kerberos
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java d52f994 
>   common/src/java/org/apache/hive/http/HttpServer.java 4b0ed68 
>   service/src/java/org/apache/hive/service/server/HiveServer2.java cad541a 
> 
> Diff: https://reviews.apache.org/r/41062/diff/
> 
> 
> Testing
> ---
> 
> Manually tested it locally.
> 
> 
> Thanks,
> 
> Jimmy Xiang
> 
>



Review Request 41153: HIVE-12640 : Allow StatsOptimizer to optimize the query for Constant GroupBy keys

2015-12-09 Thread Hari Sankar Sivarama Subramaniyan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41153/
---

Review request for hive, Ashutosh Chauhan and John Pullokkaran.


Repository: hive-git


Description
---

HIVE-12640 : Allow StatsOptimizer to optimize the query for Constant GroupBy 
keys


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/StatsOptimizer.java ffe706e 
  ql/src/test/queries/clientpositive/metadata_only_queries.q bce121d 
  ql/src/test/results/clientpositive/metadata_only_queries.q.out 65a4dfa 

Diff: https://reviews.apache.org/r/41153/diff/


Testing
---


Thanks,

Hari Sankar Sivarama Subramaniyan



[jira] [Created] (HIVE-12637) make retryable SQLExceptions in TxnHandler configurable

2015-12-09 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-12637:
-

 Summary: make retryable SQLExceptions in TxnHandler configurable
 Key: HIVE-12637
 URL: https://issues.apache.org/jira/browse/HIVE-12637
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


same for CompactionTxnHandler
would be convenient if the user could specify some RegEx (perhaps by db type) 
which will tell TxnHandler.checkRetryable() that this is should be retried.

The regex should probably apply to String produced by 
{noformat}
  private static String getMessage(SQLException ex) {
return ex.getMessage() + "(SQLState=" + ex.getSQLState() + ",ErrorCode=" + 
ex.getErrorCode() + ")";
  }
{noformat}
This make it flexible.

See if we need to add Db type (and possibly version) of the DB being used.

With 5 different DBs supported this gives control end users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12640) Allow StatsOptimizer to optimize the query for Constant GroupBy keys

2015-12-09 Thread Hari Sankar Sivarama Subramaniyan (JIRA)
Hari Sankar Sivarama Subramaniyan created HIVE-12640:


 Summary: Allow StatsOptimizer to optimize the query for Constant 
GroupBy keys 
 Key: HIVE-12640
 URL: https://issues.apache.org/jira/browse/HIVE-12640
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan


{code}
hive> select count('1') from src group by '1';
{code}

In the above query, while performing StatsOptimizer optimization we can safely 
ignore the group by on the constant key '1' since the above query will return 
the same result as "select count('1') from src".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12639) Handle exceptions during SARG creation

2015-12-09 Thread Prasanth Jayachandran (JIRA)
Prasanth Jayachandran created HIVE-12639:


 Summary: Handle exceptions during SARG creation
 Key: HIVE-12639
 URL: https://issues.apache.org/jira/browse/HIVE-12639
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0, 2.1.0
Reporter: Prasanth Jayachandran


Bad predicates can cause SearchArgument creation to throw exception.  For 
example, filters like where ts = '2014-15-16 17:18:19.20' can throw 
IllegalArgumentException during SARG creation as timestamp is of wrong format 
(month is invalid). If SARG creation fails, it should return YES_NO_NULL 
TruthValue instead of throwing exception. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12638) Hive should not create empty files in partitions

2015-12-09 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-12638:


 Summary: Hive should not create empty files in partitions
 Key: HIVE-12638
 URL: https://issues.apache.org/jira/browse/HIVE-12638
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley


Currently Hive creates empty files for buckets with no rows in a directory. I 
believe this was originally because the SMB and bucket join require files to be 
present to get InputSplits. There are customers where this behavior leads the 
creation of more 200,000 empty ORC files per an hour on a cluster (with peaks 
of more than 725,000 per an hour). We've also seen instances where a single 
DataNode is involved in 5600 of these empty ORC files within a 2 minute period. 
This causes significant stress on HDFS at both the NameNode and DataNode and is 
completely unnecessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12635) Hive should return the latest hbase cell timestamp as the row timestamp value

2015-12-09 Thread Aihua Xu (JIRA)
Aihua Xu created HIVE-12635:
---

 Summary: Hive should return the latest hbase cell timestamp as the 
row timestamp value
 Key: HIVE-12635
 URL: https://issues.apache.org/jira/browse/HIVE-12635
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 2.1.0
Reporter: Aihua Xu
Assignee: Aihua Xu


When hive talks to hbase and maps hbase timestamp field to one hive column,  
seems hive returns the first cell timestamp instead of the latest one as the 
timestamp value. 

Makes sense to return the latest timestamp since adding the latest cell can be  
considered an update to the row. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12636) Ensure that all queries (with DbTxnManager) run in a transaction

2015-12-09 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-12636:
-

 Summary: Ensure that all queries (with DbTxnManager) run in a 
transaction
 Key: HIVE-12636
 URL: https://issues.apache.org/jira/browse/HIVE-12636
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
Priority: Critical


Assuming Hive is using DbTxnManager
Currently (as of this writing only auto commit mode is supported), only queries 
that write to an Acid table start a transaction.
Read-only queries don't open a txn but still acquire locks.
This makes internal structures confusing/odd.
The are constantly 2 code paths to deal with which is inconvenient and error 
prone.

Also, a txn id is convenient "handle" for all locks/resources within a txn.
Doing thing would mean the client no longer needs to track locks that it 
acquired.  This enables further improvements to metastore side of Acid.

# add metastore call to openTxn() and acquireLocks() in a single call.  this it 
to make sure perf doesn't degrade for read-only query.  (Would also be useful 
for auto commit write queries)
# Should RO queries generate txn ids from the same sequence?  (they could for 
example use negative values of a different sequence).  Txnid is part of the 
delta/base file name.  Currently it's 7 digits.  If we use the same sequence, 
we'll exceed 7 digits faster. (possible upgrade issue).  On the other hand 
there is value in being able to pick txn id and commit timestamp out of the 
same logical sequence.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12641) LLAP: make management protocol hander count and retry policy configurable

2015-12-09 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-12641:
---

 Summary: LLAP: make management protocol hander count and retry 
policy configurable
 Key: HIVE-12641
 URL: https://issues.apache.org/jira/browse/HIVE-12641
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin


See HIVE-12341



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Feedback of my Phd work in Hive Project

2015-12-09 Thread Igor Wiese
Hi, Hive Community.

My name is Igor Wiese, phd Student from Brazil. In my research I am
investigating two important questions: What makes two files change
together? Can we predict when they are going to co-change again?

I've tried to investigate this question on the Hive project. I've collected
data from issue reports, discussions and commits and using some machine
learning techniques to build a prediction model.

I collected a total of 721 commits in which a pair of files changed
together and could correctly predict 53% commits. These were the most
useful information for predicting co-changes of files:

- sum of number of lines of code added, modified and removed,

- number of words used to describe and discuss the issues,

- number of comments in each issue,

- median value of closeness, a social network measure obtained from issue
comments, and

- median value of effective size, a social network measure obtained from
issue comments.

To illustrate, consider the following example from our analysis. For
release 0.14, the files "metastore/MetaStoreDirectSql.java" and
"metastore/ObjectStore.java" changed together in 4 commits. In another 2
commits, only the first file changed, but not the second. Collecting
contextual information for each commit made to first file in previous
release, we were able to predict 4 commits in which both files changed
together in release 0.14, and we issued 0 false positives and two wrong
predictions. For this pair of files, the most important contextual
information was the number of lines of codes added, the sum of lines of
codes added, removed and modified, and two social network metrics
(constraint, ties) obtained from issue comments

- Do these results surprise you? Can you think in any explanation for the
results?

- Do you think that our rate of prediction is good enough to be used for
building tool support for the software community?

- Do you have any suggestion on what can be done to improve the change
recommendation?

You can visit a webpage to inspect the results in details:
http://flosscoach.com/index.php/17-cochanges/72-hive

All the best,
Igor Wiese
Phd Candidate


Review Request 41164: HIVE-12628 : Eliminate flakiness in TestMetrics

2015-12-09 Thread Szehon Ho

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41164/
---

Review request for hive.


Bugs: HIVE-12628
https://issues.apache.org/jira/browse/HIVE-12628


Repository: hive-git


Description
---

Rewrite all the TestMetrics tests to not rely on file metrics-json dumps, which 
was proving to be flaky.  Now they will get the json live from the metrics and 
compare.

While at it, fix TestHiveMetaStorePartitionSpecs failures (non-related) by 
increasing the timeout.

Finally, add a 'safety' flag to turn off the blocking metadata metrics count at 
HMS startup, in case some user doesn't want it.  It's not related to test 
failures, but might come in handy.


Diffs
-

  
common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/CodahaleMetrics.java
 cba1c5a 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java d52f994 
  common/src/test/org/apache/hadoop/hive/common/metrics/MetricsTestUtils.java 
f21b431 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestMetaStoreMetrics.java
 bbfee1d 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/hbase/TestHBaseMetastoreMetrics.java
 b528376 
  
itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHs2Metrics.java 
873e126 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
fec8ea0 
  
metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStorePartitionSpecs.java
 ed1a453 
  ql/pom.xml ee1d46c 
  
ql/src/test/org/apache/hadoop/hive/ql/lockmgr/zookeeper/TestZookeeperLockManager.java
 7fcaa22 
  service/pom.xml 735891c 
  
service/src/test/org/apache/hive/service/cli/session/TestSessionManagerMetrics.java
 aaeecbe 

Diff: https://reviews.apache.org/r/41164/diff/


Testing
---


Thanks,

Szehon Ho



[jira] [Created] (HIVE-12643) For self describing InputFormat don't replicate schema information in partitions

2015-12-09 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-12643:
---

 Summary: For self describing InputFormat don't replicate schema 
information in partitions
 Key: HIVE-12643
 URL: https://issues.apache.org/jira/browse/HIVE-12643
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 2.0.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan


Since self describing Input Formats don't use individual partition schemas for 
schema resolution, there is no need to send that info to tasks.
Doing this should cut down plan size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12642) TxnHandler.TIMED_OUT_TXN_ABORT_BATCH_SIZE should be configurable

2015-12-09 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-12642:
-

 Summary: TxnHandler.TIMED_OUT_TXN_ABORT_BATCH_SIZE should be 
configurable
 Key: HIVE-12642
 URL: https://issues.apache.org/jira/browse/HIVE-12642
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
Priority: Minor
 Fix For: 1.3.0


This is used to safeguard against generating SQL with very large IN clauses but 
at the expense of running more queries.  This should be configurable since 
different DBs will be able to handle different sizes.

Current value is 1000 which is conservative.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12631) LLAP: support ORC ACID tables

2015-12-09 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-12631:
---

 Summary: LLAP: support ORC ACID tables
 Key: HIVE-12631
 URL: https://issues.apache.org/jira/browse/HIVE-12631
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin


LLAP uses a completely separate read path in ORC to allow for caching and 
parallelization of reads and processing. This path does not support ACID. As 
far as I remember ACID logic is embedded inside ORC format; we need to refactor 
it to be on top of some interface, if practical; or just port it to LLAP read 
path.
Another consideration is how the logic will work with cache. The cache is 
currently low-level (CB-level in ORC), so we could just use it to read bases 
and deltas (deltas should be cached with higher priority) and merge as usual. 
We could also cache merged representation in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12633) LLAP: package included serde jars

2015-12-09 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-12633:
---

 Summary: LLAP: package included serde jars
 Key: HIVE-12633
 URL: https://issues.apache.org/jira/browse/HIVE-12633
 Project: Hive
  Issue Type: Bug
Reporter: Takahiko Saito
Assignee: Sergey Shelukhin


Some SerDes like JSONSerde are not packaged with LLAP. One cannot localize jars 
on the daemon (due to security consideration if nothing else), so we should 
package them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12632) LLAP: don't use IO elevator for ACID tables

2015-12-09 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-12632:
---

 Summary: LLAP: don't use IO elevator for ACID tables 
 Key: HIVE-12632
 URL: https://issues.apache.org/jira/browse/HIVE-12632
 Project: Hive
  Issue Type: Bug
Reporter: Takahiko Saito


Before HIVE-12631 is fixed, we need to avoid ACID tables in IO elevator. Right 
now, a FileNotFound error is thrown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12634) Add command to kill an ACID transacton

2015-12-09 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-12634:
-

 Summary: Add command to kill an ACID transacton
 Key: HIVE-12634
 URL: https://issues.apache.org/jira/browse/HIVE-12634
 Project: Hive
  Issue Type: New Feature
  Components: Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


Should add a CLI command to abort a (runaway) transaction.
This should clean up all state related to this txn.
The initiator of this (if still alive) will get an error trying to 
heartbeat/commit, i.e. will become aware that the txn is dead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)