Review Request 45062: HIVE-13241 LLAP: Incremental Caching marks some small chunks as "incomplete CB"

2016-03-18 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45062/
---

Review request for hive, Gopal V and Prasanth_J.


Repository: hive-git


Description
---

see jira


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 98c6372 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cache/EvictionDispatcher.java 
bae571e 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cli/LlapOptionsProcessor.java 
c292b37 
  llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java 
dbee823 
  
llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java
 eb251a8 
  
llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcFileEstimateErrors.java
 PRE-CREATION 
  
llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcMetadataCache.java
 e970137 
  
llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestOrcMetadataCache.java
 901e58a 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java 
29b51ec 
  
storage-api/src/java/org/apache/hadoop/hive/common/io/encoded/EncodedColumnBatch.java
 ddba889 

Diff: https://reviews.apache.org/r/45062/diff/


Testing
---


Thanks,

Sergey Shelukhin



[jira] [Created] (HIVE-13313) Row Limit Per Split feature broken for Vectorization

2016-03-18 Thread Matt McCline (JIRA)
Matt McCline created HIVE-13313:
---

 Summary: Row Limit Per Split feature broken for Vectorization
 Key: HIVE-13313
 URL: https://issues.apache.org/jira/browse/HIVE-13313
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical


For vectorization, the ROWS clause is ignored causing many rows to be inserted.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13314) Hive on spark mapjoin errors if spark.master is not set

2016-03-18 Thread Szehon Ho (JIRA)
Szehon Ho created HIVE-13314:


 Summary: Hive on spark mapjoin errors if spark.master is not set
 Key: HIVE-13314
 URL: https://issues.apache.org/jira/browse/HIVE-13314
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Szehon Ho
Assignee: Szehon Ho
Priority: Minor


There are some errors that happen if spark.master is not set.

This is despite the code defaulting to yarn-cluster if spark.master is not set 
by user or on the config files: 
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java#L51]

The funny thing is that while it works the first time due to this default, 
subsequent tries will fail as the hiveConf is refreshed without that default 
being set.

[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java#L180]

Exception is follows:
{noformat}
Job aborted due to stage failure: Task 40 in stage 1.0 failed 4 times, most 
recent failure: Lost task 40.3 in stage 1.0 (TID 22, d2409.halxg.cloudera.com): 
java.lang.RuntimeException: Error processing row: 
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:154)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:48)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
at 
org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
at 
org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
at 
org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:117)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:197)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:223)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:490)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141)
... 16 more
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.isDedicatedCluster(SparkUtilities.java:108)
at 
org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:124)
at 
org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:114)
... 24 more

Driver stacktrace:
{noformat}

The issue is 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [Discuss] MariaDB support

2016-03-18 Thread Szehon Ho
Yea, +1 to point 2.

For point one, I also agree that it is compatible with mysql and not be a
ton of work unless you want to optimize, on our observations we have seen
existing mysql scripts work fine against mariadb.

On Wed, Mar 16, 2016 at 12:04 PM, Dmitry Tolpeko 
wrote:

> +1 great idea
>
> On Wed, Mar 16, 2016 at 10:00 PM, Thejas Nair 
> wrote:
>
>> + Sergio, Szehon, Ashutosh, Sushanth, Sergey,
>>
>> Any thoughts on this ?
>>
>>
>> On Tue, Mar 15, 2016 at 7:08 PM, Thejas Nair 
>> wrote:
>> > There seems to be increasing interest in supporting MariaDB as an
>> > option for storing metastore metadata. Supporting it as a database
>> > option is also easy as it is compatible with mysql. I thought it would
>> > be useful to discuss supporting it in the dev list before creating any
>> > jiras.
>> >
>> > There are two aspects I would like to discuss -
>> >
>> > 1. Changes in hive to support MariaDB
>> >
>> > The existing mysql schema creation/upgrade scripts in hive should just
>> > work for mariadb as well.
>> > However, MariaDB has some additional optimizations that we might want
>> > to use in future to optimize queries for it. That would mean creating
>> > specific scripts for mariadb.
>> >
>> > However, until we introduce such MariaDB specific tuning, I think it
>> > is better to avoid duplicating the mysql scripts.
>> >
>> > To make the transition to possibly using MariaDB optimized scripts
>> > easier, one option is to have schematool consider it as an alias for
>> > mysql until that happens.
>> >
>> >
>> > 2. Testing with MariaDB
>> > It would be useful to have tests for mariadb as well on the lines of
>> > what is available for mysql in
>> > https://issues.apache.org/jira/browse/HIVE-9800, to ensure that
>> > mariadb support is not broken.
>> >
>> > Thanks,
>> > Thejas
>>
>
>


[jira] [Created] (HIVE-13315) Option to reuse existing restored HBase snapshots

2016-03-18 Thread Liyin Tang (JIRA)
Liyin Tang created HIVE-13315:
-

 Summary: Option to reuse existing restored HBase snapshots
 Key: HIVE-13315
 URL: https://issues.apache.org/jira/browse/HIVE-13315
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Liyin Tang
Assignee: Sushanth Sowmyan


HiveHBaseTableSnapshotInputFormat needs to restore HBase snapshot for each 
query.  It will be great to have an option in the table properties to specify 
an existing restored snapshot. And if such property is set, the job can skip 
the restoring stage to reduce query time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 40867: HIVE-11527 - bypass HiveServer2 thrift interface for query results

2016-03-18 Thread Takanobu Asanuma


> On 3月 18, 2016, 5:57 p.m., Sergey Shelukhin wrote:
> >

Thanks for the reviewing.


> On 3月 18, 2016, 5:57 p.m., Sergey Shelukhin wrote:
> > jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java, line 210
> > 
> >
> > nit: whitespace

Sorry, I'll fix it.


> On 3月 18, 2016, 5:57 p.m., Sergey Shelukhin wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/Driver.java, line 1919
> > 
> >
> > why cannot the conf itself be passed in? why is the map necessary?
> > Or, if that doesn't work for some reason, what about creating a small 
> > struct class to hold the fields with logical names and types, instead of 
> > the map.

> why cannot the conf itself be passed in?

In this patch, HiveServer2 gives JDBC clients the configurations which is the 
necessary and sufficient condition for resolving namespace of HA.
In present implementation of Hive, the configurations on the JDBC clients side 
are different from ones on the cluster side. So if all the configurations are 
passed in here, it would destroy compatibility with existing JDBC applications. 
Even if it is better, I think it should be done in another jira.

> what about creating a small struct class to hold the fields with logical 
> names and types, instead of the map.

We need a simple key-value structure in here. Map is a simple way to do this 
and we don't need to define the new structure in the thrift API. So I used the 
map. What the advantages of creating a new structure class?


- Takanobu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40867/#review124234
---


On 3月 16, 2016, 8:51 a.m., Takanobu Asanuma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40867/
> ---
> 
> (Updated 3月 16, 2016, 8:51 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This is a WIP patch for HIVE-11527
> 
> * I added a new configuration whose name is 
> hive.server2.webhdfs.bypass.enabled. The default is false. When this value is 
> true, clients use the bypass.
> 
> * I still have not considered security such as Kerberos and SSL at present.
> 
> * I have not implement Statement#setFetchSize for bypass yet.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 98c6372 
>   jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java 8f67209 
>   jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java b4dba44 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 7327a42 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java 0b0c336 
>   service-rpc/if/TCLIService.thrift aa28b6e 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.h 7f1d9dd 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.cpp 3a27a60 
>   
> service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TColumnDesc.java
>  31472c8 
>   
> service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TExecuteStatementResp.java
>  7101fa5 
>   
> service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TGetTablesReq.java
>  1aa3f94 
>   
> service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TProtocolVersion.java
>  14d50ed 
>   service-rpc/src/gen/thrift/gen-php/Types.php b7df50a 
>   service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py c691781 
>   service-rpc/src/gen/thrift/gen-py/__init__.py PRE-CREATION 
>   service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 07ed97c 
>   service/src/java/org/apache/hive/service/cli/CLIService.java ab30ae2 
>   service/src/java/org/apache/hive/service/cli/ColumnDescriptor.java 7bd9f06 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> d9a273b 
>   
> service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
> 56a9c18 
>   service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
> 04d816a 
>   service/src/java/org/apache/hive/service/cli/session/HiveSession.java 
> 4f4e92d 
>   service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
> 8baecdf 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> 62fcde5 
> 
> Diff: https://reviews.apache.org/r/40867/diff/
> 
> 
> Testing
> ---
> 
> I have tested few simple queries and they worked well. But I think there are 
> some problems for some queries. I'm going to test more queries and fix bugs. 
> I'm also going to add unit tests.
> 
> 
> Thanks,
> 
> Takanobu Asanuma
> 
>



[jira] [Created] (HIVE-13296) Add vectorized Q test with complex types showing count(*) etc work correctly

2016-03-18 Thread Matt McCline (JIRA)
Matt McCline created HIVE-13296:
---

 Summary: Add vectorized Q test with complex types showing count(*) 
etc work correctly
 Key: HIVE-13296
 URL: https://issues.apache.org/jira/browse/HIVE-13296
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13310) Vectorized Projection Comparison Number Column to Scalar broken for !noNuls and selectedInUse

2016-03-18 Thread Matt McCline (JIRA)
Matt McCline created HIVE-13310:
---

 Summary: Vectorized Projection Comparison Number Column to Scalar 
broken for !noNuls and selectedInUse
 Key: HIVE-13310
 URL: https://issues.apache.org/jira/browse/HIVE-13310
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical


LongColEqualLongScalar.java
LongColGreaterEqualLongScalar.java
LongColGreaterLongScalar.java
LongColLessEqualLongScalar.java
LongColLessLongScalar.java
LongColNotEqualLongScalar.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13304) Merge master into llap branch

2016-03-18 Thread Jason Dere (JIRA)
Jason Dere created HIVE-13304:
-

 Summary: Merge master into llap branch
 Key: HIVE-13304
 URL: https://issues.apache.org/jira/browse/HIVE-13304
 Project: Hive
  Issue Type: Sub-task
Reporter: Jason Dere
Assignee: Jason Dere






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)