Re: Review Request 49619: sorting of tuple array using multiple fields

2016-07-06 Thread Carl Steinbach

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49619/#review141130
---




ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java (line 427)


To me "sort_array_field" makes it sound like this function sorts the 
elements in an array field, as opposed to sorting an array on a particular 
field, which is what is actually does. I think the purpose of this function 
would be clearer if the name were changed 'sort_array_on_field' or 
'sort_array_by' (I prefer the latter).



ql/src/test/queries/clientpositive/udf_sort_array_field.q (line 1)


Is this really necessary?



ql/src/test/queries/clientpositive/udf_sort_array_field.q (line 9)


No need for this. Please remove.



ql/src/test/queries/clientpositive/udf_sort_array_field.q (line 16)


The rows should have different struct values.



ql/src/test/queries/clientpositive/udf_sort_array_field.q (line 25)


Consider using named_struct() instead of struct(). This will allow you to 
provide names for the struct fields.



ql/src/test/results/beelinepositive/show_functions.q.out (line 183)


The number of rows is off by 8. This looks like a bug, thought not one 
caused by this patch.



ql/src/test/results/beelinepositive/show_functions.q.out (line 184)


It looks like you're stripping whitespace out of the patch. I suspect this 
is the cause of the failure in show_functions.q


- Carl Steinbach


On July 7, 2016, 5:07 a.m., Simanchal Das wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49619/
> ---
> 
> (Updated July 7, 2016, 5:07 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Carl Steinbach.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Problem Statement:
> 
> When we are working with complext structure of data like avro.
> Most of the times we are encountering array contains multiple tuples and each 
> tuple have struct schema.
> 
> Suppose here struct schema is like below:
> 
> {
>   "name": "employee",
>   "type": [{
>   "type": "record",
>   "name": "Employee",
>   "namespace": "com.company.Employee",
>   "fields": [{
>   "name": "empId",
>   "type": "int"
>   }, {
>   "name": "empName",
>   "type": "string"
>   }, {
>   "name": "age",
>   "type": "int"
>   }, {
>   "name": "salary",
>   "type": "double"
>   }]
>   }]
> }
> 
> 
> Then while running our hive query complex array looks like array of employee 
> objects.
> Example: 
>   //(array>)
>   
> Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]
> 
> 
> When we are implementing business use cases day to day life we are 
> encountering problems like sorting a tuple array by specific field[s] like 
> empIdm,salary,etc.
> 
> 
> Proposal:
> 
> I have developed a udf 'sort_array_field' which will sort a tuple array by 
> one or more fields in naural order.
> 
> Example:
>   1.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary");
>   output: 
> array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]
>   
>   2.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary");
>   output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
> 
>   3.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age);
>   output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
> 
> 
> Diffs
> -
> 
>   itests/src/test/resources/testconfiguration.properties 1ab914d 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 2f4a94c 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf

[jira] [Created] (HIVE-14182) Revert "HIVE-13084: Vectorization add support for PROJECTION Multi-AND/OR

2016-07-06 Thread Matt McCline (JIRA)
Matt McCline created HIVE-14182:
---

 Summary: Revert "HIVE-13084: Vectorization add support for 
PROJECTION Multi-AND/OR
 Key: HIVE-14182
 URL: https://issues.apache.org/jira/browse/HIVE-14182
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Priority: Critical
 Attachments: HIVE-13084.revert.patch

To many issues with scratch column allocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14181) DROP TABLE in hive doesn't Throw Error

2016-07-06 Thread Pranjal Singh (JIRA)
Pranjal Singh created HIVE-14181:


 Summary: DROP TABLE in hive doesn't Throw Error
 Key: HIVE-14181
 URL: https://issues.apache.org/jira/browse/HIVE-14181
 Project: Hive
  Issue Type: Bug
 Environment: Hive 1.1.0
CDH 5.5.1-1
Reporter: Pranjal Singh


drop table table_name doen't throw an error if drop table fails.
I was dropping a table and my trash didn't have enough space to hold the table 
but the drop table command showed success and the table wasn't deleted. But the 
hadoop fs -rm -r /hive/xyz.db/table_name/ gave an error "Failed to move to 
trash" because I didnot have enough space quota in my trash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 49619: sorting of tuple array using multiple fields

2016-07-06 Thread Simanchal Das

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49619/
---

(Updated July 7, 2016, 5:03 a.m.)


Review request for hive and Carl Steinbach.


Changes
---

added udf name in show function q.out file


Repository: hive-git


Description
---

Problem Statement:

When we are working with complext structure of data like avro.
Most of the times we are encountering array contains multiple tuples and each 
tuple have struct schema.

Suppose here struct schema is like below:

{
"name": "employee",
"type": [{
"type": "record",
"name": "Employee",
"namespace": "com.company.Employee",
"fields": [{
"name": "empId",
"type": "int"
}, {
"name": "empName",
"type": "string"
}, {
"name": "age",
"type": "int"
}, {
"name": "salary",
"type": "double"
}]
}]
}


Then while running our hive query complex array looks like array of employee 
objects.
Example: 
//(array>)

Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]


When we are implementing business use cases day to day life we are encountering 
problems like sorting a tuple array by specific field[s] like empIdm,salary,etc.


Proposal:

I have developed a udf 'sort_array_field' which will sort a tuple array by one 
or more fields in naural order.

Example:
1.Select 
sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary");
output: 
array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]

2.Select 
sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary");
output: 
array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]

3.Select 
sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age);
output: 
array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties 1ab914d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 2f4a94c 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayField.java 
PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSortArrayField.java
 PRE-CREATION 
  ql/src/test/queries/clientnegative/udf_sort_array_field_wrong1.q PRE-CREATION 
  ql/src/test/queries/clientnegative/udf_sort_array_field_wrong2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/udf_sort_array_field.q PRE-CREATION 
  ql/src/test/results/beelinepositive/show_functions.q.out 4f3ec40 
  ql/src/test/results/clientnegative/udf_sort_array_field_wrong1.q.out 
PRE-CREATION 
  ql/src/test/results/clientnegative/udf_sort_array_field_wrong2.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/udf_sort_array_field.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/49619/diff/


Testing
---

Junit test cases and query.q files are attached


Thanks,

Simanchal Das



Re: Need help with hive's explode function.

2016-07-06 Thread Venkata Penikalapati
Hi Karan,
Your problem looks interesting, let me know more details will try to look into.

ThanksVenkata Karthik P

_
From: Karan Verma (Tech - BLR) 
Sent: Wednesday, July 6, 2016 9:45 PM
Subject: Need help with hive's explode function.
To:  


Hi,

I'm working on writing a custom Map Reduce job, which mimics a hive query.
This query works on JSON type data and there are multiple nested lists,
which are exploded. I have been struggling with writing an Exploder class
which is very specific to the data. I just can't seem to get the number of
outputs correct. I'll mail you the details, just let me know,  if you'd be
willing to help me on this.

Thanks,
Karan.





Need help with hive's explode function.

2016-07-06 Thread Karan Verma (Tech - BLR)
Hi,

I'm working on writing a custom Map Reduce job, which mimics a hive query.
This query works on JSON type data and there are multiple nested lists,
which are exploded. I have been struggling with writing an Exploder class
which is very specific to the data. I just can't seem to get the number of
outputs correct. I'll mail you the details, just let me know,  if you'd be
willing to help me on this.

Thanks,
Karan.


[jira] [Created] (HIVE-14180) Disable LlapZookeeperRegistry ZK auth setup for external clients

2016-07-06 Thread Jason Dere (JIRA)
Jason Dere created HIVE-14180:
-

 Summary: Disable LlapZookeeperRegistry ZK auth setup for external 
clients
 Key: HIVE-14180
 URL: https://issues.apache.org/jira/browse/HIVE-14180
 Project: Hive
  Issue Type: Bug
  Components: llap
Reporter: Jason Dere
Assignee: Jason Dere


{noformat}
Caused by: org.apache.hadoop.service.ServiceStateException: 
java.io.IOException: Llap Kerberos keytab is empty
at 
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
at 
org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getClient(LlapRegistryService.java:67)
at 
org.apache.hadoop.hive.llap.LlapBaseInputFormat.getServiceInstance(LlapBaseInputFormat.java:238)
at 
org.apache.hadoop.hive.llap.LlapBaseInputFormat.getRecordReader(LlapBaseInputFormat.java:142)
at 
org.apache.hadoop.hive.llap.LlapRowInputFormat.getRecordReader(LlapRowInputFormat.java:51)
{noformat}

Using the LLAP ZK registry in environments other than the LLAP daemon (such as 
external LLAP clients), there should be a way to skip this ZK auth setup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 48716: HIVE-13873 Column pruning for nested fields

2016-07-06 Thread cheng xu


> On July 6, 2016, 10:48 p.m., Aihua Xu wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java, 
> > line 122
> > 
> >
> > Just try to understand the logic (not too familiar with Parquet). So 
> > the underneath parquet already supports "hive.io.file.readgroup.paths" or 
> > this is totally within hive? How are the struct data stored in parquet and 
> > pruned with the group path in general?

Parquet doesn't support this configuration. We reconstruct the requested schema 
in Hive side by pruning unneeded columns like other projection does.


- cheng


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48716/#review140991
---


On June 15, 2016, 11:34 a.m., cheng xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48716/
> ---
> 
> (Updated June 15, 2016, 11:34 a.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-13873
> https://issues.apache.org/jira/browse/HIVE-13873
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Add group projection support for Parquet and this is the initial patch 
> sharing my thoughts.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java dff1815 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 23abec3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 6afe957 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 24bf506 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java cfedf35 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java 
> db923fa 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveStructConverter.java
>  a89aa4d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
>  3e38cc7 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
>  74a1a82 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcCtx.java 
> 611a6b7 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java 
> a2a7f00 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d 
>   ql/src/test/queries/clientpositive/parquet_struct.q PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_struct.q.out PRE-CREATION 
>   serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java 
> 0c7ac30 
> 
> Diff: https://reviews.apache.org/r/48716/diff/
> 
> 
> Testing
> ---
> 
> Newly added qtest passed.
> 
> 
> Thanks,
> 
> cheng xu
> 
>



[jira] [Created] (HIVE-14179) Too many delta files causes select queries on the table to fail with OOM

2016-07-06 Thread Deepesh Khandelwal (JIRA)
Deepesh Khandelwal created HIVE-14179:
-

 Summary: Too many delta files causes select queries on the table 
to fail with OOM
 Key: HIVE-14179
 URL: https://issues.apache.org/jira/browse/HIVE-14179
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Reporter: Deepesh Khandelwal


When a large number of delta files get generated during ACID operations, a 
select query on the ACID table fails with OOM.
{noformat}
ERROR [main]: SessionState (SessionState.java:printError(942)) - Vertex failed, 
vertexName=Map 1, vertexId=vertex_1465431842106_0014_1_00, diagnostics=[Task 
failed, taskId=task_1465431842106_0014_1_00_00, diagnostics=[TaskAttempt 0 
failed, info=[Error: Failure while running task:java.lang.RuntimeException: 
java.lang.OutOfMemoryError: Direct buffer memory
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:159)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:693)
at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
at 
org.apache.hadoop.util.DirectBufferPool.getBuffer(DirectBufferPool.java:72)
at 
org.apache.hadoop.hdfs.BlockReaderLocal.createDataBufIfNeeded(BlockReaderLocal.java:260)
at 
org.apache.hadoop.hdfs.BlockReaderLocal.readWithBounceBuffer(BlockReaderLocal.java:601)
at 
org.apache.hadoop.hdfs.BlockReaderLocal.read(BlockReaderLocal.java:569)
at 
org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:789)
at 
org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:845)
at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:905)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:953)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:377)
at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:323)
at 
org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:238)
at 
org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:462)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1372)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1264)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:251)
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:135)
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:101)
at 
org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:149)
at 
org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
at 
org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:650)
at 
org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
at 
org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
at 
org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
at 
org.apache.hadoop.hive.ql.exec.tez.Ma

[jira] [Created] (HIVE-14178) Hive::needsToCopy should reuse FileUtils::equalsFileSystem

2016-07-06 Thread Gopal V (JIRA)
Gopal V created HIVE-14178:
--

 Summary: Hive::needsToCopy should reuse FileUtils::equalsFileSystem
 Key: HIVE-14178
 URL: https://issues.apache.org/jira/browse/HIVE-14178
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.1.0, 1.2.1, 2.2.0
Reporter: Gopal V


Clear bug triggered from missing FS checks in Hive.java

{code}
//Check if different FileSystems
if (!srcFs.getClass().equals(destFs.getClass()))
{ 
return true;
 }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14177) AddPartitionEvent contains the table location, but not the partition location

2016-07-06 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HIVE-14177:
---

 Summary: AddPartitionEvent contains the table location, but not 
the partition location
 Key: HIVE-14177
 URL: https://issues.apache.org/jira/browse/HIVE-14177
 Project: Hive
  Issue Type: Bug
Reporter: Colin Patrick McCabe


AddPartitionEvent contains the table location, but not the partition location



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14176) CBO nesting windowing function within each other when merging Project operators

2016-07-06 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-14176:
--

 Summary: CBO nesting windowing function within each other when 
merging Project operators
 Key: HIVE-14176
 URL: https://issues.apache.org/jira/browse/HIVE-14176
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.1.0, 2.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


The translation into a physical plan does not support this way of expressing 
windowing functions. Instead, we will not merge the Project operators when we 
find this pattern.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14174) Fix creating buckets without scheme information

2016-07-06 Thread Thomas Poepping (JIRA)
Thomas Poepping created HIVE-14174:
--

 Summary: Fix creating buckets without scheme information
 Key: HIVE-14174
 URL: https://issues.apache.org/jira/browse/HIVE-14174
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 2.1.0, 1.2.1
Reporter: Thomas Poepping
Assignee: Thomas Poepping


If a table is created on a non-default filesystem (i.e. non-hdfs), the empty 
files will be created with incorrect scheme information. This patch extracts 
the scheme and authority information for the new paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14175) Fix creating buckets without scheme information

2016-07-06 Thread Thomas Poepping (JIRA)
Thomas Poepping created HIVE-14175:
--

 Summary: Fix creating buckets without scheme information
 Key: HIVE-14175
 URL: https://issues.apache.org/jira/browse/HIVE-14175
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 2.1.0, 1.2.1
Reporter: Thomas Poepping
Assignee: Thomas Poepping


If a table is created on a non-default filesystem (i.e. non-hdfs), the empty 
files will be created with incorrect scheme information. This patch extracts 
the scheme and authority information for the new paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14173) NPE was thrown after enabling directsql in the middle of session

2016-07-06 Thread Chaoyu Tang (JIRA)
Chaoyu Tang created HIVE-14173:
--

 Summary: NPE was thrown after enabling directsql in the middle of 
session
 Key: HIVE-14173
 URL: https://issues.apache.org/jira/browse/HIVE-14173
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


hive.metastore.try.direct.sql is initially set to false in HMS hive-site.xml, 
then changed to true using set metaconf command in the middle of a session, 
running a query will be thrown NPE with error message is as following:
{code}
2016-07-06T17:44:41,489 ERROR [pool-5-thread-2]: metastore.RetryingHMSHandler 
(RetryingHMSHandler.java:invokeInternal(192)) - 
MetaException(message:java.lang.NullPointerException)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:5741)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.rethrowException(HiveMetaStore.java:4771)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:4754)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
at com.sun.proxy.$Proxy18.get_partitions_by_expr(Unknown Source)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:12048)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:12032)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.(ObjectStore.java:2667)
at 
org.apache.hadoop.hive.metastore.ObjectStore$GetListHelper.(ObjectStore.java:2825)
at 
org.apache.hadoop.hive.metastore.ObjectStore$4.(ObjectStore.java:2410)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:2410)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:2400)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)
at com.sun.proxy.$Proxy17.getPartitionsByExpr(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:4749)
... 20 more
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 49728: HIVE-14172 LLAP: force evict blocks by size to handle memory fragmentation

2016-07-06 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49728/#review141080
---




llap-server/src/java/org/apache/hadoop/hive/llap/cache/LlapCacheableBuffer.java 
(line 52)


bogus



llap-server/src/java/org/apache/hadoop/hive/llap/cache/LowLevelCacheMemoryManager.java
 (line 119)


also bogus :)


- Sergey Shelukhin


On July 6, 2016, 9:31 p.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49728/
> ---
> 
> (Updated July 6, 2016, 9:31 p.m.)
> 
> 
> Review request for hive and Gopal V.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see jira
> 
> 
> Diffs
> -
> 
>   llap-server/src/java/org/apache/hadoop/hive/llap/cache/BuddyAllocator.java 
> 47325ad 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/cache/LlapCacheableBuffer.java
>  5c0b6f3 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/cache/LowLevelCacheMemoryManager.java
>  4def4a1 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/cache/LowLevelCachePolicy.java
>  acbaf85 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/cache/LowLevelFifoCachePolicy.java
>  0838682 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/cache/LowLevelLrfuCachePolicy.java
>  5a0b27f 
>   llap-server/src/java/org/apache/hadoop/hive/llap/cache/MemoryManager.java 
> 6cc262e 
>   
> llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestBuddyAllocator.java
>  345f5b1 
>   
> llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestLowLevelCacheImpl.java
>  0846db9 
>   
> llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestLowLevelLrfuCachePolicy.java
>  616c040 
>   
> llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestOrcMetadataCache.java
>  40edb28 
> 
> Diff: https://reviews.apache.org/r/49728/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>



Review Request 49728: HIVE-14172 LLAP: force evict blocks by size to handle memory fragmentation

2016-07-06 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49728/
---

Review request for hive and Gopal V.


Repository: hive-git


Description
---

see jira


Diffs
-

  llap-server/src/java/org/apache/hadoop/hive/llap/cache/BuddyAllocator.java 
47325ad 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cache/LlapCacheableBuffer.java 
5c0b6f3 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cache/LowLevelCacheMemoryManager.java
 4def4a1 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cache/LowLevelCachePolicy.java 
acbaf85 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cache/LowLevelFifoCachePolicy.java
 0838682 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cache/LowLevelLrfuCachePolicy.java
 5a0b27f 
  llap-server/src/java/org/apache/hadoop/hive/llap/cache/MemoryManager.java 
6cc262e 
  
llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestBuddyAllocator.java 
345f5b1 
  
llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestLowLevelCacheImpl.java
 0846db9 
  
llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestLowLevelLrfuCachePolicy.java
 616c040 
  
llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestOrcMetadataCache.java
 40edb28 

Diff: https://reviews.apache.org/r/49728/diff/


Testing
---


Thanks,

Sergey Shelukhin



[jira] [Created] (HIVE-14172) LLAP: force evict blocks by size to handle memory fragmentation

2016-07-06 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-14172:
---

 Summary: LLAP: force evict blocks by size to handle memory 
fragmentation
 Key: HIVE-14172
 URL: https://issues.apache.org/jira/browse/HIVE-14172
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin


In the long run, we should replace buddy allocator with a better scheme. For 
now do a workaround for fragmentation that cannot be easily resolved. It's 
still not perfect but works for practical  ORC cases, where we have the default 
size and smaller blocks, rather than large allocations having trouble.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 49644: Support masking and filtering of rows/columns: deal with derived column names

2016-07-06 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49644/
---

(Updated July 6, 2016, 9:27 p.m.)


Review request for hive and Gunther Hagleitner.


Repository: hive-git


Description
---

HIVE-14158


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 20d9649 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TableMask.java 1686f36 
  ql/src/test/queries/clientpositive/masking_6.q PRE-CREATION 
  ql/src/test/queries/clientpositive/view_alias.q PRE-CREATION 
  ql/src/test/results/clientpositive/create_view.q.out d9c1e11 
  ql/src/test/results/clientpositive/create_view_translate.q.out 2789f8f 
  ql/src/test/results/clientpositive/masking_6.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/subquery_views.q.out 046f0fe 
  ql/src/test/results/clientpositive/view_alias.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/49644/diff/


Testing
---


Thanks,

pengcheng xiong



[jira] [Created] (HIVE-14171) Parquet: Simple vectorization throws NPEs

2016-07-06 Thread Gopal V (JIRA)
Gopal V created HIVE-14171:
--

 Summary: Parquet: Simple vectorization throws NPEs
 Key: HIVE-14171
 URL: https://issues.apache.org/jira/browse/HIVE-14171
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Gopal V


{code}
 create temporary table cd_parquet stored as parquet as select * from 
customer_demographics;

select count(1) from cd_parquet where cd_gender = 'F';
{code}

{code}
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next(ParquetRecordReaderWrapper.java:206)
at 
org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:118)
at 
org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:51)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
... 17 more
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


PreCommit test is broken

2016-07-06 Thread Wei Zheng
Can someone take a look please?


+ java -cp 'target/hive-ptest-1.0-classes.jar:target/lib/*' 
org.apache.hive.ptest.api.client.PTestClient --endpoint 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com:8181/hive-ptest-1.0 
--logsEndpoint http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/ 
--command testStart --profile trunk-mr2 --password '' --outputDir target/ 
--testHandle PreCommit-HIVE-MASTER-Build-387 --patch 
https://issues.apache.org/jira/secure/attachment/12816483/HIVE-13934.4.patch 
--jira HIVE-13934
log4j:WARN No appenders could be found for logger 
(org.apache.http.impl.conn.BasicClientConnectionManager).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
info.
Exception in thread "main" org.apache.http.conn.HttpHostConnectException: 
Connection to 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com:8181
 refused
at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.hive.ptest.api.client.PTestClient.post(PTestClient.java:213)
at 
org.apache.hive.ptest.api.client.PTestClient.testStart(PTestClient.java:124)
at 
org.apache.hive.ptest.api.client.PTestClient.main(PTestClient.java:312)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at 
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
... 9 more

Thanks,
Wei


[jira] [Created] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used

2016-07-06 Thread Sahil Takiar (JIRA)
Sahil Takiar created HIVE-14170:
---

 Summary: Beeline IncrementalRows should buffer rows and 
incrementally re-calculate width if TableOutputFormat is used
 Key: HIVE-14170
 URL: https://issues.apache.org/jira/browse/HIVE-14170
 Project: Hive
  Issue Type: Sub-task
  Components: Beeline
Reporter: Sahil Takiar
Assignee: Sahil Takiar


If {{--incremental}} is specified in Beeline, rows are meant to be printed out 
immediately. However, if {{TableOutputFormat}} is used with this option the 
formatting can look really off.

The reason is that {{IncrementalRows}} does not do a global calculation of the 
optimal width size for {{TableOutputFormat}} (it can't because it only sees one 
row at a time). The output of {{BufferedRows}} looks much better because it can 
do this global calculation.

If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width 
should be re-calculated every "x" rows ("x" can be configurable and by default 
it can be 1000).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14169) Beeline Row printing should only calculate the width if TableOutputFormat is used

2016-07-06 Thread Sahil Takiar (JIRA)
Sahil Takiar created HIVE-14169:
---

 Summary: Beeline Row printing should only calculate the width if 
TableOutputFormat is used
 Key: HIVE-14169
 URL: https://issues.apache.org/jira/browse/HIVE-14169
 Project: Hive
  Issue Type: Sub-task
  Components: Beeline
Reporter: Sahil Takiar
Assignee: Sahil Takiar


* When Beeline prints out a {{ResultSet}} to stdout it uses the 
{{BeeLine.print}} method
* This method takes the {{ResultSet}} from the completed query and uses a 
specified {{OutputFormat}} to print the rows (by default it uses 
{{TableOutputFormat}})
* The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
(either a {{IncrementalRows}} or a {{BufferedRows}} class)
* The {{Rows}} class will calculate the optimal width that each row in the 
{{ResultSet}} should be displayed with
* However, this width is only relevant / used by {{TableOutputFormat}}

We should modify the logic so that the width is only calculated if 
{{TableOutputFormat}} is used. This will save CPU cycles when printing records 
out to the user.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 48716: HIVE-13873 Column pruning for nested fields

2016-07-06 Thread Aihua Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48716/#review140991
---




ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
 (line 282)


Is this only used for your debug purpose?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcCtx.java (line 
305)


This is to handle a.col case? How about multiple level of nested structure?



ql/src/test/queries/clientpositive/parquet_struct.q (line 4)


Better to add "explain select..." in the test case.

Also add a multiple level of nested structure case.



serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java (line 
122)


Just try to understand the logic (not too familiar with Parquet). So the 
underneath parquet already supports "hive.io.file.readgroup.paths" or this is 
totally within hive? How are the struct data stored in parquet and pruned with 
the group path in general?


- Aihua Xu


On June 15, 2016, 3:34 a.m., cheng xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48716/
> ---
> 
> (Updated June 15, 2016, 3:34 a.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-13873
> https://issues.apache.org/jira/browse/HIVE-13873
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Add group projection support for Parquet and this is the initial patch 
> sharing my thoughts.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java dff1815 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 23abec3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 6afe957 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 24bf506 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java cfedf35 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java 
> db923fa 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveStructConverter.java
>  a89aa4d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
>  3e38cc7 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
>  74a1a82 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcCtx.java 
> 611a6b7 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java 
> a2a7f00 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d 
>   ql/src/test/queries/clientpositive/parquet_struct.q PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_struct.q.out PRE-CREATION 
>   serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java 
> 0c7ac30 
> 
> Diff: https://reviews.apache.org/r/48716/diff/
> 
> 
> Testing
> ---
> 
> Newly added qtest passed.
> 
> 
> Thanks,
> 
> cheng xu
> 
>