Review Request 38206: HIVE-11662 Dynamic partitioning cannot be applied to external table which contains part-spec like directory name

2015-09-08 Thread Navis Ryu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38206/
---

Review request for hive.


Bugs: HIVE-11662
https://issues.apache.org/jira/browse/HIVE-11662


Repository: hive-git


Description
---

Some users want to use part-spec like directory name in their partitioned table 
locations, something like,
{noformat}
/something/warehouse/some_key=some_value
{noformat}

DP calculates additional partitions from full path, and makes exception 
something like,
{noformat}
Failed with exception Partition spec {some_key=some_value, part_key=part_value} 
contains non-partition columns
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.MoveTask
{noformat}


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java ca86301 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 396c070 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 
3f07ea7 
  ql/src/test/queries/clientpositive/dynamic_partition_insert_external.q 
PRE-CREATION 
  ql/src/test/results/clientpositive/dynamic_partition_insert_external.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/38206/diff/


Testing
---


Thanks,

Navis Ryu



Re: [DISCUSS] github integration

2015-09-08 Thread Owen O'Malley
Thanks for the link to the previous discussion. Much of the previous
discussion was about the discussion about git versus subversion. Obviously,
we decided to go forward with that. We already have pull requests on
github. See the list at https://github.com/apache/hive/pulls

Without the Apache integration, we don't have any way to close those pull
requests and none of the discussion flows back into the linked jiras or
email lists.

.. Owen

On Tue, Sep 8, 2015 at 7:38 PM, kulkarni.swar...@gmail.com <
kulkarni.swar...@gmail.com> wrote:

> I personally am a big fan of pull requests which is primarily the reason
> for a similar proposal that I made almost a year and half ago[1] :). I
> think the consensus we reached at the time was that to move the primary
> source code from svn to git(which we did) but still use patches submitted
> to JIRAs to maintain a permalink to the changes and also because it's
> little harder to treat a pull requests as a patch.
>
> [1]
> http://qnalist.com/questions/4754349/proposal-to-switch-to-pull-requests
>
> On Tue, Sep 8, 2015 at 5:53 PM, Owen O'Malley  wrote:
>
> > All,
> >I think we should use the github integrations that Apache infra has
> > introduced. You can read about it here:
> >
> >
> >
> https://blogs.apache.org/infra/entry/improved_integration_between_apache_and
> >
> > The big win from my point of view is that you can use github pull
> requests
> > for doing reviews. All of the traffic from the pull request is sent to
> > Apache email lists and vice versa.
> >
> > Thoughts?
> >
> >Owen
> >
>
>
>
> --
> Swarnim
>


Review Request 38204: HIVE-11590 Making avro deserializer less chatty

2015-09-08 Thread Swarnim Kulkarni

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38204/
---

Review request for hive.


Repository: hive-git


Description
---

HIVE-11590 Making avro deserializer less chatty


Diffs
-

  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java 
cade8c33c4973c23be48786ef62206e65fbda23b 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java 
efff663afd0a8ddb94d27f246122dca762eb65d4 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 
4edf6544de5824ee681579a334e7dfd5f059afec 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerializer.java 
866c33d82f229f4e14f4d9ea2d4b930b7c1de37e 

Diff: https://reviews.apache.org/r/38204/diff/


Testing
---


Thanks,

Swarnim Kulkarni



Review Request 38203: HIVE-10708 Add avro schema compatibility check

2015-09-08 Thread Swarnim Kulkarni

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38203/
---

Review request for hive.


Repository: hive-git


Description
---

HIVE-10708 Add avro schema compatibility check


Diffs
-

  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java 
cade8c33c4973c23be48786ef62206e65fbda23b 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java 
efff663afd0a8ddb94d27f246122dca762eb65d4 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 
4edf6544de5824ee681579a334e7dfd5f059afec 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java 
986b803beae35faabfc13f2bdc89ac8661fdb11c 
  
serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroObjectInspectorGenerator.java
 3736a1f8fcc089469efb79a2c4b22db032b7dc58 

Diff: https://reviews.apache.org/r/38203/diff/


Testing
---


Thanks,

Swarnim Kulkarni



[jira] [Created] (HIVE-11768) java.io.DeleteOnExitHook leaks memory on long running Hive Server2 Instances

2015-09-08 Thread Nemon Lou (JIRA)
Nemon Lou created HIVE-11768:


 Summary: java.io.DeleteOnExitHook leaks memory on long running 
Hive Server2 Instances
 Key: HIVE-11768
 URL: https://issues.apache.org/jira/browse/HIVE-11768
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 1.2.1
Reporter: Nemon Lou


  More than 490,000 paths was added to java.io.DeleteOnExitHook on one of our 
long running HiveServer2 instances,taken up more than 100MB on heap.
  Most of the paths contains a suffix of ".piepout".




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Lowercase conversion at many places.

2015-09-08 Thread Chetna C
Hi All,
   I was going through hive codebase and noticed that, there are many
classes in which we are converting strings specially table and column
aliases or expressions in lowercase. Any particular reason behind this.
  Have raised a bug HIVE-11735 due to these conversions.I agree in
hive, table and column names are case insensitive. Do we need to follow the
same convention for intermediate tables too.
 Would appreciate any help around this.

Thanks,
Chetna Chaudhari


Review Request 34877: HIVE-11201: HCatalog is ignoring user specified avro schema in the table definition

2015-09-08 Thread Bing Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34877/
---

Review request for hive.


Summary (updated)
-

HIVE-11201: HCatalog is ignoring user specified avro schema in the table 
definition


Bugs: HIVE-11201
https://issues.apache.org/jira/browse/HIVE-11201


Repository: hive-git


Description (updated)
---

This patch is created based on the latest code in master branch.
With this patch, Hive would get the correct avro schema.


Diffs
-


Diff: https://reviews.apache.org/r/34877/diff/


Testing (updated)
---

I have tested it in a real cluster. And also ran Hive UTs in local successfully.


Thanks,

Bing Li



Re: hiveserver2 hangs

2015-09-08 Thread kulkarni.swar...@gmail.com
Sanjeev,

I am going off this exception in the stacktrace that you posted.

"at java.lang.OutOfMemoryError.(OutOfMemoryError.java:48)"

which def. indicates that it's not very happy memory wise. I would def.
recommend to bump up the memory and see if it helps. If not, we can debug
further from there.

On Tue, Sep 8, 2015 at 12:17 PM, Sanjeev Verma 
wrote:

> What this exception implies here? how to identify the problem here.
> Thanks
>
> On Tue, Sep 8, 2015 at 10:44 PM, Sanjeev Verma 
> wrote:
>
>> We have 8GB HS2 java heap, we have not tried any bumping.
>>
>> On Tue, Sep 8, 2015 at 8:14 PM, kulkarni.swar...@gmail.com <
>> kulkarni.swar...@gmail.com> wrote:
>>
>>> How much memory have you currently provided to HS2? Have you tried
>>> bumping that up?
>>>
>>> On Mon, Sep 7, 2015 at 1:09 AM, Sanjeev Verma >> > wrote:
>>>
 *I am getting the following exception when the HS2 is crashing, any
 idea why it has happening*

 "pool-1-thread-121" prio=4 tid=19283 RUNNABLE
 at java.lang.OutOfMemoryError.(OutOfMemoryError.java:48)
 at java.util.Arrays.copyOf(Arrays.java:2271)
 Local Variable: byte[]#1
 at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
 at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutput
 Stream.java:93)
 at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
 Local Variable: org.apache.thrift.TByteArrayOutputStream#42
 Local Variable: byte[]#5378
 at org.apache.thrift.transport.TSaslTransport.write(TSaslTransp
 ort.java:446)
 at org.apache.thrift.transport.TSaslServerTransport.write(TSasl
 ServerTransport.java:41)
 at org.apache.thrift.protocol.TBinaryProtocol.writeI32(TBinaryP
 rotocol.java:163)
 at org.apache.thrift.protocol.TBinaryProtocol.writeString(TBina
 ryProtocol.java:186)
 Local Variable: byte[]#2
 at org.apache.hive.service.cli.thrift.TStringColumn$TStringColu
 mnStandardScheme.write(TStringColumn.java:490)
 Local Variable: java.util.ArrayList$Itr#1
 at org.apache.hive.service.cli.thrift.TStringColumn$TStringColu
 mnStandardScheme.write(TStringColumn.java:433)
 Local Variable: org.apache.hive.service.cli.th
 rift.TStringColumn$TStringColumnStandardScheme#1
 at org.apache.hive.service.cli.thrift.TStringColumn.write(TStri
 ngColumn.java:371)
 at org.apache.hive.service.cli.thrift.TColumn.standardSchemeWri
 teValue(TColumn.java:381)
 Local Variable: org.apache.hive.service.cli.thrift.TColumn#504
 Local Variable: org.apache.hive.service.cli.thrift.TStringColumn#453
 at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:244)
 at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:213)
 at org.apache.thrift.TUnion.write(TUnion.java:152)



 On Fri, Aug 21, 2015 at 6:16 AM, kulkarni.swar...@gmail.com <
 kulkarni.swar...@gmail.com> wrote:

> Sanjeev,
>
> One possibility is that you are running into[1] which affects hive
> 0.13. Is it possible for you to apply the patch on [1] and see if it fixes
> your problem?
>
> [1] https://issues.apache.org/jira/browse/HIVE-10410
>
> On Thu, Aug 20, 2015 at 6:12 PM, Sanjeev Verma <
> sanjeev.verm...@gmail.com> wrote:
>
>> We are using hive-0.13 with hadoop1.
>>
>> On Thu, Aug 20, 2015 at 11:49 AM, kulkarni.swar...@gmail.com <
>> kulkarni.swar...@gmail.com> wrote:
>>
>>> Sanjeev,
>>>
>>> Can you tell me more details about your hive version/hadoop version
>>> etc.
>>>
>>> On Wed, Aug 19, 2015 at 1:35 PM, Sanjeev Verma <
>>> sanjeev.verm...@gmail.com> wrote:
>>>
 Can somebody gives me some pointer to looked upon?

 On Wed, Aug 19, 2015 at 9:26 AM, Sanjeev Verma <
 sanjeev.verm...@gmail.com> wrote:

> Hi
> We are experiencing a strange problem with the hiveserver2, in one
> of the job it gets the GC limit exceed from mapred task and hangs even
> having enough heap available.we are not able to identify what causing 
> this
> issue.
> Could anybody help me identify the issue and let me know what
> pointers I need to looked up.
>
> Thanks
>


>>>
>>>
>>> --
>>> Swarnim
>>>
>>
>>
>
>
> --
> Swarnim
>


>>>
>>>
>>> --
>>> Swarnim
>>>
>>
>>
>


-- 
Swarnim


Re: [DISCUSS] github integration

2015-09-08 Thread kulkarni.swar...@gmail.com
I personally am a big fan of pull requests which is primarily the reason
for a similar proposal that I made almost a year and half ago[1] :). I
think the consensus we reached at the time was that to move the primary
source code from svn to git(which we did) but still use patches submitted
to JIRAs to maintain a permalink to the changes and also because it's
little harder to treat a pull requests as a patch.

[1] http://qnalist.com/questions/4754349/proposal-to-switch-to-pull-requests

On Tue, Sep 8, 2015 at 5:53 PM, Owen O'Malley  wrote:

> All,
>I think we should use the github integrations that Apache infra has
> introduced. You can read about it here:
>
>
> https://blogs.apache.org/infra/entry/improved_integration_between_apache_and
>
> The big win from my point of view is that you can use github pull requests
> for doing reviews. All of the traffic from the pull request is sent to
> Apache email lists and vice versa.
>
> Thoughts?
>
>Owen
>



-- 
Swarnim


Review Request 38199: HIVE-4577: hive CLI can't handle hadoop dfs command with space and quotes.

2015-09-08 Thread Bing Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38199/
---

Review request for hive, EdwardWJ EdwardWJ and Thejas Nair.


Bugs: HIVE-4577
https://issues.apache.org/jira/browse/HIVE-4577


Repository: hive-git


Description
---

The patch is created based on the latest code in master branch.
Fixed the issues when handle hadoop commands with spaces and quotes in Hive.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/processors/DfsProcessor.java cc0414d 
  ql/src/test/queries/clientpositive/dfscmd.q PRE-CREATION 
  ql/src/test/results/clientpositive/dfscmd.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/38199/diff/


Testing
---

I have run the Hive UTs, this patch won't bring new failure.
I also verify it in a real cluster.


Thanks,

Bing Li



[jira] [Created] (HIVE-11767) LLAP: merge master into branch

2015-09-08 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-11767:
---

 Summary: LLAP: merge master into branch
 Key: HIVE-11767
 URL: https://issues.apache.org/jira/browse/HIVE-11767
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: llap






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11766) LLAP: Remove MiniLlapCluster from shim layer after hadoop-1 removal

2015-09-08 Thread Prasanth Jayachandran (JIRA)
Prasanth Jayachandran created HIVE-11766:


 Summary: LLAP: Remove MiniLlapCluster from shim layer after 
hadoop-1 removal
 Key: HIVE-11766
 URL: https://issues.apache.org/jira/browse/HIVE-11766
 Project: Hive
  Issue Type: Sub-task
Reporter: Prasanth Jayachandran


Remove HIVE-11732 changes after HIVE-11378 goes in.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 37985: HIVE-11705 refactor SARG stripe filtering for ORC into a method

2015-09-08 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37985/
---

(Updated Sept. 8, 2015, 11:09 p.m.)


Review request for hive and Prasanth_J.


Repository: hive-git


Description
---

see jira


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java cf8694e 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java 8beff4b 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java fcb3746 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java 
4480600 
  ql/src/java/org/apache/hadoop/hive/ql/io/sarg/ConvertAstToSearchArg.java 
e034650 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java 
2dc15f9 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java f6052e3 
  serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java 
10086c5 
  
storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentFactory.java
 0778935 
  
storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java 
d27ac16 

Diff: https://reviews.apache.org/r/37985/diff/


Testing
---


Thanks,

Sergey Shelukhin



[jira] [Created] (HIVE-11765) SMB Join fails in Hive 1.2

2015-09-08 Thread Na Yang (JIRA)
Na Yang created HIVE-11765:
--

 Summary: SMB Join fails in Hive 1.2
 Key: HIVE-11765
 URL: https://issues.apache.org/jira/browse/HIVE-11765
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.1, 1.2.0
Reporter: Na Yang
Assignee: Prasanth Jayachandran


SMB join on Hive 1.2 fails with the following stack trace :
{code}
java.io.IOException: java.lang.reflect.InvocationTargetException
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:266)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.(HadoopShimsSecure.java:213)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:333)
at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:719)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:173)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:348)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:252)
... 11 more
Caused by: java.lang.IndexOutOfBoundsException: toIndex = 5
at java.util.ArrayList.subListRangeCheck(ArrayList.java:1004)
at java.util.ArrayList.subList(ArrayList.java:996)
at
org.apache.hadoop.hive.ql.io.orc.RecordReaderFactory.getSchemaOnRead(RecordReaderFactory.java:161)
at
org.apache.hadoop.hive.ql.io.orc.RecordReaderFactory.createTreeReader(RecordReaderFactory.java:66)
at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:202)
at
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:539)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.createReaderFromFile(OrcInputFormat.java:230)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.(OrcInputFormat.java:163)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1104)
at
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:67)

{code}

This error happens after adding the patch of HIVE-10591. Reverting HIVE-10591 
fixes this exception. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11764) Verify the correctness of groupby_cube1.q with MR, Tez and Spark Mode with HIVE-1110

2015-09-08 Thread Hari Sankar Sivarama Subramaniyan (JIRA)
Hari Sankar Sivarama Subramaniyan created HIVE-11764:


 Summary: Verify the correctness of groupby_cube1.q with MR, Tez 
and Spark Mode with HIVE-1110
 Key: HIVE-11764
 URL: https://issues.apache.org/jira/browse/HIVE-11764
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan


While Working on HIVE-0, I ran into the following wrong results:
https://github.com/apache/hive/blob/master/ql/src/test/results/clientpositive/spark/groupby_cube1.q.out#L478
This happens in spark mode. The following is the diff.

@@ -475,7 +525,6 @@ POSTHOOK: Input: default@t1
 3  1
 7  1
 8  2
-NULL   6

The purpose of this jira is to see why the above query is failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 37985: HIVE-11705 refactor SARG stripe filtering for ORC into a method

2015-09-08 Thread Sergey Shelukhin


> On Sept. 5, 2015, 1:59 a.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java, line 272
> > 
> >
> > Who calls this method? If this is required later can you put this in 
> > that patch. Focus only on refactoring in this patch. Easy to review too :)

it's called from ORC splits and metastore logic in subsequent patches. If 
patches were all merged the final patch would be too epic


> On Sept. 5, 2015, 1:59 a.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java, line 363
> > 
> >
> > I think this is messed up from the beginning. The second argument is 
> > list of all projected column names and not just sarg column names. So 
> > rename of this method will be helpful.

this is an existing method. One of the reasons for this patch is that it's 
messed up ;)


> On Sept. 5, 2015, 1:59 a.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java, line 1439
> > 
> >
> > where is this used? I don't see it being used anywhere. Can this be 
> > moved to relevant jira?

future metastore usage


> On Sept. 5, 2015, 1:59 a.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java, line 
> > 1228
> > 
> >
> > indexInSourceTable will never be null. Right?

it will be, for partition columns and stuff. At least some code that I was 
reading handles this case


> On Sept. 5, 2015, 1:59 a.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java, line 
> > 1231
> > 
> >
> > Whats this translation doing exactly?
> > Is this trying to map between sarg column -> table column index -> orc 
> > internal column index?

It's making SARG halfway self contained by replacing columns by their indexes 
in the table. Then ORC can translate from those to ORC internal indexes


> On Sept. 5, 2015, 1:59 a.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java,
> >  line 517
> > 
> >
> > Create a bug?

removed comment instead :)


> On Sept. 5, 2015, 1:59 a.m., Prasanth_J wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java, 
> > line 146
> > 
> >
> > I also don't understand why would it contain duplicates. IIRC, this was 
> > probably caused by multiple concatenation to the READ_COLUMN_IDS_CONF_STR. 
> > I am not sure if this happens anymore, in any case we should create a bug 
> > and remove this code. Or may be remove in the next patch. Also use, guava 
> > Splitter.splitToList?

existing code :) I added logging as per some other comment


> On Sept. 5, 2015, 1:59 a.m., Prasanth_J wrote:
> > storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentFactory.java,
> >  line 28
> > 
> >
> > I would prefer setting new name based on internal column names. 
> > For example:
> > Schema from filedump: struct<_col0:tinyint,_col1:smallint>
> > 
> > Current SARG: leaf-0 = (EQUALS i1 100)
> > 
> > After this patch: leaf-0 = (EQUALS _col1 100)
> > 
> > If we can map to internal names, the it will be easy to map sarg column 
> > names to internal column index. i1 -> 1 (after ripping off _col)

how can we map to internal column name? We are working against a file we have 
not read (in split generation), these names could be anything


- Sergey


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37985/#review97837
---


On Sept. 2, 2015, 1:06 a.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37985/
> ---
> 
> (Updated Sept. 2, 2015, 1:06 a.m.)
> 
> 
> Review request for hive and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see jira
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 05efc5f 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java 8beff4b 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java fcb3746 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPush

[DISCUSS] github integration

2015-09-08 Thread Owen O'Malley
All,
   I think we should use the github integrations that Apache infra has
introduced. You can read about it here:

https://blogs.apache.org/infra/entry/improved_integration_between_apache_and

The big win from my point of view is that you can use github pull requests
for doing reviews. All of the traffic from the pull request is sent to
Apache email lists and vice versa.

Thoughts?

   Owen


Re: Review Request 37985: HIVE-11705 refactor SARG stripe filtering for ORC into a method

2015-09-08 Thread Sergey Shelukhin


> On Sept. 2, 2015, 2:35 a.m., Swarnim Kulkarni wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java, line 1460
> > 
> >
> > Nit: Could use parameter substitution instead of concatenating it 
> > ourselves.

existing log line


> On Sept. 2, 2015, 2:35 a.m., Swarnim Kulkarni wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java, line 1473
> > 
> >
> > Should we check if filterColumns is not empty to avoid a 
> > ArrayindexOutOfBounds

moved existing code


- Sergey


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37985/#review97385
---


On Sept. 2, 2015, 1:06 a.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37985/
> ---
> 
> (Updated Sept. 2, 2015, 1:06 a.m.)
> 
> 
> Review request for hive and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see jira
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 05efc5f 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java 8beff4b 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java fcb3746 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java 
> 4480600 
>   ql/src/java/org/apache/hadoop/hive/ql/io/sarg/ConvertAstToSearchArg.java 
> e034650 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java 
> 2dc15f9 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java b809a23 
>   serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java 
> 10086c5 
>   
> storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentFactory.java
>  0778935 
>   
> storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java
>  d27ac16 
> 
> Diff: https://reviews.apache.org/r/37985/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>



Re: [ANNOUNCE] New Hive Committer - Lars Francke

2015-09-08 Thread Sergey Shelukhin
Congrats!

From: Daniel Lopes mailto:dan...@bankfacil.com.br>>
Reply-To: "u...@hive.apache.org" 
mailto:u...@hive.apache.org>>
Date: Tuesday, September 8, 2015 at 15:02
To: "u...@hive.apache.org" 
mailto:u...@hive.apache.org>>
Cc: "kulkarni.swar...@gmail.com" 
mailto:kulkarni.swar...@gmail.com>>, 
"dev@hive.apache.org" 
mailto:dev@hive.apache.org>>
Subject: Re: [ANNOUNCE] New Hive Committer - Lars Francke

Congrats!

Daniel Lopes, B.Eng
Data Scientist - BankFacil
CREA/SP 
5069410560
Mob +55 (18) 99764-2733
Ph +55 (11) 3522-8009
http://about.me/dannyeuu

Av. Nova Independência, 956, São Paulo, SP
Bairro Brooklin Paulista
CEP 04570-001
https://www.bankfacil.com.br


On Tue, Sep 8, 2015 at 6:34 PM, Lars Francke 
mailto:lars.fran...@gmail.com>> wrote:
Thank you so much everyone!

Looking forward to continue working with all of you.

On Tue, Sep 8, 2015 at 3:26 AM, 
kulkarni.swar...@gmail.com 
mailto:kulkarni.swar...@gmail.com>> wrote:
Congrats!

On Mon, Sep 7, 2015 at 3:54 AM, Carl Steinbach 
mailto:c...@apache.org>> wrote:
The Apache Hive PMC has voted to make Lars Francke a committer on the Apache 
Hive Project.

Please join me in congratulating Lars!

Thanks.

- Carl




--
Swarnim




Re: [ANNOUNCE] New Hive Committer - Lars Francke

2015-09-08 Thread Lars Francke
Thank you so much everyone!

Looking forward to continue working with all of you.

On Tue, Sep 8, 2015 at 3:26 AM, kulkarni.swar...@gmail.com <
kulkarni.swar...@gmail.com> wrote:

> Congrats!
>
> On Mon, Sep 7, 2015 at 3:54 AM, Carl Steinbach  wrote:
>
>> The Apache Hive PMC has voted to make Lars Francke a committer on the
>> Apache Hive Project.
>>
>> Please join me in congratulating Lars!
>>
>> Thanks.
>>
>> - Carl
>>
>>
>
>
> --
> Swarnim
>


Re: Review Request 38136: CBO: Calcite Operator To Hive Operator (Calcite Return Path): ctas after order by has problem

2015-09-08 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38136/
---

(Updated Sept. 8, 2015, 8:42 p.m.)


Review request for hive and John Pullokkaran.


Repository: hive-git


Description
---

with return path on, "create table b as select * from src order by key" will 
fail. Attached two test cases, one (cbo_rp_cross_product_check_2.q) is to test 
that "create table b as select * from src order by key" will not fail. The 
other(cbo_rp_auto_join17.q) is to correct current wrong table alias on master.


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/PlanModifierForReturnPath.java
 81cc474 
  ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 73ae7c4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java f6052e3 
  ql/src/test/queries/clientpositive/cbo_rp_auto_join17.q PRE-CREATION 
  ql/src/test/queries/clientpositive/cbo_rp_cross_product_check_2.q 
PRE-CREATION 
  ql/src/test/results/clientpositive/cbo_rp_auto_join17.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/cbo_rp_cross_product_check_2.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/38136/diff/


Testing
---


Thanks,

pengcheng xiong



[jira] [Created] (HIVE-11763) Use * instead of sum(hash(*)) on Parquet predicate (PPD) integration tests

2015-09-08 Thread JIRA
Sergio Peña created HIVE-11763:
--

 Summary: Use * instead of sum(hash(*)) on Parquet predicate (PPD) 
integration tests
 Key: HIVE-11763
 URL: https://issues.apache.org/jira/browse/HIVE-11763
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña


The integration tests for Parquet predicate push down (PPD) use the following 
query to validate the values filtered:
{noformat}
select sum(hash(*)) from ...
{noformat}

It would be better if we use {{select * from ...}} instead to see that those 
values are correct. It is difficult to see if a value was filtered by seeing 
the hash.

Also, we can try to limit the number of rows of the INSERT ... SELECT statmenet 
to avoid displaying many rows when validating the data. I think a LIMIT 2 on 
each of the SELECT.

For example, the parquet_ppd_boolean.ppd has this:
{noformat}
insert overwrite table newtypestbl select * from (select cast("apple" as 
char(10)), cast("bee" as varchar(10)), 0.22, true from src src1 union all 
select cast("hello" as char(10)), cast("world" as varchar(10)), 11.22, false 
from src src2) uniontbl;
{noformat}

If we use LIMIT 2, then we will reduce the # of rows:
{noformat}
insert overwrite table newtypestbl select * from (select cast("apple" as 
char(10)), cast("bee" as varchar(10)), 0.22, true from src src1 LIMIT 2 union 
all select cast("hello" as char(10)), cast("world" as varchar(10)), 11.22, 
false from src src2 LIMIT 2) uniontbl;
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11762) TestHCatLoaderEncryption failures when using Hadoop 2.7

2015-09-08 Thread Jason Dere (JIRA)
Jason Dere created HIVE-11762:
-

 Summary: TestHCatLoaderEncryption failures when using Hadoop 2.7
 Key: HIVE-11762
 URL: https://issues.apache.org/jira/browse/HIVE-11762
 Project: Hive
  Issue Type: Bug
  Components: Shims, Tests
Reporter: Jason Dere


When running TestHCatLoaderEncryption with -Dhadoop23.version=2.7.0, we get the 
following error during setup():

{noformat}
testReadDataFromEncryptedHiveTableByPig[5](org.apache.hive.hcatalog.pig.TestHCatLoaderEncryption)
  Time elapsed: 3.648 sec  <<< ERROR!
java.lang.NoSuchMethodError: 
org.apache.hadoop.hdfs.DFSClient.setKeyProvider(Lorg/apache/hadoop/crypto/key/KeyProviderCryptoExtension;)V
at 
org.apache.hadoop.hive.shims.Hadoop23Shims.getMiniDfs(Hadoop23Shims.java:534)
at 
org.apache.hive.hcatalog.pig.TestHCatLoaderEncryption.initEncryptionShim(TestHCatLoaderEncryption.java:252)
at 
org.apache.hive.hcatalog.pig.TestHCatLoaderEncryption.setup(TestHCatLoaderEncryption.java:200)
{noformat}

It looks like between Hadoop 2.6 and Hadoop 2.7, the argument to 
DFSClient.setKeyProvider() changed:
{noformat}
   @VisibleForTesting
-  public void setKeyProvider(KeyProviderCryptoExtension provider) {
-this.provider = provider;
+  public void setKeyProvider(KeyProvider provider) {
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11761) DoubleWritable hash code for GroupBy is not properly generated

2015-09-08 Thread Aihua Xu (JIRA)
Aihua Xu created HIVE-11761:
---

 Summary: DoubleWritable hash code for GroupBy is not properly 
generated
 Key: HIVE-11761
 URL: https://issues.apache.org/jira/browse/HIVE-11761
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0, 2.0.0
Reporter: Aihua Xu
Assignee: Aihua Xu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 38009: HIVE-11696 Exception when table-level serde is Parquet while partition-level serde is JSON

2015-09-08 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38009/#review98028
---

Ship it!


Ship It!

- Chao Sun


On Sept. 4, 2015, 1:51 p.m., Aihua Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38009/
> ---
> 
> (Updated Sept. 4, 2015, 1:51 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-11696 Exception when table-level serde is Parquet while partition-level 
> serde is JSON
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveArrayInspector.java
>  bde0dcb 
>   ql/src/test/queries/clientpositive/parquet_mixed_partition_formats.q 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_mixed_partition_formats.q.out 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/38009/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Aihua Xu
> 
>



Re: hiveserver2 hangs

2015-09-08 Thread Sanjeev Verma
What this exception implies here? how to identify the problem here.
Thanks

On Tue, Sep 8, 2015 at 10:44 PM, Sanjeev Verma 
wrote:

> We have 8GB HS2 java heap, we have not tried any bumping.
>
> On Tue, Sep 8, 2015 at 8:14 PM, kulkarni.swar...@gmail.com <
> kulkarni.swar...@gmail.com> wrote:
>
>> How much memory have you currently provided to HS2? Have you tried
>> bumping that up?
>>
>> On Mon, Sep 7, 2015 at 1:09 AM, Sanjeev Verma 
>> wrote:
>>
>>> *I am getting the following exception when the HS2 is crashing, any idea
>>> why it has happening*
>>>
>>> "pool-1-thread-121" prio=4 tid=19283 RUNNABLE
>>> at java.lang.OutOfMemoryError.(OutOfMemoryError.java:48)
>>> at java.util.Arrays.copyOf(Arrays.java:2271)
>>> Local Variable: byte[]#1
>>> at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
>>> at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutput
>>> Stream.java:93)
>>> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
>>> Local Variable: org.apache.thrift.TByteArrayOutputStream#42
>>> Local Variable: byte[]#5378
>>> at org.apache.thrift.transport.TSaslTransport.write(TSaslTransp
>>> ort.java:446)
>>> at org.apache.thrift.transport.TSaslServerTransport.write(TSasl
>>> ServerTransport.java:41)
>>> at org.apache.thrift.protocol.TBinaryProtocol.writeI32(TBinaryP
>>> rotocol.java:163)
>>> at org.apache.thrift.protocol.TBinaryProtocol.writeString(TBina
>>> ryProtocol.java:186)
>>> Local Variable: byte[]#2
>>> at org.apache.hive.service.cli.thrift.TStringColumn$TStringColu
>>> mnStandardScheme.write(TStringColumn.java:490)
>>> Local Variable: java.util.ArrayList$Itr#1
>>> at org.apache.hive.service.cli.thrift.TStringColumn$TStringColu
>>> mnStandardScheme.write(TStringColumn.java:433)
>>> Local Variable: org.apache.hive.service.cli.th
>>> rift.TStringColumn$TStringColumnStandardScheme#1
>>> at org.apache.hive.service.cli.thrift.TStringColumn.write(TStri
>>> ngColumn.java:371)
>>> at org.apache.hive.service.cli.thrift.TColumn.standardSchemeWri
>>> teValue(TColumn.java:381)
>>> Local Variable: org.apache.hive.service.cli.thrift.TColumn#504
>>> Local Variable: org.apache.hive.service.cli.thrift.TStringColumn#453
>>> at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:244)
>>> at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:213)
>>> at org.apache.thrift.TUnion.write(TUnion.java:152)
>>>
>>>
>>>
>>> On Fri, Aug 21, 2015 at 6:16 AM, kulkarni.swar...@gmail.com <
>>> kulkarni.swar...@gmail.com> wrote:
>>>
 Sanjeev,

 One possibility is that you are running into[1] which affects hive
 0.13. Is it possible for you to apply the patch on [1] and see if it fixes
 your problem?

 [1] https://issues.apache.org/jira/browse/HIVE-10410

 On Thu, Aug 20, 2015 at 6:12 PM, Sanjeev Verma <
 sanjeev.verm...@gmail.com> wrote:

> We are using hive-0.13 with hadoop1.
>
> On Thu, Aug 20, 2015 at 11:49 AM, kulkarni.swar...@gmail.com <
> kulkarni.swar...@gmail.com> wrote:
>
>> Sanjeev,
>>
>> Can you tell me more details about your hive version/hadoop version
>> etc.
>>
>> On Wed, Aug 19, 2015 at 1:35 PM, Sanjeev Verma <
>> sanjeev.verm...@gmail.com> wrote:
>>
>>> Can somebody gives me some pointer to looked upon?
>>>
>>> On Wed, Aug 19, 2015 at 9:26 AM, Sanjeev Verma <
>>> sanjeev.verm...@gmail.com> wrote:
>>>
 Hi
 We are experiencing a strange problem with the hiveserver2, in one
 of the job it gets the GC limit exceed from mapred task and hangs even
 having enough heap available.we are not able to identify what causing 
 this
 issue.
 Could anybody help me identify the issue and let me know what
 pointers I need to looked up.

 Thanks

>>>
>>>
>>
>>
>> --
>> Swarnim
>>
>
>


 --
 Swarnim

>>>
>>>
>>
>>
>> --
>> Swarnim
>>
>
>


Re: hiveserver2 hangs

2015-09-08 Thread Sanjeev Verma
We have 8GB HS2 java heap, we have not tried any bumping.

On Tue, Sep 8, 2015 at 8:14 PM, kulkarni.swar...@gmail.com <
kulkarni.swar...@gmail.com> wrote:

> How much memory have you currently provided to HS2? Have you tried bumping
> that up?
>
> On Mon, Sep 7, 2015 at 1:09 AM, Sanjeev Verma 
> wrote:
>
>> *I am getting the following exception when the HS2 is crashing, any idea
>> why it has happening*
>>
>> "pool-1-thread-121" prio=4 tid=19283 RUNNABLE
>> at java.lang.OutOfMemoryError.(OutOfMemoryError.java:48)
>> at java.util.Arrays.copyOf(Arrays.java:2271)
>> Local Variable: byte[]#1
>> at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
>> at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutput
>> Stream.java:93)
>> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
>> Local Variable: org.apache.thrift.TByteArrayOutputStream#42
>> Local Variable: byte[]#5378
>> at org.apache.thrift.transport.TSaslTransport.write(TSaslTransp
>> ort.java:446)
>> at org.apache.thrift.transport.TSaslServerTransport.write(TSasl
>> ServerTransport.java:41)
>> at org.apache.thrift.protocol.TBinaryProtocol.writeI32(TBinaryP
>> rotocol.java:163)
>> at org.apache.thrift.protocol.TBinaryProtocol.writeString(TBina
>> ryProtocol.java:186)
>> Local Variable: byte[]#2
>> at org.apache.hive.service.cli.thrift.TStringColumn$TStringColu
>> mnStandardScheme.write(TStringColumn.java:490)
>> Local Variable: java.util.ArrayList$Itr#1
>> at org.apache.hive.service.cli.thrift.TStringColumn$TStringColu
>> mnStandardScheme.write(TStringColumn.java:433)
>> Local Variable: org.apache.hive.service.cli.th
>> rift.TStringColumn$TStringColumnStandardScheme#1
>> at org.apache.hive.service.cli.thrift.TStringColumn.write(TStri
>> ngColumn.java:371)
>> at org.apache.hive.service.cli.thrift.TColumn.standardSchemeWri
>> teValue(TColumn.java:381)
>> Local Variable: org.apache.hive.service.cli.thrift.TColumn#504
>> Local Variable: org.apache.hive.service.cli.thrift.TStringColumn#453
>> at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:244)
>> at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:213)
>> at org.apache.thrift.TUnion.write(TUnion.java:152)
>>
>>
>>
>> On Fri, Aug 21, 2015 at 6:16 AM, kulkarni.swar...@gmail.com <
>> kulkarni.swar...@gmail.com> wrote:
>>
>>> Sanjeev,
>>>
>>> One possibility is that you are running into[1] which affects hive 0.13.
>>> Is it possible for you to apply the patch on [1] and see if it fixes your
>>> problem?
>>>
>>> [1] https://issues.apache.org/jira/browse/HIVE-10410
>>>
>>> On Thu, Aug 20, 2015 at 6:12 PM, Sanjeev Verma <
>>> sanjeev.verm...@gmail.com> wrote:
>>>
 We are using hive-0.13 with hadoop1.

 On Thu, Aug 20, 2015 at 11:49 AM, kulkarni.swar...@gmail.com <
 kulkarni.swar...@gmail.com> wrote:

> Sanjeev,
>
> Can you tell me more details about your hive version/hadoop version
> etc.
>
> On Wed, Aug 19, 2015 at 1:35 PM, Sanjeev Verma <
> sanjeev.verm...@gmail.com> wrote:
>
>> Can somebody gives me some pointer to looked upon?
>>
>> On Wed, Aug 19, 2015 at 9:26 AM, Sanjeev Verma <
>> sanjeev.verm...@gmail.com> wrote:
>>
>>> Hi
>>> We are experiencing a strange problem with the hiveserver2, in one
>>> of the job it gets the GC limit exceed from mapred task and hangs even
>>> having enough heap available.we are not able to identify what causing 
>>> this
>>> issue.
>>> Could anybody help me identify the issue and let me know what
>>> pointers I need to looked up.
>>>
>>> Thanks
>>>
>>
>>
>
>
> --
> Swarnim
>


>>>
>>>
>>> --
>>> Swarnim
>>>
>>
>>
>
>
> --
> Swarnim
>


[jira] [Created] (HIVE-11760) Make fs.azure.account.key style of credential compatible with StorageBasedAuthorizationProvider

2015-09-08 Thread Wouter De Borger (JIRA)
Wouter De Borger created HIVE-11760:
---

 Summary: Make fs.azure.account.key style of credential compatible 
with StorageBasedAuthorizationProvider
 Key: HIVE-11760
 URL: https://issues.apache.org/jira/browse/HIVE-11760
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Wouter De Borger
Priority: Minor


When using Hive with Azure storage, credentials can be passed as a property to 
hiveserver2

{code}
set fs.azure.account.key.xxx.blob.core.windows.net=x
create external table inputs(name string) STORED AS TEXTFILE LOCATION  
'wasb://x...@xxx.blob.core.windows.net/’
{code}

By setting the key as a property, there is no need to put keys in the global 
hive config. 

This works as expected, except when the hive server is restarted. Then, all 
select queries return an error (see below).

It would be nice if the properties were propagated to the metastore, so that it 
works as expected. 
{code}

"*org.apache.hive.service.cli.HiveSQLException:Error while compiling statement: 
FAILED: SemanticException Unable to fetch table inputs. 
org.apache.hadoop.fs.azure.AzureException: 
org.apache.hadoop.fs.azure.AzureException: Container __container__ in account 
__account__.blob.core.windows.net not found, and we can't create  it using 
anoynomous credentials.:17:16",
'org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:315',
'org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:112',
'org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:181',
'org.apache.hive.service.cli.operation.Operation:run:Operation.java:257',
'org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:388',
'org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementAsync:HiveSessionImpl.java:375',
'org.apache.hive.service.cli.CLIService:executeStatementAsync:CLIService.java:274',
'org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:486',
'org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1313',
'org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1298',
'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39',
'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39',
'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56',
'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285',
'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145',
'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:615',
'java.lang.Thread:run:Thread.java:745',
"*org.apache.hadoop.hive.ql.parse.SemanticException:Unable to fetch table 
inputs. org.apache.hadoop.fs.azure.AzureException: 
org.apache.hadoop.fs.azure.AzureException: Container __container__ in account 
__account__.blob.core.windows.net not found, and we can't create  it using 
anoynomous credentials.:26:10",
'org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:getMetaData:SemanticAnalyzer.java:1868',
'org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:getMetaData:SemanticAnalyzer.java:1545',
'org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:genResolvedParseTree:SemanticAnalyzer.java:10077',
'org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:analyzeInternal:SemanticAnalyzer.java:10128',
'org.apache.hadoop.hive.ql.parse.CalcitePlanner:analyzeInternal:CalcitePlanner.java:209',
'org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer:analyze:BaseSemanticAnalyzer.java:227',
'org.apache.hadoop.hive.ql.Driver:compile:Driver.java:424',
'org.apache.hadoop.hive.ql.Driver:compile:Driver.java:308',
'org.apache.hadoop.hive.ql.Driver:compileInternal:Driver.java:1122',
'org.apache.hadoop.hive.ql.Driver:compileAndRespond:Driver.java:1116',
'org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:110',
"*org.apache.hadoop.hive.ql.metadata.HiveException:Unable to fetch table 
inputs. org.apache.hadoop.fs.azure.AzureException: 
org.apache.hadoop.fs.azure.AzureException: Container __container__ in account 
__account__.blob.core.windows.net not found, and we can't create  it using 
anoynomous credentials.:28:2",
'org.apache.hadoop.hive.ql.metadata.Hive:getTable:Hive.java:1123',
'org.apache.hadoop.hive.ql.metadata.Hive:getTable:Hive.java:1070',
'org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:getMetaData:SemanticAnalyzer.java:1574',
  
"*org.apache.hadoop.hive.metastore.api.MetaException:org.apache.hadoop.fs.azure.AzureException:
 org.apache.hadoop.fs.azure.AzureException: Container __container__ in account 
__account__.blob.core.windows.net not found, and we can't create  it using 
anoynomous credentials.:47:19",
'org.apache.hadoop.hive.ql.security.authorization.Authori

[jira] [Created] (HIVE-11759) Extend new cost model to correctly reflect limit cost

2015-09-08 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-11759:
--

 Summary: Extend new cost model to correctly reflect limit cost
 Key: HIVE-11759
 URL: https://issues.apache.org/jira/browse/HIVE-11759
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Affects Versions: 2.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: hiveserver2 hangs

2015-09-08 Thread kulkarni.swar...@gmail.com
How much memory have you currently provided to HS2? Have you tried bumping
that up?

On Mon, Sep 7, 2015 at 1:09 AM, Sanjeev Verma 
wrote:

> *I am getting the following exception when the HS2 is crashing, any idea
> why it has happening*
>
> "pool-1-thread-121" prio=4 tid=19283 RUNNABLE
> at java.lang.OutOfMemoryError.(OutOfMemoryError.java:48)
> at java.util.Arrays.copyOf(Arrays.java:2271)
> Local Variable: byte[]#1
> at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
> at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutput
> Stream.java:93)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
> Local Variable: org.apache.thrift.TByteArrayOutputStream#42
> Local Variable: byte[]#5378
> at org.apache.thrift.transport.TSaslTransport.write(TSaslTransp
> ort.java:446)
> at org.apache.thrift.transport.TSaslServerTransport.write(TSasl
> ServerTransport.java:41)
> at org.apache.thrift.protocol.TBinaryProtocol.writeI32(TBinaryP
> rotocol.java:163)
> at org.apache.thrift.protocol.TBinaryProtocol.writeString(TBina
> ryProtocol.java:186)
> Local Variable: byte[]#2
> at org.apache.hive.service.cli.thrift.TStringColumn$TStringColu
> mnStandardScheme.write(TStringColumn.java:490)
> Local Variable: java.util.ArrayList$Itr#1
> at org.apache.hive.service.cli.thrift.TStringColumn$TStringColu
> mnStandardScheme.write(TStringColumn.java:433)
> Local Variable: org.apache.hive.service.cli.th
> rift.TStringColumn$TStringColumnStandardScheme#1
> at org.apache.hive.service.cli.thrift.TStringColumn.write(TStri
> ngColumn.java:371)
> at org.apache.hive.service.cli.thrift.TColumn.standardSchemeWri
> teValue(TColumn.java:381)
> Local Variable: org.apache.hive.service.cli.thrift.TColumn#504
> Local Variable: org.apache.hive.service.cli.thrift.TStringColumn#453
> at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:244)
> at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:213)
> at org.apache.thrift.TUnion.write(TUnion.java:152)
>
>
>
> On Fri, Aug 21, 2015 at 6:16 AM, kulkarni.swar...@gmail.com <
> kulkarni.swar...@gmail.com> wrote:
>
>> Sanjeev,
>>
>> One possibility is that you are running into[1] which affects hive 0.13.
>> Is it possible for you to apply the patch on [1] and see if it fixes your
>> problem?
>>
>> [1] https://issues.apache.org/jira/browse/HIVE-10410
>>
>> On Thu, Aug 20, 2015 at 6:12 PM, Sanjeev Verma > > wrote:
>>
>>> We are using hive-0.13 with hadoop1.
>>>
>>> On Thu, Aug 20, 2015 at 11:49 AM, kulkarni.swar...@gmail.com <
>>> kulkarni.swar...@gmail.com> wrote:
>>>
 Sanjeev,

 Can you tell me more details about your hive version/hadoop version etc.

 On Wed, Aug 19, 2015 at 1:35 PM, Sanjeev Verma <
 sanjeev.verm...@gmail.com> wrote:

> Can somebody gives me some pointer to looked upon?
>
> On Wed, Aug 19, 2015 at 9:26 AM, Sanjeev Verma <
> sanjeev.verm...@gmail.com> wrote:
>
>> Hi
>> We are experiencing a strange problem with the hiveserver2, in one of
>> the job it gets the GC limit exceed from mapred task and hangs even 
>> having
>> enough heap available.we are not able to identify what causing this 
>> issue.
>> Could anybody help me identify the issue and let me know what
>> pointers I need to looked up.
>>
>> Thanks
>>
>
>


 --
 Swarnim

>>>
>>>
>>
>>
>> --
>> Swarnim
>>
>
>


-- 
Swarnim


[jira] [Created] (HIVE-11758) Querying nested parquet columns is case sensitive

2015-09-08 Thread Jakub Kukul (JIRA)
Jakub Kukul created HIVE-11758:
--

 Summary: Querying nested parquet columns is case sensitive
 Key: HIVE-11758
 URL: https://issues.apache.org/jira/browse/HIVE-11758
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.1.1, 1.1.0
Reporter: Jakub Kukul
Priority: Minor


Querying nested parquet columns (columns within a {{STRUCT}}) is case 
sensitive. It should be case insensitive, to be compatible with querying 
non-nested columns and querying nested columns with other file formats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11757) I want to get the value of HiveKey in my custom partitioner

2015-09-08 Thread apachehadoop (JIRA)
apachehadoop created HIVE-11757:
---

 Summary: I want to get the value of HiveKey in my custom 
partitioner
 Key: HIVE-11757
 URL: https://issues.apache.org/jira/browse/HIVE-11757
 Project: Hive
  Issue Type: Wish
Reporter: apachehadoop






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11756) Avoid redundant key serialization in RS for distinct query

2015-09-08 Thread Navis (JIRA)
Navis created HIVE-11756:


 Summary: Avoid redundant key serialization in RS for distinct query
 Key: HIVE-11756
 URL: https://issues.apache.org/jira/browse/HIVE-11756
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial


Currently hive serializes twice to know the length of distribution key for 
distinct queries. This introduces IndexedSerializer to avoid this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)