from:"kulkarni.swar...@gmail.com"

Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan

2015-09-17 Thread kulkarni.swar...@gmail.com

Congratulations! Well deserved!

On Thu, Sep 17, 2015 at 12:03 AM, Vikram Dixit K 
wrote:

> Congrats Ashutosh!
>
> On Wed, Sep 16, 2015 at 9:01 PM, Chetna C  wrote:
>
>> Congrats Ashutosh !
>>
>> Thanks,
>> Chetna Chaudhari
>>
>> On 17 September 2015 at 06:53, Navis Ryu  wrote:
>>
>> > Congratulations!
>> >
>> > 2015-09-17 9:35 GMT+09:00 Xu, Cheng A :
>> > > Congratulations, Ashutosh!
>> > >
>> > > -Original Message-
>> > > From: Mohammad Islam [mailto:misla...@yahoo.com.INVALID]
>> > > Sent: Thursday, September 17, 2015 8:23 AM
>> > > To: user@hive.apache.org; Hive
>> > > Subject: Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan
>> > >
>> > > Congratulations Asutosh!
>> > >
>> > >
>> > >  On Wednesday, September 16, 2015 4:51 PM, Bright Ling <
>> > brig...@hostworks.com.au> wrote:
>> > >
>> > >
>> > >  #yiv7221259285 #yiv7221259285 -- _filtered #yiv7221259285
>> > {font-family:SimSun;panose-1:2 1 6 0 3 1 1 1 1 1;} _filtered
>> #yiv7221259285
>> > {font-family:PMingLiU;panose-1:2 2 5 0 0 0 0 0 0 0;} _filtered
>> > #yiv7221259285 {panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv7221259285
>> > {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered
>> > #yiv7221259285 {font-family:Tahoma;panose-1:2 11 6 4 3 5 4 4 2 4;}
>> > _filtered #yiv7221259285 {panose-1:2 2 5 0 0 0 0 0 0 0;} _filtered
>> > #yiv7221259285 {panose-1:2 1 6 0 3 1 1 1 1 1;}#yiv7221259285
>> #yiv7221259285
>> > p.yiv7221259285MsoNormal, #yiv7221259285 li.yiv7221259285MsoNormal,
>> > #yiv7221259285 div.yiv7221259285MsoNormal
>> > {margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;}#yiv7221259285
>> a:link,
>> > #yiv7221259285 span.yiv7221259285MsoHyperlink
>> > {color:blue;text-decoration:underline;}#yiv7221259285 a:visited,
>> > #yiv7221259285 span.yiv7221259285MsoHyperlinkFollowed
>> > {color:purple;text-decoration:underline;}#yiv7221259285
>> > p.yiv7221259285MsoAcetate, #yiv7221259285 li.yiv7221259285MsoAcetate,
>> > #yiv7221259285 div.yiv7221259285MsoAcetate
>> > {margin:0cm;margin-bottom:.0001pt;font-size:8.0pt;}#yiv7221259285
>> > span.yiv7221259285EmailStyle17 {color:#1F497D;}#yiv7221259285
>> > span.yiv7221259285BalloonTextChar {}#yiv7221259285
>> > .yiv7221259285MsoChpDefault {font-size:10.0pt;} _filtered #yiv7221259285
>> > {margin:72.0pt 72.0pt 72.0pt 72.0pt;}#yiv7221259285
>> > div.yiv7221259285WordSection1 {}#yiv7221259285 Congratulations Asutosh!
>> >From: Sathi Chowdhury [mailto:sathi.chowdh...@lithium.com]
>> > > Sent: Thursday, 17 September 2015 8:04 AM
>> > > To: user@hive.apache.org
>> > > Subject: Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan
>> > Congrats Asutosh!From:Sergey Shelukhin
>> > > Reply-To: "user@hive.apache.org"
>> > > Date: Wednesday, September 16, 2015 at 2:31 PM
>> > > To: "user@hive.apache.org"
>> > > Subject: Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan
>> > Congrats!From:Alpesh Patel 
>> > > Reply-To: "user@hive.apache.org" 
>> > > Date: Wednesday, September 16, 2015 at 13:24
>> > > To: "user@hive.apache.org" 
>> > > Subject: Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan
>> > Congratulations AshutoshOn Wed, Sep 16, 2015 at 1:23 PM, Pengcheng
>> > Xiong  wrote: Congratulations Ashutosh!On Wed,
>> Sep
>> > 16, 2015 at 1:17 PM, John Pullokkaran 
>> > wrote: Congrats Ashutosh!From:Vaibhav Gumashta <
>> > vgumas...@hortonworks.com>
>> > > Reply-To: "user@hive.apache.org" 
>> > > Date: Wednesday, September 16, 2015 at 1:01 PM
>> > > To: "user@hive.apache.org" , "
>> d...@hive.apache.org"
>> > 
>> > > Cc: Ashutosh Chauhan 
>> > > Subject: Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan
>> > Congrats Ashutosh! —VaibhavFrom:Prasanth Jayachandran <
>> > pjayachand...@hortonworks.com>
>> > > Reply-To: "user@hive.apache.org" 
>> > > Date: Wednesday, September 16, 2015 at 12:50 PM
>> > > To: "d...@hive.apache.org" , "
>> user@hive.apache.org"
>> > 
>> > > Cc: "d...@hive.apache.org" , Ashutosh Chauhan <
>> > hashut...@apache.org>
>> > > Subject: Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan
>> > Congratulations Ashutosh!
>> > >
>> > >  On Wed, Sep 16, 2015 at 12:48 PM -0700, "Xuefu Zhang" <
>> > xzh...@cloudera.com> wrote: Congratulations, Ashutosh!. Well-deserved.
>> > >
>> > > Thanks to Carl also for the hard work in the past few years!
>> > >
>> > > --Xuefu
>> > >
>> > > On Wed, Sep 16, 2015 at 12:39 PM, Carl Steinbach 
>> wrote:
>> > >
>> > >> I am very happy to announce that Ashutosh Chauhan is taking over as
>> > >> the new VP of the Apache Hive project. Ashutosh has been a longtime
>> > >> contributor to Hive and has played a pivotal role in many of the
>> major
>> > >> advances that have been made over the past couple of years. Please
>> > >> join me in congratulating Ashutosh on his new role!
>> > >>
>> > >
>> > >
>> >
>>
>
>
>
> --
> Nothing better than when appreciated for hard work.
> -Mark
>



-- 
Swarnim

Re: hiveserver2 hangs

2015-09-08 Thread kulkarni.swar...@gmail.com

Sanjeev,

I am going off this exception in the stacktrace that you posted.

"at java.lang.OutOfMemoryError.(OutOfMemoryError.java:48)"

which def. indicates that it's not very happy memory wise. I would def.
recommend to bump up the memory and see if it helps. If not, we can debug
further from there.

On Tue, Sep 8, 2015 at 12:17 PM, Sanjeev Verma 
wrote:

> What this exception implies here? how to identify the problem here.
> Thanks
>
> On Tue, Sep 8, 2015 at 10:44 PM, Sanjeev Verma 
> wrote:
>
>> We have 8GB HS2 java heap, we have not tried any bumping.
>>
>> On Tue, Sep 8, 2015 at 8:14 PM, kulkarni.swar...@gmail.com <
>> kulkarni.swar...@gmail.com> wrote:
>>
>>> How much memory have you currently provided to HS2? Have you tried
>>> bumping that up?
>>>
>>> On Mon, Sep 7, 2015 at 1:09 AM, Sanjeev Verma >> > wrote:
>>>
>>>> *I am getting the following exception when the HS2 is crashing, any
>>>> idea why it has happening*
>>>>
>>>> "pool-1-thread-121" prio=4 tid=19283 RUNNABLE
>>>> at java.lang.OutOfMemoryError.(OutOfMemoryError.java:48)
>>>> at java.util.Arrays.copyOf(Arrays.java:2271)
>>>> Local Variable: byte[]#1
>>>> at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
>>>> at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutput
>>>> Stream.java:93)
>>>> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
>>>> Local Variable: org.apache.thrift.TByteArrayOutputStream#42
>>>> Local Variable: byte[]#5378
>>>> at org.apache.thrift.transport.TSaslTransport.write(TSaslTransp
>>>> ort.java:446)
>>>> at org.apache.thrift.transport.TSaslServerTransport.write(TSasl
>>>> ServerTransport.java:41)
>>>> at org.apache.thrift.protocol.TBinaryProtocol.writeI32(TBinaryP
>>>> rotocol.java:163)
>>>> at org.apache.thrift.protocol.TBinaryProtocol.writeString(TBina
>>>> ryProtocol.java:186)
>>>> Local Variable: byte[]#2
>>>> at org.apache.hive.service.cli.thrift.TStringColumn$TStringColu
>>>> mnStandardScheme.write(TStringColumn.java:490)
>>>> Local Variable: java.util.ArrayList$Itr#1
>>>> at org.apache.hive.service.cli.thrift.TStringColumn$TStringColu
>>>> mnStandardScheme.write(TStringColumn.java:433)
>>>> Local Variable: org.apache.hive.service.cli.th
>>>> rift.TStringColumn$TStringColumnStandardScheme#1
>>>> at org.apache.hive.service.cli.thrift.TStringColumn.write(TStri
>>>> ngColumn.java:371)
>>>> at org.apache.hive.service.cli.thrift.TColumn.standardSchemeWri
>>>> teValue(TColumn.java:381)
>>>> Local Variable: org.apache.hive.service.cli.thrift.TColumn#504
>>>> Local Variable: org.apache.hive.service.cli.thrift.TStringColumn#453
>>>> at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:244)
>>>> at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:213)
>>>> at org.apache.thrift.TUnion.write(TUnion.java:152)
>>>>
>>>>
>>>>
>>>> On Fri, Aug 21, 2015 at 6:16 AM, kulkarni.swar...@gmail.com <
>>>> kulkarni.swar...@gmail.com> wrote:
>>>>
>>>>> Sanjeev,
>>>>>
>>>>> One possibility is that you are running into[1] which affects hive
>>>>> 0.13. Is it possible for you to apply the patch on [1] and see if it fixes
>>>>> your problem?
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/HIVE-10410
>>>>>
>>>>> On Thu, Aug 20, 2015 at 6:12 PM, Sanjeev Verma <
>>>>> sanjeev.verm...@gmail.com> wrote:
>>>>>
>>>>>> We are using hive-0.13 with hadoop1.
>>>>>>
>>>>>> On Thu, Aug 20, 2015 at 11:49 AM, kulkarni.swar...@gmail.com <
>>>>>> kulkarni.swar...@gmail.com> wrote:
>>>>>>
>>>>>>> Sanjeev,
>>>>>>>
>>>>>>> Can you tell me more details about your hive version/hadoop version
>>>>>>> etc.
>>>>>>>
>>>>>>> On Wed, Aug 19, 2015 at 1:35 PM, Sanjeev Verma <
>>>>>>> sanjeev.verm...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Can somebody gives me some pointer to looked upon?
>>>>>>>>
>>>>>>>> On Wed, Aug 19, 2015 at 9:26 AM, Sanjeev Verma <
>>>>>>>> sanjeev.verm...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi
>>>>>>>>> We are experiencing a strange problem with the hiveserver2, in one
>>>>>>>>> of the job it gets the GC limit exceed from mapred task and hangs even
>>>>>>>>> having enough heap available.we are not able to identify what causing 
>>>>>>>>> this
>>>>>>>>> issue.
>>>>>>>>> Could anybody help me identify the issue and let me know what
>>>>>>>>> pointers I need to looked up.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Swarnim
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Swarnim
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Swarnim
>>>
>>
>>
>


-- 
Swarnim

Re: hiveserver2 hangs

2015-09-08 Thread kulkarni.swar...@gmail.com

How much memory have you currently provided to HS2? Have you tried bumping
that up?

On Mon, Sep 7, 2015 at 1:09 AM, Sanjeev Verma 
wrote:

> *I am getting the following exception when the HS2 is crashing, any idea
> why it has happening*
>
> "pool-1-thread-121" prio=4 tid=19283 RUNNABLE
> at java.lang.OutOfMemoryError.(OutOfMemoryError.java:48)
> at java.util.Arrays.copyOf(Arrays.java:2271)
> Local Variable: byte[]#1
> at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
> at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutput
> Stream.java:93)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
> Local Variable: org.apache.thrift.TByteArrayOutputStream#42
> Local Variable: byte[]#5378
> at org.apache.thrift.transport.TSaslTransport.write(TSaslTransp
> ort.java:446)
> at org.apache.thrift.transport.TSaslServerTransport.write(TSasl
> ServerTransport.java:41)
> at org.apache.thrift.protocol.TBinaryProtocol.writeI32(TBinaryP
> rotocol.java:163)
> at org.apache.thrift.protocol.TBinaryProtocol.writeString(TBina
> ryProtocol.java:186)
> Local Variable: byte[]#2
> at org.apache.hive.service.cli.thrift.TStringColumn$TStringColu
> mnStandardScheme.write(TStringColumn.java:490)
> Local Variable: java.util.ArrayList$Itr#1
> at org.apache.hive.service.cli.thrift.TStringColumn$TStringColu
> mnStandardScheme.write(TStringColumn.java:433)
> Local Variable: org.apache.hive.service.cli.th
> rift.TStringColumn$TStringColumnStandardScheme#1
> at org.apache.hive.service.cli.thrift.TStringColumn.write(TStri
> ngColumn.java:371)
> at org.apache.hive.service.cli.thrift.TColumn.standardSchemeWri
> teValue(TColumn.java:381)
> Local Variable: org.apache.hive.service.cli.thrift.TColumn#504
> Local Variable: org.apache.hive.service.cli.thrift.TStringColumn#453
> at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:244)
> at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:213)
> at org.apache.thrift.TUnion.write(TUnion.java:152)
>
>
>
> On Fri, Aug 21, 2015 at 6:16 AM, kulkarni.swar...@gmail.com <
> kulkarni.swar...@gmail.com> wrote:
>
>> Sanjeev,
>>
>> One possibility is that you are running into[1] which affects hive 0.13.
>> Is it possible for you to apply the patch on [1] and see if it fixes your
>> problem?
>>
>> [1] https://issues.apache.org/jira/browse/HIVE-10410
>>
>> On Thu, Aug 20, 2015 at 6:12 PM, Sanjeev Verma > > wrote:
>>
>>> We are using hive-0.13 with hadoop1.
>>>
>>> On Thu, Aug 20, 2015 at 11:49 AM, kulkarni.swar...@gmail.com <
>>> kulkarni.swar...@gmail.com> wrote:
>>>
>>>> Sanjeev,
>>>>
>>>> Can you tell me more details about your hive version/hadoop version etc.
>>>>
>>>> On Wed, Aug 19, 2015 at 1:35 PM, Sanjeev Verma <
>>>> sanjeev.verm...@gmail.com> wrote:
>>>>
>>>>> Can somebody gives me some pointer to looked upon?
>>>>>
>>>>> On Wed, Aug 19, 2015 at 9:26 AM, Sanjeev Verma <
>>>>> sanjeev.verm...@gmail.com> wrote:
>>>>>
>>>>>> Hi
>>>>>> We are experiencing a strange problem with the hiveserver2, in one of
>>>>>> the job it gets the GC limit exceed from mapred task and hangs even 
>>>>>> having
>>>>>> enough heap available.we are not able to identify what causing this 
>>>>>> issue.
>>>>>> Could anybody help me identify the issue and let me know what
>>>>>> pointers I need to looked up.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Swarnim
>>>>
>>>
>>>
>>
>>
>> --
>> Swarnim
>>
>
>


-- 
Swarnim

Re: [ANNOUNCE] New Hive Committer - Lars Francke

2015-09-07 Thread kulkarni.swar...@gmail.com

Congrats!

On Mon, Sep 7, 2015 at 3:54 AM, Carl Steinbach  wrote:

> The Apache Hive PMC has voted to make Lars Francke a committer on the
> Apache Hive Project.
>
> Please join me in congratulating Lars!
>
> Thanks.
>
> - Carl
>
>


-- 
Swarnim

Re: HiveServer2 & Kerberos

2015-08-27 Thread kulkarni.swar...@gmail.com

> 1) Hive CLI does not talk to HiveServer2

Oh yes. Absolutely. Sorry typo on my end.

> 2) Beeline talks to HiveServer2 and needs some way to authenticate itself
depending on the configuration of HS2.

HS2 can be configured to authenticate in one of these ways if I'm up to
date:

* NOSASL: no password needed
* KERBEROS (SASL): no password needed

So this is what confused me so I started digging deeper to understand
what's going on and how it is exactly using the kerberos credentials for
getting the connection. So it turns out that it actually boils down to
using the HiveConnection[1] and depending on the auth type, opens up a
transport to the HS. So technically it's not attempting to read credentials
off the kerb ticket for the connection but completely ignoring the creds
that are passed by the user, ofcourse unless it's SASL.

That said, +1 to adding a check that we are using kerberos and skipping the
prompt if we are. I think we probably don't even need to parse the URL to
detect that. Just checking on the auth type property(
hive.server2.authentication) is KERBEROS or not should do the trick.

[1]
https://github.com/apache/hive/blob/3991dba30c5068cac296f32e24e97cf87efa266c/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L450-L455

On Wed, Aug 26, 2015 at 5:40 PM, Lars Francke 
wrote:

>
> On Wed, Aug 26, 2015 at 4:53 PM, kulkarni.swar...@gmail.com <
> kulkarni.swar...@gmail.com> wrote:
>
>> > my understanding is that after using kerberos authentication, you
>> probably don’t need the password.
>>
>> That is not an accurate statement. Beeline is a JDBC client as compared
>> to Hive CLI which is a thrift client to talk to HIveServer2. So it would
>> need the password to establish that JDBC connection. If you look at the
>> beeline console code[1], it actually first tries to read the
>> "javax.jdo.option.ConnectionUserName" and
>> "javax.jdo.option.ConnectionPassword" property which is the same username
>> and password that you have setup your backing metastore DB with. If it is
>> MySWL, it would be the password you set MySQL with or empty if you
>> haven't(or are using derby). Kerberos is merely a tool for you to
>> authenticate yourself so that you cannot impersonate yourself as someone
>> else.
>>
>
> I don't think what you're saying is accurate.
>
> 1) Hive CLI does not talk to HiveServer2
>
> 2) Beeline talks to HiveServer2 and needs some way to authenticate itself
> depending on the configuration of HS2.
>
> HS2 can be configured to authenticate in one of these ways if I'm up to
> date:
>
> * NOSASL: no password needed
> * KERBEROS (SASL): no password needed
> * NONE (SASL) using the AnonymousAuthenticationProviderImpl: no password
> needed
> * LDAP (SASL) using the LdapAuthenticationProviderImpl: username and
> password required
> * PAM (SASL) using the PamAuthenticationProviderImpl: username and
> password required
> * CUSTOM (SASL) using the CustomAuthenticationProviderImpl: username and
> password required
>
> By tar the most common configurations are NONE (default I think) and
> KERBEROS. Both don't need a username and password provided so it does not
> make sense to ask for one every time.
>
> The only good reason I can think of to ask for a password is so that it
> doesn't appear in a shell/beeline history and/or on screen. I'm sure there
> are others?
> The username can be safely provided in the URL if needed so I don't think
> asking for that every time is reasonable either.
>
> What would be a good way to deal with this? I'm tempted to just rip out
> those prompts. The other option would be to parse the connection URL and
> check whether it's the Kerberos mode.
>
>>
>> [1]
>> https://github.com/apache/hive/blob/3991dba30c5068cac296f32e24e97cf87efa266c/beeline/src/java/org/apache/hive/beeline/Commands.java#L1117-L1125
>>
>> On Wed, Aug 26, 2015 at 10:13 AM, Loïc Chanel <
>> loic.cha...@telecomnancy.net> wrote:
>>
>>> Here it is : https://issues.apache.org/jira/browse/HIVE-11653
>>>
>>> Loïc CHANEL
>>> Engineering student at TELECOM Nancy
>>> Trainee at Worldline - Villeurbanne
>>>
>>> 2015-08-25 23:10 GMT+02:00 Sergey Shelukhin :
>>>
>>>> Sure!
>>>>
>>>> From: Loïc Chanel 
>>>> Reply-To: "user@hive.apache.org" 
>>>> Date: Tuesday, August 25, 2015 at 00:23
>>>>
>>>> To: "user@hive.apache.org" 
>>>> Subject: Re: HiveServer2 & Kerberos
>>>>
>>>> It is the case.
>>>> Would you like me to fill a JIRA a

Re: HiveServer2 & Kerberos

2015-08-26 Thread kulkarni.swar...@gmail.com

Nope. Because the credentials are different. You might have multiple users
using there own credentials to authenticate themselves but there are only
single defined credentials to be used by the metastore server.

On Wed, Aug 26, 2015 at 10:58 AM, Loïc Chanel 
wrote:

> I understand the behavior, but when Kerberos is enabled, isn't that a bit
> redundant ?
>
> Loïc CHANEL
> Engineering student at TELECOM Nancy
> Trainee at Worldline - Villeurbanne
>
> 2015-08-26 17:53 GMT+02:00 kulkarni.swar...@gmail.com <
> kulkarni.swar...@gmail.com>:
>
>> > my understanding is that after using kerberos authentication, you
>> probably don’t need the password.
>>
>> That is not an accurate statement. Beeline is a JDBC client as compared
>> to Hive CLI which is a thrift client to talk to HIveServer2. So it would
>> need the password to establish that JDBC connection. If you look at the
>> beeline console code[1], it actually first tries to read the
>> "javax.jdo.option.ConnectionUserName" and
>> "javax.jdo.option.ConnectionPassword" property which is the same username
>> and password that you have setup your backing metastore DB with. If it is
>> MySWL, it would be the password you set MySQL with or empty if you
>> haven't(or are using derby). Kerberos is merely a tool for you to
>> authenticate yourself so that you cannot impersonate yourself as someone
>> else.
>>
>> [1]
>> https://github.com/apache/hive/blob/3991dba30c5068cac296f32e24e97cf87efa266c/beeline/src/java/org/apache/hive/beeline/Commands.java#L1117-L1125
>>
>> On Wed, Aug 26, 2015 at 10:13 AM, Loïc Chanel <
>> loic.cha...@telecomnancy.net> wrote:
>>
>>> Here it is : https://issues.apache.org/jira/browse/HIVE-11653
>>>
>>> Loïc CHANEL
>>> Engineering student at TELECOM Nancy
>>> Trainee at Worldline - Villeurbanne
>>>
>>> 2015-08-25 23:10 GMT+02:00 Sergey Shelukhin :
>>>
>>>> Sure!
>>>>
>>>> From: Loïc Chanel 
>>>> Reply-To: "user@hive.apache.org" 
>>>> Date: Tuesday, August 25, 2015 at 00:23
>>>>
>>>> To: "user@hive.apache.org" 
>>>> Subject: Re: HiveServer2 & Kerberos
>>>>
>>>> It is the case.
>>>> Would you like me to fill a JIRA about it ?
>>>>
>>>> Loïc CHANEL
>>>> Engineering student at TELECOM Nancy
>>>> Trainee at Worldline - Villeurbanne
>>>>
>>>> 2015-08-24 19:24 GMT+02:00 Sergey Shelukhin :
>>>>
>>>>> If that is the case it sounds like a bug…
>>>>>
>>>>> From: Jary Du 
>>>>> Reply-To: "user@hive.apache.org" 
>>>>> Date: Thursday, August 20, 2015 at 08:56
>>>>> To: "user@hive.apache.org" 
>>>>> Subject: Re: HiveServer2 & Kerberos
>>>>>
>>>>> My understanding is that it will always ask you user/password even
>>>>> though you don’t need them. It is just the way how hive is setup.
>>>>>
>>>>> On Aug 20, 2015, at 8:28 AM, Loïc Chanel 
>>>>> wrote:
>>>>>
>>>>> !connect jdbc:hive2://
>>>>> 192.168.6.210:1/db;principal=hive/hiveh...@westeros.wl
>>>>> org.apache.hive.jdbc.HiveDriver
>>>>> scan complete in 13ms
>>>>> Connecting to jdbc:hive2://
>>>>> 192.168.6.210:1/db;principal=hive/hiveh...@westeros.wl
>>>>> Enter password for jdbc:hive2://
>>>>> 192.168.6.210:1/chaneldb;principal=hive/hiveh...@westeros.wl:
>>>>>
>>>>> And if I press enter everything works perfectly, because I am using
>>>>> Kerberos authentication, that's actually why I was asking what is Hive
>>>>> asking for, because in my case, it seems that I shouldn't be asked for a
>>>>> password when connecting.
>>>>>
>>>>> Loïc CHANEL
>>>>> Engineering student at TELECOM Nancy
>>>>> Trainee at Worldline - Villeurbanne
>>>>>
>>>>> 2015-08-20 17:06 GMT+02:00 Jary Du :
>>>>>
>>>>>> How does Beeline ask you? What happens if you just press enter?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Aug 20, 2015, at 12:15 AM, Loïc Chanel <
>>>>>> loic.cha...@telecomnancy.net> wrote:
>>>>>>
>>>>>> Indeed, I don'

Re: HiveServer2 & Kerberos

2015-08-26 Thread kulkarni.swar...@gmail.com

> my understanding is that after using kerberos authentication, you
probably don’t need the password.

That is not an accurate statement. Beeline is a JDBC client as compared to
Hive CLI which is a thrift client to talk to HIveServer2. So it would need
the password to establish that JDBC connection. If you look at the beeline
console code[1], it actually first tries to read the
"javax.jdo.option.ConnectionUserName" and
"javax.jdo.option.ConnectionPassword" property which is the same username
and password that you have setup your backing metastore DB with. If it is
MySWL, it would be the password you set MySQL with or empty if you
haven't(or are using derby). Kerberos is merely a tool for you to
authenticate yourself so that you cannot impersonate yourself as someone
else.

[1]
https://github.com/apache/hive/blob/3991dba30c5068cac296f32e24e97cf87efa266c/beeline/src/java/org/apache/hive/beeline/Commands.java#L1117-L1125

On Wed, Aug 26, 2015 at 10:13 AM, Loïc Chanel 
wrote:

> Here it is : https://issues.apache.org/jira/browse/HIVE-11653
>
> Loïc CHANEL
> Engineering student at TELECOM Nancy
> Trainee at Worldline - Villeurbanne
>
> 2015-08-25 23:10 GMT+02:00 Sergey Shelukhin :
>
>> Sure!
>>
>> From: Loïc Chanel 
>> Reply-To: "user@hive.apache.org" 
>> Date: Tuesday, August 25, 2015 at 00:23
>>
>> To: "user@hive.apache.org" 
>> Subject: Re: HiveServer2 & Kerberos
>>
>> It is the case.
>> Would you like me to fill a JIRA about it ?
>>
>> Loïc CHANEL
>> Engineering student at TELECOM Nancy
>> Trainee at Worldline - Villeurbanne
>>
>> 2015-08-24 19:24 GMT+02:00 Sergey Shelukhin :
>>
>>> If that is the case it sounds like a bug…
>>>
>>> From: Jary Du 
>>> Reply-To: "user@hive.apache.org" 
>>> Date: Thursday, August 20, 2015 at 08:56
>>> To: "user@hive.apache.org" 
>>> Subject: Re: HiveServer2 & Kerberos
>>>
>>> My understanding is that it will always ask you user/password even
>>> though you don’t need them. It is just the way how hive is setup.
>>>
>>> On Aug 20, 2015, at 8:28 AM, Loïc Chanel 
>>> wrote:
>>>
>>> !connect jdbc:hive2://
>>> 192.168.6.210:1/db;principal=hive/hiveh...@westeros.wl
>>> org.apache.hive.jdbc.HiveDriver
>>> scan complete in 13ms
>>> Connecting to jdbc:hive2://
>>> 192.168.6.210:1/db;principal=hive/hiveh...@westeros.wl
>>> Enter password for jdbc:hive2://
>>> 192.168.6.210:1/chaneldb;principal=hive/hiveh...@westeros.wl:
>>>
>>> And if I press enter everything works perfectly, because I am using
>>> Kerberos authentication, that's actually why I was asking what is Hive
>>> asking for, because in my case, it seems that I shouldn't be asked for a
>>> password when connecting.
>>>
>>> Loïc CHANEL
>>> Engineering student at TELECOM Nancy
>>> Trainee at Worldline - Villeurbanne
>>>
>>> 2015-08-20 17:06 GMT+02:00 Jary Du :
>>>
 How does Beeline ask you? What happens if you just press enter?



 On Aug 20, 2015, at 12:15 AM, Loïc Chanel 
 wrote:

 Indeed, I don't need the password, but why is Beeline asking me for one
 ? To what does it correspond ?

 Thanks again,


 Loïc

 Loïc CHANEL
 Engineering student at TELECOM Nancy
 Trainee at Worldline - Villeurbanne

 2015-08-19 18:22 GMT+02:00 Jary Du :

> Correct me if I am wrong, my understanding is that after using
> kerberos authentication, you probably don’t need the password.
>
> Hope it helps
>
> Thanks,
> Jary
>
>
> On Aug 19, 2015, at 9:09 AM, Loïc Chanel 
> wrote:
>
> By the way, thanks a lot for your help, because your solution works,
> but I'm still interested in knowing what is the password I did not enter.
>
> Thanks again,
>
>
> Loïc
>
> Loïc CHANEL
> Engineering student at TELECOM Nancy
> Trainee at Worldline - Villeurbanne
>
> 2015-08-19 18:07 GMT+02:00 Loïc Chanel :
>
>> All right, but then, what is the password hive asks for ? Hive's one
>> ? How do I know its value ?
>>
>> Loïc CHANEL
>> Engineering student at TELECOM Nancy
>> Trainee at Worldline - Villeurbanne
>>
>> 2015-08-19 17:51 GMT+02:00 Jary Du :
>>
>>> For Beeline connection string, it should be "!connect
>>> jdbc:hive2://:/;principal=”.
>>>  Please
>>> make sure it is the hive’s principal, not the user’s. And when you 
>>> kinit,
>>> it should be kinit user’s keytab, not the hive’s keytab.
>>>
>>>
>>>
>>>
>>>
>>> On Aug 19, 2015, at 8:46 AM, Loïc Chanel <
>>> loic.cha...@telecomnancy.net> wrote:
>>>
>>> Yeah, I forgot to mention it, but each time I did a kinit user/hive
>>> before launching beeline, as I read somewhere that Beeline does not 
>>> handle
>>> Kerberos connection.
>>>
>>> So, as I can make klist before launching beeline and having a good
>>> result, the problem does not come from this. Thanks a lot for your 
>>> response
>>> though.
>>>

Re: hiveserver2 hangs

2015-08-20 Thread kulkarni.swar...@gmail.com

Sanjeev,

One possibility is that you are running into[1] which affects hive 0.13. Is
it possible for you to apply the patch on [1] and see if it fixes your
problem?

[1] https://issues.apache.org/jira/browse/HIVE-10410

On Thu, Aug 20, 2015 at 6:12 PM, Sanjeev Verma 
wrote:

> We are using hive-0.13 with hadoop1.
>
> On Thu, Aug 20, 2015 at 11:49 AM, kulkarni.swar...@gmail.com <
> kulkarni.swar...@gmail.com> wrote:
>
>> Sanjeev,
>>
>> Can you tell me more details about your hive version/hadoop version etc.
>>
>> On Wed, Aug 19, 2015 at 1:35 PM, Sanjeev Verma > > wrote:
>>
>>> Can somebody gives me some pointer to looked upon?
>>>
>>> On Wed, Aug 19, 2015 at 9:26 AM, Sanjeev Verma <
>>> sanjeev.verm...@gmail.com> wrote:
>>>
>>>> Hi
>>>> We are experiencing a strange problem with the hiveserver2, in one of
>>>> the job it gets the GC limit exceed from mapred task and hangs even having
>>>> enough heap available.we are not able to identify what causing this issue.
>>>> Could anybody help me identify the issue and let me know what pointers
>>>> I need to looked up.
>>>>
>>>> Thanks
>>>>
>>>
>>>
>>
>>
>> --
>> Swarnim
>>
>
>


-- 
Swarnim

Re: hiveserver2 hangs

2015-08-20 Thread kulkarni.swar...@gmail.com

Sanjeev,

Can you tell me more details about your hive version/hadoop version etc.

On Wed, Aug 19, 2015 at 1:35 PM, Sanjeev Verma 
wrote:

> Can somebody gives me some pointer to looked upon?
>
> On Wed, Aug 19, 2015 at 9:26 AM, Sanjeev Verma 
> wrote:
>
>> Hi
>> We are experiencing a strange problem with the hiveserver2, in one of the
>> job it gets the GC limit exceed from mapred task and hangs even having
>> enough heap available.we are not able to identify what causing this issue.
>> Could anybody help me identify the issue and let me know what pointers I
>> need to looked up.
>>
>> Thanks
>>
>
>


-- 
Swarnim

Re: Request write access to the Hive wiki

2015-08-10 Thread kulkarni.swar...@gmail.com

@Xuefu While you are already at it, would you mind giving me this access
too? :)

Thanks,

On Mon, Aug 10, 2015 at 2:37 PM, Xuefu Zhang  wrote:

> Done!
>
> On Mon, Aug 10, 2015 at 1:05 AM, Xu, Cheng A  wrote:
>
>> Hi,
>>
>> I’d like to have write access to the Hive wiki. My Confluence username is
>> cheng.a...@intel.com with Full Name “Ferdinand Xu”. Please help me deal
>> with it. Thank you!
>>
>>
>>
>> Regards,
>>
>> Ferdinand Xu
>>
>>
>>
>
>


-- 
Swarnim

Re: Error communicating with metastore

2015-08-07 Thread kulkarni.swar...@gmail.com

Sarath,

I assume that the failure you are seeing doesn't happen immediately? The
current timeout on the client is set to 5 minutes. A socket timeout usually
means that the client timed out before it can even get a response from the
server. So the server could either be very busy doing something if you are
pulling in tons of data and/or you might also be running into this bug[1]
where the HMS connections are leaked.

To start with, can you try bumping up the timeout to like 20 minutes and
see if your queries succeed? This can be done directly from cli via
"set hive.metastore.client.socket.timeout=1200".

[1] https://issues.apache.org/jira/browse/HIVE-10956

On Fri, Aug 7, 2015 at 8:13 AM, Sarath Chandra <
sarathchandra.jos...@algofusiontech.com> wrote:

> Thanks Eugene, Alan.
>
> @Alan,
> As suggested checked the logs, here is what I found -
>
>- On starting metastore server, I'm seeing following messages in the
>log file -
>
> *2015-08-07 18:32:56,678 ERROR [Thread-7]: compactor.Initiator
> (Initiator.java:run(134)) - Caught an exception in the main loop of
> compactor initiator, exiting MetaException(message:Unable to get jdbc
> connection from pool, READ_COMMITTED and SERIALIZABLE are the only valid
> transaction levels)*
> *at
> org.apache.hadoop.hive.metastore.txn.TxnHandler.getDbConn(TxnHandler.java:811)*
> *at
> org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler.revokeFromLocalWorkers(CompactionTxnHandler.java:443)*
> *at
> org.apache.hadoop.hive.ql.txn.compactor.Initiator.recoverFailedCompactions(Initiator.java:147)*
> *at
> org.apache.hadoop.hive.ql.txn.compactor.Initiator.run(Initiator.java:64)*
>
>- On bringing up the hive shell, I get the following messages -
>
> tion - enable connectionWatch for additional debugging assistance or set
> disableConnectionTracking to true to disable this feature entirely.
> 2015-08-07 18:38:51,614 WARN
>  [org.spark-project.guava.common.base.internal.Finalizer]:
> bonecp.ConnectionPartition (ConnectionPartition.java:finalizeReferent(162))
> - BoneCP detected an unclosed connection and will now attempt to close it
> for you. You should be closing this connection in your application - enable
> connectionWatch for additional debugging assistance or set
> disableConnectionTracking to true to disable this feature entirely.
> 2015-08-07 18:38:51,768 DEBUG [pool-3-thread-1]: metastore.ObjectStore
> (ObjectStore.java:debugLog(6435)) - Commit transaction: count = 0, isactive
> true at:
>
> org.apache.hadoop.hive.metastore.ObjectStore.getFunctions(ObjectStore.java:6657)
>
>- On firing "show tables" command, I get the following messages in the
>log file -
>
> 2015-08-07 18:41:02,511 INFO  [main]: hive.metastore
> (HiveMetaStoreClient.java:open(297)) - Trying to connect to metastore with
> URI thrift://sarath:9083
> 2015-08-07 18:41:02,511 INFO  [main]: hive.metastore
> (HiveMetaStoreClient.java:open(385)) - Connected to metastore.
> 2015-08-07 18:41:22,549 ERROR [main]: ql.Driver
> (SessionState.java:printError(545)) - FAILED: Error in determing valid
> transactions: Error communicating with the metastore
> org.apache.hadoop.hive.ql.lockmgr.LockException: Error communicating with
> the metastore
> at
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.getValidTxns(DbTxnManager.java:281)
> at
> org.apache.hadoop.hive.ql.Driver.recordValidTxns(Driver.java:842)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1036)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
> at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
> at
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: org.apache.thrift.transport.TTransportException:
> java.net.SocketTimeoutException: Read timed out
> at
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
> at
> org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
>

Re: Unable to create table in Hive

2015-05-14 Thread kulkarni.swar...@gmail.com

Yeah. 0.13 isn't compatible with 1.0 HBase. We haven't made the jump the
HBase 1.0 yet. But Hive 1.1 is on HBase 0.98. And from what I know, there
aren't many breaking changes from 0.98 to 1.0 so you might give that a shot
a see if it works.

On Thu, May 14, 2015 at 3:30 PM, Ibrar Ahmed  wrote:

> I have also tried
>
> ADD FILE /usr/local/hbase/conf/hbase-site.xml;
> ADD JAR /usr/local/hive/lib/zookeeper-3.4.5.jar;
> ADD JAR /usr/local/hive/lib/hive-hbase-handler-0.13.0.jar;
> ADD JAR /usr/local/hive/lib/guava-11.0.2.jar;
> ADD JAR /usr/local/hbase/lib/hbase-client-1.0.1.jar;
> ADD JAR /usr/local/hbase/lib/hbase-common-1.0.1.jar;
> ADD JAR /usr/local/hbase/lib/hbase-protocol-1.0.1.jar;
> ADD JAR /usr/local/hbase/lib/hbase-server-1.0.1.jar;
> ADD JAR /usr/local/hbase/lib/hbase-shell-1.0.1.jar;
> ADD JAR /usr/local/hbase/lib/hbase-thrift-1.0.1.jar;
> ADD JAR /usr/local/hbase/lib/hbase-server-1.0.1.jar;
>
> CREATE TABLE abcd(key int, value string)  STORED BY
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'  WITH SERDEPROPERTIES
> ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("
> hbase.table.name" = "xyz");
>
>
> But "list jars" also shows nothing.
>
>
>
> On Fri, May 15, 2015 at 1:29 AM, Ibrar Ahmed 
> wrote:
>
>> Hive : 0.13
>> Hbase: 1.0.1
>>
>>
>>
>> On Fri, May 15, 2015 at 1:26 AM, kulkarni.swar...@gmail.com <
>> kulkarni.swar...@gmail.com> wrote:
>>
>>> Hi Ibrar,
>>>
>>> It seems like your hive and hbase versions are incompatible. What
>>> version of hive and hbase are you on?
>>>
>>> On Thu, May 14, 2015 at 3:21 PM, Ibrar Ahmed 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> While creating a table in Hive I am getting this error message.
>>>>
>>>> CREATE TABLE abcd(key int, value string)  STORED BY
>>>> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'  WITH SERDEPROPERTIES
>>>> ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("
>>>> hbase.table.name" = "xyz");
>>>>
>>>>
>>>> [Hive Error]: Query returned non-zero code: 1, cause: FAILED: Execution
>>>> Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
>>>> org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V
>>>>
>>>>
>>>
>>>
>>> --
>>> Swarnim
>>>
>>
>>
>>
>>


-- 
Swarnim

Re: Unable to create table in Hive

2015-05-14 Thread kulkarni.swar...@gmail.com

Hi Ibrar,

It seems like your hive and hbase versions are incompatible. What version
of hive and hbase are you on?

On Thu, May 14, 2015 at 3:21 PM, Ibrar Ahmed  wrote:

> Hi,
>
> While creating a table in Hive I am getting this error message.
>
> CREATE TABLE abcd(key int, value string)  STORED BY
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'  WITH SERDEPROPERTIES
> ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("
> hbase.table.name" = "xyz");
>
>
> [Hive Error]: Query returned non-zero code: 1, cause: FAILED: Execution
> Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
> org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V
>
>


-- 
Swarnim

Re: Hive/Hbase Integration issue

2015-05-13 Thread kulkarni.swar...@gmail.com

Ibrar,

This seems to be an issue with the cluster rather than the integration
itself. Can you make sure that HBase is happy and healthy and all RS are up
and running?

On Wed, May 13, 2015 at 1:58 PM, Ibrar Ahmed  wrote:

> Hi,
>
> I am creating a table using hive and getting this error.
>
> [127.0.0.1:1] hive> CREATE TABLE hbase_table_1(key int, value string)
>   > STORED BY
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>   > WITH SERDEPROPERTIES ("hbase.columns.mapping" =
> ":key,cf1:val")
>   > TBLPROPERTIES ("hbase.table.name" = "xyz");
>
>
>
> [Hive Error]: Query returned non-zero code: 1, cause: FAILED: Execution
> Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
> MetaException(message:org.apache.hadoop.hbase.client.RetriesExhaustedException:
> Can't get the locations
> at
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:305)
> at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:147)
> at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:56)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
> at
> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:288)
> at
> org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:267)
> at
> org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:139)
> at
> org.apache.hadoop.hbase.client.ClientScanner.(ClientScanner.java:134)
> at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:823)
> at
> org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:601)
> at
> org.apache.hadoop.hbase.MetaTableAccessor.tableExists(MetaTableAccessor.java:365)
> at
> org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:281)
> at
> org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:291)
> at
> org.apache.hadoop.hive.hbase.HBaseStorageHandler.preCreateTable(HBaseStorageHandler.java:162)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:554)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:547)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
> at com.sun.proxy.$Proxy7.createTable(Unknown Source)
> at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:613)
> at
> org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4194)
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:281)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
> at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1472)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1239)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1057)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:880)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:870)
> at
> org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:198)
> at
> org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:644)
> at
> org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:628)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> )
>
>
> Any help/clue can help.
>
>


-- 
Swarnim

Re: [ANNOUNCE] New Hive Committer - Mithun Radhakrishnan

2015-04-15 Thread kulkarni.swar...@gmail.com

Congratulations!!

On Wed, Apr 15, 2015 at 10:57 AM, Viraj Bhat 
wrote:

> Mithun Congrats!!
> Viraj
>
>   From: Carl Steinbach 
>  To: d...@hive.apache.org; user@hive.apache.org; mit...@apache.org
>  Sent: Tuesday, April 14, 2015 2:54 PM
>  Subject: [ANNOUNCE] New Hive Committer - Mithun Radhakrishnan
>
> The Apache Hive PMC has voted to make Mithun Radhakrishnan a committer on
> the Apache Hive Project.
> Please join me in congratulating Mithun.
> Thanks.
> - Carl
>
>
>
>



-- 
Swarnim

Re: [ANNOUNCE] New Hive PMC Member - Sergey Shelukhin

2015-02-27 Thread kulkarni.swar...@gmail.com

Congratulations Sergey! Well deserved!

On Fri, Feb 27, 2015 at 1:51 AM, Vinod Kumar Vavilapalli <
vino...@hortonworks.com> wrote:

> Congratulations and keep up the great work!
>
> +Vinod
>
> On Feb 25, 2015, at 8:43 AM, Carl Steinbach  wrote:
>
> > I am pleased to announce that Sergey Shelukhin has been elected to the
> Hive Project Management Committee. Please join me in congratulating Sergey!
> >
> > Thanks.
> >
> > - Carl
> >
>
>


-- 
Swarnim

Re: Hive-HBase Integration

2015-01-01 Thread kulkarni.swar...@gmail.com

Hey Mohan,

Could you detail your question a little bit more? Hopefully the wiki
here[1] solves your queries.

[1] https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration

On Thu, Jan 1, 2015 at 2:38 PM, Mohan Krishna 
wrote:

> Any body know Hive-HBase Integration process?
>
>
>
> Thanks
> Mohan
>

-- 
Swarnim

Re: Need urgent help on hive query performance

2014-05-30 Thread kulkarni.swar...@gmail.com

> It has innumerable no of joins. Since its client specific query, u
understand I cannot share. Sorry about that

Like I said, Joins are slow and in not done correctly could have terrible
performance. A couple of handy techniques depend on how exactly are you
trying to perform the join. For instance, if you are trying to join a
smaller table to a larger one, a map join could work well for you where the
smaller table is kept in-memory when the join is performed. Also if you are
able to break your table down to smaller buckets, you might as well be able
to use a bucketed map join for instance. Following link should be
helpful[1][2].

Hope this helps.

[1]
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization
[2]
http://stackoverflow.com/questions/20199077/hive-efficient-join-of-two-tables


On Fri, May 30, 2014 at 5:38 PM,  wrote:

>  Pls find the answers
>
>
>
>
>
>
>
> *From:* kulkarni.swar...@gmail.com [mailto:kulkarni.swar...@gmail.com]
> *Sent:* Friday, May 30, 2014 3:34 PM
>
> *To:* user@hive.apache.org
> *Subject:* Re: Need urgent help on hive query performance
>
>
>
> I feel it's pretty hard to answer this without understanding the following:
>
>
>
> 1.  What exactly are you trying to query? CSV? Avro? 
>
> HIVE table
>
> 2.  Where is your data? HDFS? HBase? Local filesystem?
>
> Data is in s3
>
> 3.  What version of hive are you using?
>
> Hive 0.12
>
> 4.  What is an example of a query that is slow? Some queries like
> joins and stuff would be inherently slower than other simpler ones(though
> can be optimized).
>
> It has innumerable no of joins. Since its client specific query, u
> understand I cannot share. Sorry about that
>
>
>
> Thanks,
>
>
>
> --
> Swarnim
>
>
>
> On Fri, May 30, 2014 at 5:32 PM,  wrote:
>
> Can you please give a specific example or blog to refer to. I did not
> understand
>
>
>
> *From:* Ashish Garg [mailto:gargcreation1...@gmail.com]
> *Sent:* Friday, May 30, 2014 3:31 PM
> *To:* user@hive.apache.org
> *Subject:* Re: Need urgent help on hive query performance
>
>
>
> try partitioning the table and run the queries which are partition
> specific. Hope this helps.
>
> Thanks and Regards,
>
> Ashish Garg.
>
>
>
> On Fri, May 30, 2014 at 6:05 PM,  wrote:
>
> Hi,
>
>
>
> Does anybody  help urgently on optimizing hive query performance? I am
> looking more Hadoop tuning point of view. Currently, small amount of table
> takes much time to query?
>
>
>
> We are running EMR cluster with 1 MASTER node, 2 Core Nodes and  Task
> Nodes.
>
>
>
> Quick help is much appreciated.
>
>
>
> Thanks,
>
> Shouvanik
>
>
>  --
>
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy.
>
> __
>
> www.accenture.com
>
>
>
>
>
>
>
> --
> Swarnim
>



-- 
Swarnim

Re: Need urgent help on hive query performance

2014-05-30 Thread kulkarni.swar...@gmail.com

I feel it's pretty hard to answer this without understanding the following:

1. What exactly are you trying to query? CSV? Avro? 
2. Where is your data? HDFS? HBase? Local filesystem?
3. What version of hive are you using?
4. What is an example of a query that is slow? Some queries like joins and
stuff would be inherently slower than other simpler ones(though can be
optimized).

Thanks,

-- 
Swarnim


On Fri, May 30, 2014 at 5:32 PM,  wrote:

>  Can you please give a specific example or blog to refer to. I did not
> understand
>
>
>
> *From:* Ashish Garg [mailto:gargcreation1...@gmail.com]
> *Sent:* Friday, May 30, 2014 3:31 PM
> *To:* user@hive.apache.org
> *Subject:* Re: Need urgent help on hive query performance
>
>
>
> try partitioning the table and run the queries which are partition
> specific. Hope this helps.
>
> Thanks and Regards,
>
> Ashish Garg.
>
>
>
> On Fri, May 30, 2014 at 6:05 PM,  wrote:
>
> Hi,
>
>
>
> Does anybody  help urgently on optimizing hive query performance? I am
> looking more Hadoop tuning point of view. Currently, small amount of table
> takes much time to query?
>
>
>
> We are running EMR cluster with 1 MASTER node, 2 Core Nodes and  Task
> Nodes.
>
>
>
> Quick help is much appreciated.
>
>
>
> Thanks,
>
> Shouvanik
>
>
>  --
>
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy.
>
> __
>
> www.accenture.com
>
>
>



-- 
Swarnim

Re: Querying Hbase table from Hive without mounting

2014-03-31 Thread kulkarni.swar...@gmail.com

Hi Manju,

If I am understanding correctly what you are trying to do, there is no
current great to achieve that with the existing hive hbase integration.
Ofcourse you can read and write data to HBase like you mentioned, but that
is pretty much it. If you need more fine grained access like accessing a
particular version or timestamp, I would recommend you to look at Apache
Phoenix[1].

Hope that helps.

[1] http://phoenix.incubator.apache.org/

On Mon, Mar 31, 2014 at 3:31 PM, Manju M wrote:

> Usually to access Hbase from Hive,  you will map Hbase table using
> .HBaseStorageHandler and specifying Hbase table in TBLPROPERTIES.
>
> But my question is ..I have to Access Hbase records directly .
>
>
> INSERT OVERWRITE TABLE top_cool_hbase SELECT name, map(`date`, cast(coolness 
> as int)) FROM* top_cool*
>
>
>  top_cool_hbase is hive table ( mapped to hbase table )
>
> top_cool is hbase table ( not a mapped Hive table)
>
>
>
> On Mon, Mar 31, 2014 at 12:42 PM, kulkarni.swar...@gmail.com <
> kulkarni.swar...@gmail.com> wrote:
>
>> Can you elaborate a little on what exactly you mean by "mounting"? The
>> least you will need to have hbase data query able in hive is to create an
>> external table on top of it.
>>
>>
>> On Mon, Mar 31, 2014 at 2:11 PM, Manju M wrote:
>>
>>> Without mapping /mounting the hbase table , how can I access and query
>>> hbase table ?
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Swarnim
>>
>
>

-- 
Swarnim

Re: Querying Hbase table from Hive without mounting

2014-03-31 Thread kulkarni.swar...@gmail.com

Can you elaborate a little on what exactly you mean by "mounting"? The
least you will need to have hbase data query able in hive is to create an
external table on top of it.

On Mon, Mar 31, 2014 at 2:11 PM, Manju M wrote:

> Without mapping /mounting the hbase table , how can I access and query
> hbase table ?
>
>
>
>

-- 
Swarnim

Re: Mapping HBase column qualifier with a ':' in it

2014-02-19 Thread kulkarni.swar...@gmail.com

The syntax for this is very similar to the original one.

CREATE EXTERNAL TABLE t (
  id string,
  cf map
) WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:*my_prefix.**")

and then it can be queried similar to how you were querying before.


On Wed, Feb 19, 2014 at 8:09 PM, kulkarni.swar...@gmail.com <
kulkarni.swar...@gmail.com> wrote:

> Hi Den,
>
> I think that is a valid solution. If you are using a version of hive >
> 0.12, you can also select columns from hbase using prefixes (introduced in
> [1]). Marginally more efficient than the "select all columns" approach and
> little more flexible as now just all columns sharing the given prefix need
> to have the same type.
>
> [1] https://issues.apache.org/jira/browse/HIVE-3725
>
>
> On Wed, Feb 19, 2014 at 5:22 PM, Den  wrote:
>
>> I've arrived at a workaround that I'm using for this for now. Basically I
>> have a map in the Hive table corresponding to the column family then I'm
>> able to select from it. The downside here is that every qualifier has to
>> have the same data type and storage type, which I suppose you could work
>> around by having two map columns pointing to the same HBase column family
>> with different value types and storage types. Here's the basic idea of it:
>>
>> CREATE EXTERNAL TABLE t (
>>   id string,
>>   cf map
>> ) WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:")
>>
>> then it can be queried as:
>>
>> select cf['q1:q2'] from t
>>
>>
>> On Mon, Feb 17, 2014 at 6:16 PM, Den  wrote:
>>
>>> I assume by 'escape' you mean something like:
>>>
>>> "hbase.columns.mapping" = ":key,cf:q1\:q2"
>>>
>>> That gave the same error as before. If that's not what you mean could
>>> you expand?
>>>
>>> I also tried to see if I could trick it by doing some binary encoding of
>>> the character
>>>
>>> "hbase.columns.mapping" = ":key,cf:q1\x3Aq2"
>>>
>>> where \x3A is the ascii hex code for ':'. That also didn't work either,
>>> the 'create external table' went through but all the values were NULL.
>>>
>>> I dug around a bit in the source and found that in HBaseSerDe.java in
>>> the parseColumnsMapping(...) function there is the following:
>>>
>>>   int idxFirst = colInfo.indexOf(":");
>>>   int idxLast = colInfo.lastIndexOf(":");
>>>
>>>   if (idxFirst < 0 || !(idxFirst == idxLast)) {
>>> throw new SerDeException("Error: the HBase columns mapping contains 
>>> a badly formed " +
>>> "column family, column qualifier specification.");
>>>   }
>>>
>>> It seems that this will throw this error if there is not exactly 1 colon in 
>>> the HBase column to map. So short of tricking it into thinking something 
>>> else is a colon there might not be any way to map my columns without 
>>> renaming them first. Thoughts?
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Feb 14, 2014 at 10:50 AM, kulkarni.swar...@gmail.com <
>>> kulkarni.swar...@gmail.com> wrote:
>>>
>>>> Hi Den,
>>>>
>>>> Have you tried escaping the additional colon in the qualifier name?
>>>>
>>>>
>>>> On Fri, Feb 14, 2014 at 9:47 AM, Den  wrote:
>>>>
>>>>> I'm working with an HBase database with a column of the form
>>>>> 'cf:q1:q2' where 'cf' is the column family 'q1:q2' is the column 
>>>>> qualifier.
>>>>> When trying to map this in Hive I'm using a statement like the following:
>>>>>
>>>>> CREATE EXTERNAL TABLE t (
>>>>>   id string
>>>>>   q1_q2 string
>>>>> ) WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:q1:q2")
>>>>>
>>>>> I get an error saying
>>>>>
>>>>> Error: the HBase columns mapping contains a badly formed column
>>>>> family, column qualifier specification.
>>>>>
>>>>> This seems to be due to the colon in the column qualifier. It seems to
>>>>> demand that there be exactly on colon in the field name and it has to be
>>>>> the one separating the column family from the column qualifier.
>>>>>
>>>>> Is there a reason that is the case? Is there any way around it so I
>>>>> can map the columns from the HBase DB to Hive?
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Swarnim
>>>>
>>>
>>>
>>
>
>
> --
> Swarnim
>



-- 
Swarnim

Re: Mapping HBase column qualifier with a ':' in it

2014-02-19 Thread kulkarni.swar...@gmail.com

Hi Den,

I think that is a valid solution. If you are using a version of hive >
0.12, you can also select columns from hbase using prefixes (introduced in
[1]). Marginally more efficient than the "select all columns" approach and
little more flexible as now just all columns sharing the given prefix need
to have the same type.

[1] https://issues.apache.org/jira/browse/HIVE-3725


On Wed, Feb 19, 2014 at 5:22 PM, Den  wrote:

> I've arrived at a workaround that I'm using for this for now. Basically I
> have a map in the Hive table corresponding to the column family then I'm
> able to select from it. The downside here is that every qualifier has to
> have the same data type and storage type, which I suppose you could work
> around by having two map columns pointing to the same HBase column family
> with different value types and storage types. Here's the basic idea of it:
>
> CREATE EXTERNAL TABLE t (
>   id string,
>   cf map
> ) WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:")
>
> then it can be queried as:
>
> select cf['q1:q2'] from t
>
>
> On Mon, Feb 17, 2014 at 6:16 PM, Den  wrote:
>
>> I assume by 'escape' you mean something like:
>>
>> "hbase.columns.mapping" = ":key,cf:q1\:q2"
>>
>> That gave the same error as before. If that's not what you mean could you
>> expand?
>>
>> I also tried to see if I could trick it by doing some binary encoding of
>> the character
>>
>> "hbase.columns.mapping" = ":key,cf:q1\x3Aq2"
>>
>> where \x3A is the ascii hex code for ':'. That also didn't work either,
>> the 'create external table' went through but all the values were NULL.
>>
>> I dug around a bit in the source and found that in HBaseSerDe.java in
>> the parseColumnsMapping(...) function there is the following:
>>
>>   int idxFirst = colInfo.indexOf(":");
>>   int idxLast = colInfo.lastIndexOf(":");
>>
>>   if (idxFirst < 0 || !(idxFirst == idxLast)) {
>> throw new SerDeException("Error: the HBase columns mapping contains 
>> a badly formed " +
>> "column family, column qualifier specification.");
>>   }
>>
>> It seems that this will throw this error if there is not exactly 1 colon in 
>> the HBase column to map. So short of tricking it into thinking something 
>> else is a colon there might not be any way to map my columns without 
>> renaming them first. Thoughts?
>>
>>
>>
>>
>> On Fri, Feb 14, 2014 at 10:50 AM, kulkarni.swar...@gmail.com <
>> kulkarni.swar...@gmail.com> wrote:
>>
>>> Hi Den,
>>>
>>> Have you tried escaping the additional colon in the qualifier name?
>>>
>>>
>>> On Fri, Feb 14, 2014 at 9:47 AM, Den  wrote:
>>>
>>>> I'm working with an HBase database with a column of the form 'cf:q1:q2'
>>>> where 'cf' is the column family 'q1:q2' is the column qualifier. When
>>>> trying to map this in Hive I'm using a statement like the following:
>>>>
>>>> CREATE EXTERNAL TABLE t (
>>>>   id string
>>>>   q1_q2 string
>>>> ) WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:q1:q2")
>>>>
>>>> I get an error saying
>>>>
>>>> Error: the HBase columns mapping contains a badly formed column family,
>>>> column qualifier specification.
>>>>
>>>> This seems to be due to the colon in the column qualifier. It seems to
>>>> demand that there be exactly on colon in the field name and it has to be
>>>> the one separating the column family from the column qualifier.
>>>>
>>>> Is there a reason that is the case? Is there any way around it so I can
>>>> map the columns from the HBase DB to Hive?
>>>>
>>>
>>>
>>>
>>> --
>>> Swarnim
>>>
>>
>>
>


-- 
Swarnim

Re: Mapping HBase column qualifier with a ':' in it

2014-02-14 Thread kulkarni.swar...@gmail.com

Hi Den,

Have you tried escaping the additional colon in the qualifier name?


On Fri, Feb 14, 2014 at 9:47 AM, Den  wrote:

> I'm working with an HBase database with a column of the form 'cf:q1:q2'
> where 'cf' is the column family 'q1:q2' is the column qualifier. When
> trying to map this in Hive I'm using a statement like the following:
>
> CREATE EXTERNAL TABLE t (
>   id string
>   q1_q2 string
> ) WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:q1:q2")
>
> I get an error saying
>
> Error: the HBase columns mapping contains a badly formed column family,
> column qualifier specification.
>
> This seems to be due to the colon in the column qualifier. It seems to
> demand that there be exactly on colon in the field name and it has to be
> the one separating the column family from the column qualifier.
>
> Is there a reason that is the case? Is there any way around it so I can
> map the columns from the HBase DB to Hive?
>



-- 
Swarnim

Re: hive hbase integration

2013-12-26 Thread kulkarni.swar...@gmail.com

Seems like you are running hive on yarn instead of mr1. I have had some
issues in the past doing so. The post here[1] has some solutions on how to
configure hive ot work with yarn. Hope that helps.

[1]
https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/gHVq9C5H6RE


On Thu, Dec 26, 2013 at 10:35 AM, Vikas Parashar wrote:

> Hi,
>
> I am integrating hive(0.12) with hbase(0.96). Everything is working fine
> there but get stuck between two quires.
>
> When i create table or select * from table then it's working fine .
> but in case of select count(*) from table it give me below error.
>
>
> 2013-12-26 13:25:01,864 ERROR ql.Driver
> (SessionState.java:printError(419)) - FAILED: Execution Error, return code
> 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> 2013-12-26 13:25:01,869 WARN  mapreduce.Counters
> (AbstractCounters.java:getGroup(234)) - Group FileSystemCounters is
> deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
> 2013-12-26 14:25:44,119 WARN  mapreduce.JobSubmitter
> (JobSubmitter.java:copyAndConfigureFiles(149)) - Hadoop command-line option
> parsing not performed. Implement the Tool interface and execute your
> application with ToolRunner to remedy this.
> 2013-12-26 14:26:14,677 WARN  mapreduce.Counters
> (AbstractCounters.java:getGroup(234)) - Group
> org.apache.hadoop.mapred.Task$Counter is deprecated. Use
> org.apache.hadoop.mapreduce.TaskCounter instead
> 2013-12-26 14:26:33,613 WARN  mapreduce.Counters
> (AbstractCounters.java:getGroup(234)) - Group
> org.apache.hadoop.mapred.Task$Counter is deprecated. Use
> org.apache.hadoop.mapreduce.TaskCounter instead
> 2013-12-26 14:27:30,355 WARN  mapreduce.Counters
> (AbstractCounters.java:getGroup(234)) - Group
> org.apache.hadoop.mapred.Task$Counter is deprecated. Use
> org.apache.hadoop.mapreduce.TaskCounter instead
> 2013-12-26 14:27:32,479 WARN  mapreduce.Counters
> (AbstractCounters.java:getGroup(234)) - Group
> org.apache.hadoop.mapred.Task$Counter is deprecated. Use
> org.apache.hadoop.mapreduce.TaskCounter instead
> 2013-12-26 14:27:32,528 ERROR exec.Task
> (SessionState.java:printError(419)) - Ended Job = job_1388037394132_0013
> with errors
> 2013-12-26 14:27:32,530 ERROR exec.Task
> (SessionState.java:printError(419)) - Error during job, obtaining debugging
> information...
> 2013-12-26 14:27:32,538 ERROR exec.Task
> (SessionState.java:printError(419)) - Examining task ID:
> task_1388037394132_0013_m_00 (and more) from job job_1388037394132_0013
> 2013-12-26 14:27:32,539 WARN  shims.HadoopShimsSecure
> (Hadoop23Shims.java:getTaskAttemptLogUrl(72)) - Can't fetch tasklog:
> TaskLogServlet is not supported in MR2 mode.
> 2013-12-26 14:27:32,593 WARN  shims.HadoopShimsSecure
> (Hadoop23Shims.java:getTaskAttemptLogUrl(72)) - Can't fetch tasklog:
> TaskLogServlet is not supported in MR2 mode.
> 2013-12-26 14:27:32,596 WARN  shims.HadoopShimsSecure
> (Hadoop23Shims.java:getTaskAttemptLogUrl(72)) - Can't fetch tasklog:
> TaskLogServlet is not supported in MR2 mode.
> 2013-12-26 14:27:32,599 WARN  shims.HadoopShimsSecure
> (Hadoop23Shims.java:getTaskAttemptLogUrl(72)) - Can't fetch tasklog:
> TaskLogServlet is not supported in MR2 mode.
> 2013-12-26 14:27:32,615 ERROR exec.Task
> (SessionState.java:printError(419)) -
> Task with the most failures(4):
> -
> Task ID:
>   task_1388037394132_0013_m_00
>
> URL:
>
> http://ambari1.hadoop.com:8088/taskdetails.jsp?jobid=job_1388037394132_0013&tipid=task_1388037394132_0013_m_00
> -
> Diagnostic Messages for this Task:
> Error: java.io.IOException: java.io.IOException:
> java.lang.reflect.InvocationTargetException
> at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
>  at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:244)
>  at
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:538)
> at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:167)
>  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:408)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
> Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException
> at
> org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:383)
>  at
> org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:360)
> at
> org.apache.hadoop.hbase.client.HConnection

Re: can hive do paging query result?

2013-08-18 Thread kulkarni.swar...@gmail.com

You can use beeswax from hue. It will neatly page your results.


On Sun, Aug 18, 2013 at 11:39 PM, Nitin Pawar wrote:

> it can not page, it displays all the results on the console
>
> to avoid this,
>
> we either put the output in another table or put it inside a file
>
>
> On Mon, Aug 19, 2013 at 8:16 AM, ch huang  wrote:
>
>> the limit cause has no offset option,how to page result query by hive?
>
>
>
>
> --
> Nitin Pawar
>



-- 
Swarnim

Re: Composite blob key mapping in hive

2013-07-29 Thread kulkarni.swar...@gmail.com

Yes. It is possible to do that. The attached patch on the bug adds in a new
HBaseCompositeKey class that consumers can extend to provide their own
implementations. This will help hive understand their custom arrangement of
the composite keys.

If you can try the patch and let me know if it worked on you, that will be
awesome!


On Mon, Jul 29, 2013 at 11:47 PM, G.S.Vijay Raajaa
wrote:

> Hi,
>
> Thanks for the reply. The workaround can help me if it is a composite
> string literal with separators in the same. I would like to know if it
> works with the following constraints:
>
> 1) Is it possible to map a composite key based on the length instead of
> separators.
> *eg: *Map first 10 bytes , then last 4 bytes as a struct in Hive.
>
> 2) Is it possible to convert the composite key stored as serialized blob
> in the form of byte array and map the same to Hive??
>
> Thanks,
> Vijay Raajaa G S
>
>
> On Mon, Jul 29, 2013 at 6:10 PM,  wrote:
>
>> Hi,
>>
>> Please refer to the workaround posted on HIVE-2599 and let me know if
>> that works for you.
>>
>> On Jul 29, 2013, at 6:22 AM, "G.S.Vijay Raajaa" 
>> wrote:
>>
>> > Hi,
>> >
>> >  I would like to know if it is possible to map a composite key
>> stored as blob in HBase to Hive??
>> >
>> > Regards,
>> > Vijay Raajaa G S
>>
>
>


-- 
Swarnim

Re: error in running hive query

2013-07-19 Thread kulkarni.swar...@gmail.com

> Error: Java heap space

Guess this should give a hint.


On Fri, Jul 19, 2013 at 4:22 AM, ch huang  wrote:

> why the task failed? anyone can help?
>
>
> hive> select cookieid,count(url) as visit_num from alex_test_big_seq group
> by cookieid order by visit_num desc limit 10;
>
> MapReduce Total cumulative CPU time: 49 minutes 20 seconds 870 msec
> Ended Job = job_1374214993631_0037 with errors
> Error during job, obtaining debugging information...
> Job Tracking URL: 
> http://CH22:8088/proxy/application_1374214993631_0037/
> Examining task ID: task_1374214993631_0037_m_07 (and more) from job
> job_1374214993631_0037
> Examining task ID: task_1374214993631_0037_m_02 (and more) from job
> job_1374214993631_0037
> Examining task ID: task_1374214993631_0037_m_28 (and more) from job
> job_1374214993631_0037
> Examining task ID: task_1374214993631_0037_m_35 (and more) from job
> job_1374214993631_0037
> Examining task ID: task_1374214993631_0037_m_46 (and more) from job
> job_1374214993631_0037
> Examining task ID: task_1374214993631_0037_m_54 (and more) from job
> job_1374214993631_0037
> Examining task ID: task_1374214993631_0037_m_20 (and more) from job
> job_1374214993631_0037
> Examining task ID: task_1374214993631_0037_m_40 (and more) from job
> job_1374214993631_0037
> Examining task ID: task_1374214993631_0037_m_40 (and more) from job
> job_1374214993631_0037
> Examining task ID: task_1374214993631_0037_m_79 (and more) from job
> job_1374214993631_0037
>
> Task with the most failures(4):
> -
> Task ID:
>   task_1374214993631_0037_m_11
>
> URL:
>
> http://CH22:8088/taskdetails.jsp?jobid=job_1374214993631_0037&tipid=task_1374214993631_0037_m_11
> -
> Diagnostic Messages for this Task:
> Error: Java heap space
>
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapRedTask
> MapReduce Jobs Launched:
> Job 0: Map: 114  Reduce: 33   Cumulative CPU: 2960.87 sec   HDFS Read:
> 22037646479 HDFS Write: 0 FAIL
> Total MapReduce CPU Time Spent: 49 minutes 20 seconds 870 msec
>



-- 
Swarnim

Re: which approach is better

2013-07-17 Thread kulkarni.swar...@gmail.com

First of all, that might not be the right approach to choose the underlying
storage. You should choose HDFS or HBase depending on whether the data is
going to be used for batch processing or you need random access on top of
it. HBase is just another layer on top of HDFS. So obviously the queries
running on top of HBase are going to be less efficient. So if you can get
away with using HDFS, I would say that is the best and simplest approach.

On Wed, Jul 17, 2013 at 12:40 PM, Hamza Asad  wrote:

> Please let me knw which approach is better. Either i save my data directly
> to HDFS and run hive (shark) queries over it OR store my data in HBASE, and
> then query it.. as i want to ensure efficient data retrieval and data
> remains safe and can easily recover if hadoop crashes.
>
> --
> *Muhammad Hamza Asad*
>

-- 
Swarnim

Re: Hive 0.11 with Cloudera CHD4.3 MR v1

2013-07-16 Thread kulkarni.swar...@gmail.com

This error is not the actual reason why your job failed. Please look into
your jobtracker logs to know the real reason. This error simply means that
hive attempted to connect to JT to gather debugging info for your failed
job but could not due to a classpath error.


On Tue, Jul 16, 2013 at 4:50 PM, Sunita Arvind wrote:

> Hi Jim,
>
> I am new to hive too so cannot suggest much on that front. However, I'm
> pretty sure that this error indicates that a particular class is missing in
> your classpath. In the sense, your hive runtime is not able to locate the
> class org.apache.hadoop.mapreduce.util.HostUtil. Double check your
> HADOOP_HOME and any other variable that will configure paths and classpaths
> for hive.
>
> Hope this helps.
>
> All the best!
> Sunita
>
>
> On Mon, Jul 15, 2013 at 9:32 PM, Jim Colestock 
> wrote:
>
>> Hello All,
>>
>> Has anyone been successful at running hive 0.11 with Cloudera CDH 4.3?
>>
>> I've been able to get hive to connect to my metadb (which is in
>> Postgres).  Verified by doing a show tables..  I can run explain and
>> describes on tables, but when I try to run anything that fires off an M/R
>> job, I get the following error:
>>
>> hive>select count(*) from tableA;
>> Total MapReduce jobs = 1
>> Launching Job 1 out of 1
>> Number of reduce tasks determined at compile time: 1
>> In order to change the average load for a reducer (in bytes):
>>   set hive.exec.reducers.bytes.per.reducer=
>> In order to limit the maximum number of reducers:
>>   set hive.exec.reducers.max=
>> In order to set a constant number of reducers:
>>   set mapred.reduce.tasks=
>> Starting Job = job_201307112247_13816, Tracking URL =
>> http://master:50030/jobdetails.jsp?jobid=job_201307112247_13816
>> Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill
>> job_201307112247_13816
>> Hadoop job information for Stage-1: number of mappers: 1; number of
>> reducers: 1
>> 2013-07-12 02:11:42,829 Stage-1 map = 0%,  reduce = 0%
>> 2013-07-12 02:12:08,173 Stage-1 map = 100%,  reduce = 100%
>> Ended Job = job_201307112247_13816 with errors
>> Error during job, obtaining debugging information...
>> Job Tracking URL:
>> http://master:50030/jobdetails.jsp?jobid=job_201307112247_13816
>> Examining task ID: task_201307112247_13816_m_02 (and more) from job
>> job_201307112247_13816
>> Exception in thread "Thread-19" java.lang.NoClassDefFoundError:
>> org/apache/hadoop/mapreduce/util/HostUtil
>>  at
>> org.apache.hadoop.hive.shims.Hadoop23Shims.getTaskAttemptLogUrl(Hadoop23Shims.java:61)
>> at
>> org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.getTaskInfos(JobDebugger.java:186)
>>  at
>> org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.run(JobDebugger.java:142)
>> at java.lang.Thread.run(Thread.java:619)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.hadoop.mapreduce.util.HostUtil
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>> at java.security.AccessController.doPrivileged(Native Method)
>>  at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>>  ... 4 more
>> FAILED: Execution Error, return code 2 from
>> org.apache.hadoop.hive.ql.exec.MapRedTask
>> MapReduce Jobs Launched:
>> Job 0: Map: 1  Reduce: 1   HDFS Read: 0 HDFS Write: 0 FAIL
>> Total MapReduce CPU Time Spent: 0 msec
>>
>>
>> I'm using my configs from hive 0.10, which works with no issues and this
>> was pretty much a drop in replacement on the machine that hadoop 0.10 was
>> running on..
>>
>> I've done a bunch of googling around and have found a bunch of other
>> folks that have have had the same issue, but no solid answers..
>>
>> Thanks in advance for any help..
>>
>> JC
>>
>>
>>
>


-- 
Swarnim

Re: "show table" throwing strange error

2013-06-21 Thread kulkarni.swar...@gmail.com

More often than not in my experience is caused by a malformed
hive-site.xml(or hive-default.xml). When this happened to me, it was
because I somehow had tab characters in my hive-site.xml. Try dropping the
file(s) and recreate with appropriate formatting.


On Fri, Jun 21, 2013 at 2:17 PM, Sanjay Subramanian <
sanjay.subraman...@wizecommerce.com> wrote:

>  Can u stop following services
> hive-server
> hive-meta-store
> Hive-server2 (if u r running that)
>
>  Move current hive.log some place else
>
>  Start following services
>  hive-server
> hive-meta-store
> Hive-server2 (if u r running that)
>
>
>  And check hive.log ?
>
>  Also can u paste the CREATE TABLe script verbatim here…I feel if u are
> using some custom INPUTFORMAT / OUTPUTFORMAT class  that have to be
> specified in quotes…u may have to be *escape* that
>
>  Plus try and add a semicolon to the end of the create table script ...
>
>  sanjay
>
>   From: Mohammad Tariq 
> Reply-To: "user@hive.apache.org" 
> Date: Thursday, June 20, 2013 12:52 PM
>
> To: user 
> Subject: Re: "show table" throwing strange error
>
>   Thank you for looking into it Sanjay. "show tables" is working fine
> from both Ubuntu and Hive shell. But i'm getting the same error as
> yesterday when i'm running "create table", which is :
>
>  line 1:30 character '' not supported here
> line 1:31 character '' not supported here
> line 1:32 character '' not supported here
> line 1:33 character '' not supported here
> line 1:34 character '' not supported here
> line 1:35 character '' not supported here
> line 1:36 character '' not supported here
> line 1:37 character '' not supported here
> line 1:38 character '' not supported here
> line 1:39 character '' not supported here
> line 1:40 character '' not supported here
> line 1:41 character '' not supported here
> line 1:42 character '' not supported here
> .
> .
> .
> .
>
>  Also, I have noticed 1 strange thing. "hive.log" is totally messed up.
> Looks like logs are getting written in some binary encoding. I have
> attached a snapshot of the same. Any idea?
>
>  Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Fri, Jun 21, 2013 at 1:03 AM, Sanjay Subramanian <
> sanjay.subraman...@wizecommerce.com> wrote:
>
>>  Can u try from your ubuntu command prompt
>> $> hive -e "show tables"
>>
>>   From: Mohammad Tariq 
>> Reply-To: "user@hive.apache.org" 
>> Date: Thursday, June 20, 2013 4:28 AM
>> To: user 
>>
>> Subject: Re: "show table" throwing strange error
>>
>>   Thank you for the response ma'am. It didn't help either.
>>
>>  Warm Regards,
>> Tariq
>> cloudfront.blogspot.com
>>
>>
>> On Thu, Jun 20, 2013 at 8:43 AM, Sunita Arvind wrote:
>>
>>>  Your issue seems familiar. Try logging out of hive session and
>>> re-login.
>>>
>>>  Sunita
>>>
>>>
>>> On Wed, Jun 19, 2013 at 8:53 PM, Mohammad Tariq wrote:
>>>
 Hello list,

   I have a hive(0.9.0) setup on my Ubuntu box running
 hadoop-1.0.4. Everything was going smooth till now. But today when I issued
 *show tables* I got some strange error on the CLI. Here is the error :

  hive> show tables;
 FAILED: Parse Error: line 1:0 character '' not supported here
 line 1:1 character '' not supported here
 line 1:2 character '' not supported here
 line 1:3 character '' not supported here
 line 1:4 character '' not supported here
 line 1:5 character '' not supported here
 line 1:6 character '' not supported here
 line 1:7 character '' not supported here
 line 1:8 character '' not supported here
 line 1:9 character '' not supported here
 line 1:10 character '' not supported here
 line 1:11 character '' not supported here
 line 1:12 character '' not supported here
 line 1:13 character '' not supported here
 line 1:14 character '' not supported here
 line 1:15 character '' not supported here
 line 1:16 character '' not supported here
 line 1:17 character '' not supported here
 line 1:18 character '' not supported here
 line 1:19 character '' not supported here
 line 1:20 character '' not supported here
 line 1:21 character '' not supported here
 line 1:22 character '' not supported here
 line 1:23 character '' not supported here
 line 1:24 character '' not supported here
 line 1:25 character '' not supported here
 line 1:26 character '' not supported here
 line 1:27 character '' not supported here
 line 1:28 character '' not supported here
 line 1:29 character '' not supported here
 line 1:30 character '' not supported here
 line 1:31 character '' not supported here
 line 1:32 character '' not supported here
 line 1:33 character '' not supported here
 line 1:34 character '' not supported here
 line 1:35 character '' not supported here
 line 1:36 character '' not supported here
 line 1:37 character '' not supported here
 line 1:38 character '' not supported here
 line 1:39 character '' not supported h

Re: Hive Web Interface

2013-05-16 Thread kulkarni.swar...@gmail.com

AFAIK Hive HWI has been deprecated and you should be using hue/beeswax for
all your web interface needs.


On Thu, May 16, 2013 at 11:18 AM, Aniket Mokashi wrote:

> In your hive-site.xml, change value to "lib/hive-hwi-0.9.0.war" from
> "/lib/hive-hwi-0.9.0.war". I guess its a known issue with hwi.
>
> ~Aniket
>
>
> On Thu, May 16, 2013 at 8:58 AM, Stephen Sprague wrote:
>
>> ok. i'll bite.  you've cut 'n pasted the stderr to us -- but have you any
>> further comment on what you did after reading it?  Take that second line
>> for instance.  What action would you take after reading that?
>>
>>
>>
>>
>> On Wed, May 15, 2013 at 10:24 PM, Something Something <
>> mailinglist...@gmail.com> wrote:
>>
>>> I have installed Hive locally & I am able to run Hive queries etc.  Now
>>> I would like to try out Hive Web Interface, but when I try to start the
>>> webserver I run into this:
>>>
>>> ./hive --service hwi
>>> 13/05/15 22:18:33 INFO hwi.HWIServer: HWI is starting up
>>> 13/05/15 22:18:33 WARN conf.HiveConf: hive-site.xml not found on
>>> CLASSPATH
>>> 13/05/15 22:18:34 FATAL hwi.HWIServer: HWI WAR file not found at
>>> /lib/hive-hwi-0.9.0.war
>>>
>>
>>
>
>
> --
> "...:::Aniket:::... Quetzalco@tl"
>



-- 
Swarnim

Re: Partitioning an external hbase table

2013-05-16 Thread kulkarni.swar...@gmail.com

Unfortunately I don't think there is a clean way to achieve that (atleast
not one that I know of). Your option at this point is to run your queries
with a WHERE clause so that the predicate behind the scenes gets converted
to a range scan and restricts the amount of data that is being getting
scanned.

On Wed, May 15, 2013 at 8:22 PM, MailingList
wrote:

>  Is it possible to define partitions for a external table backed by Hbase?
> If so what is the proper syntax?
>
> I already have an external table backed by base and I'm finding that for
> even simple SELECT queries the load isn't getting evenly distributed a
> across the map tasks.  Some tasks see as few as a few hundred map input
> records while others receive more than a million.
>
>

-- 
Swarnim

Re: Getting Started

2013-05-02 Thread kulkarni.swar...@gmail.com

But that would still use the HADOOP_CLASSPATH right?


On Thu, May 2, 2013 at 12:52 PM, Cyril Bogus  wrote:

> But right now I am just trying to run it as standalone (no need to check
> for the packages I assume) with hadoop's hdfs in order to do some indexing
> on data already present in the hdfs
>
>
> On Thu, May 2, 2013 at 1:50 PM, Cyril Bogus  wrote:
>
>> Actually two the one from hadoop (which is the same as from the one in
>> the hive package) and the one from mahout 0.7 which is newer antlr 3.2
>>
>>
>> On Thu, May 2, 2013 at 1:47 PM, kulkarni.swar...@gmail.com <
>> kulkarni.swar...@gmail.com> wrote:
>>
>>> Do you have a different version of antlr jar in your classpath other
>>> than the one packaged with hive?
>>>
>>>
>>> On Thu, May 2, 2013 at 12:38 PM, Cyril Bogus wrote:
>>>
>>>> I am using the default setup for the hive-site.xml so the meta store is
>>>> in /user/hive/warehouse in the hdfs (Which I have setup as specified under
>>>> Getting started in the hive website.
>>>>
>>>> Here is the output from the command.
>>>>
>>>>
>>>>
>>>> hive -hiveconf hive.root.logger=INFO,console -e  "show databases"
>>>> WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated.
>>>> Please use org.apache.hadoop.log.metrics.EventCounter in all the
>>>> log4j.properties files.
>>>> Logging initialized using configuration in
>>>> jar:file:/home/hive/lib/hive-common-0.10.0.jar!/hive-log4j.properties
>>>> 13/05/02 13:37:09 INFO SessionState: Logging initialized using
>>>> configuration in
>>>> jar:file:/home/hive/lib/hive-common-0.10.0.jar!/hive-log4j.properties
>>>> Hive history
>>>> file=/tmp/cyrille/hive_job_log_cyrille_201305021337_431171577.txt
>>>> 13/05/02 13:37:09 INFO exec.HiveHistory: Hive history
>>>> file=/tmp/cyrille/hive_job_log_cyrille_201305021337_431171577.txt
>>>> 13/05/02 13:37:09 INFO ql.Driver: 
>>>> 13/05/02 13:37:09 INFO ql.Driver: 
>>>> 13/05/02 13:37:09 INFO ql.Driver: 
>>>> 13/05/02 13:37:09 INFO parse.ParseDriver: Parsing command: show
>>>> databases
>>>> 13/05/02 13:37:09 INFO ql.Driver: >>> start=1367516229487 end=1367516229681 duration=194>
>>>>
>>>> Exception in thread "main" java.lang.NoSuchFieldError: type
>>>> at
>>>> org.apache.hadoop.hive.ql.parse.HiveLexer.mKW_SHOW(HiveLexer.java:1305)
>>>> at
>>>> org.apache.hadoop.hive.ql.parse.HiveLexer.mTokens(HiveLexer.java:6439)
>>>> at org.antlr.runtime.Lexer.nextToken(Lexer.java:84)
>>>> at
>>>> org.antlr.runtime.CommonTokenStream.fillBuffer(CommonTokenStream.java:95)
>>>> at org.antlr.runtime.CommonTokenStream.LT
>>>> (CommonTokenStream.java:238)
>>>> at
>>>> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:573)
>>>> at
>>>> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:439)
>>>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:416)
>>>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
>>>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:893)
>>>> at
>>>> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>>>> at
>>>> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
>>>> at
>>>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
>>>> at
>>>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
>>>> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:706)
>>>>
>>>> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>> at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> at java.lang.reflect.Method.invoke(Method.java:616)
>>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>>
>>>>
>>>>
>>>> On Thu, May 2, 2013 at 1:25 PM, Sanjay Subramanian <
>>>> sanjay.subraman...@wizecommerce.co

Re: Getting Started

2013-05-02 Thread kulkarni.swar...@gmail.com

Ok. Try replacing the jar in HIVE_HOME/lib with antlr 3.2.


On Thu, May 2, 2013 at 12:50 PM, Cyril Bogus  wrote:

> Actually two the one from hadoop (which is the same as from the one in the
> hive package) and the one from mahout 0.7 which is newer antlr 3.2
>
>
> On Thu, May 2, 2013 at 1:47 PM, kulkarni.swar...@gmail.com <
> kulkarni.swar...@gmail.com> wrote:
>
>> Do you have a different version of antlr jar in your classpath other than
>> the one packaged with hive?
>>
>>
>> On Thu, May 2, 2013 at 12:38 PM, Cyril Bogus wrote:
>>
>>> I am using the default setup for the hive-site.xml so the meta store is
>>> in /user/hive/warehouse in the hdfs (Which I have setup as specified under
>>> Getting started in the hive website.
>>>
>>> Here is the output from the command.
>>>
>>>
>>>
>>> hive -hiveconf hive.root.logger=INFO,console -e  "show databases"
>>> WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated.
>>> Please use org.apache.hadoop.log.metrics.EventCounter in all the
>>> log4j.properties files.
>>> Logging initialized using configuration in
>>> jar:file:/home/hive/lib/hive-common-0.10.0.jar!/hive-log4j.properties
>>> 13/05/02 13:37:09 INFO SessionState: Logging initialized using
>>> configuration in
>>> jar:file:/home/hive/lib/hive-common-0.10.0.jar!/hive-log4j.properties
>>> Hive history
>>> file=/tmp/cyrille/hive_job_log_cyrille_201305021337_431171577.txt
>>> 13/05/02 13:37:09 INFO exec.HiveHistory: Hive history
>>> file=/tmp/cyrille/hive_job_log_cyrille_201305021337_431171577.txt
>>> 13/05/02 13:37:09 INFO ql.Driver: 
>>> 13/05/02 13:37:09 INFO ql.Driver: 
>>> 13/05/02 13:37:09 INFO ql.Driver: 
>>> 13/05/02 13:37:09 INFO parse.ParseDriver: Parsing command: show databases
>>> 13/05/02 13:37:09 INFO ql.Driver: >> start=1367516229487 end=1367516229681 duration=194>
>>>
>>> Exception in thread "main" java.lang.NoSuchFieldError: type
>>> at
>>> org.apache.hadoop.hive.ql.parse.HiveLexer.mKW_SHOW(HiveLexer.java:1305)
>>> at
>>> org.apache.hadoop.hive.ql.parse.HiveLexer.mTokens(HiveLexer.java:6439)
>>> at org.antlr.runtime.Lexer.nextToken(Lexer.java:84)
>>> at
>>> org.antlr.runtime.CommonTokenStream.fillBuffer(CommonTokenStream.java:95)
>>> at org.antlr.runtime.CommonTokenStream.LT
>>> (CommonTokenStream.java:238)
>>> at
>>> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:573)
>>> at
>>> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:439)
>>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:416)
>>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
>>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:893)
>>> at
>>> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>>> at
>>> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
>>> at
>>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
>>> at
>>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
>>> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:706)
>>>
>>> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:616)
>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>
>>>
>>>
>>> On Thu, May 2, 2013 at 1:25 PM, Sanjay Subramanian <
>>> sanjay.subraman...@wizecommerce.com> wrote:
>>>
>>>>  Can u share  your hive-site.xml ? What meta store r u using ?
>>>> Also try this to get additional debug messages that u can use to
>>>> analyze the problem
>>>>
>>>>  From your linux command prompt run the following and tell us what u
>>>> see. Also hive-site.xml please
>>>>
>>>> /path/to/hive -hiveconf hive.root.logger=INFO,console -e  "show
>>>> databases"
>>>>
>>>>   From: Cyril Bogus 
>>>> Reply-To: "user@hive.apache.org" 
>>>

Re: Getting Started

2013-05-02 Thread kulkarni.swar...@gmail.com

Do you have a different version of antlr jar in your classpath other than
the one packaged with hive?


On Thu, May 2, 2013 at 12:38 PM, Cyril Bogus  wrote:

> I am using the default setup for the hive-site.xml so the meta store is in
> /user/hive/warehouse in the hdfs (Which I have setup as specified under
> Getting started in the hive website.
>
> Here is the output from the command.
>
>
>
> hive -hiveconf hive.root.logger=INFO,console -e  "show databases"
> WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please
> use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties
> files.
> Logging initialized using configuration in
> jar:file:/home/hive/lib/hive-common-0.10.0.jar!/hive-log4j.properties
> 13/05/02 13:37:09 INFO SessionState: Logging initialized using
> configuration in
> jar:file:/home/hive/lib/hive-common-0.10.0.jar!/hive-log4j.properties
> Hive history
> file=/tmp/cyrille/hive_job_log_cyrille_201305021337_431171577.txt
> 13/05/02 13:37:09 INFO exec.HiveHistory: Hive history
> file=/tmp/cyrille/hive_job_log_cyrille_201305021337_431171577.txt
> 13/05/02 13:37:09 INFO ql.Driver: 
> 13/05/02 13:37:09 INFO ql.Driver: 
> 13/05/02 13:37:09 INFO ql.Driver: 
> 13/05/02 13:37:09 INFO parse.ParseDriver: Parsing command: show databases
> 13/05/02 13:37:09 INFO ql.Driver:  start=1367516229487 end=1367516229681 duration=194>
>
> Exception in thread "main" java.lang.NoSuchFieldError: type
> at
> org.apache.hadoop.hive.ql.parse.HiveLexer.mKW_SHOW(HiveLexer.java:1305)
> at
> org.apache.hadoop.hive.ql.parse.HiveLexer.mTokens(HiveLexer.java:6439)
> at org.antlr.runtime.Lexer.nextToken(Lexer.java:84)
> at
> org.antlr.runtime.CommonTokenStream.fillBuffer(CommonTokenStream.java:95)
> at org.antlr.runtime.CommonTokenStream.LT(CommonTokenStream.java:238)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:573)
> at
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:439)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:416)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:893)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:706)
>
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:616)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
>
>
> On Thu, May 2, 2013 at 1:25 PM, Sanjay Subramanian <
> sanjay.subraman...@wizecommerce.com> wrote:
>
>>  Can u share  your hive-site.xml ? What meta store r u using ?
>> Also try this to get additional debug messages that u can use to analyze
>> the problem
>>
>>  From your linux command prompt run the following and tell us what u
>> see. Also hive-site.xml please
>>
>> /path/to/hive -hiveconf hive.root.logger=INFO,console -e  "show
>> databases"
>>
>>   From: Cyril Bogus 
>> Reply-To: "user@hive.apache.org" 
>> Date: Thursday, May 2, 2013 10:19 AM
>> To: "user@hive.apache.org" 
>> Subject: Getting Started
>>
>>Hi,
>>
>>  I am currently running hadoop 1.0.4 and hive 0.10.0
>>  also I have HADOOP_HOME set to /home/hadoop and HIVE_HOME to /home/hive
>> along with JAVA_HOME also to the right location.
>>  and I would like to run the hive command line but I keep getting the
>> following error when I try to run a simple query like show databases;
>>
>>
>> hive
>> WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please
>> use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties
>> files.
>> Logging initialized using configuration in
>> jar:file:/home/hive/lib/hive-common-0.10.0.jar!/hive-log4j.properties
>> Hive history
>> file=/tmp/cyrille/hive_job_log_cyrille_201305021317_1253522258.txt
>> hive> show databases;
>> Exception in thread "main" java.lang.NoSuchFieldError: type
>> at
>> org.apache.hadoop.hive.ql.parse.HiveLexer.mKW_SHOW(HiveLexer.java:1305)
>> at
>> org.apache.hadoop.hive.ql.parse.HiveLexer.mTokens(HiveLexer.java:6439)
>> at org.antlr.runtime.Lexer.nextToken(Lexer.java:84)
>> at
>> org.antlr.runtime.CommonTokenStream.fillBuffer(CommonTokenStream.java:95)
>> at org.antlr.runtime.CommonTokenStream.LT(CommonTokenStream.java:238)
>> at
>> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:573)
>> at
>> org.apache.hadoop.hive.ql.parse.ParseDr

Re: Very poor read performance with composite keys in hbase

2013-04-30 Thread kulkarni.swar...@gmail.com

That depends on how dynamic your data is. If it is pretty static, you can
also consider using something like Create Table As Select (CTAS) to create
a snapshot of your data to HDFS and then run queries on top of that data.

So your query might become something like:

create table my_table as select * from event where key.name=’Signup’ and
key.dateCreated=’2013-03-06 16:39:55.353’ and key.uid=’7af4c330-5988-4255-
9250-924ce5864e3bf’;

Since your data is now in HDFS, this should give you a considerable
performance boost.


On Tue, Apr 30, 2013 at 3:00 PM, Rupinder Singh  wrote:

>  Swarnim,
>
> ** **
>
> Thanks. So this means custom map reduce is the viable option when working
> with hbase tables having composite keys, since it allows to set the start
> and stop keys. Hive+Hbase combination is out.
>
> ** **
>
> Regards
>
> Rupinder****
>
> ** **
>
> *From:* kulkarni.swar...@gmail.com [mailto:kulkarni.swar...@gmail.com]
> *Sent:* Wednesday, May 01, 2013 12:17 AM
>
> *To:* user@hive.apache.org
> *Cc:* u...@hbase.apache.org
> *Subject:* Re: Very poor read performance with composite keys in hbase
>
>  ** **
>
> Rupinder,
>
> ** **
>
> Hive supports a filter pushdown[1] which means that the predicates in the
> where clause are pushed down to the storage handler level where either they
> get handled by the storage handler or delegated to hive if they cannot
> handle them. As of now, the HBaseStorageHandler only supports primitive
> types. So when you use strings as keys, behind the scenes they get
> converted to start and stop keys and restrict the hbase scan. This does not
> happen for structs. Hence you see a full table scan causing bad performance.
> 
>
> ** **
>
> [1] https://cwiki.apache.org/Hive/filterpushdowndev.html
>
> ** **
>
> On Tue, Apr 30, 2013 at 1:04 PM, Sanjay Subramanian <
> sanjay.subraman...@wizecommerce.com> wrote:
>
> My experience with hive + hbase has been about 8x slower on an average. So
> I went ahead with hive only option.
>
> Sent from my iPhone
>
>
> On Apr 30, 2013, at 11:19 PM, "Rupinder Singh"  wrote:***
> *
>
>  Hi,
>
>  
>
> I have an hbase cluster where I have a table with a composite key. I map
> this table to a Hive external table using which I insert/select data
> into/from this table:
>
> CREATE EXTERNAL TABLE event(key
> struct, {more columns here})***
> *
>
> ROW FORMAT DELIMITED
>
> COLLECTION ITEMS TERMINATED BY '~'
>
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, other columns ")***
> *
>
> TBLPROPERTIES ("hbase.table.name" = "event");
>
>  
>
> The table has about 10 million rows. When I do a select * using all 3
> components of the key, essentially selecting just 1 row, the response time
> is almost 700 sec, which seems pretty bad.
>
>  
>
> For comparison purpose, I created another table with a simple string key,
> and the rest of the columns etc same. The key is a string UUID. Table has
> same number of column families and same number of rows.
>
> CREATE EXTERNAL TABLE test_event(key string, blah blah…..
>
> TBLPROPERTIES ("hbase.table.name" = "test_event");
>
>  
>
> When I select a single row from this table by doing select * where
> key=’something’, the response time is 35 sec.
>
>  
>
> This seems to indicate that in case of composite keys, there is a full
> table scan happening.  This seems weird.
>
>  
>
> What am I missing here? Is there something special I need to do to get
> good read performance if I am using composite keys ?
>
> Insert performance in both cases is comparable and is as per expectation.*
> ***
>
>  
>
> Any help is appreciated.
>
> Here is the env spec:
>
>  
>
> Amazon EMR
>
> Hbase Cluster- 3 core nodes with 7.5 GB RAM each, 2 CPUs of 2.2 GHz each.
> Master 7.5 GB RAM, 2 CPUs of 2.2 GHz each
>
> Hive Cluster – 3 core nodes 3.75 GB RAM each, 1 CPU of 1.8 GHz. Master
> 3.75 GB RAM, 1 CPU of 1.8 GHz
>
>  
>
> Thanks
>
> Rupinder
>
> ** **
>
> ** **
>
> This email is intended for the person(s) to whom it is addressed and may
> contain information that is PRIVILEGED or CONFIDENTIAL. Any unauthorized
> use, distribution, copying, or disclosure by any person other than the
> addressee(s) is strictly prohibited. If you have received this email in
> error, please notify the sender immediately by

Re: Very poor read performance with composite keys in hbase

2013-04-30 Thread kulkarni.swar...@gmail.com

Rupinder,

Hive supports a filter pushdown[1] which means that the predicates in the
where clause are pushed down to the storage handler level where either they
get handled by the storage handler or delegated to hive if they cannot
handle them. As of now, the HBaseStorageHandler only supports primitive
types. So when you use strings as keys, behind the scenes they get
converted to start and stop keys and restrict the hbase scan. This does not
happen for structs. Hence you see a full table scan causing bad performance.

[1] https://cwiki.apache.org/Hive/filterpushdowndev.html


On Tue, Apr 30, 2013 at 1:04 PM, Sanjay Subramanian <
sanjay.subraman...@wizecommerce.com> wrote:

>  My experience with hive + hbase has been about 8x slower on an average.
> So I went ahead with hive only option.
>
> Sent from my iPhone
>
> On Apr 30, 2013, at 11:19 PM, "Rupinder Singh"  wrote:
>
>   Hi,
>
>
>
> I have an hbase cluster where I have a table with a composite key. I map
> this table to a Hive external table using which I insert/select data
> into/from this table:
>
> CREATE EXTERNAL TABLE event(key
> struct, {more columns here})
>
> ROW FORMAT DELIMITED
>
> COLLECTION ITEMS TERMINATED BY '~'
>
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, other columns ")
>
> TBLPROPERTIES ("hbase.table.name" = "event");
>
>
>
> The table has about 10 million rows. When I do a select * using all 3
> components of the key, essentially selecting just 1 row, the response time
> is almost 700 sec, which seems pretty bad.
>
>
>
> For comparison purpose, I created another table with a simple string key,
> and the rest of the columns etc same. The key is a string UUID. Table has
> same number of column families and same number of rows.
>
> CREATE EXTERNAL TABLE test_event(key string, blah blah…..
>
> TBLPROPERTIES ("hbase.table.name" = "test_event");
>
>
>
> When I select a single row from this table by doing select * where
> key=’something’, the response time is 35 sec.
>
>
>
> This seems to indicate that in case of composite keys, there is a full
> table scan happening.  This seems weird.
>
>
>
> What am I missing here? Is there something special I need to do to get
> good read performance if I am using composite keys ?
>
> Insert performance in both cases is comparable and is as per expectation.
>
>
>
> Any help is appreciated.
>
> Here is the env spec:
>
>
>
> Amazon EMR
>
> Hbase Cluster- 3 core nodes with 7.5 GB RAM each, 2 CPUs of 2.2 GHz each.
> Master 7.5 GB RAM, 2 CPUs of 2.2 GHz each
>
> Hive Cluster – 3 core nodes 3.75 GB RAM each, 1 CPU of 1.8 GHz. Master
> 3.75 GB RAM, 1 CPU of 1.8 GHz
>
>
>
> Thanks
>
> Rupinder
>
>
> This email is intended for the person(s) to whom it is addressed and may
> contain information that is PRIVILEGED or CONFIDENTIAL. Any unauthorized
> use, distribution, copying, or disclosure by any person other than the
> addressee(s) is strictly prohibited. If you have received this email in
> error, please notify the sender immediately by return email and delete the
> message and any attachments from your system.
>
>
> CONFIDENTIALITY NOTICE
> ==
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>



-- 
Swarnim

Re: Very poor read performance with composite keys in hbase

2013-04-30 Thread kulkarni.swar...@gmail.com

Can you show your query that is taking 700 seconds?


On Tue, Apr 30, 2013 at 12:48 PM, Rupinder Singh  wrote:

>  Hi,
>
> ** **
>
> I have an hbase cluster where I have a table with a composite key. I map
> this table to a Hive external table using which I insert/select data
> into/from this table:
>
> CREATE EXTERNAL TABLE event(key
> struct, {more columns here})***
> *
>
> ROW FORMAT DELIMITED
>
> COLLECTION ITEMS TERMINATED BY '~'
>
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, other columns ")***
> *
>
> TBLPROPERTIES ("hbase.table.name" = "event");
>
> ** **
>
> The table has about 10 million rows. When I do a select * using all 3
> components of the key, essentially selecting just 1 row, the response time
> is almost 700 sec, which seems pretty bad.
>
> ** **
>
> For comparison purpose, I created another table with a simple string key,
> and the rest of the columns etc same. The key is a string UUID. Table has
> same number of column families and same number of rows.
>
> CREATE EXTERNAL TABLE test_event(key string, blah blah…..
>
> TBLPROPERTIES ("hbase.table.name" = "test_event");
>
> ** **
>
> When I select a single row from this table by doing select * where
> key=’something’, the response time is 35 sec.
>
> ** **
>
> This seems to indicate that in case of composite keys, there is a full
> table scan happening.  This seems weird.
>
> ** **
>
> What am I missing here? Is there something special I need to do to get
> good read performance if I am using composite keys ?
>
> Insert performance in both cases is comparable and is as per expectation.*
> ***
>
> ** **
>
> Any help is appreciated.
>
> Here is the env spec:
>
> ** **
>
> Amazon EMR
>
> Hbase Cluster- 3 core nodes with 7.5 GB RAM each, 2 CPUs of 2.2 GHz each.
> Master 7.5 GB RAM, 2 CPUs of 2.2 GHz each
>
> Hive Cluster – 3 core nodes 3.75 GB RAM each, 1 CPU of 1.8 GHz. Master
> 3.75 GB RAM, 1 CPU of 1.8 GHz
>
> ** **
>
> Thanks
>
> Rupinder
>
>
> This email is intended for the person(s) to whom it is addressed and may
> contain information that is PRIVILEGED or CONFIDENTIAL. Any unauthorized
> use, distribution, copying, or disclosure by any person other than the
> addressee(s) is strictly prohibited. If you have received this email in
> error, please notify the sender immediately by return email and delete the
> message and any attachments from your system.
>
>


-- 
Swarnim

Re: java.io.FileNotFoundException(File does not exist) when running a hive query

2013-03-04 Thread kulkarni.swar...@gmail.com

Are you using hive over yarn? If yes, see this related thread here[1].

[1]
https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!topic/cdh-user/gHVq9C5H6RE


On Mon, Mar 4, 2013 at 4:49 AM, Bhaskar, Snehalata <
snehalata_bhas...@syntelinc.com> wrote:

>  Does anyone know how to solve this issue??
>
> ** **
>
> Thanks and regards,
>
> Snehalata Deorukhkar
>
> Nortel No : 0229 -5814
>
> ** **
>
> *From:* Bhaskar, Snehalata [mailto:snehalata_bhas...@syntelinc.com]
> *Sent:* Sunday, March 03, 2013 11:23 PM
> *To:* user@hive.apache.org
> *Subject:* java.io.FileNotFoundException(File does not exist) when
> running a hive query
>
> ** **
>
> Hi,
>
> I am getting "java.io.FileNotFoundException(File does not exist:
> /tmp/sb25634/hive_2013-03-01_23-21-43_428_5325193042224363842/-mr-1/1/emptyFile)'
> " exception when running any join query :
>
> Following is the query that I am using and exception thrown.
>
>
> hive> select * from retail_1 l join retail_2 t on
> l.product_name=t.product_name;
>
> Total MapReduce jobs = 1
>
> Launching Job 1 out of 1
>
> Number of reduce tasks determined at compile time: 1
>
> In order to change the average load for a reducer (in bytes):
>
>   set hive.exec.reducers.bytes.per.reducer=
>
> In order to limit the maximum number of reducers:
>
>   set hive.exec.reducers.max=
>
> In order to set a constant number of reducers:
>
>   set mapred.reduce.tasks=
>
> WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please
> use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties
> files.
>
> Execution log at:
> /tmp/sb25634/sb25634_20130301232121_0c9f19d1-7846-4f4e-9469-401641fdd137.log
>
> java.io.FileNotFoundException: File does not exist:
> /tmp/sb25634/hive_2013-03-01_23-21-43_428_5325193042224363842/-mr-1/1/emptyFile
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:787)
>
> at
> org.apache.hadoop.mapred.lib.CombineFileInputFormat$OneFileInfo.(CombineFileInputFormat.java:462)
>
> at
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:256)
>
> at
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:212)
>
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:392)
>
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:358)
>
> at
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:387)
>
> at
> org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1041)
>
> at
> org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1033)
>
> at
> org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172)
>
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:943)
>
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:396)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>
> at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896)
>
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:870)
>
> at
> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:435)
>
> at
> org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:677)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
> at java.lang.reflect.Method.invoke(Method.java:597)
>
> at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>
> Job Submission failed with exception 'java.io.FileNotFoundException(File
> does not exist:
> /tmp/sb25634/hive_2013-03-01_23-21-43_428_5325193042224363842/-mr-1/1/emptyFile)'
>
> Execution failed with exit status: 1
>
> Obtaining error information
>
>
>
> Task failed!
>
> Task ID:
>
>   Stage-1
>
>
>
> Logs:
>
>
>
> /tmp/sb25634/hive.log
>
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.MapRedTask
>
>
> What may be the cause of this error?
>
> Please help me to resolve this issue.Thanks in advance.
>
> Regards,
> Snehalata Deorukhkar.
>
> 
>



-- 
Swarnim

Re: NoClassDefFoundError: org/apache/hadoop/mapreduce/util/HostUtil

2013-02-07 Thread kulkarni.swar...@gmail.com

One reason I know for this error is not setting up HADOOP_HOME. It is right
to not set this variable since it was deprecated and replaced with
HADOOP_PREFIX and HADOOP_MAPRED_HOME. However, it seems like hive still has
some haunting references to HADOOP_HOME causing this error, specially after
the query has failed and it attempts to grab debugging info from the
tasktracker URL.

As far as your job is concerned, the failure has nothing to do with this
error. Check your tasktrackers logs to know what blew up.


On Thu, Feb 7, 2013 at 5:33 AM, Viral Bajaria wrote:

> Are you seeing this after a few of the jobs have finished or on the first
> stage itself ? Also is this error on all boxes or just a few ? You can
> check MR logs and see which box or boxes are the culprits and debug from
> there.
>
> Viral
> --
> From: Krishna Rao
> Sent: 2/7/2013 2:46 AM
> To: user@hive.apache.org
> Subject: NoClassDefFoundError: org/apache/hadoop/mapreduce/util/HostUtil
>
> Hi all,
>
> I'm occasionally getting the following error, usually after running an
> expensive Hive query (creating 20 or so MR jobs):
>
> ***
> Error during job, obtaining debugging information...
> Examining task ID: task_201301291405_1640_r_01 (and more) from job
> job_201301291405_1640
> Exception in thread "Thread-29" java.lang.NoClassDefFoundError:
> org/apache/hadoop/mapreduce/util/HostUtil
> at
> org.apache.hadoop.hive.shims.Hadoop23Shims.getTaskAttemptLogUrl(Hadoop23Shims.java:51)
> at
> org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.getTaskInfos(JobDebugger.java:186)
> at
> org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.run(JobDebugger.java:142)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.hadoop.mapreduce.util.HostUtil
> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> ... 4 more
> CmdRunner::runCmd: Error running cmd in script, error: FAILED: Execution
> Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
> ***
>
> Any ideas on what's causing it?
> How can I find out more info on this error?
>
> Cheers,
>
> Krishna
>



-- 
Swarnim

Re: Drop an HBase backed table

2012-12-09 Thread kulkarni.swar...@gmail.com

Hi David,

DROP TABLE  is the right command to drop a table. You can look at
the hive logs under "/tmp//hive.log" to see why your shell is
hanging. With dropping an EXTERNAL TABLE, you are guaranteed that the
underlying hbase table won't be touched.

On Sun, Dec 9, 2012 at 6:06 PM, David Koch  wrote:

> Hello,
>
> How can I drop a Hive table which was created using "CREATE EXTERNAL
> TABLE..."? I tried "DROP TABLE ;" but the shell hangs. The
> underlying HBase table should not be deleted. I am using Hive 0.9
>
> Thank you,
>
> /David
>

-- 
Swarnim

Re: Mapping existing HBase table with many columns to Hive.

2012-12-06 Thread kulkarni.swar...@gmail.com

Hi David,

First of all, you columns are not "long". They are binary as well.
Currently as hive stands, there is no support for binary qualifiers.
However, I recently submitted a patch for that[1]. Feel free to give it a
shot and let me know if you see any issues. With that patch, you can
directly give your qualifiers to hive as they look here (
\x00\x00\x01;2\xE6Q\x06).

Until then, the only option you have is to use a map to map all your
columns under the column family "t". An example to do that would be:

CREATE EXTERNAL TABLE hbase_table_1(key int, value map)

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,t:")
TBLPROPERTIES("hbase.table.name" = "some_existing_table");

Also as far as your key goes, it is a composite key. There is also an
existing patch for the support of that here[2].

Hope that helps.

[1] https://issues.apache.org/jira/browse/HIVE-3553
[2] https://issues.apache.org/jira/browse/HIVE-2599

On Thu, Dec 6, 2012 at 12:56 PM, David Koch  wrote:

> Hello,
>
> How can I map an HBase table with the following layout to Hive using the
> "CREATE EXTERNAL TABLE" command from shell (or another programmatic way):
>
> The HBase table's layout is as follows:
> Rowkey=16 bytes, a UUID that had the "-" removed, and the 32hex chars
> converted into two 8byte longs.
> Columns (qualifiers): timestamps, i.e the bytes of a long which were
> converted using Hadoop's Bytes.toBytes(long). There can be many of those in
> a single row.
> Values: The bytes of a Java string.
>
> I am unsure of which datatypes to use. I am pretty sure there is no way I
> can sensible map the row key to anything other than "binary" but maybe the
> columns - which are longs and the values which are strings can be mapped to
> their according Hive datatypes.
>
> I include an extract of what a row looks like in HBase shell below:
>
> Thank you,
>
> /David
>
> hbase(main):009:0> scan "hits"
> ROW
> COLUMN+CELL
>
> \x00\x00\x06\xB1H\x89N\xC3\xA5\x83\x0F\xDD\x1E\xAE&\xDC
>  column=t:\x00\x00\x01;2\xE6Q\x06, timestamp=1267737987733, value=blahaha
> \x00\x00\x06\xB1H\x89N\xC3\xA5\x83\x0F\xDD\x1E\xAE&\xDC
>  column=t:\x00\x00\x01;2\xE6\xFB@, timestamp=1354012104967,
> value=testtest
>

-- 
Swarnim

Re: no data in external table

2012-10-04 Thread kulkarni.swar...@gmail.com

Can you try creating a table like this:

CREATE EXTERNAL TABLE hbase_table_2(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "xyz");


Now do a select * from hbase_table_2;

Do you see any data now?

On Thu, Oct 4, 2012 at 5:10 PM,  wrote:

> Hi,
>
> In the hbase table I do not see column qualifier, only family.
> For testing connection to hbase I also created a table using
>
> CREATE TABLE hbase_table_1(key int, value string)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
> TBLPROPERTIES ("hbase.table.name" = "xyz");
>
> I see xyz table in hbase. then I added a row in hbase using put 'xyz', 
> 'row1', 'cf1', 'abc'
>
>
> Then in hive I did: select * from hbase_table_1;
> No results are returned, but scan xys in hbase returns 1 row.
>
> Thanks.
>
> Alex.
>
>  -Original Message-
> From: kulkarni.swarnim 
> To: user 
> Sent: Thu, Oct 4, 2012 3:00 pm
> Subject: Re: no data in external table
> > "hbase.columns.mapping" = ":key,mtdt:string,il:string,ol:string"
>
>  This doesn't look right. The mapping should be of form
> COLUMN_FAMILY:COLUMN_QUALIFIER. In this case it seems to be
> COLUMN_FAMILY:TYPE which is not right.
>
>  On Thu, Oct 4, 2012 at 3:25 PM,  wrote:
>
>> Hi,
>>
>> In hive shell I did
>>
>> create external table myextrenaltable (key string, metadata string,
>> inlinks string, outlinks string) stored by
>> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>>  with serdeproperties ("hbase.columns.mapping" =
>> ":key,mtdt:string,il:string,ol:string")
>>  tblproperties ("hbase.table.name" = "myextrenaltable");
>>
>> In tasktracker log I do not see anything relevant to hbase. In jobdetails
>> page I see a few successful jobs. in hive shell I see
>>
>> Total MapReduce jobs = 1
>> Launching Job 1 out of 1
>> Number of reduce tasks is set to 0 since there's no reduce operator
>> Starting Job = job_201210031146_0016, Tracking URL =
>> http://localhost:50030/jobdetails.jsp?jobid=job_201210031146_0016
>> Kill Command = /home/dev/hadoop-0.20.2/bin/../bin/hadoop job
>> -Dmapred.job.tracker=localhost:9001 -kill job_201210031146_0016
>> Hadoop job information for Stage-1: number of mappers: 1; number of
>> reducers: 0
>> 2012-10-04 13:19:06,581 Stage-1 map = 0%,  reduce = 0%
>> 2012-10-04 13:19:12,629 Stage-1 map = 100%,  reduce = 0%
>> 2012-10-04 13:19:15,657 Stage-1 map = 100%,  reduce = 100%
>> Ended Job = job_201210031146_0016
>> MapReduce Jobs Launched:
>> Job 0: Map: 1   HDFS Read: 0 HDFS Write: 0 SUCCESS
>> Total MapReduce CPU Time Spent: 0 msec
>> OK
>> Time taken: 17.47 seconds
>>
>>
>>
>> Thanks in advance.
>> Alex.
>>
>>  -Original Message-
>> From: Ted Yu 
>> To: user 
>> Sent: Thu, Oct 4, 2012 11:33 am
>> Subject: Re: no data in external table
>>
>>  Can you tell us how you created mapping for the existing table ?
>>
>> In task log, do you see any connection attempt to HBase ?
>>
>> Cheers
>>
>> On Thu, Oct 4, 2012 at 11:30 AM,  wrote:
>>
>>> Hello,
>>>
>>> I use hive-0.9.0 with hadoop-0.20.2 and hbase -0.92.1. I have created
>>> external table, mapping it to an existing table in hbase. When I do "select
>>> * from myextrenaltable" it returns no results, although scan in hbase shows
>>> data, and I do not see any errors in jobtracker log.
>>>
>>> Any ideas how to debug this issue.
>>>
>>> Thanks.
>>> Alex.
>>>
>>
>>
>
>
>  --
> Swarnim
>



-- 
Swarnim

Re: no data in external table

2012-10-04 Thread kulkarni.swar...@gmail.com

> "hbase.columns.mapping" = ":key,mtdt:string,il:string,ol:string"

This doesn't look right. The mapping should be of form
COLUMN_FAMILY:COLUMN_QUALIFIER. In this case it seems to be
COLUMN_FAMILY:TYPE which is not right.

On Thu, Oct 4, 2012 at 3:25 PM,  wrote:

> Hi,
>
> In hive shell I did
>
> create external table myextrenaltable (key string, metadata string,
> inlinks string, outlinks string) stored by
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>  with serdeproperties ("hbase.columns.mapping" =
> ":key,mtdt:string,il:string,ol:string")
>  tblproperties ("hbase.table.name" = "myextrenaltable");
>
> In tasktracker log I do not see anything relevant to hbase. In jobdetails
> page I see a few successful jobs. in hive shell I see
>
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_201210031146_0016, Tracking URL =
> http://localhost:50030/jobdetails.jsp?jobid=job_201210031146_0016
> Kill Command = /home/dev/hadoop-0.20.2/bin/../bin/hadoop job
> -Dmapred.job.tracker=localhost:9001 -kill job_201210031146_0016
> Hadoop job information for Stage-1: number of mappers: 1; number of
> reducers: 0
> 2012-10-04 13:19:06,581 Stage-1 map = 0%,  reduce = 0%
> 2012-10-04 13:19:12,629 Stage-1 map = 100%,  reduce = 0%
> 2012-10-04 13:19:15,657 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_201210031146_0016
> MapReduce Jobs Launched:
> Job 0: Map: 1   HDFS Read: 0 HDFS Write: 0 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> OK
> Time taken: 17.47 seconds
>
>
>
> Thanks in advance.
> Alex.
>
>  -Original Message-
> From: Ted Yu 
> To: user 
> Sent: Thu, Oct 4, 2012 11:33 am
> Subject: Re: no data in external table
>
>  Can you tell us how you created mapping for the existing table ?
>
> In task log, do you see any connection attempt to HBase ?
>
> Cheers
>
> On Thu, Oct 4, 2012 at 11:30 AM,  wrote:
>
>> Hello,
>>
>> I use hive-0.9.0 with hadoop-0.20.2 and hbase -0.92.1. I have created
>> external table, mapping it to an existing table in hbase. When I do "select
>> * from myextrenaltable" it returns no results, although scan in hbase shows
>> data, and I do not see any errors in jobtracker log.
>>
>> Any ideas how to debug this issue.
>>
>> Thanks.
>> Alex.
>>
>
>


-- 
Swarnim

Re: ERROR :regarding Hive WI, hwi service is not running

2012-09-19 Thread kulkarni.swar...@gmail.com

No. I meant create "*/opt/hive-0.8.1/lib/" *in HDFS and place the
"*hive-hwi-0.8.1.war"
*there.

On Wed, Sep 19, 2012 at 4:55 PM, yogesh dhari  wrote:

>  Hello Swarnim,
>
> Are you saying to put *hive-hwi-0.8.1.war *into hadoop/lib ?
>
> I have put it over there and still the same issue..
>
> Thanks & Regards
> Yogesh Kumar
>
> ------
> From: kulkarni.swar...@gmail.com
> Date: Wed, 19 Sep 2012 16:48:37 -0500
> Subject: Re: ERROR :regarding Hive WI, hwi service is not running
> To: user@hive.apache.org
>
>
> It's probably looking for that file on HDFS. Try placing it there under
> the given location and see if you get the same error.
>
> On Wed, Sep 19, 2012 at 4:45 PM, yogesh dhari wrote:
>
>  Hi all,
>
> I am trying to run hive wi but its showing FATAL,
>
> I have used this command
> *
> hive --service hwi *
>
> but it shows..
>
> yogesh@yogesh-Aspire-5738:/opt/hive-0.8.1/lib$ hive --service hwi
>
> *12/09/20 03:12:03 INFO hwi.HWIServer: HWI is starting up
> 12/09/20 03:12:04 FATAL hwi.HWIServer: HWI WAR file not found at
> /opt/hive-0.8.1/lib/hive-hwi-0.8.1.war
> *
> although lies there.
>
>
> yogesh@yogesh-Aspire-5738:/opt/hive-0.8.1/lib$  pwd
>
> */opt/hive-0.8.1/lib*
>
> ls -l
>
> -rw-rw-r-- 1 root root   55876 Jan 26  2012 hive-common-0.8.1.jar
> -rw-rw-r-- 1 root root  112440 Jan 26  2012 hive-contrib-0.8.1.jar
> -rw-rw-r-- 1 root root  112440 Jan 26  2012 hive_contrib.jar
> -rw-rw-r-- 1 root root 3461228 Jan 26  2012 hive-exec-0.8.1.jar
> -rw-rw-r-- 1 root root   48829 Jan 26  2012 hive-hbase-handler-0.8.1.jar
> -rw-rw-r-- 1 root root   23529 Jan 26  2012 hive-hwi-0.8.1.jar
> *-rwxrwxrwx 1 root root   28413 Jan 26  2012 hive-hwi-0.8.1.war*
> -rw-rw-r-- 1 root root   58914 Jan 26  2012 hive-jdbc-0.8.1.jar
> -rw-rw-r-- 1 root root 1765743 Jan 26  2012 hive-metastore-0.8.1.jar
> -rw-rw-r-- 1 root root   14081 Jan 26  2012 hive-pdk-0.8.1.jar
> -rw-rw-r-- 1 root root  509488 Jan 26  2012 hive-serde-0.8.1.jar
> -rw-rw-r-- 1 root root  174445 Jan 26  2012 hive-service-0.8.1.jar
> -rw-rw-r-- 1 root root  110154 Jan 26  2012 hive-shims-0.8.1.jar
> -rw-rw-r-- 1 root root   15260 Jan 24  2012 javaewah-0.3.jar
> -rw-rw-r-- 1 root root  198552 Dec 24  2009 jdo2-api-2.3-ec.jar
>
>
> Please suggest regarding
>
> Thanks & regards
> Yogesh Kumar
>
>
>
>
>
>
>
>
>
>
>
>
> --
> Swarnim
>



-- 
Swarnim

Re: ERROR :regarding Hive WI, hwi service is not running

2012-09-19 Thread kulkarni.swar...@gmail.com

It's probably looking for that file on HDFS. Try placing it there under the
given location and see if you get the same error.

On Wed, Sep 19, 2012 at 4:45 PM, yogesh dhari  wrote:

>  Hi all,
>
> I am trying to run hive wi but its showing FATAL,
>
> I have used this command
> *
> hive --service hwi *
>
> but it shows..
>
> yogesh@yogesh-Aspire-5738:/opt/hive-0.8.1/lib$ hive --service hwi
>
> *12/09/20 03:12:03 INFO hwi.HWIServer: HWI is starting up
> 12/09/20 03:12:04 FATAL hwi.HWIServer: HWI WAR file not found at
> /opt/hive-0.8.1/lib/hive-hwi-0.8.1.war
> *
> although lies there.
>
>
> yogesh@yogesh-Aspire-5738:/opt/hive-0.8.1/lib$  pwd
>
> */opt/hive-0.8.1/lib*
>
> ls -l
>
> -rw-rw-r-- 1 root root   55876 Jan 26  2012 hive-common-0.8.1.jar
> -rw-rw-r-- 1 root root  112440 Jan 26  2012 hive-contrib-0.8.1.jar
> -rw-rw-r-- 1 root root  112440 Jan 26  2012 hive_contrib.jar
> -rw-rw-r-- 1 root root 3461228 Jan 26  2012 hive-exec-0.8.1.jar
> -rw-rw-r-- 1 root root   48829 Jan 26  2012 hive-hbase-handler-0.8.1.jar
> -rw-rw-r-- 1 root root   23529 Jan 26  2012 hive-hwi-0.8.1.jar
> *-rwxrwxrwx 1 root root   28413 Jan 26  2012 hive-hwi-0.8.1.war*
> -rw-rw-r-- 1 root root   58914 Jan 26  2012 hive-jdbc-0.8.1.jar
> -rw-rw-r-- 1 root root 1765743 Jan 26  2012 hive-metastore-0.8.1.jar
> -rw-rw-r-- 1 root root   14081 Jan 26  2012 hive-pdk-0.8.1.jar
> -rw-rw-r-- 1 root root  509488 Jan 26  2012 hive-serde-0.8.1.jar
> -rw-rw-r-- 1 root root  174445 Jan 26  2012 hive-service-0.8.1.jar
> -rw-rw-r-- 1 root root  110154 Jan 26  2012 hive-shims-0.8.1.jar
> -rw-rw-r-- 1 root root   15260 Jan 24  2012 javaewah-0.3.jar
> -rw-rw-r-- 1 root root  198552 Dec 24  2009 jdo2-api-2.3-ec.jar
>
>
> Please suggest regarding
>
> Thanks & regards
> Yogesh Kumar
>
>
>
>
>
>
>
>
>
>


-- 
Swarnim

Re: Upper case column names

2012-08-14 Thread kulkarni.swar...@gmail.com

Mayank,

Just out of curiosityany other reason other than conventions to
preserve the case for column names in hive?

On Tue, Aug 14, 2012 at 6:38 PM, Travis Crawford
wrote:

> On Tue, Aug 14, 2012 at 4:20 PM, Edward Capriolo wrote:
>
>>
>> Just changing the code is not as easy as it sounds. It sounds like this
>> will break many things in production for a lot of people.
>
>
> Absolutely - case sensitivity would be a big change. In the patch we're
> playing around with we centralized the toLowerCase business in a single
> method, and can turn it on/off per-query.
>
> --travis
>
>
>
>> On Tuesday, August 14, 2012, Travis Crawford 
>> wrote:
>> > Hey Mayank -
>> > I've looked briefly at case-sensitivity in Hive, and there's a lot of
>> places where fields are lowercased to normalize. For HCatalog, I'm playing
>> around with a small patch that makes case-sensitivity optional and it works
>> if you run queries with Pig/HCat against the metastore. It would be a
>> pretty large patch to make hive optionally case sensitive though.
>> > Case sensitive field names as an option certainly would use helpful
>> though.
>> > --travis
>> >
>> > On Tue, Aug 14, 2012 at 8:24 AM, Mayank Bansal <
>> mayank.ban...@mu-sigma.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >>
>> >>
>> >> The column names in hive are by default case insensitive.
>> >>
>> >> I was wondering if there is any way, I could make the column names
>> case sensitive?
>> >>
>> >> I am running a model on a data, the data is now stored in hive, the
>> model has columns referred in camel case.
>> >>
>> >> It would require a lot of effort to change the code of the model, so I
>> was wondering if I could change my hive schema or anything.
>> >>
>> >> Would changing the metastore_db help in someway ?
>> >>
>> >>
>> >>
>> >> Thanks,
>> >>
>> >> Mayank
>> >>
>> >>
>> >>
>> >> 
>> >> This email message may contain proprietary, private and confidential
>> information. The information transmitted is intended only for the person(s)
>> or entities to which it is addressed. Any review, retransmission,
>> dissemination or other use of, or taking of any action in reliance upon,
>> this information by persons or entities other than the intended recipient
>> is prohibited and may be illegal. If you received this in error, please
>> contact the sender and delete the message from your system.
>> >>
>> >> Mu Sigma takes all reasonable steps to ensure that its electronic
>> communications are free from viruses. However, given Internet
>> accessibility, the Company cannot accept liability for any virus introduced
>> by this e-mail or any attachment and you are advised to use up-to-date
>> virus checking software.
>> >
>> >
>>
>
>


-- 
Swarnim

Re: Issue with creating table in hbase

2012-08-14 Thread kulkarni.swar...@gmail.com

Is that the complete stacktrace?

On Tue, Aug 14, 2012 at 12:01 PM, Omer, Farah wrote:

>  Unfortunately the job’s log also doesn’t tell me anything very
> meaningful. Have you or anyone might have seen this before?
>
> 
>
> java.lang.RuntimeException: Error in configuring object
>
> at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
> 
>
> at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)***
> *
>
> at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> 
>
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387)
> 
>
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
>
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:396)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
> 
>
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
>
> Caused by: java.lang.reflect.InvocationTargetException
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
> 
>
> ** **
>
> ** **
>
> Thanks.
>
> ** **
>
> ** **
>
> Farah Omer
>
> Sr. DB Engineer | MicroStrategy, Inc.****
>
> Tel 703.270.2230 | fo...@microstrategy.com
>
> 1850 Towers Crescent Plaza | Tysons Corner, VA 22182
>
> www.microstrategy.com
>
> ** **
>
> *From:* kulkarni.swar...@gmail.com [mailto:kulkarni.swar...@gmail.com]
> *Sent:* Tuesday, August 14, 2012 12:49 PM
>
> *To:* user@hive.apache.org
> *Subject:* Re: Issue with creating table in hbase
>
>  ** **
>
> It seems like your Map reduce job is failing. Refer to the logs in the
> tracking URL "
> http://hadoop001:50030/jobdetails.jsp?jobid=job_201207251201_0678"; to see
> why exactly it is failing.
>
> ** **
>
> On Tue, Aug 14, 2012 at 11:35 AM, Omer, Farah 
> wrote:
>
> Thanks. That helped.
>
>  
>
> Another related question:
>
>  
>
> I created this table on HIVE:
>
> hive> CREATE TABLE hbase_mstr_1(key int, value string) STORED BY
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES
> ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("
> hbase.table.name" = "hbase_mstr_1_xyz");
>
>  
>
> My intention is to load it and then access it via HIVE. But I am not able
> to insert values into this table:
>
>  
>
> hive> insert overwrite table hbase_mstr_1 select * from eatwh1_ft1;
>
> Total MapReduce jobs = 1
>
> Launching Job 1 out of 1
>
> Number of reduce tasks is set to 0 since there's no reduce operator
>
> Starting Job = job_201207251201_0678, Tracking URL =
> http://hadoop001:50030/jobdetails.jsp?jobid=job_201207251201_0678
>
> Kill Command = /usr/lib/hadoop/bin/hadoop job
> -Dmapred.job.tracker=hadoop001:6932 -kill job_201207251201_0678
>
> 2012-08-14 12:30:20,977 Stage-0 map = 0%,  reduce = 0%
>
> 2012-08-14 12:30:47,110 Stage-0 map = 100%,  reduce = 100%
>
> Ended Job = job_201207251201_0678 with errors
>
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapRedTask
>
>  
>
> The hive.log doesn’t tell a lot of useful information.
>
>  
>
> 2012-08-14 11:55:51,893 WARN  hbase.HBaseConfiguration
> (HBaseConfiguration.java:(45)) - instantiating HBaseConfiguration()
> is deprecated. Please use HBaseConfiguration#create() to construct a plain
> Configuration
>
> 2012-08-14 11:55:52,302 WARN  mapred.JobClient
> (JobClient.java:copyAndConfigureFiles(649)) - Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.***
> *
>
> 2012-08-14 11:56:21,132 ERROR exec.MapRedTask
> (SessionState.java:printError(365)) - Ended Job = job_201207251201_0677
> with errors
>
> 2012-08-14 11:56:21,151 ERROR ql.Driver
> (SessionState.java:printError(365)) - FAILED: Execution Error, return code
> 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
>
>  
>
> Any ideas?
>
>  
>
> Thank you.
>
>  
>

Re: Issue with creating table in hbase

2012-08-14 Thread kulkarni.swar...@gmail.com

It seems like your Map reduce job is failing. Refer to the logs in the
tracking URL "
http://hadoop001:50030/jobdetails.jsp?jobid=job_201207251201_0678"; to see
why exactly it is failing.

On Tue, Aug 14, 2012 at 11:35 AM, Omer, Farah wrote:

>  Thanks. That helped.
>
> ** **
>
> Another related question:
>
> ** **
>
> I created this table on HIVE:
>
> hive> CREATE TABLE hbase_mstr_1(key int, value string) STORED BY
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES
> ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("
> hbase.table.name" = "hbase_mstr_1_xyz");
>
> ** **
>
> My intention is to load it and then access it via HIVE. But I am not able
> to insert values into this table:
>
> ** **
>
> hive> insert overwrite table hbase_mstr_1 select * from eatwh1_ft1;
>
> Total MapReduce jobs = 1
>
> Launching Job 1 out of 1
>
> Number of reduce tasks is set to 0 since there's no reduce operator
>
> Starting Job = job_201207251201_0678, Tracking URL =
> http://hadoop001:50030/jobdetails.jsp?jobid=job_201207251201_0678
>
> Kill Command = /usr/lib/hadoop/bin/hadoop job
> -Dmapred.job.tracker=hadoop001:6932 -kill job_201207251201_0678
>
> 2012-08-14 12:30:20,977 Stage-0 map = 0%,  reduce = 0%
>
> 2012-08-14 12:30:47,110 Stage-0 map = 100%,  reduce = 100%
>
> Ended Job = job_201207251201_0678 with errors
>
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapRedTask
>
> ** **
>
> The hive.log doesn’t tell a lot of useful information.
>
> ** **
>
> 2012-08-14 11:55:51,893 WARN  hbase.HBaseConfiguration
> (HBaseConfiguration.java:(45)) - instantiating HBaseConfiguration()
> is deprecated. Please use HBaseConfiguration#create() to construct a plain
> Configuration
>
> 2012-08-14 11:55:52,302 WARN  mapred.JobClient
> (JobClient.java:copyAndConfigureFiles(649)) - Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.***
> *
>
> 2012-08-14 11:56:21,132 ERROR exec.MapRedTask
> (SessionState.java:printError(365)) - Ended Job = job_201207251201_0677
> with errors
>
> 2012-08-14 11:56:21,151 ERROR ql.Driver
> (SessionState.java:printError(365)) - FAILED: Execution Error, return code
> 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
>
> ** **
>
> Any ideas?****
>
> ** **
>
> Thank you.
>
> ** **
>
> Farah Omer
>
> Sr. DB Engineer | MicroStrategy, Inc.
>
> Tel 703.270.2230 | fo...@microstrategy.com
>
> 1850 Towers Crescent Plaza | Tysons Corner, VA 22182
>
> www.microstrategy.com
>
> ** **
>
> *From:* kulkarni.swar...@gmail.com [mailto:kulkarni.swar...@gmail.com]
> *Sent:* Tuesday, August 14, 2012 11:11 AM
> *To:* user@hive.apache.org
> *Subject:* Re: Issue with creating table in hbase
>
> ** **
>
> > *hbase(main):001:0*> CREATE TABLE hbase_mstr_1.
>
> ** **
>
> Are you running the CREATE TABLE from the hbase shell? You should run it
> from the hive shell. You can start it from "$HIVE_HOME/bin/hive"
>
> On Tue, Aug 14, 2012 at 10:05 AM, Omer, Farah 
> wrote:
>
> Hi all,
>
> I was testing hbase integrated with hive, and running into an issue. Would
> anyone has an idea what it means?
>
>  
>
> hbase(main):001:0> CREATE TABLE hbase_mstr_1(key int, value string) STORED
> BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES
> ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("
> hbase.table.name" = "xyz")
>
> SyntaxError: (hbase):1: syntax error, unexpected tIDENTIFIER
>
>  
>
> CREATE TABLE hbase_mstr_1(key int, value string) STORED BY
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES
> ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("
> hbase.table.name" = "xyz")
>
>   ^
>
>  
>
> hbase(main):002:0>
>
>  
>
> Hive version is 0.7, hbase version is 0.90
>
>  
>
> Thanks.
>
>  
>
> Farah Omer
>
> Sr. DB Engineer | MicroStrategy, Inc.
>
> Tel 703.270.2230 | fo...@microstrategy.com
>
> 1850 Towers Crescent Plaza | Tysons Corner, VA 22182
>
> www.microstrategy.com
>
>  
>
>  
>
>
>
> 
>
> ** **
>
> --
> Swarnim
>



-- 
Swarnim

Re: Some Weird Behavior

2012-08-07 Thread kulkarni.swar...@gmail.com

In that case you might want to try "count(1)" instead of "count(*)" and see
if that makes any difference. [1]

[1] https://issues.apache.org/jira/browse/HIVE-287

On Tue, Aug 7, 2012 at 1:07 PM, Techy Teck  wrote:

> I am running Hive 0.6.
>
>
>
>
>
> On Tue, Aug 7, 2012 at 11:04 AM, kulkarni.swar...@gmail.com <
> kulkarni.swar...@gmail.com> wrote:
>
>> What is the hive version that you are using?
>>
>>
>> On Tue, Aug 7, 2012 at 12:57 PM, Techy Teck wrote:
>>
>>> I am not sure about the data, but when we do
>>>
>>> SELECT count(*) from data_realtime where dt='20120730' and uid is null
>>>
>>> I get the count
>>>
>>> but If I do-
>>>
>>> SELECT * from data_realtime where dt='20120730' and uid is null
>>>
>>> I get zero record back. But if all the record is NULL then I should be
>>> getting NULL record back right?
>>>
>>>
>>> But I am not getting anything back and that is the reason it is making me
>>> more confuse.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Aug 7, 2012 at 10:31 AM, Yue Guan  wrote:
>>>
>>> > Just in case, all Record is null when uid is null?
>>> >
>>> > On Tue, Aug 7, 2012 at 1:14 PM, Techy Teck 
>>> > wrote:
>>> > > SELECT count(*) from data_realtime where dt='20120730' and uid is
>>> null
>>> > >
>>> > >
>>> > >
>>> > > I get the count as 1509
>>> > >
>>> > >
>>> > >
>>> > > So that means If I will be doing
>>> > >
>>> > >
>>> > >
>>> > > SELECT * from data_realtime where dt='20120730' and uid is null
>>> > >
>>> > >
>>> > >
>>> > > I should be seeing those records in which uid is null? right?
>>> > >
>>> > > But I get zero record back with the above query. Why is it so? Its
>>> very
>>> > > strange and why is it happening like this. Something wrong with the
>>> Hive?
>>> > >
>>> > >
>>> > >
>>> > > Can anyone suggest me what is happening?
>>> > >
>>> > >
>>> > >
>>> > >
>>> >
>>>
>>
>>
>>
>> --
>> Swarnim
>>
>
>


-- 
Swarnim

Re: Some Weird Behavior

2012-08-07 Thread kulkarni.swar...@gmail.com

What is the hive version that you are using?

On Tue, Aug 7, 2012 at 12:57 PM, Techy Teck  wrote:

> I am not sure about the data, but when we do
>
> SELECT count(*) from data_realtime where dt='20120730' and uid is null
>
> I get the count
>
> but If I do-
>
> SELECT * from data_realtime where dt='20120730' and uid is null
>
> I get zero record back. But if all the record is NULL then I should be
> getting NULL record back right?
>
>
> But I am not getting anything back and that is the reason it is making me
> more confuse.
>
>
>
>
>
>
> On Tue, Aug 7, 2012 at 10:31 AM, Yue Guan  wrote:
>
> > Just in case, all Record is null when uid is null?
> >
> > On Tue, Aug 7, 2012 at 1:14 PM, Techy Teck 
> > wrote:
> > > SELECT count(*) from data_realtime where dt='20120730' and uid is null
> > >
> > >
> > >
> > > I get the count as 1509
> > >
> > >
> > >
> > > So that means If I will be doing
> > >
> > >
> > >
> > > SELECT * from data_realtime where dt='20120730' and uid is null
> > >
> > >
> > >
> > > I should be seeing those records in which uid is null? right?
> > >
> > > But I get zero record back with the above query. Why is it so? Its very
> > > strange and why is it happening like this. Something wrong with the
> Hive?
> > >
> > >
> > >
> > > Can anyone suggest me what is happening?
> > >
> > >
> > >
> > >
> >
>



-- 
Swarnim

Re: Custom UserDefinedFunction in Hive

2012-08-07 Thread kulkarni.swar...@gmail.com

Have you tried using EXPLAIN[1] on your query? I usually like to use that
to get a better understanding of what my query is actually doing and
debugging at other times.

[1] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain

On Tue, Aug 7, 2012 at 12:20 PM, Raihan Jamal  wrote:

> Hi Jan,
>
>
> I figured that out, it is working fine for me now. The only question I
> have is, if I am doing like this-
>
>
>
> SELECT * FROM REALTIME where dt= yesterdaydate('MMdd') LIMIT 10;
>
>
>
> Then the above query will be evaluated as below right?
>
>
>
> SELECT * FROM REALTIME where dt= ‘20120806’ LIMIT 10;
>
>
>
> So that means it will look for data in the corresponding dt partition 
> *(20120806)
> *only right as above table is partitioned on dt column ? And it will not
> scan the whole table right?**
>
>
>
> *Raihan Jamal*
>
>
>
> On Mon, Aug 6, 2012 at 10:56 PM, Jan Dolinár  wrote:
>
>> Hi Jamal,
>>
>> Check if the function really returns what it should and that your data
>> are really in MMdd format. You can do this by simple query like this:
>>
>> SELECT dt, yesterdaydate('MMdd') FROM REALTIME LIMIT 1;
>>
>> I don't see anything wrong with the function itself, it works well for me
>> (although I tested it in hive 0.7.1). The only thing I would change about
>> it would be to optimize it by calling 'new' only at the time of
>> construction and reusing the object when the function is called, but that
>> should not affect the functionality at all.
>>
>> Best regards,
>> Jan
>>
>>
>>
>>
>> On Tue, Aug 7, 2012 at 3:39 AM, Raihan Jamal wrote:
>>
>>> *Problem*
>>>
>>> I created the below UserDefinedFunction to get the yesterday's day in
>>> the format I wanted as I will be passing the format into this below method
>>> from the query.
>>>
>>>
>>>
>>> *public final class YesterdayDate extends UDF {*
>>>
>>> * *
>>>
>>> *public String evaluate(final String format) { *
>>>
>>> *DateFormat dateFormat = new
>>> SimpleDateFormat(format); *
>>>
>>> *Calendar cal = Calendar.getInstance();*
>>>
>>> *cal.add(Calendar.DATE, -1); *
>>>
>>> *return
>>> dateFormat.format(cal.getTime()).toString(); *
>>>
>>> *} *
>>>
>>> *}*
>>>
>>>
>>>
>>>
>>>
>>> So whenever I try to run the query like below by adding the jar to
>>> classpath and creating the temporary function yesterdaydate, I always get
>>> zero result back-
>>>
>>>
>>>
>>> hive> create temporary function *yesterdaydate* as
>>> 'com.example.hive.udf.YesterdayDate';
>>>
>>> OK
>>>
>>> Time taken: 0.512 seconds
>>>
>>>
>>>
>>> Below is the query I am running-
>>>
>>>
>>>
>>> *hive> SELECT * FROM REALTIME where dt= yesterdaydate('MMdd') LIMIT
>>> 10;*
>>>
>>> *OK*
>>>
>>> * *
>>>
>>> And I always get zero result back but the data is there in that table
>>> for Aug 5th.**
>>>
>>>
>>>
>>> What wrong I am doing? Any suggestions will be appreciated.
>>>
>>>
>>>
>>>
>>>
>>> NOTE:- As I am working with Hive 0.6 so it doesn’t support variable
>>> substitution thing, so I cannot use hiveconf here and the above table has
>>> been partitioned on dt(date) column.**
>>>
>>
>>
>


-- 
Swarnim

Re: HIVE AND HBASE

2012-07-27 Thread kulkarni.swar...@gmail.com

If you are using the latest release (0.9), you would need atleast
hbase-0.92 installed. If you are using the CDH stack, I would recommend
recompiling hive with CDH dependencies to avoid any surprises. You can find
more information about it here[1].

[1] https://cwiki.apache.org/Hive/hbaseintegration.html

On Fri, Jul 27, 2012 at 11:30 AM, abhiTowson cal
wrote:

> hi all,
>
> I am trying to install HIVE and HBASE
>
> Is there any dependency needs to be installed ?
>
> what versions of HIVE and HBASE are compatible?
>
> Is there any good document on configuring HIVE with HBASE , can any
> please share?
>
> Regards
> Abhishek
>

-- 
Swarnim

Transitive dependencies with hive

2012-07-26 Thread kulkarni.swar...@gmail.com

Hello,

I know that a custom jar can be added to hive classpath via "--auxpath"
command. But for any transitive dependencies that my jar depends on, should
they be added explicitly to the classpath too? I tried doing that too, but
still get the "ClassNotFoundException" for classes in my transitive
dependency. Any suggestions?

Thanks,

-- 
Swarnim

HBaseSerDe

2012-07-25 Thread kulkarni.swar...@gmail.com

While going through some code for HBase/Hive Integration, I came across
this constructor:

public HBaseSerDe() throws SerDeException {

}

Basically, the constructor is doing nothing but throwing an exception.
Problem is fixing this now will be a non-passive change.

I couldn't really find an obvious reason for this to be there. Are there
any objections if I file a JIRA to remove this constructor?
-- 
Swarnim

Re: HBASE and HIVE Integration

2012-07-25 Thread kulkarni.swar...@gmail.com

Can you also post logs from "/tmp//hive.log". That might contain some
info on your job failure.

On Wed, Jul 25, 2012 at 8:28 AM, vijay shinde wrote:

> Hi Bejoy,
>
> Thanks for quick reply. Here are some additional details
>
> Cloudera Version - CDH3U4
>
> *hive-site.xml*
> **
> *
> hive.aux.jars.path
>
> file:///usr/lib/hive/lib/hive-hbase-handler-0.7.1-cdh3u2.jar,file:///usr/lib/hive/lib/hbase-0.90.4-cdh3u2.jar,file:///usr/lib/hive/lib/zookeeper-3.3.1.jar,file:///usr/lib/hive/lib/hive-contrib-0.7.1-cdh3u2.jar
> 
> *
> *Execution Log*
>
>
> 1. *start zookeeper*
>
> [root@localhost zookeeper]# ./bin/zkServer.sh start
>
>
>
> 2. *start hbase*
>
>
>
> 3. *start hive. I am setting hive jars in hive-site.xml*
>
>
>
> ./bin/hive -hiveconf hbase.master=127.0.1.1:60010
>
>
> 4. *Create new HBase table which is to be managed by Hive*
> **
>
> CREATE TABLE hive_hbasetable_k(key int, value string)
>
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
>
> TBLPROPERTIES ("hbase.table.name" = "hivehbasek");
>
>
>
> 5. *Create a logical table pokes in Hive*
>
> CREATE TABLE pokes (foo INT, bar STRING);
>
> 6. *HIve error while inserting the data from Hive Poke table to HBASE
> table*
>
> *hive> INSERT OVERWRITE TABLE hive_hbasetable_k SELECT * FROM pokes WHERE
> foo=98;*
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_201207250246_0005, Tracking URL =
> http://0.0.0.0:50030/jobdetails.jsp?jobid=job_201207250246_0005
> Kill Command = /usr/lib/hadoop-0.20/bin/hadoop job  -Dmapred.job.tracker=
> 0.0.0.0:8021 -kill job_201207250246_0005
> 2012-07-25 04:26:00,198 Stage-0 map = 0%,  reduce = 0%
> 2012-07-25 04:27:00,767 Stage-0 map = 0%,  reduce = 0%
> 2012-07-25 04:27:08,844 Stage-0 map = 100%,  reduce = 100%
> Ended Job = job_201207250246_0005 with errors
>
> *FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapRedTask
> *
> Let me know if you need any additonal information.
>
> Thanks,
> Vijay
>
> On Wed, Jul 25, 2012 at 5:30 AM, Bejoy KS  wrote:
>
>> **
>> Hi Vijay
>>
>> Can you share more details like
>>
>> The CDH Version/Hive version you are using
>>
>> Steps you followed for hive hbase integration with the values you set
>>
>> The DDL used for hive hbase integration
>>
>> The actual error from failed map reduce task
>>
>> Regards
>> Bejoy KS
>>
>> Sent from handheld, please excuse typos.
>> --
>> *From: *vijay shinde 
>> *Date: *Wed, 25 Jul 2012 04:45:41 -0400
>> *To: *
>> *ReplyTo: *user@hive.apache.org
>> *Subject: *HBASE and HIVE Integration
>>
>> I am facing issue while executing HIVE queries on HBASE-HIVE integration.
>> I followed the wiki hbase-hive integration
>> https://cwiki.apache.org/Hive/hbaseintegration.html
>>
>> I have already passed all the required jars for auxpath in hive-site.xml
>> file.
>> I am using Cloudera CDH demo VM.. Any help would be highly appreciated
>>
>> hive> INSERT OVERWRITE TABLE hive_hbasetable_k SELECT * FROM pokes WHERE
>> foo=98;
>> Total MapReduce jobs = 1
>> Launching Job 1 out of 1
>> Number of reduce tasks is set to 0 since there's no reduce operator
>> Starting Job = job_201207250246_0005, Tracking URL =
>> http://0.0.0.0:50030/jobdetails.jsp?jobid=job_201207250246_0005
>> Kill Command = /usr/lib/hadoop-0.20/bin/hadoop job  -Dmapred.job.tracker=
>> 0.0.0.0:8021 -kill job_201207250246_0005
>> 2012-07-25 04:26:00,198 Stage-0 map = 0%,  reduce = 0%
>> 2012-07-25 04:27:00,767 Stage-0 map = 0%,  reduce = 0%
>> 2012-07-25 04:27:08,844 Stage-0 map = 100%,  reduce = 100%
>> Ended Job = job_201207250246_0005 with errors
>> FAILED: Execution Error, return code 2 from
>> org.apache.hadoop.hive.ql.exec.MapRedTask
>> hive>
>>
>>
>


-- 
Swarnim

Re: Composite Key Handling in Hbase + Hive Integration

2012-07-24 Thread kulkarni.swar...@gmail.com

Try something like this:

CREATE EXTERNAL TABLE hbase_table_1(key struct,
value string)

ROW FORMAT DELIMITED

COLLECTION ITEMS TERMINATED BY '~'

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

WITH SERDEPROPERTIES ("hbase.columns.mapping" =
":key,test-family:test-qual")

TBLPROPERTIES ("hbase.table.name" = "SIMPLE_TABLE");

Basically what you are doing here is that you are visualizing the composite
key as a struct and specifying that your keys in the composite key are
separated by a "~". After doing this, to GROUP BY any key in your composite
key, you simply run a query like:

select * from hbase_table_2 GROUP BY key.a;

This should give you your desired result.

Let me know if this works for you. We can then add this as a workaround on
that bug.

On Tue, Jul 24, 2012 at 2:14 AM, ankit kinra  wrote:

> Hi,
>
> I have a use case in HBase + Hive Integration where HBase primary key is a
> composite key and the keys is separated by us with a custom delimiter. So
> basically it is Key = A~B~C.
>  Now, I wanted to run a query on this HBase table using Hive and group by
> "A" (and not the complete primary key). I went through the following
> presentation :
>
> https://docs.google.com/viewer?a=v&q=cache:GHg9GMFOZVwJ:assets.en.oreilly.com/1/event/61/HBase%2520and%2520Hive%2520at%2520StumbleUpon%2520Presentation.ppt+hbase+composite+key+hive&hl=en&gl=us&pid=bl&srcid=ADGEEShTyoUXyvXptTu4pMjje_FkaN_j1OK9wG0lclWWsKNjGreLTkk3IDqT16xO8ClqIfzhM69aeU7Gph4kZPxTS-PXvLiWPSRvgS2WEjnvViPJhpM0ItsLaTWq1DRuUgOzKhjSzIlx&sig=AHIEtbT4scO3IdtvLYG3RtLoKN5gG1udPg
>
> It says that this was implemented at StumbleUpon, anybody having any idea
> if that can be used by others.
>
> Also, there is this issue in JIRA :
> https://issues.apache.org/jira/browse/HIVE-2599 which talks about similar
> feature.
>
> So it would be very helpful if anyone can give me some idea regarding this.
>
> Regards,
> Ankit Kinra
>
>


-- 
Swarnim

Structs in Hive

2012-07-23 Thread kulkarni.swar...@gmail.com

Hello,

I kind of have a pretty basic question here. I am trying to read structs
stored in HBase to be read by Hive. In what format should these structs be
written so that they can be read?

For instance, if my query has the following struct:

s struct

How should I be writing my data in HBase so that when read, it fits into
this struct? In other words, can I create my own class 'MyStruct' which is
something like:

class MyStruct{
   string a;
   string b;
}

to create the struct bytes and read them using hive with the struct defined
above? I hope I made my question clear. I will be glad to provide any
clarifications.

Thanks,

-- 
Swarnim

Re: Converting timestamp to date format

2012-07-20 Thread kulkarni.swar...@gmail.com

BIGINT is 8 bytes whereas INT is 4 bytes. Timestamps are usually of "long"
type. To avoid loss of precision, I would recommend BIGINT.

On Fri, Jul 20, 2012 at 4:52 PM, Tech RJ  wrote:

> What is the difference between these two? Trying to convert timestamps to
> full date format. The only difference is BIGINT and INT
> *
> *
> from_unixtime(cast(prod_and_ts.timestamps as *BIGINT*))
> *
> *
> *OR*
>
> from_unixtime(cast(prod_and_ts.timestamps as *INT*))
>
>
>
> Which one should I use to get accurate result?
>
>
>
>
>

-- 
Swarnim

Re: Hive 0.10 release date

2012-07-20 Thread kulkarni.swar...@gmail.com

Thanks for the advice Edward. That makes sense to me.

As a side note, while doing some searching, I stumbled upon your blog[1]
regarding the release which made me even more curious. :)

[1]
http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/when_hive_0_10_release

On Fri, Jul 20, 2012 at 4:32 PM, Edward Capriolo wrote:

> Usually it is ok to build a trunk, we do not do anything extra special
> for releases other then some basic sanity testing and cut an svn tag.
> (we run the full unit tests every commit). The only time this advice
> is not true is if there are some metastore changes, however the
> scripts to handle the upgrades are usually added with the changes. (So
> do not just blindly take upgrade advice without trying it in staging
> first and backing up your metastore)
>
> Edward
>
> On Fri, Jul 20, 2012 at 5:20 PM, kulkarni.swar...@gmail.com
>  wrote:
> > Hello,
> >
> > I totally understand that usually open source projects to not have a
> fixed
> > date for release but I was just curious if something was chalked out for
> > releasing hive 0.10 out in the wild. There are some really interesting
> > additions that I am looking forward to.
> >
> > Thanks,
> >
> > --
> > Swarnim
>



-- 
Swarnim

Hive 0.10 release date

2012-07-20 Thread kulkarni.swar...@gmail.com

Hello,

I totally understand that usually open source projects to not have a fixed
date for release but I was just curious if something was chalked out for
releasing hive 0.10 out in the wild. There are some really interesting
additions that I am looking forward to.

Thanks,

-- 
Swarnim

Re: Disc quota exceeded

2012-07-20 Thread kulkarni.swar...@gmail.com

*rpool/tmp   10G10G 0K   100%/tmp*
*
*
This might be the source of your problem as I mentioned earlier. Try
freeing some space here and then try again.

On Fri, Jul 20, 2012 at 11:34 AM, comptech geeky wrote:

> After trying "df -kh". I got below result.
>
> *bash-3.00$ df -kh*
> *Filesystem size   used  avail capacity  Mounted on*
> *rpool/ROOT/sol10   916G30G   668G 5%/*
> */devices 0K 0K 0K 0%/devices*
> *ctfs 0K 0K 0K 0%/system/contract*
> *proc 0K 0K 0K 0%/proc*
> *mnttab   0K 0K 0K 0%/etc/mnttab*
> *swap31G   656K31G 1%/etc/svc/volatile*
> *objfs0K 0K 0K 0%/system/object*
> *sharefs  0K 0K 0K 0%/etc/dfs/sharetab*
> */usr/lib/libc/libc_hwcap2.so.1*
> *   698G30G   668G 5%/lib/libc.so.1*
> *fd   0K 0K 0K 0%/dev/fd*
> *rpool/ROOT/sol10/var20G10G   9.7G52%/var*
> *rpool/tmp   10G10G 0K   100%/tmp*
>  *swap31G20K31G 1%/var/run*
> *lvsaishdc3in0001data/data*
> *32T27T   2.4T92%/data*
> *lvsaishdc3in0001data/data/b_apdpds*
> *   1.0T   8.5G  1016G 1%/data/b_apdpds*
> *lvsaishdc3in0001data/data/b_bids*
> *   100G75G25G76%/data/b_bids*
> *lvsaishdc3in0001data/data/b_sbe*
> *   100G51K   100G 1%/data/b_sbe*
> *lvsaishdc3in0001data/data/b_selling*
> *   500G   298G   202G60%/data/b_selling*
> *lvsaishdc3in0001data/data/imk*
> *   3.0T   2.7T   293G91%/data/inbound/sq/imk*
> *rpool/export   916G23K   668G 1%/export*
> *rpool/export/home  175G   118G57G68%/export/home*
> *rpool  916G    34K   668G 1%/rpool*
> *
> *
>
>
> On Fri, Jul 20, 2012 at 7:42 AM, kulkarni.swar...@gmail.com <
> kulkarni.swar...@gmail.com> wrote:
>
>> Seems to me like you might be just running out of disk space on one of
>> the partitions. What does the output of "df -kh" say?
>>
>> Also, I just speculate that it might be your "/tmp" directory out of
>> space because that is where hive tries to dump a bunch of log entries
>> before it starts up. (/tmp//hive.log).
>>
>>
>> On Fri, Jul 20, 2012 at 3:12 AM, comptech geeky 
>> wrote:
>>
>>> Whenever I am typing Hive at the command prompt, I am getting the below
>>> exception. What does that mean?
>>> *
>>> *
>>> *$ bash*
>>> *bash-3.00$ hive*
>>> *Exception in thread "main" java.io.IOException: Disc quota exceeded*
>>> *at java.io.UnixFileSystem.createFileExclusively(Native Method)*
>>> *at java.io.File.checkAndCreate(File.java:1704)*
>>> *at java.io.File.createTempFile(File.java:1792)*
>>> *at org.apache.hadoop.util.RunJar.main(RunJar.java:115)*
>>> *bash-3.00$*
>>>
>>> Any suggestions why is it happening?
>>>
>>
>>
>>
>> --
>> Swarnim
>>
>
>


-- 
Swarnim

Re: Performance tuning a hive query

2012-07-19 Thread kulkarni.swar...@gmail.com

Couple to add to the list:

Indexing[1]
Columnar Storage/RCFile[2]

[1] https://cwiki.apache.org/confluence/display/Hive/IndexDev
[2]
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-4.pdf

On Thu, Jul 19, 2012 at 8:39 AM, Jan Dolinár  wrote:

> There are many ways, but beware that some of them may result in worse
> performance when used inappropriately.
>
> Some of the settings we use to achieve faster queries:
> hive.map.aggr=true
> hive.exec.parallel=true
> hive.exec.compress.intermediate=true
> mapred.job.reuse.jvm.num.tasks=-1
>
> Structuring the queries properly can help a lot. For example if you
> eliminate unneeded data early in the query before further processing. E.g.
> if you use subquery in FROM, you should put all WHERE clauses where
> possible into the subquery, to eliminate the amount of data passed to the
> next stage.
>
> Using multi-group-by queries helps a lot when computing multiple queries
> on same set of data.
>
> As Nitin Pawar mentioned, the JOINs can be often optimized as well.
>
> Also, fine tuning the hadoop server itself for your specific needs might
> help.
>
> I am very interested in optimization of queries as well, so if anyone
> knows some more tricks, please share...
>
> J. Dolinar
>
>
>
> On Thu, Jul 19, 2012 at 3:24 PM, Abhishek wrote:
>
>>
>> Apart from partitions and buckets how to improve of hive queries
>> *
>> *
>> *Regards
>> *
>> Abhi
>> Sent from my iPhone
>>
>
>


-- 
Swarnim

Re: HADOOP_HOME requirement

2012-07-18 Thread kulkarni.swar...@gmail.com

My main concern here was that HADOOP_HOME is deprecated since hadoop 0.23.
So I was hoping it could actually function as documented.

FWIW, I found this bug[1] that addresses exactly this issue. The attached
patch makes HADOOP_HOME not required and auto-detects hadoop from the path.
This seems to have ben patched to 0.10.0.

[1] https://issues.apache.org/jira/browse/HIVE-2757

On Wed, Jul 18, 2012 at 12:50 PM, kulkarni.swar...@gmail.com <
kulkarni.swar...@gmail.com> wrote:

> Hm. Yeah I tried out with a few version 0.7 -> 0.9 and seems like they all
> do. May be we should just update the documentation then?
>
>
> On Wed, Jul 18, 2012 at 12:34 PM, Vinod Singh wrote:
>
>> We are using Hive 0.7.1 and there  HADOOP_HOME must be exported so that
>> it is available as environment variable.
>>
>> Thanks,
>> Vinod
>>
>>
>> On Wed, Jul 18, 2012 at 10:48 PM, Nitin Pawar wrote:
>>
>>> from hive trunk i can only see this
>>> I am not sure I am 100% sure but I remember setting up HADOOP_HOME always
>>>
>>>
>>> http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java
>>>
>>>
>>>   String hadoopExec = conf.getVar(HiveConf.ConfVars.HADOOPBIN);
>>>
>>> this change was introduced in 0.8
>>>
>>> from 
>>> http://svn.apache.org/repos/asf/hive/branches/branch-0.9/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
>>>  
>>> <http://svn.apache.org/repos/asf/hive/branches/branch-0.8/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java>
>>>
>>> HADOOPBIN("hadoop.bin.path", System.getenv("HADOOP_HOME") + "/bin/hadoop"),
>>>
>>> On Wed, Jul 18, 2012 at 10:38 PM, kulkarni.swar...@gmail.com <
>>> kulkarni.swar...@gmail.com> wrote:
>>>
>>>> 0.9
>>>>
>>>>
>>>> On Wed, Jul 18, 2012 at 12:04 PM, Nitin Pawar 
>>>> wrote:
>>>>
>>>>> this also depends on what version of hive you are using
>>>>>
>>>>>
>>>>> On Wed, Jul 18, 2012 at 10:33 PM, kulkarni.swar...@gmail.com <
>>>>> kulkarni.swar...@gmail.com> wrote:
>>>>>
>>>>>> Thanks for your reply nitin.
>>>>>>
>>>>>> Ok. So you mean we always need to set HADOOP_HOME irrespective of
>>>>>> "hadoop" is on the path or not. Correct?
>>>>>>
>>>>>> Little confused because that contradicts what's mentioned here[1].
>>>>>>
>>>>>> [1]
>>>>>> https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-RunningHive
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> On Wed, Jul 18, 2012 at 11:59 AM, Nitin Pawar <
>>>>>> nitinpawar...@gmail.com> wrote:
>>>>>>
>>>>>>> This is not a bug.
>>>>>>>
>>>>>>> even if hadoop was path, hive does not use it.
>>>>>>> Hive internally uses HADOOP_HOME in the code base. So you will
>>>>>>> always need to set that for hive.
>>>>>>> Where as for HADOOP clusters, HADOOP_HOME is deprecated but hive
>>>>>>> still needs it.
>>>>>>>
>>>>>>> Don't know if that answers your question
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Nitin
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 18, 2012 at 10:01 PM, kulkarni.swar...@gmail.com <
>>>>>>> kulkarni.swar...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> The hive documentation states that either HADOOP_HOME should be set
>>>>>>>> or hadoop should be on the path. However for some cases, where 
>>>>>>>> HADOOP_HOME
>>>>>>>> was not set but hadoop was on path, I have seen this error pop up:
>>>>>>>>
>>>>>>>> java.io.IOException: *Cannot run program "null/bin/hadoop" *(in
>>>>>>>> directory "/root/swarnim/hive-0.9.0-cern1-SNAPSHOT"): 
>>>>>>>> java.io.IOException:
>>>>>>>> error=2, No such file or directory
>>>>>>>>  at ja

Re: HADOOP_HOME requirement

2012-07-18 Thread kulkarni.swar...@gmail.com

Hm. Yeah I tried out with a few version 0.7 -> 0.9 and seems like they all
do. May be we should just update the documentation then?

On Wed, Jul 18, 2012 at 12:34 PM, Vinod Singh  wrote:

> We are using Hive 0.7.1 and there  HADOOP_HOME must be exported so that
> it is available as environment variable.
>
> Thanks,
> Vinod
>
>
> On Wed, Jul 18, 2012 at 10:48 PM, Nitin Pawar wrote:
>
>> from hive trunk i can only see this
>> I am not sure I am 100% sure but I remember setting up HADOOP_HOME always
>>
>>
>> http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java
>>
>>
>>   String hadoopExec = conf.getVar(HiveConf.ConfVars.HADOOPBIN);
>>
>> this change was introduced in 0.8
>>
>> from 
>> http://svn.apache.org/repos/asf/hive/branches/branch-0.9/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
>>  
>> <http://svn.apache.org/repos/asf/hive/branches/branch-0.8/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java>
>>
>> HADOOPBIN("hadoop.bin.path", System.getenv("HADOOP_HOME") + "/bin/hadoop"),
>>
>> On Wed, Jul 18, 2012 at 10:38 PM, kulkarni.swar...@gmail.com <
>> kulkarni.swar...@gmail.com> wrote:
>>
>>> 0.9
>>>
>>>
>>> On Wed, Jul 18, 2012 at 12:04 PM, Nitin Pawar 
>>> wrote:
>>>
>>>> this also depends on what version of hive you are using
>>>>
>>>>
>>>> On Wed, Jul 18, 2012 at 10:33 PM, kulkarni.swar...@gmail.com <
>>>> kulkarni.swar...@gmail.com> wrote:
>>>>
>>>>> Thanks for your reply nitin.
>>>>>
>>>>> Ok. So you mean we always need to set HADOOP_HOME irrespective of
>>>>> "hadoop" is on the path or not. Correct?
>>>>>
>>>>> Little confused because that contradicts what's mentioned here[1].
>>>>>
>>>>> [1]
>>>>> https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-RunningHive
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> On Wed, Jul 18, 2012 at 11:59 AM, Nitin Pawar >>>> > wrote:
>>>>>
>>>>>> This is not a bug.
>>>>>>
>>>>>> even if hadoop was path, hive does not use it.
>>>>>> Hive internally uses HADOOP_HOME in the code base. So you will always
>>>>>> need to set that for hive.
>>>>>> Where as for HADOOP clusters, HADOOP_HOME is deprecated but hive
>>>>>> still needs it.
>>>>>>
>>>>>> Don't know if that answers your question
>>>>>>
>>>>>> Thanks,
>>>>>> Nitin
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 18, 2012 at 10:01 PM, kulkarni.swar...@gmail.com <
>>>>>> kulkarni.swar...@gmail.com> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> The hive documentation states that either HADOOP_HOME should be set
>>>>>>> or hadoop should be on the path. However for some cases, where 
>>>>>>> HADOOP_HOME
>>>>>>> was not set but hadoop was on path, I have seen this error pop up:
>>>>>>>
>>>>>>> java.io.IOException: *Cannot run program "null/bin/hadoop" *(in
>>>>>>> directory "/root/swarnim/hive-0.9.0-cern1-SNAPSHOT"): 
>>>>>>> java.io.IOException:
>>>>>>> error=2, No such file or directory
>>>>>>>  at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
>>>>>>> at java.lang.Runtime.exec(Runtime.java:593)
>>>>>>>  at java.lang.Runtime.exec(Runtime.java:431)
>>>>>>> at
>>>>>>> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:268)
>>>>>>>  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134)
>>>>>>> at
>>>>>>> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>>>>>>>  at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1326)
>>>>>>> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1118)
>>>>>>>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)
>>>>>>

Re: HADOOP_HOME requirement

2012-07-18 Thread kulkarni.swar...@gmail.com

0.9

On Wed, Jul 18, 2012 at 12:04 PM, Nitin Pawar wrote:

> this also depends on what version of hive you are using
>
>
> On Wed, Jul 18, 2012 at 10:33 PM, kulkarni.swar...@gmail.com <
> kulkarni.swar...@gmail.com> wrote:
>
>> Thanks for your reply nitin.
>>
>> Ok. So you mean we always need to set HADOOP_HOME irrespective of
>> "hadoop" is on the path or not. Correct?
>>
>> Little confused because that contradicts what's mentioned here[1].
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-RunningHive
>>
>>
>> Thanks,
>>
>> On Wed, Jul 18, 2012 at 11:59 AM, Nitin Pawar wrote:
>>
>>> This is not a bug.
>>>
>>> even if hadoop was path, hive does not use it.
>>> Hive internally uses HADOOP_HOME in the code base. So you will always
>>> need to set that for hive.
>>> Where as for HADOOP clusters, HADOOP_HOME is deprecated but hive still
>>> needs it.
>>>
>>> Don't know if that answers your question
>>>
>>> Thanks,
>>> Nitin
>>>
>>>
>>> On Wed, Jul 18, 2012 at 10:01 PM, kulkarni.swar...@gmail.com <
>>> kulkarni.swar...@gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> The hive documentation states that either HADOOP_HOME should be set or
>>>> hadoop should be on the path. However for some cases, where HADOOP_HOME was
>>>> not set but hadoop was on path, I have seen this error pop up:
>>>>
>>>> java.io.IOException: *Cannot run program "null/bin/hadoop" *(in
>>>> directory "/root/swarnim/hive-0.9.0-cern1-SNAPSHOT"): java.io.IOException:
>>>> error=2, No such file or directory
>>>>  at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
>>>> at java.lang.Runtime.exec(Runtime.java:593)
>>>>  at java.lang.Runtime.exec(Runtime.java:431)
>>>> at
>>>> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:268)
>>>>  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134)
>>>> at
>>>> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>>>>  at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1326)
>>>> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1118)
>>>>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)
>>>> at
>>>> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
>>>>  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
>>>> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
>>>>  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689)
>>>> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
>>>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>  at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>>>  at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>>>>
>>>> Digging into the code in MapRedTask.java, I found the following
>>>> (simplified):
>>>>
>>>> String *hadoopExec* = conf.getVar(System.getenv("HADOOP_HOME") +
>>>> "/bin/hadoop");
>>>> ...
>>>>
>>>> Runtime.getRuntime().exec(*hadoopExec*, env, new File(workDir));
>>>>
>>>> Clearly, if HADOOP_HOME is not set, the command that it would try to
>>>> execute is "null/bin/hadoop" which is exactly the exception I am getting.
>>>>
>>>> Has anyone else run into this before? Is this a bug?
>>>>
>>>> Thanks,
>>>> --
>>>> Swarnim
>>>>
>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>>
>>
>>
>> --
>> Swarnim
>>
>
>
>
> --
> Nitin Pawar
>
>


-- 
Swarnim

Re: HADOOP_HOME requirement

2012-07-18 Thread kulkarni.swar...@gmail.com

Thanks for your reply nitin.

Ok. So you mean we always need to set HADOOP_HOME irrespective of "hadoop"
is on the path or not. Correct?

Little confused because that contradicts what's mentioned here[1].

[1]
https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-RunningHive


Thanks,

On Wed, Jul 18, 2012 at 11:59 AM, Nitin Pawar wrote:

> This is not a bug.
>
> even if hadoop was path, hive does not use it.
> Hive internally uses HADOOP_HOME in the code base. So you will always need
> to set that for hive.
> Where as for HADOOP clusters, HADOOP_HOME is deprecated but hive still
> needs it.
>
> Don't know if that answers your question
>
> Thanks,
> Nitin
>
>
> On Wed, Jul 18, 2012 at 10:01 PM, kulkarni.swar...@gmail.com <
> kulkarni.swar...@gmail.com> wrote:
>
>> Hello,
>>
>> The hive documentation states that either HADOOP_HOME should be set or
>> hadoop should be on the path. However for some cases, where HADOOP_HOME was
>> not set but hadoop was on path, I have seen this error pop up:
>>
>> java.io.IOException: *Cannot run program "null/bin/hadoop" *(in
>> directory "/root/swarnim/hive-0.9.0-cern1-SNAPSHOT"): java.io.IOException:
>> error=2, No such file or directory
>>  at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
>> at java.lang.Runtime.exec(Runtime.java:593)
>>  at java.lang.Runtime.exec(Runtime.java:431)
>> at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:268)
>>  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134)
>> at
>> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>>  at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1326)
>> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1118)
>>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)
>> at
>> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
>>  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
>> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
>>  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689)
>> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>  at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>>  at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>>
>> Digging into the code in MapRedTask.java, I found the following
>> (simplified):
>>
>> String *hadoopExec* = conf.getVar(System.getenv("HADOOP_HOME") +
>> "/bin/hadoop");
>> ...
>>
>> Runtime.getRuntime().exec(*hadoopExec*, env, new File(workDir));
>>
>> Clearly, if HADOOP_HOME is not set, the command that it would try to
>> execute is "null/bin/hadoop" which is exactly the exception I am getting.
>>
>> Has anyone else run into this before? Is this a bug?
>>
>> Thanks,
>> --
>> Swarnim
>>
>
>
>
> --
> Nitin Pawar
>
>


-- 
Swarnim

Re: Hive and CDH4 GA

2012-07-18 Thread kulkarni.swar...@gmail.com

To follow up on this one, I figured out that the bug resolved here [1] was
the main source of my problems. It basically changed the definition of
NetUtils#getInputStream method causing a "NoSuchMethodError" to be thrown
downstream. I upgraded hive 0.9 to use the hbase jar packaged with CDH4 GA
release (hbase-0.92.1-cdh4.0.0.jar) and everything was good then.

[1] https://issues.apache.org/jira/browse/HADOOP-8350

On Mon, Jul 16, 2012 at 5:08 PM, kulkarni.swar...@gmail.com <
kulkarni.swar...@gmail.com> wrote:

> Yeah. I did override hadoop.security.version to 2.0.0-alpha. That gives me
> a whole bunch of compilation errors in HadoopShimsSecure.java
>
> [javac]
> /Users/sk018283/git-repo/hive/shims/src/common-secure/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java:37:
> package org.apache.hadoop.mapred does not exist
> [javac] import org.apache.hadoop.mapred.ClusterStatus;
> [javac]^
> [javac]
> /Users/sk018283/git-repo/hive/shims/src/common-secure/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java:38:
> package org.apache.hadoop.mapred does not exist
> [javac] import org.apache.hadoop.mapred.FileInputFormat;
> [javac]^
> [javac]
> /Users/sk018283/git-repo/hive/shims/src/common-secure/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java:39:
> package org.apache.hadoop.mapred does not exist
> [javac] import org.apache.hadoop.mapred.InputFormat;
> many more..
>
> On Mon, Jul 16, 2012 at 4:48 PM, Ted Yu  wrote:
>
>> I see the following in build.properties :
>>
>> hadoop.version=${hadoop-0.20.version}
>> hadoop.security.version=${hadoop-0.20S.version}
>>
>> Have you tried to override the above property values when building ?
>>
>> If it still fails, please comment on 
>> HIVE-3029<https://issues.apache.org/jira/browse/HIVE-3029>
>> .
>>
>> Thanks
>>
>>
>> On Mon, Jul 16, 2012 at 2:42 PM, kulkarni.swar...@gmail.com <
>> kulkarni.swar...@gmail.com> wrote:
>>
>>> I found this issue [1] and applied the patch but still the issue
>>> persists.
>>>
>>> Any different way that I should be creating my assembly (currently just
>>> doing "ant clean tar") so that it works with hadoop 2.0.0 on its classpath?
>>>
>>> Any help is appreciated.
>>>
>>> Thanks,
>>>
>>> [1] https://issues.apache.org/jira/browse/HIVE-3029
>>>
>>>
>>> On Fri, Jul 13, 2012 at 10:27 PM, Ted Yu  wrote:
>>>
>>>> See
>>>> https://issues.apache.org/jira/browse/HADOOP-8350?focusedCommentId=13414276&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13414276
>>>>
>>>> Cheers
>>>>
>>>>
>>>> On Fri, Jul 13, 2012 at 12:38 PM, kulkarni.swar...@gmail.com <
>>>> kulkarni.swar...@gmail.com> wrote:
>>>>
>>>>> Has anyone being using hive 0.9.0 release with the CDH4 GA release? I
>>>>> keep hitting this exception on its interaction with HBase.
>>>>>
>>>>> java.lang.NoSuchMethodError:
>>>>> org.apache.hadoop.net.NetUtils.getInputStream(Ljava/net/Socket;)Ljava/io/InputStream;
>>>>>  at
>>>>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:363)
>>>>> at
>>>>> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1026)
>>>>>  at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:878)
>>>>> at
>>>>> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
>>>>>  at $Proxy9.getProtocolVersion(Unknown Source)
>>>>> at
>>>>> org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183)
>>>>>  at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303)
>>>>> at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280)
>>>>>  at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332)
>>>>> at
>>>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:642)
>>>>>
>>>>> --
>>>>> Swarnim
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Swarnim
>>>
>>
>>
>
>
> --
> Swarnim
>



-- 
Swarnim

Re: ipc.RemoteException in Hive

2012-07-17 Thread kulkarni.swar...@gmail.com

"select *" queries don't really run a M/R job. Rather directly hit HDFS to
grab the results. While "select count(*)" run mappers/reducers to perform
the count on the data. The former running and the latter not suspects
something might be wrong with your hadoop installation. Looking at the
stacktrace, it even seems like the user executing this query might not have
proper access.

Are you able to run simple M/R jobs with the installation? You might also
want to check on permissions.

On Tue, Jul 17, 2012 at 9:24 AM, Павел Мезенцев  wrote:

> Hello all!
>
> We have a trouble with hive.
> My colleague created table "as_test" in hive
> create external table as_test (line STRING) location '/logs/2012-07-16'
>
> Query
> select * from as_test limit 10;
> compliting successfully
>
> but query
> select count (1) from as_test limit 10;
> raise org.apache.hadoop.ipc.RemoteException.
> How we can fix it?
>
> Best regards,
> Pavel
>
> P.S. full stack trace:
>
> org.apache.hadoop.ipc.RemoteException: IPC server unable to read call
> parameters : readObject can't find class
> org.apache.hadoop.fs.permission.FsPermission$2
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1107)
>
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
>
> at $Proxy4.setPermission(Unknown Source)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
> java:39)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
> at java.lang.reflect.Method.invoke(Method.java:597)
>
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>
> at $Proxy4.setPermission(Unknown Source)
>
> at
> org.apache.hadoop.hdfs.DFSClient.setPermission(DFSClient.java:855)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.setPermission(DistributedFileSystem.java:560)
>
> at
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:123)
>
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:839)
>
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:396)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
>
> at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
>
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)
>
> at
> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:657)
>
> at
> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:123)
>
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>
> at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)
>
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)
>
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748)
>
> at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>
> at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
> at java.lang.reflect.Method.invoke(Method.java:597)
>
> at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>
> Job Submission failed with exception
> 'org.apache.hadoop.ipc.RemoteException(IPC server unable to read call
> parameters: readObject can't find class
> org.apache.hadoop.fs.permission.FsPermission$2)'
>
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.MapRedTask
>
>


-- 
Swarnim

Re: not able to access Hive web Interface

2012-07-17 Thread kulkarni.swar...@gmail.com

The problem is that it is probably looking for these files in HDFS instead
of your local file system. As a workaround, try creating that path on HDFS
and uploading these files there and see if it works. Also, try setting the
fs.default.name property in conf-site.xml to point to your local filesystem
instead of HDFS (file://.).

PS: You can also directly do this from the hive command line itself.

hive> SET fs.default.name='file://`pwd`

But this would be temporary and limited to your current session.

Hope that helps.

On Tue, Jul 17, 2012 at 7:36 AM,  wrote:

>  Hi all :-),
>
> Iam trying to access Hive Web Interface but it fails.
>
> I have this changes in hive-site.xml
>
>
> 
> 
> 
> hive.hwi.listen.host
> 0.0.0.0
> This is the host address the Hive Web Interface will
> listen on
> 
>
> 
> hive.hwi.listen.port
> 
> This is the port the Hive Web Interface will listen
> on
> 
>
> 
> hive.hwi.war.file
> /HADOOP/hive/lib/hive-hwi-0.8.1.war /*  (Here is
> the hive directory) */
> This is the WAR file with the jsp content for Hive
> Web Interface
> 
>
> 
>
>
>  
> ***
>
> And also export the ANT lib like.
>
> export ANT_LIB=/Yogesh/ant-1.8.4/lib
> export PATH=$PATH:$ANT_LIB
>
>
> now when i do run command
>
> hive --service hwi  it results
>
> 12/07/17 18:03:02 INFO hwi.HWIServer: HWI is starting up
> 12/07/17 18:03:02 WARN conf.HiveConf: DEPRECATED: Ignoring
> hive-default.xml found on the CLASSPATH at
> /HADOOP/hive/conf/hive-default.xml
> 12/07/17 18:03:02 FATAL hwi.HWIServer: HWI WAR file not found at
> /HADOOP/hive/lib/hive-hwi-0.8.1.war
>
>
> and if I go for
>
> hive --service hwi --help it results
>
> Usage ANT_LIB= hive --service hwi
>
>
> Althought if I go to /HADOOP/hive/lib directory I found
>
> 1) hive-hwi-0.8.1.war
> 2) hive-hwi-0.8.1.jar
>
> these files are present there.
>
> what is Iam doing wrong :-( ?
>
> Please help and Suggest
>
> Greetings
> Yogesh Kumar
>
>
>  * Please do not print this email unless it is absolutely necessary. *
>
> The information contained in this electronic message and any attachments
> to this message are intended for the exclusive use of the addressee(s) and
> may contain proprietary, confidential or privileged information. If you are
> not the intended recipient, you should not disseminate, distribute or copy
> this e-mail. Please notify the sender immediately and destroy all copies of
> this message and any attachments.
>
> WARNING: Computer viruses can be transmitted via email. The recipient
> should check this email and any attachments for the presence of viruses.
> The company accepts no liability for any damage caused by any virus
> transmitted by this email.
>
> www.wipro.com
>



-- 
Swarnim

Re: Hive and CDH4 GA

2012-07-16 Thread kulkarni.swar...@gmail.com

Yeah. I did override hadoop.security.version to 2.0.0-alpha. That gives me
a whole bunch of compilation errors in HadoopShimsSecure.java

[javac]
/Users/sk018283/git-repo/hive/shims/src/common-secure/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java:37:
package org.apache.hadoop.mapred does not exist
[javac] import org.apache.hadoop.mapred.ClusterStatus;
[javac]^
[javac]
/Users/sk018283/git-repo/hive/shims/src/common-secure/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java:38:
package org.apache.hadoop.mapred does not exist
[javac] import org.apache.hadoop.mapred.FileInputFormat;
[javac]^
[javac]
/Users/sk018283/git-repo/hive/shims/src/common-secure/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java:39:
package org.apache.hadoop.mapred does not exist
[javac] import org.apache.hadoop.mapred.InputFormat;
many more..

On Mon, Jul 16, 2012 at 4:48 PM, Ted Yu  wrote:

> I see the following in build.properties :
>
> hadoop.version=${hadoop-0.20.version}
> hadoop.security.version=${hadoop-0.20S.version}
>
> Have you tried to override the above property values when building ?
>
> If it still fails, please comment on 
> HIVE-3029<https://issues.apache.org/jira/browse/HIVE-3029>
> .
>
> Thanks
>
>
> On Mon, Jul 16, 2012 at 2:42 PM, kulkarni.swar...@gmail.com <
> kulkarni.swar...@gmail.com> wrote:
>
>> I found this issue [1] and applied the patch but still the issue persists.
>>
>> Any different way that I should be creating my assembly (currently just
>> doing "ant clean tar") so that it works with hadoop 2.0.0 on its classpath?
>>
>> Any help is appreciated.
>>
>> Thanks,
>>
>> [1] https://issues.apache.org/jira/browse/HIVE-3029
>>
>>
>> On Fri, Jul 13, 2012 at 10:27 PM, Ted Yu  wrote:
>>
>>> See
>>> https://issues.apache.org/jira/browse/HADOOP-8350?focusedCommentId=13414276&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13414276
>>>
>>> Cheers
>>>
>>>
>>> On Fri, Jul 13, 2012 at 12:38 PM, kulkarni.swar...@gmail.com <
>>> kulkarni.swar...@gmail.com> wrote:
>>>
>>>> Has anyone being using hive 0.9.0 release with the CDH4 GA release? I
>>>> keep hitting this exception on its interaction with HBase.
>>>>
>>>> java.lang.NoSuchMethodError:
>>>> org.apache.hadoop.net.NetUtils.getInputStream(Ljava/net/Socket;)Ljava/io/InputStream;
>>>>  at
>>>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:363)
>>>> at
>>>> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1026)
>>>>  at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:878)
>>>> at
>>>> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
>>>>  at $Proxy9.getProtocolVersion(Unknown Source)
>>>> at
>>>> org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183)
>>>>  at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303)
>>>> at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280)
>>>>  at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332)
>>>> at
>>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:642)
>>>>
>>>> --
>>>> Swarnim
>>>>
>>>
>>>
>>
>>
>> --
>> Swarnim
>>
>
>


-- 
Swarnim

Re: Hive and CDH4 GA

2012-07-16 Thread kulkarni.swar...@gmail.com

I found this issue [1] and applied the patch but still the issue persists.

Any different way that I should be creating my assembly (currently just
doing "ant clean tar") so that it works with hadoop 2.0.0 on its classpath?

Any help is appreciated.

Thanks,

[1] https://issues.apache.org/jira/browse/HIVE-3029

On Fri, Jul 13, 2012 at 10:27 PM, Ted Yu  wrote:

> See
> https://issues.apache.org/jira/browse/HADOOP-8350?focusedCommentId=13414276&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13414276
>
> Cheers
>
>
> On Fri, Jul 13, 2012 at 12:38 PM, kulkarni.swar...@gmail.com <
> kulkarni.swar...@gmail.com> wrote:
>
>> Has anyone being using hive 0.9.0 release with the CDH4 GA release? I
>> keep hitting this exception on its interaction with HBase.
>>
>> java.lang.NoSuchMethodError:
>> org.apache.hadoop.net.NetUtils.getInputStream(Ljava/net/Socket;)Ljava/io/InputStream;
>>  at
>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:363)
>> at
>> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1026)
>>  at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:878)
>> at
>> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
>>  at $Proxy9.getProtocolVersion(Unknown Source)
>> at
>> org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183)
>>  at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303)
>> at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280)
>>  at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332)
>> at
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:642)
>>
>> --
>> Swarnim
>>
>
>


-- 
Swarnim

Re: connection error

2012-07-16 Thread kulkarni.swar...@gmail.com

This error is more related to hadoop than hive. Looking at the exception,
it looks like your namenode is not running/configured properly. Check you
namenode log to see why it failed to start.

Swarnim


On Mon, Jul 16, 2012 at 2:53 AM, shaik ahamed  wrote:

> Hi All,
>
>How to rectify the below error
>
> FAILED: Hive Internal Error:
> java.lang.RuntimeException(java.net.ConnectException: Call to md-trngpoc1/
> 10.5.114.110:54310 failed on connection exception:
> java.net.ConnectException: Connection refused)
> java.lang.RuntimeException: java.net.ConnectException: Call to md-trngpoc1/
> 10.5.114.110:54310 failed on connection exception:
> java.net.ConnectException: Connection refused
> at
> org.apache.hadoop.hive.ql.Context.getScratchDir(Context.java:170)
> at
> org.apache.hadoop.hive.ql.Context.getMRScratchDir(Context.java:210)
> at
> org.apache.hadoop.hive.ql.Context.getMRTmpFileURI(Context.java:267)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1112)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7524)
> at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
> at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: java.net.ConnectException: Call to md-trngpoc1/
> 10.5.114.110:54310 failed on connection exception:
> java.net.ConnectException: Connection refused
> at org.apache.hadoop.ipc.Client.wrapException(Client.java:1099)
> at org.apache.hadoop.ipc.Client.call(Client.java:1075)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
> at $Proxy6.getProtocolVersion(Unknown Source)
> at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
> at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
> at
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:238)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:203)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
> at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
> at
> org.apache.hadoop.hive.ql.Context.getScratchDir(Context.java:163)
> ... 18 more
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
> at
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
> at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
> at
> org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1206)
> at org.apache.hadoop.ipc.Client.call(Client.java:1050)
> ... 32 more
> Please help me in this ...
>
> Thanks in advance
>
> Shaik
>



-- 
Swarnim

Re: [ANN] Hive-protobuf support

2012-07-14 Thread kulkarni.swar...@gmail.com

Hi Edward,

This project looks really good.

Internally, we also have been working on similar changes. Specifically,
enhancing the existing HIve/HBase Integration to support protobufs/thrifts
stored in HBase. Because of the need to specify explicit columns mapping
and number of issues faced [1] with getting the existing
ProtocolBuffersObjectInspector working with the latest protobuf 2.4.1, I
decided to write totally new ObjectInspectors to cleanly deserialize
protobufs and thrifts that use the provided reflections API to perform
deserialization and field extraction.

In short, some of the enhancements are:

1. Support thrift/protobuf stored in HBase using the new ObjectInspectors.
2. Auto generate the columns and column types using the provided
deserializer class by translating them into nested structs. (HIVE-3211)

Some of this stuff is still in development/testing phase. Once that is
done, I can have a patch for this enhancement up for review.

[1]
http://mail-archives.apache.org/mod_mbox/hive-user/201205.mbox/%3CCAENxBwxaSOq1=0u+keaj6NG_s8Zh6=rzvlz4p2ywge-uq+j...@mail.gmail.com%3E

Thanks,

On Sat, Jul 14, 2012 at 10:18 AM, Edward Capriolo wrote:

> Hello all,
>
> My employer, m6d.com, has given the thumbs up to open source our
> latest hive tool, hive-protobuf. We created this because we work with
> protobuf formats often and wanted to be able to directly log an query
> this types without writing one-off User Defined Functions or Input
> Formats.
>
> https://github.com/edwardcapriolo/hive-protobuf
>
> Hive-protobuf is much like the new avro support and the already
> existing thrift support. Here is how it works:
>
> if you have a sequence file with a serialized protobuf in the key and
> a serialized protobuf in the value, a table can be created that
> describes the data to hive. The table needs only be configured with
> the protobuf generated class name for the key and value and it turns
> the nested classes into nested structs.
>
> We eventually will migrate the project into core hive but we want to
> let it incubate in github for a time. (For example there is no support
> for union types at the moment, maybe other kinks or tunes). Please
> checkout the project and send pull requests if you have patches.
>
> Thank you,
> Edward
>

-- 
Swarnim

Re: Output from HiveQL query

2012-07-12 Thread kulkarni.swar...@gmail.com

Yes.

INSERT OVERWRITE DIRECTORY ''


would mean path on HDFS



INSERT OVERWRITE LOCAL DIRECTORY ''


would mean path on the local FS.

On Thu, Jul 12, 2012 at 2:01 PM, Raihan Jamal  wrote:

> Basically, I was assuming that whenever you do any HiveQL query, all the
> outputs gets stored somewhere in HDFS, in some path. But from this thread
> discussion, it means they are not getting stored anywhere in HDFS, but you
> can specify the path where do you want to store them. Is that right?
>
>
>
> *Raihan Jamal*
>
>
>
> On Thu, Jul 12, 2012 at 11:56 AM, Roberto Sanabria <
> robe...@stumbleupon.com> wrote:
>
>> Or you can output to a table and store it there.
>>
>>
>> On Thu, Jul 12, 2012 at 2:53 PM, VanHuy Pham wrote:
>>
>>> The output can be printed out on terminal when you run it, or can be
>>> stored if you specify a location for it through the operator >.
>>> For example:
>>>
>>> /home/smith/bin/hive -f "file containing hive sql" >
>>> /home/smith/documents/output.txt
>>>
>>> On Thu, Jul 12, 2012 at 11:31 AM, Raihan Jamal wrote:
>>>
 I have one question related to the output from the HiveQL query.
 Suppose I did some HiveQL query and I will be getting some output back, so
 those result set are getting stored somewhere in HDFS? If yes?

 1)  Then where they are getting stored and how can I access it?**

 2)  And is there any time limit on that meaning after this much
 particular time it will be deleted?**


 *Raihan Jamal*


>>>
>>
>


-- 
Swarnim

Re: Output from HiveQL query

2012-07-12 Thread kulkarni.swar...@gmail.com

By default, no. They will be displayed onto the console.

Try this to store them in HDFS:

INSERT OVERWRITE DIRECTORY '/tmp/hdfs_out' SELECT * FROM invites a
WHERE a.ds='2008-08-15';

The results of the query will be stored in '/tmp/hdfs_out' directory on HDFS.

I am not sure I understood your question about time limit. Once in
HDFS, it should be upto you to decide till when you want them to be
there right?

On Thu, Jul 12, 2012 at 1:31 PM, Raihan Jamal  wrote:

> I have one question related to the output from the HiveQL query. Suppose I
> did some HiveQL query and I will be getting some output back, so those
> result set are getting stored somewhere in HDFS? If yes?
>
> 1)  Then where they are getting stored and how can I access it?**
>
> 2)  And is there any time limit on that meaning after this much
> particular time it will be deleted?**
>
>
> *Raihan Jamal*
>
>

-- 
Swarnim

Re: Separators in struct

2012-07-12 Thread kulkarni.swar...@gmail.com

Issue logged. https://issues.apache.org/jira/browse/HIVE-3253

On Wed, Jul 11, 2012 at 4:13 PM, Edward Capriolo wrote:

> We surely can make it bigger. However there is a sublte problem. I
> have ran into maximum column length limitations in the metastore with
> heavily nested columns. MySQL varchar maxes etc. You should open a
> jirra issues on issues.apache.org/jira/hive
>
> Edward
>
> On Wed, Jul 11, 2012 at 5:10 PM, kulkarni.swar...@gmail.com
>  wrote:
> > Hello,
> >
> > I am not sure I understand the significance of separators very well in
> case
> > of structs. For instance, for deeply nested structs I usually hit this
> > exception:
> >
> > java.lang.ArrayIndexOutOfBoundsException: 9
> > at
> >
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:281)
> > at
> >
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
> > at
> >
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
> > at
> >
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
> > at
> >
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
> > at
> >
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
> > at
> >
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
> > at
> >
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
> > at
> >
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
> > at
> >
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyStructInspector(LazyFactory.java:354)
> >
> > Digging deeper into the code I found that the size of separators have
> been
> > hard-coded to be an array of size "8" with a comment on it.
> >
> > // Read the separators: We use 8 levels of separators by default, but we
> > should change this when we allow users to specify more than 10 levels of
> > separators through DDL.
> >
> > serdeParams.separators = new byte[8];
> >
> >
> > If someone can explain this to me, I would really appreciate that. Also
> is
> > there a way to change the number of separators so that this exception is
> not
> > thrown?
> >
> > Thanks,
> >
> >
> > --
> > Swarnim
>



-- 
Swarnim

Re: Casting exception while converting from "LazyDouble" to "LazyString"

2012-07-10 Thread kulkarni.swar...@gmail.com

Hi Kanna,

This might just mean that in your query you are having a STRING type for a
field which is actually a DOUBLE.

On Tue, Jul 10, 2012 at 3:05 PM, Kanna Karanam wrote:

>  Has anyone seen this error before? Am I missing anything here?
>
> ** **
>
> 2012-07-10 11:11:02,203 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201207091248_0107_m_00_0:
> java.lang.RuntimeException:
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
> processing row {"name":"zach johnson","age":77,"gpa":3.27}
>
> at
> org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)
>
> at
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>
> at
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
>
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)*
> ***
>
> at org.apache.hadoop.mapred.Child$4.run(Child.java:271)***
> *
>
> at java.security.AccessController.doPrivileged(Native
> Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:396)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1124)
> 
>
> at org.apache.hadoop.mapred.Child.main(Child.java:265)
>
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
> Error while processing row {"name":"zach johnson","age":77,"gpa":3.27}
>
> at
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)**
> **
>
> at
> org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
>
> ... 8 more
>
> Caused by: java.lang.ClassCastException:
> org.apache.hadoop.hive.serde2.lazy.LazyDouble cannot be cast to
> org.apache.hadoop.hive.serde2.lazy.LazyString
>
> at
> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyStringObjectInspector.getPrimitiveWritableObject(LazyStringObjectInspector.java:47)
> 
>
> at
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:351)
> 
>
> at
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:255)
> 
>
> at
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:202)
> 
>
> at
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:236)
> 
>
> at
> org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
>
> at
> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
>
> at
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
> 
>
> at
> org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
>
> at
> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
>
> at
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531)**
> **
>
> ** **
>
> ** **
>
> Thanks,
>
> Kanna
>



-- 
Swarnim

Re: protobuf 2.4.1 and ObjectInspector

2012-06-28 Thread kulkarni.swar...@gmail.com

Thanks Ed for your reply on this.

I specifically have been working on enhancing the existing hbase hive
integration to handle advanced proto/thrift structures and also logged [1]
in the process. I wasn't able to use the existing
ProtocolBuffersObjectInspector for this purpose as they extend the
ReflectionStructObjectInspector which I have seen to fail in some cases. I
decided to provided the struct schema explicitly in the CREATE EXTERNAL
TABLE command itself and then use that to lazily deserialize the structure.

This approach worked well for most cases until I hit more complicated cases
like circular references and nested repeated messages in protobufs. I
suspect that reflections "might" have worked better for such complicated
cases.

You mentioned writing a serde that takes the protobuf objects directly and
deserialize them. Did you mean to use reflections or follow a method
similar to one I described above? Any suggestions?

Thanks,

On Tue, May 22, 2012 at 9:00 PM, Edward Capriolo wrote:

> I am trying to decipher my way though the code as well. Apparently the
> work is half done to return ProfotBuf as a native type. As there is
> support for ObjectInstector.THRIFT and ObjectInspector.ProtoBuffer. I
> currently want to write a Serde that works like the thrift serde where
> protobuf objects can be given directly to hive. Come hang out in the
> IRC room and maybe we can chat more about this.
>
> On Tue, May 22, 2012 at 6:09 PM, kulkarni.swar...@gmail.com
>  wrote:
> > I am trying to use the ReflectionStructObjectInspector to extract fields
> > from a protobuf generated from 2.4.1 compiler. I am seeing that
> reflections
> > fails to extract fields out of the generated protobuf class.
> Specifically,
> > this code snippet:
> >
> > public static Field[] getDeclaredNonStaticFields(Class c) {
> >
> > Field[] f = c.getDeclaredFields();// This returns back the
> correct
> > number of fields
> >
> > ArrayList af = new ArrayList();
> >
> > for (int i = 0; i < f.length; ++i) {
> >
> >   // The logic here falls flat as it is looking only for the
> non-static
> > fields and all generated fields
> >
> >  // seem to be static
> >
> >   if (!Modifier.isStatic(f[i].getModifiers())) {
> >
> > af.add(f[i]);
> >
> >   }
> >
> > }
> >
> > Field[] r = new Field[af.size()];
> >
> > for (int i = 0; i < af.size(); ++i) {
> >
> >   r[i] = af.get(i);
> >
> > }
> >
> > return r;
> >
> >   }
> >
> >
> > This causes the whole ObjectInspector to fail. Has anyone else seen this
> > issue too?
> >
>

-- 
Swarnim

Hive tar ball snapshot build

2012-06-15 Thread kulkarni.swar...@gmail.com

I was looking into the snapshot builds for hive[1] and noticed that there
is no snapshot tar ball available. Is there a reason why we don't build
them? If not, should we be adding that to the build so that interested
people can simply pull this bleeding edge tar ball and start playing with
it rather than checking the code out and building it themselves?

[1]
https://repository.apache.org/content/repositories/snapshots/org/apache/hive/

Thanks,
-- 
Swarnim

Re: Providing a custom serialization.class SerDe property

2012-06-13 Thread kulkarni.swar...@gmail.com

Another quick follow up on this itself.

If I try to run a query that invokes a M/R job, then it throws me this
exception:

java.io.FileNotFoundException: File does not exist: /Users/my-classes.jar
 *at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:722)
*
at
org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208)
 at
org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:77)

Looking at the stacktrace, it seems like it is searching for the jar on
HDFS rather than local filesystem. Is that intended? All "Select *" queries
that do not spawn a M/R job work fine.

Thanks,

On Wed, Jun 13, 2012 at 9:44 AM, kulkarni.swar...@gmail.com <
kulkarni.swar...@gmail.com> wrote:

> Cool. That worked!
>
> Thanks guys.
>
>
> On Wed, Jun 13, 2012 at 9:35 AM, Edward Capriolo wrote:
>
>> Right, The end result is that same. The hive shell script currently
>> bulds a list of aux_lib jars and generates the auxpath arguments. but
>> it is good to know these files do not need to be in a static folder.
>>
>> On Wed, Jun 13, 2012 at 10:32 AM, Rubin, Bradley S.
>>  wrote:
>> > Another option is to specify the path in the hive command:
>> >
>> > hive --auxpath ~/my-classes.jar
>> >
>> > -- Brad
>> >
>> > On Jun 13, 2012, at 9:23 AM, Edward Capriolo wrote:
>> >
>> >> You need to put these jars in your aux_lib folder or in your hadoop
>> >> classpath. There is a subtle difference between that classpath and the
>> >> classpath used by UDF and anything that involves  a serde or input
>> >> format needs to be in auxlib.
>> >>
>> >> Edward
>> >>
>> >> On Wed, Jun 13, 2012 at 10:20 AM, kulkarni.swar...@gmail.com
>> >>  wrote:
>> >>> Hello,
>> >>>
>> >>> In order to provide a custom "serialization.class" to a SerDe, I
>> created a
>> >>> jar containing all my custom serialization classes and added them to
>> the
>> >>> hive classpath with "ADD JAR my-classes.jar" command. Now when I try
>> to use
>> >>> these custom classes via CLI, it still throws me a
>> "ClassNotFoundException"
>> >>> for those custom classes in my jar.
>> >>>
>> >>> Is there something that I am missing? I confirmed that 'list jars' is
>> >>> showing me the custom jar that I added.
>> >>>
>> >>> Any help would be appreciated.
>> >>>
>> >>> Thanks,
>> >>> --
>> >>> Swarnim
>> >
>>
>
>
>
> --
> Swarnim
>



-- 
Swarnim

Re: Providing a custom serialization.class SerDe property

2012-06-13 Thread kulkarni.swar...@gmail.com

Cool. That worked!

Thanks guys.

On Wed, Jun 13, 2012 at 9:35 AM, Edward Capriolo wrote:

> Right, The end result is that same. The hive shell script currently
> bulds a list of aux_lib jars and generates the auxpath arguments. but
> it is good to know these files do not need to be in a static folder.
>
> On Wed, Jun 13, 2012 at 10:32 AM, Rubin, Bradley S.
>  wrote:
> > Another option is to specify the path in the hive command:
> >
> > hive --auxpath ~/my-classes.jar
> >
> > -- Brad
> >
> > On Jun 13, 2012, at 9:23 AM, Edward Capriolo wrote:
> >
> >> You need to put these jars in your aux_lib folder or in your hadoop
> >> classpath. There is a subtle difference between that classpath and the
> >> classpath used by UDF and anything that involves  a serde or input
> >> format needs to be in auxlib.
> >>
> >> Edward
> >>
> >> On Wed, Jun 13, 2012 at 10:20 AM, kulkarni.swar...@gmail.com
> >>  wrote:
> >>> Hello,
> >>>
> >>> In order to provide a custom "serialization.class" to a SerDe, I
> created a
> >>> jar containing all my custom serialization classes and added them to
> the
> >>> hive classpath with "ADD JAR my-classes.jar" command. Now when I try
> to use
> >>> these custom classes via CLI, it still throws me a
> "ClassNotFoundException"
> >>> for those custom classes in my jar.
> >>>
> >>> Is there something that I am missing? I confirmed that 'list jars' is
> >>> showing me the custom jar that I added.
> >>>
> >>> Any help would be appreciated.
> >>>
> >>> Thanks,
> >>> --
> >>> Swarnim
> >
>



-- 
Swarnim

Providing a custom serialization.class SerDe property

2012-06-13 Thread kulkarni.swar...@gmail.com

Hello,

In order to provide a custom "serialization.class" to a SerDe, I created a
jar containing all my custom serialization classes and added them to the
hive classpath with "ADD JAR my-classes.jar" command. Now when I try to use
these custom classes via CLI, it still throws me a "ClassNotFoundException"
for those custom classes in my jar.

Is there something that I am missing? I confirmed that 'list jars' is
showing me the custom jar that I added.

Any help would be appreciated.

Thanks,
-- 
Swarnim

Hive and thrift 0.8.0

2012-06-07 Thread kulkarni.swar...@gmail.com

Is the latest hive release 0.9.0 compatible with thrift 0.8 or do we need
to recompile and rebuild the package ourselves to make it compatible?
Currently it seems to depend on libthrift-0.7.

Thanks for the help.

Swarnim

Re: Developing Hive UDF in eclipse

2012-06-05 Thread kulkarni.swar...@gmail.com

Did you try this[1]? It had got me most of my way through the process.

[1] https://cwiki.apache.org/Hive/gettingstarted-eclipsesetup.html

On Tue, Jun 5, 2012 at 8:49 AM, Arun Prakash wrote:

> Hi Friends,
> I tried to develop udf for hive but i am getting package import error
> in eclipse.
>
> import org.apache.hadoop.hive.ql.exec.UDF;
>
>
> How to import hive package in eclipse?
>
>
> Any inputs much appreciated.
>
>
>
> Best Regards
>  Arun Prakash C.K
>
> Keep On Sharing Your Knowledge with Others
>



-- 
Swarnim

Re: getStructFieldData method on StructObjectInspector

2012-06-05 Thread kulkarni.swar...@gmail.com

Thanks Edward for your reply on this.

Would you mind giving a very small example on how a struct corresponds to a
Map? I am having hard time understanding what the K/V pairs in the map
would look like.

Thanks again.

On Tue, May 29, 2012 at 10:16 AM, Edward Capriolo wrote:

> Returning custom writables will not work. In most cases the methods
> return Object because the types can be many things that do not fall
> under a single superclass other then object. like Integer,IntWritable,
> Array, or Map. In your case, a struct corresponds to a
> Map.
>
> On Tue, May 29, 2012 at 11:08 AM, kulkarni.swar...@gmail.com
>  wrote:
> > If someone can help understand this, I would really appreciate.
> >
> > On Fri, May 25, 2012 at 3:58 PM, kulkarni.swar...@gmail.com
> >  wrote:
> >>
> >> I am trying to write a custom ObjectInspector extending the
> >> StructObjectInspector and got a little confused about the use of the
> >> getStructFieldData method on the inspector. Looking at the definition
> of the
> >> method:
> >>
> >> public Object getStructFieldData(Object data, StructField fieldRef);
> >>
> >> I understand that the use of this method is to retrieve the specific
> given
> >> field from the buffer. However, what I don't understand is what is it
> >> expected to return. I looked around the tests and related code and
> mostly
> >> stuff returned was either a LazyPrimitive or a LazyNonPrimitive, but I
> >> couldn't find anything that enforces this(specially given that the
> return
> >> type is a plain "Object")! Does this mean that I am free to return even
> my
> >> custom object as a return type of this method? If so, what is the
> guarantee
> >> that it will be interpreted correctly down the pipeline?
> >>
> >> Thanks,
> >> --
> >> Swarnim
> >
> >
> >
> >
> > --
> > Swarnim
>



-- 
Swarnim

Re: getStructFieldData method on StructObjectInspector

2012-05-29 Thread kulkarni.swar...@gmail.com

If someone can help understand this, I would really appreciate.

On Fri, May 25, 2012 at 3:58 PM, kulkarni.swar...@gmail.com <
kulkarni.swar...@gmail.com> wrote:

> I am trying to write a custom ObjectInspector extending the
> StructObjectInspector and got a little confused about the use of the
> getStructFieldData method on the inspector. Looking at the definition of
> the method:
>
> public Object getStructFieldData(Object data, StructField fieldRef);
>
> I understand that the use of this method is to retrieve the specific given
> field from the buffer. However, what I don't understand is what is it
> expected to return. I looked around the tests and related code and mostly
> stuff returned was either a LazyPrimitive or a LazyNonPrimitive, but I
> couldn't find anything that enforces this(specially given that the return
> type is a plain "Object")! Does this mean that I am free to return even my
> custom object as a return type of this method? If so, what is the guarantee
> that it will be interpreted correctly down the pipeline?
>
> Thanks,
> --
> Swarnim
>



-- 
Swarnim

getStructFieldData method on StructObjectInspector

2012-05-25 Thread kulkarni.swar...@gmail.com

I am trying to write a custom ObjectInspector extending the
StructObjectInspector and got a little confused about the use of the
getStructFieldData method on the inspector. Looking at the definition of
the method:

public Object getStructFieldData(Object data, StructField fieldRef);

I understand that the use of this method is to retrieve the specific given
field from the buffer. However, what I don't understand is what is it
expected to return. I looked around the tests and related code and mostly
stuff returned was either a LazyPrimitive or a LazyNonPrimitive, but I
couldn't find anything that enforces this(specially given that the return
type is a plain "Object")! Does this mean that I am free to return even my
custom object as a return type of this method? If so, what is the guarantee
that it will be interpreted correctly down the pipeline?

Thanks,
-- 
Swarnim

protobuf 2.4.1 and ObjectInspector

2012-05-22 Thread kulkarni.swar...@gmail.com

I am trying to use the ReflectionStructObjectInspector to extract fields
from a protobuf generated from 2.4.1 compiler. I am seeing that reflections
fails to extract fields out of the generated protobuf class. Specifically,
this code snippet:

public static Field[] getDeclaredNonStaticFields(Class c) {

Field[] f = c.getDeclaredFields();// This returns back the correct
number of fields

ArrayList af = new ArrayList();

for (int i = 0; i < f.length; ++i) {

  *//* *The logic here falls flat as it is looking only for the
non-static fields and all generated fields *

* // seem to be static*

  if (!Modifier.isStatic(f[i].getModifiers())) {

af.add(f[i]);

  }

}

Field[] r = new Field[af.size()];

for (int i = 0; i < af.size(); ++i) {

  r[i] = af.get(i);

}

return r;

  }

This causes the whole ObjectInspector to fail. Has anyone else seen this
issue too?

Re: Multiple SerDe per table name

2012-05-18 Thread kulkarni.swar...@gmail.com

Considering a case where we have multi HBase columns in an HBase table,
each containing data of a different structure would warrant a need for
multiple SerDe to map them to a single Hive table. Correct?

On Thu, May 17, 2012 at 11:45 AM, Edward Capriolo wrote:

> This does not work. A Deserializer's role is to turn the value which
> came form the InputFormat into something hive can use as column data.
> In essence the Deserializer creates the columns so I do not see a
> logical way to have more then one.
>
> On Thu, May 17, 2012 at 11:53 AM, kulkarni.swar...@gmail.com
>  wrote:
> > I was thinking more from a perspective of specifying a SerDe per column
> > name.
> >
> > On Thu, May 17, 2012 at 10:38 AM, Mark Grover  wrote:
> >>
> >> Hi Swarnim,
> >> What's your use case?
> >> If you use multiple SerDe's, when you are writing to the table, how
> would
> >> you want Hive to decide which one to use?
> >>
> >> Mark
> >>
> >> Mark Grover, Business Intelligence Analyst
> >> OANDA Corporation
> >>
> >> www: oanda.com www: fxtrade.com
> >>
> >> - Original Message -
> >> From: "kulkarni swarnim" 
> >> To: user@hive.apache.org
> >> Sent: Thursday, May 17, 2012 11:29:26 AM
> >> Subject: Multiple SerDe per table name
> >>
> >> Does hive currently support multiple SerDe s to be defined per table
> name?
> >> Looking through the code and documentation, it seems like it doesn't as
> only
> >> one could be specified through the ROW FORMAT SERDE but just wanted to
> be
> >> sure.
> >>
> >>
> >> --
> >> Swarnim
> >
> >
> >
> >
> > --
> > Swarnim
>



-- 
Swarnim

Re: Multiple SerDe per table name

2012-05-17 Thread kulkarni.swar...@gmail.com

I was thinking more from a perspective of specifying a SerDe per column
name.

On Thu, May 17, 2012 at 10:38 AM, Mark Grover  wrote:

> Hi Swarnim,
> What's your use case?
> If you use multiple SerDe's, when you are writing to the table, how would
> you want Hive to decide which one to use?
>
> Mark
>
> Mark Grover, Business Intelligence Analyst
> OANDA Corporation
>
> www: oanda.com www: fxtrade.com
>
> - Original Message -
> From: "kulkarni swarnim" 
> To: user@hive.apache.org
> Sent: Thursday, May 17, 2012 11:29:26 AM
> Subject: Multiple SerDe per table name
>
> Does hive currently support multiple SerDe s to be defined per table name?
> Looking through the code and documentation, it seems like it doesn't as
> only one could be specified through the ROW FORMAT SERDE but just wanted to
> be sure.
>
>
> --
> Swarnim
>



-- 
Swarnim

Multiple SerDe per table name

2012-05-17 Thread kulkarni.swar...@gmail.com

Does hive currently support multiple SerDe s to be defined per table name?
Looking through the code and documentation, it seems like it doesn't as
only one could be specified through the ROW FORMAT SERDE but just wanted to
be sure.

-- 
Swarnim

Exception with datanucleus while running hive tests in eclipse

2012-05-16 Thread kulkarni.swar...@gmail.com

I installed datanucleus eclipse plugin as I realized that it is needed to
run some of the hive tests in eclipse. While trying to run the enhancer
tool, I keep getting this exception:

"Exception occurred executing command line. Cannot run program
"/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java"
...Argument list is too long"

Has anyone else encountered this too? My machine is running OS X 10.7.

Thanks,

Swarnim

Re: Hive join does not execute

2012-05-10 Thread kulkarni.swar...@gmail.com

It looks more like a permissions problem to me. Just make sure that
whatever directories hadoop is writing to are owned by hadoop itself.

Also it looks a little weird to me that it is using the
"RawLocalFileSystem" instead of the "DistributedFileSystem". You might want
to look at "fs.default.name" property in core-site.xml and see if it is
pointing to your HDFS location.

Hope that helps.

On Thu, May 10, 2012 at 11:29 AM, Mahsa Mofidpoor wrote:

> Hi,
>
> When I want to join two tables, I receive the following error:
>
> 12/05/10 12:03:31 WARN conf.HiveConf: hive-site.xml not found on CLASSPATH
> WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please
> use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties
> files.
> Execution log at:
> /tmp/umroot/umroot_20120510120303_4d0145bb-27fa-4d4a-8cbc-95d8353fccaf.log
> ENOENT: No such file or directory
> at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
>  at org.apache.hadoop.fs.FileUtil.execSetPermission(FileUtil.java:692)
> at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:647)
>  at
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
>  at
> org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
> at
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
>  at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
>  at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
>  at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
>  at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:435)
>  at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:693)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Job Submission failed with exception
> 'org.apache.hadoop.io.nativeio.NativeIOException(No such file or directory)'
> Execution failed with exit status: 2
> Obtaining error information
>
> Task failed!
> Task ID:
>   Stage-1
>
> Logs:
>
> /tmp/umroot/hive.log
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapRedTask
>
>
> I use hadoop-0.20.2 (single-node setup) and I have build Hive through  the
> latest source code.
>
>
> Thank you in advance  for your help,
>
> Mahsa
>



-- 
Swarnim

Re: Exception while running simple hive query

2012-05-07 Thread kulkarni.swar...@gmail.com

Thanks Shashwat.

That did work. However I do find this behavior very weird that it is able
to find all other libs at their proper location on local filesystem but
searches for this particular one on HDFS. I'll try to dig deeper into the
code to see if I can find a cause for this happening.

On Mon, May 7, 2012 at 2:12 PM, shashwat shriparv  wrote:

> Do one thing create the same structure   /Users/testuser/hive-0.9.0/
> lib/hive-builtins-0.9.0.jar on the hadoop file system and den try.. will
> work
>
> Shashwat Shriparv
>
>
> On Mon, May 7, 2012 at 11:57 PM, kulkarni.swar...@gmail.com <
> kulkarni.swar...@gmail.com> wrote:
>
>> Thanks for the reply.
>>
>> Assuming that you mean for permissions within the HIVE_HOME, they all
>> look ok to me. Is there anywhere else too you want me to check?
>>
>>
>> On Mon, May 7, 2012 at 11:16 AM, hadoop hive wrote:
>>
>>> check for the permission..
>>>
>>>
>>> On Mon, May 7, 2012 at 7:30 PM, kulkarni.swar...@gmail.com <
>>> kulkarni.swar...@gmail.com> wrote:
>>>
>>>> I created a very simple hive table and then ran the following query that
>>>> should run a M/R job to return the results.
>>>>
>>>> hive> SELECT COUNT(*) FROM invites;
>>>>
>>>> But I am getting the following exception:
>>>>
>>>> java.io.FileNotFoundException: File does not exist:
>>>> /Users/testuser/hive-0.9.0/lib/hive-builtins-0.9.0.jar
>>>>
>>>> at
>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:722)
>>>>
>>>> at
>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208)
>>>>
>>>> at
>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:71)
>>>>
>>>> at
>>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:246)
>>>>
>>>> at
>>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:284)
>>>>
>>>> at
>>>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:355)
>>>>
>>>> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1221)
>>>>
>>>> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
>>>>
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>
>>>> .
>>>>
>>>> When I go to the given location, the jar does exist. It seems
>>>> like somehow it is searching for the jar in HDFS instead of the local
>>>> file system. Any suggestions on what I could be possible missing? My
>>>> hadoop version is 0.23.
>>>>
>>>
>>>
>>
>>
>> --
>> Swarnim
>>
>
>
>
> --
>
>
> ∞
> Shashwat Shriparv
>
>
>


-- 
Swarnim

Re: Exception while running simple hive query

2012-05-07 Thread kulkarni.swar...@gmail.com

Thanks for the reply.

Assuming that you mean for permissions within the HIVE_HOME, they all look
ok to me. Is there anywhere else too you want me to check?

On Mon, May 7, 2012 at 11:16 AM, hadoop hive  wrote:

> check for the permission..
>
>
> On Mon, May 7, 2012 at 7:30 PM, kulkarni.swar...@gmail.com <
> kulkarni.swar...@gmail.com> wrote:
>
>> I created a very simple hive table and then ran the following query that
>> should run a M/R job to return the results.
>>
>> hive> SELECT COUNT(*) FROM invites;
>>
>> But I am getting the following exception:
>>
>> java.io.FileNotFoundException: File does not exist:
>> /Users/testuser/hive-0.9.0/lib/hive-builtins-0.9.0.jar
>>
>> at
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:722)
>>
>> at
>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208)
>>
>> at
>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:71)
>>
>> at
>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:246)
>>
>> at
>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:284)
>>
>> at
>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:355)
>>
>> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1221)
>>
>> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
>>
>> at java.security.AccessController.doPrivileged(Native Method)
>>
>> .
>>
>> When I go to the given location, the jar does exist. It seems
>> like somehow it is searching for the jar in HDFS instead of the local
>> file system. Any suggestions on what I could be possible missing? My
>> hadoop version is 0.23.
>>
>
>


-- 
Swarnim

97 matches

Mail list logo