Re: Duplicated Hadoop Package Structure in Hive exec

2012-04-02 Thread Edward Capriolo
I 'fixed' it by downgrading my common-util to the same version as hive
so the signatures would not conflict.

Edward

On Mon, Apr 2, 2012 at 12:18 PM, buddhika chamith
 wrote:
> HI Edward,
>
> Thanks for the reply. Please find the created JIRA at [1]. However for
> currently for getting this working would moving these classes to a separate
> package structure work? I am willing to patch Hive sources to get this
> working if it comes to that. How did you get around the problems you faced?
>
> Regards
> Buddhika
>
> [1] https://issues.apache.org/jira/browse/HIVE-2919
>
> On Mon, Apr 2, 2012 at 8:25 PM, Edward Capriolo 
> wrote:
>>
>> Yes. This has bitten me before as well. Hive-exec jar builds in some
>> things from commons lang and commons util. They are needed by
>> hive-exec jar itself so getting them from distributed cache/auxjars
>> does not work well. I think the solution is we should probably
>> repackage commons util into hive-commons so their is not conflicts.
>>
>> Open up a jira for discussion.
>>
>> Edward
>>
>> On Mon, Apr 2, 2012 at 5:54 AM, buddhika chamith
>>  wrote:
>> > Hi All,
>> >
>> > I see there are some hadoop packages present inside hive-exec jar. (for
>> > example org.apache.hadoop.fs etc.). My guess is that these classes
>> > extend
>> > some hadoop interfaces/ functionality. However is it required for them
>> > to
>> > retain the same package structure? I am asking because I am trying to
>> > run
>> > Hive under an OSGi environment and duplicate packages coming from Hadoop
>> > and
>> > Hive are causing issues under OSGi.
>> >
>> > Regards
>> > Buddhika
>
>


Re: Duplicated Hadoop Package Structure in Hive exec

2012-04-02 Thread buddhika chamith
HI Edward,

Thanks for the reply. Please find the created JIRA at [1]. However for
currently for getting this working would moving these classes to a separate
package structure work? I am willing to patch Hive sources to get this
working if it comes to that. How did you get around the problems you faced?

Regards
Buddhika

[1] https://issues.apache.org/jira/browse/HIVE-2919

On Mon, Apr 2, 2012 at 8:25 PM, Edward Capriolo wrote:

> Yes. This has bitten me before as well. Hive-exec jar builds in some
> things from commons lang and commons util. They are needed by
> hive-exec jar itself so getting them from distributed cache/auxjars
> does not work well. I think the solution is we should probably
> repackage commons util into hive-commons so their is not conflicts.
>
> Open up a jira for discussion.
>
> Edward
>
> On Mon, Apr 2, 2012 at 5:54 AM, buddhika chamith
>  wrote:
> > Hi All,
> >
> > I see there are some hadoop packages present inside hive-exec jar. (for
> > example org.apache.hadoop.fs etc.). My guess is that these classes extend
> > some hadoop interfaces/ functionality. However is it required for them to
> > retain the same package structure? I am asking because I am trying to run
> > Hive under an OSGi environment and duplicate packages coming from Hadoop
> and
> > Hive are causing issues under OSGi.
> >
> > Regards
> > Buddhika
>


Table Statistics In Hive

2012-04-02 Thread Ladda, Anand
I've tried to collect statistics on an existing table in hive using the 
commands mentioned in this wiki page - 
https://cwiki.apache.org/confluence/display/Hive/StatsDev
ANALYZE TABLE [TABLENAME] PARTITION(parcol1=..., partcol2=) COMPUTE 
STATISTICS
But when I do a DESCRIBE EXTENDED [TABLENAME] after the stats collection has 
been completed, I see the number of rows is still 0. Is there anything I am 
missing here? I am using Hive 0.7.1
Thanks
Anand


Re: Duplicated Hadoop Package Structure in Hive exec

2012-04-02 Thread Edward Capriolo
Yes. This has bitten me before as well. Hive-exec jar builds in some
things from commons lang and commons util. They are needed by
hive-exec jar itself so getting them from distributed cache/auxjars
does not work well. I think the solution is we should probably
repackage commons util into hive-commons so their is not conflicts.

Open up a jira for discussion.

Edward

On Mon, Apr 2, 2012 at 5:54 AM, buddhika chamith
 wrote:
> Hi All,
>
> I see there are some hadoop packages present inside hive-exec jar. (for
> example org.apache.hadoop.fs etc.). My guess is that these classes extend
> some hadoop interfaces/ functionality. However is it required for them to
> retain the same package structure? I am asking because I am trying to run
> Hive under an OSGi environment and duplicate packages coming from Hadoop and
> Hive are causing issues under OSGi.
>
> Regards
> Buddhika


Re: Hive Meta Information

2012-04-02 Thread Thiruvel Thirumoolan
> 1.   recent users of table,
> 2.   top users of table,

Hive metastore has audit support (HIVE-1948). While what Edward suggests
will be accurate, audit log might give you a superset of that, it will
also include "desc table" requests.

Thiruvel



On 3/31/12 8:39 PM, "Edward Capriolo"  wrote:

>hive does not capture this information.
>
>I have a tool on github that connects to the JobTracker and pulls stat
>information.
>
>https://github.com/edwardcapriolo/hadoop_cluster_profiler
>
>It would be pretty easy to add code to record some of information you
>are looking for.
>
>Hive is close to #4 with a statistics DB used for indexing/query planning.
>
>Edward
>
>On Sat, Mar 31, 2012 at 2:50 AM, Nitin Pawar 
>wrote:
>> Anand,
>>
>> I doubt this information is readily available in hive as this is not
>>meta
>> information rather access information.
>>
>> For number of records in a table you can just run a query like select
>> count(1) from table;
>>
>>
>> For the access details on table data, you will need to process hadoop
>>logs
>> and based on that you can figure out who accessed the data and how
>>
>> Thanks,
>> Nitin
>>
>> On Sat, Mar 31, 2012 at 3:36 AM, Ladda, Anand 
>> wrote:
>>>
>>> How do I get the following meta information about a table
>>>
>>> 1.   recent users of table,
>>>
>>> 2.   top users of table,
>>>
>>> 3.   recent queries/jobs/reports,
>>>
>>> 4.   number of rows in a table
>>>
>>>
>>>
>>> I don¹t see anything either in DESCRIBE FORMATTED or SHOW TABLE
>>>EXTENDED
>>> LIKE commands.
>>>
>>> Thanks
>>>
>>> Anand
>>
>>
>>
>>
>> --
>> Nitin Pawar
>>



Re: Question - Nested JSON using Hive

2012-04-02 Thread Ashwanth Kumar
>From what I see, You are using complex data types within Array type. It is
not possible. ARRAY is like a LIST which supports only one type.

For your case try something with STRUCT may be that suits your use-case
better.

Read this link -
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

Based on that *updated schema* might be something like
CREATE TABLE USERS (
id STRING,
name STRING,
first_name STRING,
last_name STRING,
link STRING,
username STRING,
birthday STRING,
hometown MAP,
`location` MAP,
bio STRING,
quotes STRING,
work STRUCT,location:
MAP,position: MAP,start_date :
STRING,end_date : STRING>,
education STRUCT,type: STRING>,
gender STRING,
interested_in ARRAY,
relationship_status STRING,
religion STRING,
political STRING,
email STRING,
website STRING,
timezone INT,
locale STRING,
language STRUCT>,
verified STRING,
updated_time STRING
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS TEXTFILE;

PS - I have not tested this code, but it must be very similar to this.

On Mon, Apr 2, 2012 at 2:05 PM, Anurag Gulati wrote:

>  Great that fixed that problem … now I’m getting the error “FAILED: Parse
> Error: line 13:29 mismatched input ',' expecting > near '>' in list type”*
> ***
>
> ** **
>
> This is the updated Schema Code:
>
> ** **
>
> ADD JAR JARs/json-serde-1.1-jar-with-dependencies.jar;
>
> ADD JAR JARs/json-path-0.5.4.jar;
>
> ADD JAR JARs/json-smart-1.0.6.3.jar;
>
> CREATE TABLE USERS (
>
> id STRING,
>
> name STRING,
>
> first_name STRING,
>
> last_name STRING,
>
> link STRING,
>
> username STRING,
>
> birthday STRING,
>
> hometown MAP,
>
> `location` MAP,
>
> bio STRING,
>
> quotes STRING,
>
> work
> ARRAY,MAP,MAP,start_date
> STRING,end_date STRING>,
>
> education ARRAY,STRING>,
>
> gender STRING,
>
> interested_in ARRAY,
>
> relationship_status STRING,
>
> religion STRING,
>
> political STRING,
>
> email STRING,
>
> website STRING,
>
> timezone INT,
>
> locale STRING,
>
> language ARRAY,
>
> verified STRING,
>
> updated_time STRING
>
> )
>
> ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS TEXTFILE;*
> ***
>
> ** **
>
> Test Data:
>
> {"id":"10011666","name":"Test
> user","first_name":"Test","last_name":"user","link":"http:\/\/
> www.facebook.com\/test.user","username":"test.user","birthday":"09\/19\/1983","hometown":{"id":"103102203064024","name":"West
> Chester, Pennsylvania"},"location":{"id":"","name":null},"bio":"This is my
> Bio. I’m a geek that love to hack (in a good way)","quotes":"I like quotes.
> But I’m shortening this section cuz it was
> wild!","work":[{"employer":{"id":"6185812851","name":"American
> Eagle"},"location":{"id":"105540216147364","name":"Phoenix,
> Arizona"},"position":{"id":"133619273341785","name":"Counter
> Guy"},"start_date":"2012-01"},{"employer":{"id":"190876464341724","name":"Cardiac
> group"},"position":{"id":"105630109469647","name":"Executive
> Producer"},"description":"We create music for Artist Placement and
> TV\/Film.","start_date":"2002-01"},{"employer":{"id":"6185812851","name":"American
> Eagle"},"location":{"id":"105540216147364","name":"Phoenix,
> Arizona"},"position":{"id":"116439401740213","name":"Floor
> Guy"},"start_date":"2007-10","end_date":"2012-01"},{"employer":{"id":"110067355684846","name":"Saint
> Joseph Hospital"},"location":{"id":"105540216147364","name":"Phoenix,
> Arizona"},"position":{"id":"202489236428627","name":"Pharmacy IT
> Coordinator"},"start_date":"2005-10","end_date":"2007-10"},{"employer":{"id":"110067355684846","name":"Saint
> Joseph Hospital"},"location":{"id":"105540216147364","name":"Phoenix,
> Arizona"},"position":{"id":"144703015548786","name":"Pharmacy
> Tech"},"start_date":"2001-02","end_date":"2005-10"}],"sports":[{"id":"108606435830479","name":"Karate"}],"favorite_teams":[{"id":"87169796810","name":"Philadelphia
> Flyers"},{"id":"93625750491","name":"Philadelphia
> Phillies"},{"id":"45898408995","name":"Phoenix
> Suns"},{"id":"120163518021430","name":"Philadelphia
> Eagles"}],"favorite_athletes":[{"id":"77922840249","name":"Steve
> Nash"},{"id":"105590659475179","name":"Wayne
> Gretzky"},{"id":"62975399193","name":"Michael
> Jordan"}],"inspirational_people":[{"id":"106676942701904","name":"Gandhi"}],"education":[{"school":{"id":"109324275761313","name":"Corona
> del Sol High School"},"type":"High
> School"},{"school":{"id":"23680344606","name":"Arizona State
> University"},"type":"College"}],"gender":"male","interested_in":["female"],"relationship_status":"Single","religion":"Hinduism
> (One with all things)","political":"Liberal (Left of
> Center)","email":"app+22c90gj.9hh9d.f7304b58ac646e08b5f0f10a73547e34\
> u0040proxymail.facebook.com","website":"www.slashdot.org\r\
> nwww.gizmodo.com
> ","timezone":-7,"locale":"en_US","languages":[{"id":"106059522759137","name":"English"},{"id":"112969428713061","name":"Hindi"}],"verified":true,"

RE: Question - Nested JSON using Hive

2012-04-02 Thread Anurag Gulati
Great that fixed that problem ... now I'm getting the error "FAILED: Parse 
Error: line 13:29 mismatched input ',' expecting > near '>' in list type"

This is the updated Schema Code:

ADD JAR JARs/json-serde-1.1-jar-with-dependencies.jar;
ADD JAR JARs/json-path-0.5.4.jar;
ADD JAR JARs/json-smart-1.0.6.3.jar;
CREATE TABLE USERS (
id STRING,
name STRING,
first_name STRING,
last_name STRING,
link STRING,
username STRING,
birthday STRING,
hometown MAP,
`location` MAP,
bio STRING,
quotes STRING,
work ARRAY,MAP,MAP,start_date 
STRING,end_date STRING>,
education ARRAY,STRING>,
gender STRING,
interested_in ARRAY,
relationship_status STRING,
religion STRING,
political STRING,
email STRING,
website STRING,
timezone INT,
locale STRING,
language ARRAY,
verified STRING,
updated_time STRING
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS TEXTFILE;

Test Data:
{"id":"10011666","name":"Test 
user","first_name":"Test","last_name":"user","link":"http:\/\/www.facebook.com\/test.user","username":"test.user","birthday":"09\/19\/1983","hometown":{"id":"103102203064024","name":"West
 Chester, Pennsylvania"},"location":{"id":"","name":null},"bio":"This is my 
Bio. I'm a geek that love to hack (in a good way)","quotes":"I like quotes. But 
I'm shortening this section cuz it was 
wild!","work":[{"employer":{"id":"6185812851","name":"American 
Eagle"},"location":{"id":"105540216147364","name":"Phoenix, 
Arizona"},"position":{"id":"133619273341785","name":"Counter 
Guy"},"start_date":"2012-01"},{"employer":{"id":"190876464341724","name":"Cardiac
 group"},"position":{"id":"105630109469647","name":"Executive 
Producer"},"description":"We create music for Artist Placement and 
TV\/Film.","start_date":"2002-01"},{"employer":{"id":"6185812851","name":"American
 Eagle"},"location":{"id":"105540216147364","name":"Phoenix, 
Arizona"},"position":{"id":"116439401740213","name":"Floor 
Guy"},"start_date":"2007-10","end_date":"2012-01"},{"employer":{"id":"110067355684846","name":"Saint
 Joseph Hospital"},"location":{"id":"105540216147364","name":"Phoenix, 
Arizona"},"position":{"id":"202489236428627","name":"Pharmacy IT 
Coordinator"},"start_date":"2005-10","end_date":"2007-10"},{"employer":{"id":"110067355684846","name":"Saint
 Joseph Hospital"},"location":{"id":"105540216147364","name":"Phoenix, 
Arizona"},"position":{"id":"144703015548786","name":"Pharmacy 
Tech"},"start_date":"2001-02","end_date":"2005-10"}],"sports":[{"id":"108606435830479","name":"Karate"}],"favorite_teams":[{"id":"87169796810","name":"Philadelphia
 Flyers"},{"id":"93625750491","name":"Philadelphia 
Phillies"},{"id":"45898408995","name":"Phoenix 
Suns"},{"id":"120163518021430","name":"Philadelphia 
Eagles"}],"favorite_athletes":[{"id":"77922840249","name":"Steve 
Nash"},{"id":"105590659475179","name":"Wayne 
Gretzky"},{"id":"62975399193","name":"Michael 
Jordan"}],"inspirational_people":[{"id":"106676942701904","name":"Gandhi"}],"education":[{"school":{"id":"109324275761313","name":"Corona
 del Sol High School"},"type":"High 
School"},{"school":{"id":"23680344606","name":"Arizona State 
University"},"type":"College"}],"gender":"male","interested_in":["female"],"relationship_status":"Single","religion":"Hinduism
 (One with all things)","political":"Liberal (Left of 
Center)","email":"app+22c90gj.9hh9d.f7304b58ac646e08b5f0f10a73547e34\u0040proxymail.facebook.com","website":"www.slashdot.org\r\nwww.gizmodo.com","timezone":-7,"locale":"en_US","languages":[{"id":"106059522759137","name":"English"},{"id":"112969428713061","name":"Hindi"}],"verified":true,"updated_time":"2012-03-22T17:24:25+"}



Regards,
-Anurag G.

From: ashwanth.ku...@gmail.com [mailto:ashwanth.ku...@gmail.com] On Behalf Of 
Ashwanth Kumar
Sent: Sunday, April 01, 2012 8:53 PM
To: user@hive.apache.org
Subject: Re: Question - Nested JSON using Hive

You get that error because "location" is a keyword in Hive. Try to encapsulate 
it in ` char and try.
On Mon, Apr 2, 2012 at 7:07 AM, Anurag Gulati 
mailto:anurag.gul...@aexp.com>> wrote:
I've been trying to figure this out for a couple days now and I haven't gotten 
very far.
Looking for your guidance on the matter.

As a test, I'm trying to import Facebook Open Graph API data into Hive but am 
having a problem with the syntax.

Here is a line of sample data I'm trying to import (my own personal data):

{"id":"10011666","name":"Test 
user","first_name":"Test","last_name":"user","link":"http:\/\/www.facebook.com\/test.user","username":"test.user","birthday":"09\/19\/1983","hometown":{"id":"103102203064024","name":"West
 Chester, Pennsylvania"},"location":{"id":"","name":null},"bio":"This is my 
Bio. I'm a geek that love to hack (in a good way)","quotes":"I like quotes. But 
I'm shortening this section cuz it was 
wild!","work":[{"employer":{"id":"6185812851","name":"American 
Eagle"},"location":{"id":"105540216

Re: How to check for year condition in hive

2012-04-02 Thread Bhavesh Shah
Thanks Nitin for your reply.


On Mon, Apr 2, 2012 at 1:19 PM, Nitin Pawar  wrote:

> that should work provided your columns are in '-mm-dd' format and you
> are ok to leave out leap years :)
>
> also its a good idea to have all the mathematical operations in a
> bracketed manner so u can have it like where round(datediff(date,dob)/365)
> >=18
>
>
>
> On Mon, Apr 2, 2012 at 1:04 PM, Bhavesh Shah wrote:
>
>> Hello all,
>> I am trying to check age in hive.
>> select * from tbl_name where datediff(date,dob)/365 >= 18;
>>
>> Is it right to check date condition in hive?
>> Or do I need to do something else.
>> Pls suggest me as soon as possible
>>
>>
>> --
>> Thanks and Regards,
>> Bhavesh Shah
>>
>>
>
>
> --
> Nitin Pawar
>
>


-- 
Regards,
Bhavesh Shah


Re: How to check for year condition in hive

2012-04-02 Thread Nitin Pawar
that should work provided your columns are in '-mm-dd' format and you
are ok to leave out leap years :)

also its a good idea to have all the mathematical operations in a bracketed
manner so u can have it like where round(datediff(date,dob)/365) >=18



On Mon, Apr 2, 2012 at 1:04 PM, Bhavesh Shah wrote:

> Hello all,
> I am trying to check age in hive.
> select * from tbl_name where datediff(date,dob)/365 >= 18;
>
> Is it right to check date condition in hive?
> Or do I need to do something else.
> Pls suggest me as soon as possible
>
>
> --
> Thanks and Regards,
> Bhavesh Shah
>
>


-- 
Nitin Pawar


How to check for year condition in hive

2012-04-02 Thread Bhavesh Shah
Hello all,
I am trying to check age in hive.
select * from tbl_name where datediff(date,dob)/365 >= 18;

Is it right to check date condition in hive?
Or do I need to do something else.
Pls suggest me as soon as possible


-- 
Thanks and Regards,
Bhavesh Shah