Re: How to FLATTEN hive column in Pig with ARRAY data type

2014-06-02 Thread Pradeep Gollakota
FOREACH A GENERATE cust_id, cust_name, FLATTEN(cust_address), cust_email; ​ On Sun, Jun 1, 2014 at 5:54 PM, Rahul Channe drah...@googlemail.com wrote: Hi All, I have imported hive table into pig having a complex data type (ARRAYString). The alias in pig looks as below grunt describe A;

Re: How to FLATTEN hive column in Pig with ARRAY data type

2014-06-02 Thread Pradeep Gollakota
Disregard last email. Sorry... didn't fully understand the question. On Mon, Jun 2, 2014 at 8:44 AM, Pradeep Gollakota pradeep...@gmail.com wrote: FOREACH A GENERATE cust_id, cust_name, FLATTEN(cust_address), cust_email; ​ On Sun, Jun 1, 2014 at 5:54 PM, Rahul Channe

Re: How to FLATTEN hive column in Pig with ARRAY data type

2014-06-02 Thread Pradeep Gollakota
There was a similar question as this on StackOverflow a while back. The suggestion was to write a custom BagToTuple UDF. http://stackoverflow.com/questions/18544602/how-to-flatten-a-group-into-a-single-tuple-in-pig On Mon, Jun 2, 2014 at 8:46 AM, Pradeep Gollakota pradeep...@gmail.com wrote:

Re: How to FLATTEN hive column in Pig with ARRAY data type

2014-06-02 Thread Rahul Channe
Thank You Pradeep, it worked to a certain extend but having following difficulty in separating fields as $0,$1 for the customer_address. Example - grunt describe A; A: {cust_id: int,cust_name: chararray,cust_address: {innertuple: (innerfield: chararray)},cust_email: chararray} grunt dump A;

Re: How to FLATTEN hive column in Pig with ARRAY data type

2014-06-02 Thread Pradeep Gollakota
If you're using the built-in BagToTuple UDF, then you probably don't need the FLATTEN operator. I suspect that your output looks as follows: 2200 benjamin avenue philadelphia ... Can you confirm that this is what you're seeing? On Mon, Jun 2, 2014 at 9:52 AM, Rahul Channe

Re: How to FLATTEN hive column in Pig with ARRAY data type

2014-06-02 Thread Rahul Channe
grunt B = foreach A generate BagToTuple(cust_address); grunt describe B; B: {org.apache.pig.builtin.bagtotuple_cust_address_24: (innerfield: chararray)} grunt dump B; ((2200,benjamin franklin,philadelphia)) ((44,atlanta franklin,florida)) On Mon, Jun 2, 2014 at 12:59 PM, Pradeep Gollakota

Re: How to FLATTEN hive column in Pig with ARRAY data type

2014-06-02 Thread Rahul Channe
I tried changing the hive column datatype from ARRAY to STRUCT for cust_address, then i imported the table in pig. Now I am able to separate the fields, as below grunt Z = load 'cust_info' using org.apache.hcatalog.pig.HCatLoader(); grunt describe Z; Z: {cust_id: int,cust_name:

Re: How to FLATTEN hive column in Pig with ARRAY data type

2014-06-02 Thread Pradeep Gollakota
Awesome... that's the way I would have done it as well. On Mon, Jun 2, 2014 at 10:14 AM, Rahul Channe drah...@googlemail.com wrote: I tried changing the hive column datatype from ARRAY to STRUCT for cust_address, then i imported the table in pig. Now I am able to separate the fields, as

Building Pig 0.12.1 with Hadoop 2.2.0

2014-06-02 Thread Sandeep Jangra
Hi, I have a remote hadoop cluster version 2.2.0 and am running pig version 0.12.1 (on separate VM). Seems like the hadoop client packaged in the pig version is not compatible with the hadoop version. So I need to build the pig with hadoop 2.2.0. Any pointers on what variables do I need