Awesome... that's the way I would have done it as well.
On Mon, Jun 2, 2014 at 10:14 AM, Rahul Channe <drah...@googlemail.com> wrote: > I tried changing the hive column datatype from ARRAY to STRUCT for > cust_address, then i imported the table in pig. > > Now I am able to separate the fields, as below > > grunt> Z = load 'cust_info' using org.apache.hcatalog.pig.HCatLoader(); > grunt> describe Z; > Z: {cust_id: int,cust_name: chararray,cust_address: (house_no: int,street: > chararray,city: chararray)} > > > grunt> Y = foreach Z generate cust_address.house_no as > house_no,cust_address.street as street,UPPER(cust_address.city) as city; > grunt> describe Y; > Y: {house_no: int,street: chararray,city: chararray} > > grunt> dump Y; > (2200,benjamin franklin,PHILADELPHIA) > (44,atlanta franklin,FLORIDA) > > > On Mon, Jun 2, 2014 at 1:09 PM, Rahul Channe <drah...@googlemail.com> > wrote: > > > grunt> B = foreach A generate BagToTuple(cust_address); > > > > grunt> describe B; > > B: {org.apache.pig.builtin.bagtotuple_cust_address_24: (innerfield: > > chararray)} > > > > grunt> dump B; > > ((2200,benjamin franklin,philadelphia)) > > ((44,atlanta franklin,florida)) > > > > > > > > > > On Mon, Jun 2, 2014 at 12:59 PM, Pradeep Gollakota <pradeep...@gmail.com > > > > wrote: > > > >> If you're using the built-in BagToTuple UDF, then you probably don't > need > >> the FLATTEN operator. > >> > >> I suspect that your output looks as follows: > >> > >> 2200 > >> benjamin avenue > >> philadelphia > >> ... > >> > >> Can you confirm that this is what you're seeing? > >> > >> > >> On Mon, Jun 2, 2014 at 9:52 AM, Rahul Channe <drah...@googlemail.com> > >> wrote: > >> > >> > Thank You Pradeep, it worked to a certain extend but having following > >> > difficulty in separating fields as $0,$1 for the customer_address. > >> > > >> > > >> > Example - > >> > > >> > grunt> describe A; > >> > A: {cust_id: int,cust_name: chararray,cust_address: {innertuple: > >> > (innerfield: chararray)},cust_email: chararray} > >> > > >> > grunt> dump A; > >> > > >> > (123,phil abc,{(2200),(benjamin avenue),(philadelphia)}, > t...@gmail.com) > >> > (124,diego arty,{(44),(atlanta franklin),(florida)},o...@gmail.com) > >> > > >> > grunt> B = foreach A generate FLATTEN(BagToTuple(cust_address)); > >> > grunt> dump B; > >> > (2200,benjamin franklin,philadelphia) > >> > (44,atlanta franklin,florida) > >> > > >> > grunt> describe B; > >> > B: {org.apache.pig.builtin.bagtotuple_cust_address_34::innerfield: > >> > chararray} > >> > > >> > > >> > > >> > I am not able to seperate the fields in B as $0,$1 and $3 ,tried using > >> > STRSPLIT but didnt work. > >> > > >> > > >> > > >> > On Mon, Jun 2, 2014 at 11:50 AM, Pradeep Gollakota < > >> pradeep...@gmail.com> > >> > wrote: > >> > > >> > > There was a similar question as this on StackOverflow a while back. > >> The > >> > > suggestion was to write a custom BagToTuple UDF. > >> > > > >> > > > >> > > > >> > > >> > http://stackoverflow.com/questions/18544602/how-to-flatten-a-group-into-a-single-tuple-in-pig > >> > > > >> > > > >> > > On Mon, Jun 2, 2014 at 8:46 AM, Pradeep Gollakota < > >> pradeep...@gmail.com> > >> > > wrote: > >> > > > >> > > > Disregard last email. > >> > > > > >> > > > Sorry... didn't fully understand the question. > >> > > > > >> > > > > >> > > > On Mon, Jun 2, 2014 at 8:44 AM, Pradeep Gollakota < > >> > pradeep...@gmail.com> > >> > > > wrote: > >> > > > > >> > > >> FOREACH A GENERATE cust_id, cust_name, FLATTEN(cust_address), > >> > > cust_email; > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> On Sun, Jun 1, 2014 at 5:54 PM, Rahul Channe < > >> drah...@googlemail.com> > >> > > >> wrote: > >> > > >> > >> > > >>> Hi All, > >> > > >>> > >> > > >>> I have imported hive table into pig having a complex data type > >> > > >>> (ARRAY<String>). The alias in pig looks as below > >> > > >>> > >> > > >>> grunt> describe A; > >> > > >>> A: {cust_id: int,cust_name: chararray,cust_address: {innertuple: > >> > > >>> (innerfield: chararray)},cust_email: chararray} > >> > > >>> > >> > > >>> grunt> dump A; > >> > > >>> > >> > > >>> (123,phil abc,{(2200),(benjamin avenue),(philadelphia)}, > >> > t...@gmail.com > >> > > ) > >> > > >>> (124,diego arty,{(44),(atlanta franklin),(florida)}, > >> o...@gmail.com) > >> > > >>> > >> > > >>> The cust_address is the ARRAY field from hive. I want to FLATTEN > >> the > >> > > >>> cust_address into different fields. > >> > > >>> > >> > > >>> > >> > > >>> Expected output > >> > > >>> (2200,benjamin avenue,philadelphia) > >> > > >>> (44,atlanta franklin,florida) > >> > > >>> > >> > > >>> please help > >> > > >>> > >> > > >>> Regards, > >> > > >>> Rahul > >> > > >>> > >> > > >> > >> > > >> > >> > > > > >> > > > >> > > >> > > > > >