Awesome... that's the way I would have done it as well.

On Mon, Jun 2, 2014 at 10:14 AM, Rahul Channe <drah...@googlemail.com>
wrote:

> I tried changing the hive column datatype from ARRAY to STRUCT for
> cust_address, then i imported the table in pig.
>
> Now I am able to separate the fields, as below
>
> grunt> Z = load 'cust_info' using org.apache.hcatalog.pig.HCatLoader();
> grunt> describe Z;
> Z: {cust_id: int,cust_name: chararray,cust_address: (house_no: int,street:
> chararray,city: chararray)}
>
>
> grunt> Y = foreach Z generate cust_address.house_no as
> house_no,cust_address.street as street,UPPER(cust_address.city) as city;
> grunt> describe Y;
> Y: {house_no: int,street: chararray,city: chararray}
>
> grunt> dump Y;
> (2200,benjamin franklin,PHILADELPHIA)
> (44,atlanta franklin,FLORIDA)
>
>
> On Mon, Jun 2, 2014 at 1:09 PM, Rahul Channe <drah...@googlemail.com>
> wrote:
>
> > grunt> B = foreach A generate BagToTuple(cust_address);
> >
> > grunt> describe B;
> > B: {org.apache.pig.builtin.bagtotuple_cust_address_24: (innerfield:
> > chararray)}
> >
> > grunt> dump B;
> > ((2200,benjamin franklin,philadelphia))
> > ((44,atlanta franklin,florida))
> >
> >
> >
> >
> > On Mon, Jun 2, 2014 at 12:59 PM, Pradeep Gollakota <pradeep...@gmail.com
> >
> > wrote:
> >
> >> If you're using the built-in BagToTuple UDF, then you probably don't
> need
> >> the FLATTEN operator.
> >>
> >> I suspect that your output looks as follows:
> >>
> >> 2200
> >> benjamin avenue
> >> philadelphia
> >> ...
> >>
> >> Can you confirm that this is what you're seeing?
> >>
> >>
> >> On Mon, Jun 2, 2014 at 9:52 AM, Rahul Channe <drah...@googlemail.com>
> >> wrote:
> >>
> >> > Thank You Pradeep, it worked to a certain extend but having following
> >> > difficulty in separating fields as $0,$1 for the customer_address.
> >> >
> >> >
> >> > Example -
> >> >
> >> > grunt> describe A;
> >> > A: {cust_id: int,cust_name: chararray,cust_address: {innertuple:
> >> > (innerfield: chararray)},cust_email: chararray}
> >> >
> >> > grunt> dump A;
> >> >
> >> > (123,phil abc,{(2200),(benjamin avenue),(philadelphia)},
> t...@gmail.com)
> >> > (124,diego arty,{(44),(atlanta franklin),(florida)},o...@gmail.com)
> >> >
> >> > grunt> B = foreach A generate FLATTEN(BagToTuple(cust_address));
> >> > grunt> dump B;
> >> > (2200,benjamin franklin,philadelphia)
> >> > (44,atlanta franklin,florida)
> >> >
> >> > grunt> describe B;
> >> > B: {org.apache.pig.builtin.bagtotuple_cust_address_34::innerfield:
> >> > chararray}
> >> >
> >> >
> >> >
> >> > I am not able to seperate the fields in B as $0,$1 and $3 ,tried using
> >> > STRSPLIT but didnt work.
> >> >
> >> >
> >> >
> >> > On Mon, Jun 2, 2014 at 11:50 AM, Pradeep Gollakota <
> >> pradeep...@gmail.com>
> >> > wrote:
> >> >
> >> > > There was a similar question as this on StackOverflow a while back.
> >> The
> >> > > suggestion was to write a custom BagToTuple UDF.
> >> > >
> >> > >
> >> > >
> >> >
> >>
> http://stackoverflow.com/questions/18544602/how-to-flatten-a-group-into-a-single-tuple-in-pig
> >> > >
> >> > >
> >> > > On Mon, Jun 2, 2014 at 8:46 AM, Pradeep Gollakota <
> >> pradeep...@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > Disregard last email.
> >> > > >
> >> > > > Sorry... didn't fully understand the question.
> >> > > >
> >> > > >
> >> > > > On Mon, Jun 2, 2014 at 8:44 AM, Pradeep Gollakota <
> >> > pradeep...@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > >> FOREACH A GENERATE cust_id, cust_name, FLATTEN(cust_address),
> >> > > cust_email;
> >> > > >>
> >> > > >> ​
> >> > > >>
> >> > > >>
> >> > > >> On Sun, Jun 1, 2014 at 5:54 PM, Rahul Channe <
> >> drah...@googlemail.com>
> >> > > >> wrote:
> >> > > >>
> >> > > >>> Hi All,
> >> > > >>>
> >> > > >>> I have imported hive table into pig having a complex data type
> >> > > >>> (ARRAY<String>). The alias in pig looks as below
> >> > > >>>
> >> > > >>> grunt> describe A;
> >> > > >>> A: {cust_id: int,cust_name: chararray,cust_address: {innertuple:
> >> > > >>> (innerfield: chararray)},cust_email: chararray}
> >> > > >>>
> >> > > >>> grunt> dump A;
> >> > > >>>
> >> > > >>> (123,phil abc,{(2200),(benjamin avenue),(philadelphia)},
> >> > t...@gmail.com
> >> > > )
> >> > > >>> (124,diego arty,{(44),(atlanta franklin),(florida)},
> >> o...@gmail.com)
> >> > > >>>
> >> > > >>> The cust_address is the ARRAY field from hive. I want to FLATTEN
> >> the
> >> > > >>> cust_address into different fields.
> >> > > >>>
> >> > > >>>
> >> > > >>> Expected output
> >> > > >>> (2200,benjamin avenue,philadelphia)
> >> > > >>> (44,atlanta franklin,florida)
> >> > > >>>
> >> > > >>> please help
> >> > > >>>
> >> > > >>> Regards,
> >> > > >>> Rahul
> >> > > >>>
> >> > > >>
> >> > > >>
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Reply via email to