FOREACH A GENERATE cust_id, cust_name, FLATTEN(cust_address), cust_email;
On Sun, Jun 1, 2014 at 5:54 PM, Rahul Channe drah...@googlemail.com wrote:
Hi All,
I have imported hive table into pig having a complex data type
(ARRAYString). The alias in pig looks as below
grunt describe A;
Disregard last email.
Sorry... didn't fully understand the question.
On Mon, Jun 2, 2014 at 8:44 AM, Pradeep Gollakota pradeep...@gmail.com
wrote:
FOREACH A GENERATE cust_id, cust_name, FLATTEN(cust_address), cust_email;
On Sun, Jun 1, 2014 at 5:54 PM, Rahul Channe
There was a similar question as this on StackOverflow a while back. The
suggestion was to write a custom BagToTuple UDF.
http://stackoverflow.com/questions/18544602/how-to-flatten-a-group-into-a-single-tuple-in-pig
On Mon, Jun 2, 2014 at 8:46 AM, Pradeep Gollakota pradeep...@gmail.com
wrote:
Thank You Pradeep, it worked to a certain extend but having following
difficulty in separating fields as $0,$1 for the customer_address.
Example -
grunt describe A;
A: {cust_id: int,cust_name: chararray,cust_address: {innertuple:
(innerfield: chararray)},cust_email: chararray}
grunt dump A;
If you're using the built-in BagToTuple UDF, then you probably don't need
the FLATTEN operator.
I suspect that your output looks as follows:
2200
benjamin avenue
philadelphia
...
Can you confirm that this is what you're seeing?
On Mon, Jun 2, 2014 at 9:52 AM, Rahul Channe
grunt B = foreach A generate BagToTuple(cust_address);
grunt describe B;
B: {org.apache.pig.builtin.bagtotuple_cust_address_24: (innerfield:
chararray)}
grunt dump B;
((2200,benjamin franklin,philadelphia))
((44,atlanta franklin,florida))
On Mon, Jun 2, 2014 at 12:59 PM, Pradeep Gollakota
I tried changing the hive column datatype from ARRAY to STRUCT for
cust_address, then i imported the table in pig.
Now I am able to separate the fields, as below
grunt Z = load 'cust_info' using org.apache.hcatalog.pig.HCatLoader();
grunt describe Z;
Z: {cust_id: int,cust_name:
Awesome... that's the way I would have done it as well.
On Mon, Jun 2, 2014 at 10:14 AM, Rahul Channe drah...@googlemail.com
wrote:
I tried changing the hive column datatype from ARRAY to STRUCT for
cust_address, then i imported the table in pig.
Now I am able to separate the fields, as
Hi,
I have a remote hadoop cluster version 2.2.0 and am running pig version
0.12.1 (on separate VM).
Seems like the hadoop client packaged in the pig version is not
compatible with the hadoop version.
So I need to build the pig with hadoop 2.2.0. Any pointers on what
variables do I need