Hi,

 

Hope this gets to the right list...

 

I'm fairly new to Pig, been playing around with it for a couple of days.
Essentially I'm doing a bit of work to evaluate Pig and its ability to
simplify the use of Hadoop - basically to allow users without a massive
Java background to run Hadoop jobs. There's a couple of issues I've got
- which are probably very simple, and even more probably documented
somewhere, but I can't find it. 

 

First, I'm using Pig 0.5.0 and Hadoop 0.20.

 

1.      Dynamically assigning a variable - I can use

 

%declare my_count '0' as int;

 

But if I want to set this dynamically? 

 

e.g. 

 

A = GROUP srtdx ALL;

B = FOREACH A GENERATE COUNT(srtdx);

 

I want to set this value B (which is a long) to a variable - how do I do
it, or is it not possible? None of the following seem to work.

 

%declare my_count B as int 

%declare my_count 'B' as int;

%declare my_count `B` as int;

%declare my_count ` FOREACH A GENERATE COUNT(srtdx)` as int;

 

2.      Is it possible to alter the datatype of a element in a tuple? 

 

e.g.

 

A = LOAD 'my_file' as (c1:chararray, c2:int);

B = FOREACH A GENERATE c1*2;

 

throws an error.

 

3.      Picking a specific row from a bag - is there a SQL-like 'rownum'
operator? If I have a bag of 50 elements can I do something like...?

 

C = FILTER B BY (rownum <= 10);

 

Vaguely related...is there anyway of attaching a count to 'rows' in a
bag, to generate a unique identifier for each row, perhaps to generate a
map with a unique set  of keys for a list in a bag?

 

 

I understand that a lot of this can be done as a UDF, but I'm keen to
find any pure Pig solutions if possible.

 

 

 

Thanks,

 

Guy

 




This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately.
Statements of intent shall only become binding when confirmed in hard copy by 
an authorised signatory.  The contents of this email may relate to dealings 
with other companies within the Detica Limited group of companies.

Detica Limited is registered in England under No: 1337451.

Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.

Reply via email to