Answers inlined:

On Feb 2, 2010, at 3:15 AM, Guy Jeffery wrote:

Hi,



Hope this gets to the right list...



I'm fairly new to Pig, been playing around with it for a couple of days.
Essentially I'm doing a bit of work to evaluate Pig and its ability to
simplify the use of Hadoop - basically to allow users without a massive Java background to run Hadoop jobs. There's a couple of issues I've got
- which are probably very simple, and even more probably documented
somewhere, but I can't find it.



First, I'm using Pig 0.5.0 and Hadoop 0.20.



1.      Dynamically assigning a variable - I can use



%declare my_count '0' as int;



But if I want to set this dynamically?



e.g.



A = GROUP srtdx ALL;

B = FOREACH A GENERATE COUNT(srtdx);



I want to set this value B (which is a long) to a variable - how do I do
it, or is it not possible? None of the following seem to work.



%declare my_count B as int

%declare my_count 'B' as int;

%declare my_count `B` as int;

%declare my_count ` FOREACH A GENERATE COUNT(srtdx)` as int;

Pig Latin is a dataflow language, not a traditional procedural programming language. It does not support variable declaration. %declare does not declare a variable; it is part of the pre-processor (somewhat analogous to #define in C). The variables on the left side of a Pig Latin script are relations (that is, collections of records), not scalar values.




2.      Is it possible to alter the datatype of a element in a tuple?



e.g.



A = LOAD 'my_file' as (c1:chararray, c2:int);

B = FOREACH A GENERATE c1*2;



throws an error.

Cast the value, so

A = load 'my_file' as (c1: chararray, c2:int);
B = foreach a generate (int)c1 * 2;

should work. (We only added casts from chararray to int recently, so this particular cast may not be in the release you're using.)




3.      Picking a specific row from a bag - is there a SQL-like 'rownum'
operator? If I have a bag of 50 elements can I do something like...?



C = FILTER B BY (rownum <= 10);

No. Since Pig executes in parallel with task partitioning done at runtime it is not generally possible to give rownums. You can build a UDF that generates unique row ids, but they will not be ordered across different maps.

Alan.

Reply via email to