Cory,
FOREACH is generally the way you transform tuples to generate new ones.
Here's an example script that may make things clearer (writing it free-hand,
there may be syntax errors :)
raw_data = LOAD '/user/dmitriy/petshop/*.txt' USING PigStorage(',') as (id,
species, is_brown, can_swim);
by_species = GROUP raw_data BY species;
summary = FOREACH by_species GENERATE
group AS species, COUNT(raw_data) AS num_animals,
SUM(is_brown) AS brown_ones,
SUM(can_swim) AS swimming_ones,
SUM ( (is_brown AND can_swim) ? 1 : 0) AS brown_swimming_pets;
store summary into '/user/dmitriy/petshop_summary';
On Sun, Feb 28, 2010 at 11:29 PM, Cory Radcliff <[email protected]>wrote:
> I'd be happy to put these together into a NOOB faq =).
>
> Please feel free to forward me to the docs where I might have missed this.
>
> How do I generate a simple Tuple? I have a value, say a sum, and I want to
> just generate a tuple that's ('TOTAL CATS', 2L). Basically, after all is
> said and done, I want my output file to look like this:
>
> <DATE>, <COUNT of one interesting value>, <COUNT of another interesting
> value>,<COUNT of a third interesting value>
>
> I've figured out how to get the interesting values to a single TUPLE, but I
> want to get it to a point where I can create a tuple and then STORE it.
>
> I'm a fairly reasonably trained SQL developer. I think a lot of people
> coming at this will be SQL conversant. It might be helpful (again, I'd help
> once I know what I'm doing) to have examples that deal with CSV crunching
> for SQL minded folk, no?
>
> Something to the effect of:
>
> Here's your CSV, here's how you break it into Tuples, here's a bunch of
> examples as if this was a table and you were trying to run reports.
>
> This way, I think it would help map to familiar territory faster.
>
> Anyways, thanks for listening to the rambling. I'm really digging this
> stuff!
>
> Cory
>