For at least simple cases what's in the pseduo code should work. I hope someday soon we can start using the new logical optimizer work (in the experimental package) to build rules for the MR optimizer (like this combiner stuff) as well, which should be much easier to code. But it will be a while before we get there.

I don't think this will automatically make it work for split, because I think it will see the split in the plan and that will make it choose not to optimize.

Alan.

On Jun 2, 2010, at 4:18 PM, Dmitriy Ryaboy wrote:

It looks like right now, the combiner optimization does not kick in for a
script like this:

data = load 'foo' using PigStorage() as (a, b, c);
grouped = group data by a;
filtered = filter grouped by COUNT(data) < 1000;

Looking at the code in CombinerOptimizer, seems like the Filter bit is just pseudo-coded in comments. Are there complications there other than what is
already noted, or is it just the matter of coding up the pseudo-code?

On that note -- assuming the optimization was implemented for Filter
following group, would it automagically start working for Splits, as well?

-D

Reply via email to