For at least simple cases what's in the pseduo code should work. I
hope someday soon we can start using the new logical optimizer work
(in the experimental package) to build rules for the MR optimizer
(like this combiner stuff) as well, which should be much easier to
code. But it will be a while before we get there.
I don't think this will automatically make it work for split, because
I think it will see the split in the plan and that will make it choose
not to optimize.
Alan.
On Jun 2, 2010, at 4:18 PM, Dmitriy Ryaboy wrote:
It looks like right now, the combiner optimization does not kick in
for a
script like this:
data = load 'foo' using PigStorage() as (a, b, c);
grouped = group data by a;
filtered = filter grouped by COUNT(data) < 1000;
Looking at the code in CombinerOptimizer, seems like the Filter bit
is just
pseudo-coded in comments. Are there complications there other than
what is
already noted, or is it just the matter of coding up the pseudo-code?
On that note -- assuming the optimization was implemented for Filter
following group, would it automagically start working for Splits, as
well?
-D