Shared scan optimization

Todd Lipcon Tue, 13 Oct 2009 14:24:55 -0700

Hey all,

I'm running the following query:


EXPLAIN FROM mytable
INSERT OVERWRITE TABLE agg_table PARTITION(result_type="males")
  SELECT my_attr, COUNT(DISTINCT userid) WHERE gender='male' GROUP BY
my_attr
INSERT OVERWRITE TABLE agg_table PARTITION(result_type="females")
  SELECT my_attr, COUNT(DISTINCT userid) WHERE gender='female' GROUP BY
my_attr;

Never mind the fact that this is a cooked-up example and I could just group
by gender,my_attr ;-) Imagine the two queries have significantly more
complex WHERE clauses.

I'd think it should be able to share the table scan and do this in a single
mapreduce job. Instead I get the plan pasted at http://pastebin.com/f479232f4

Is this a bug or is this kind of shared scan optimization not in, yet? I'm
running 0.4.0rc1-ish.

Thanks
-Todd

Shared scan optimization

Reply via email to