Hey all, I'm running the following query:
EXPLAIN FROM mytable INSERT OVERWRITE TABLE agg_table PARTITION(result_type="males") SELECT my_attr, COUNT(DISTINCT userid) WHERE gender='male' GROUP BY my_attr INSERT OVERWRITE TABLE agg_table PARTITION(result_type="females") SELECT my_attr, COUNT(DISTINCT userid) WHERE gender='female' GROUP BY my_attr; Never mind the fact that this is a cooked-up example and I could just group by gender,my_attr ;-) Imagine the two queries have significantly more complex WHERE clauses. I'd think it should be able to share the table scan and do this in a single mapreduce job. Instead I get the plan pasted at http://pastebin.com/f479232f4 Is this a bug or is this kind of shared scan optimization not in, yet? I'm running 0.4.0rc1-ish. Thanks -Todd
