Ido Hadanny created PIG-4662:
--------------------------------
Summary: New optimizer rule: filter nulls before inner joins
Key: PIG-4662
URL: https://issues.apache.org/jira/browse/PIG-4662
Project: Pig
Issue Type: Improvement
Reporter: Ido Hadanny
Priority: Minor
As stated in the docs, rewriting an inner join and filtering nulls from inputs
can be a big performance gain:
http://pig.apache.org/docs/r0.14.0/perf.html#nulls
We would like to add an optimizer rule which detects inner joins, and filters
nulls in all inputs:
A = filter A by t is not null;
B = filter B by x is not null;
C = join A by t, B by x;
see also:
http://stackoverflow.com/questions/32088389/is-the-pig-optimizer-filtering-nulls-before-joining
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)