Ido Hadanny created PIG-4662:
--------------------------------

             Summary: New optimizer rule: filter nulls before inner joins
                 Key: PIG-4662
                 URL: https://issues.apache.org/jira/browse/PIG-4662
             Project: Pig
          Issue Type: Improvement
            Reporter: Ido Hadanny
            Priority: Minor


As stated in the docs, rewriting an inner join and filtering nulls from inputs 
can be a big performance gain: 
http://pig.apache.org/docs/r0.14.0/perf.html#nulls

We would like to add an optimizer rule which detects inner joins, and filters 
nulls in all inputs:
A = filter A by t is not null;
B = filter B by x is not null;
C = join A by t, B by x;

see also: 
http://stackoverflow.com/questions/32088389/is-the-pig-optimizer-filtering-nulls-before-joining




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to