Rohini Palaniswamy created PIG-3850:
---------------------------------------

             Summary: Optimize join followed by order by using same key
                 Key: PIG-3850
                 URL: https://issues.apache.org/jira/browse/PIG-3850
             Project: Pig
          Issue Type: Sub-task
            Reporter: Rohini Palaniswamy


Possible optimizations:
    1) If it is a skewed join, then we can combine ordering into it instead of 
doing a additional orderby as we skewed join already involves sampling.
    2) If it is a normal join, then we can do the order by and then join. i.e
Current plan:
  Vertex 1 (load massive), Vertex 2 (load big) -> Vertex 3 (join) -> Vertex 4 
(sampler), Vertex 5 (Partitioner) -> Vertex 6 (order by)
New plan:
  Vertex 1 (load massive) -> Vertex 2 (sampler), Vertex 3 (Partitioner) -> 
Vertex 4 (order by and join) <- Vertex 5 (load big)




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to