Rohini Palaniswamy created PIG-3850:
---------------------------------------
Summary: Optimize join followed by order by using same key
Key: PIG-3850
URL: https://issues.apache.org/jira/browse/PIG-3850
Project: Pig
Issue Type: Sub-task
Reporter: Rohini Palaniswamy
Possible optimizations:
1) If it is a skewed join, then we can combine ordering into it instead of
doing a additional orderby as we skewed join already involves sampling.
2) If it is a normal join, then we can do the order by and then join. i.e
Current plan:
Vertex 1 (load massive), Vertex 2 (load big) -> Vertex 3 (join) -> Vertex 4
(sampler), Vertex 5 (Partitioner) -> Vertex 6 (order by)
New plan:
Vertex 1 (load massive) -> Vertex 2 (sampler), Vertex 3 (Partitioner) ->
Vertex 4 (order by and join) <- Vertex 5 (load big)
--
This message was sent by Atlassian JIRA
(v6.2#6252)