Rohini Palaniswamy created TEZ-1265:
---------------------------------------

             Summary: Custom input to fetch source task inputs in order
                 Key: TEZ-1265
                 URL: https://issues.apache.org/jira/browse/TEZ-1265
             Project: Apache Tez
          Issue Type: Improvement
            Reporter: Rohini Palaniswamy


Consider the case of having to LIMIT m records after an Order by. A distributed 
orderby vertex produces data in sorted order from task0,task1...taskn. Each 
task limits its output to m records (the output count can be <m also). The 
limit vertex (parallelism 1) following the order by vertex has to fetch output 
of all n tasks, shuffle merge its inputs (to maintain the order) and then limit 
m records again.  So need a input that fetches from source tasks in order and 
reads them in order. Since data produced is ordered from task0,task1...taskn it 
can be consumed without shuffle and sort. If the limit is hit early it can skip 
fetching more task inputs. 

More details in 
https://issues.apache.org/jira/browse/PIG-4049?focusedCommentId=14053217&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14053217



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to