[jira] [Commented] (TEZ-1265) Custom input to fetch source task inputs in order

Rohini Palaniswamy (JIRA) Mon, 07 Jul 2014 16:32:25 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054287#comment-14054287
 ]


Rohini Palaniswamy commented on TEZ-1265:
-----------------------------------------

[~sseth] had better suggestion of making it even more generic and controllable 
than fetching in order - 
https://issues.apache.org/jira/browse/PIG-4049?focusedCommentId=14054266&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054266

A custom version of the Input which, instead of providing a unified view of the 
data, gives access to individual chunks along with meta-information (taskId 
etc). This could, additionally, be fully controlled by the user in terms of 
which chunks need to be fetched.

> Custom input to fetch source task inputs in order
> -------------------------------------------------
>
>                 Key: TEZ-1265
>                 URL: https://issues.apache.org/jira/browse/TEZ-1265
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>
> Consider the case of having to LIMIT m records after an Order by. A 
> distributed orderby vertex produces data in sorted order from 
> task0,task1...taskn. Each task limits its output to m records (the output 
> count can be <m also). The limit vertex (parallelism 1) following the order 
> by vertex has to fetch output of all n tasks, shuffle merge its inputs (to 
> maintain the order) and then limit m records again.  So need a input that 
> fetches from source tasks in order and reads them in order. Since data 
> produced is ordered from task0,task1...taskn it can be consumed without 
> shuffle and sort. If the limit is hit early it can skip fetching more task 
> inputs. 
> More details in 
> https://issues.apache.org/jira/browse/PIG-4049?focusedCommentId=14053217&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14053217



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (TEZ-1265) Custom input to fetch source task inputs in order

Reply via email to