[ 
https://issues.apache.org/jira/browse/TEZ-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908905#comment-13908905
 ] 

Bikas Saha commented on TEZ-873:
--------------------------------

This jira as it stands is invalid. InputSplit is an MR concept. So in general 
it should not move into Tez API's.

We have a design choice in Tez to make Inputs and Outputs independents of 
processors as far as the framework is concerned. Thats prevents any framework 
induced binding on the input/output/processor code. Compatibility of 
inputs/outputs/processors is currently left to the user (eg. do all of them use 
KeyValues) but may later be statically checked at compile by the framework via 
annotations.

This usage of getting file name via splits is probably a hack which we did not 
want to continue to support in MRInput. So we created MRInputLegacy to support 
this. If you feel that additions are needed to make this work with 
TezGroupedSplits then please go ahead and make the improvement to MRInputLegacy 
and change the jira title to reflect this.

> Allow Tez processor access to meta-data through input split
> -----------------------------------------------------------
>
>                 Key: TEZ-873
>                 URL: https://issues.apache.org/jira/browse/TEZ-873
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Mohammad Kamrul Islam
>            Assignee: Mohammad Kamrul Islam
>
> Currently there is no way of getting InputSplit from TezProcessor. In current 
> MR framework, there is  a way to find out the filename through FileSplit.  
> For example, one common uses is to get the filename in map
> String fileName = ((FileSplit) context.getInputSplit()).getPath().getName();
> There are other meta-data in Inputsplit that could be used by existing MR 
> user.
> This JIRA is to add APIs to expose the InputSplit by adding these   
> TezGroupedSplit.getWrapperSplit() and MRInput.getInputSplit().
> Although MRInputLegacy provide an API to get the InputSplit, it has few 
> issues:
>  * Without TezGroupedSplit.getWrapperSplit() it is unusable.
>  * Since it is used in various use cases, I propose to move it from 
> MRInputLegacy to MRInput.
> * Currently the APIs are named as getNewInputSplit() and getOldInputSplit().  
> These should be merged into one : getInputSplit(). The new/old API should be 
> handled internally.
> Please give your feedback.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to