[jira] [Commented] (TEZ-873) Allow MRInputLegacy to expose the individual input split

Siddharth Seth (JIRA) Tue, 11 Mar 2014 09:54:52 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13930569#comment-13930569
 ]


Siddharth Seth commented on TEZ-873:
------------------------------------

bq.  I don't know that hive gets the filename from Input split. Can you please 
give me little more details or pointer?
Afaik, they get splits (or used to get splits) to figure out the filename being 
processed - so that they could determine the table / partition being used and 
load appropriate meta data.
I asked because what this means is checking the filename on each and every 
record. Is that something you think is acceptable ?



> Allow MRInputLegacy to expose the individual input split
> --------------------------------------------------------
>
>                 Key: TEZ-873
>                 URL: https://issues.apache.org/jira/browse/TEZ-873
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Mohammad Kamrul Islam
>            Assignee: Mohammad Kamrul Islam
>         Attachments: TEZ-873.1.patch, TEZ-873.2.patch
>
>
> Currently there is no way of getting InputSplit from TezProcessor. In current 
> MR framework, there is  a way to find out the filename through FileSplit.  
> For example, one common uses is to get the filename in map
> String fileName = ((FileSplit) context.getInputSplit()).getPath().getName();
> There are other meta-data in Inputsplit that could be used by existing MR 
> user.
> This JIRA is to add APIs to expose the InputSplit by adding these   
> TezGroupedSplit.getWrapperSplit() and MRInput.getInputSplit().
> Although MRInputLegacy provide an API to get the InputSplit, it has few 
> issues:
>  * Without TezGroupedSplit.getWrapperSplit() it is unusable.
>  * Since it is used in various use cases, I propose to move it from 
> MRInputLegacy to MRInput.
> * Currently the APIs are named as getNewInputSplit() and getOldInputSplit().  
> These should be merged into one : getInputSplit(). The new/old API should be 
> handled internally.
> Please give your feedback.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (TEZ-873) Allow MRInputLegacy to expose the individual input split

Reply via email to