[
https://issues.apache.org/jira/browse/TEZ-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13917856#comment-13917856
]
Mohammad Kamrul Islam commented on TEZ-873:
-------------------------------------------
The omission of annotations was not intentional. I can add those.
You brought a broader and relevant question.
I can see some classes are private and specifically for "hive". For example,
MRInputLegacy has this annotation :
{noformat}
@LimitedPrivate("Hive")
public class MRInputLegacy extends MRInput {
{noformat}
My current use case:
{noformat}
FileSplit fs =
(FileSplit)((TezGroupedSplit)input.getNewInputSplit()).getWrappedSplits().get(0);
String fileName = fs.getPath().getName();
{noformat}
In this case, the document name is important to create the search indexes of a
large number of docs/files.
I think it is not only for FileSplit (which could be MR-specific). For any
InputSplit, a user might want to get some metadata that it is processing.
> Allow MRInputLegacy to expose the individual input split
> --------------------------------------------------------
>
> Key: TEZ-873
> URL: https://issues.apache.org/jira/browse/TEZ-873
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Mohammad Kamrul Islam
> Assignee: Mohammad Kamrul Islam
> Attachments: TEZ-873.1.patch, TEZ-873.2.patch
>
>
> Currently there is no way of getting InputSplit from TezProcessor. In current
> MR framework, there is a way to find out the filename through FileSplit.
> For example, one common uses is to get the filename in map
> String fileName = ((FileSplit) context.getInputSplit()).getPath().getName();
> There are other meta-data in Inputsplit that could be used by existing MR
> user.
> This JIRA is to add APIs to expose the InputSplit by adding these
> TezGroupedSplit.getWrapperSplit() and MRInput.getInputSplit().
> Although MRInputLegacy provide an API to get the InputSplit, it has few
> issues:
> * Without TezGroupedSplit.getWrapperSplit() it is unusable.
> * Since it is used in various use cases, I propose to move it from
> MRInputLegacy to MRInput.
> * Currently the APIs are named as getNewInputSplit() and getOldInputSplit().
> These should be merged into one : getInputSplit(). The new/old API should be
> handled internally.
> Please give your feedback.
--
This message was sent by Atlassian JIRA
(v6.2#6252)