[ 
https://issues.apache.org/jira/browse/CRUNCH-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554054#comment-13554054
 ] 

Josh Wills commented on CRUNCH-143:
-----------------------------------

It's possible right now, just hacky-- you end up doing ((MapContext) 
getContext()).getInputSplit() in the DoFn, but I would be good with making 
information about the input data that is currently being processed easier to 
access for the client. Thoughts on what the API should look like? Very 
MapReduce-y, or should we wrap it in some kind of abstraction that would be 
valid for (say) in-memory pipelines as well?
                
> CrunchInputSplit should be public
> ---------------------------------
>
>                 Key: CRUNCH-143
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-143
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.4.0
>            Reporter: Dave Beech
>            Assignee: Josh Wills
>            Priority: Minor
>
> Similar to MAPREDUCE-2226 - it's currently not possible to access the 
> underlying input split details, for instance the path on HDFS. 
> Is there a nice way to make this information available from DoFn instances 
> while keeping with the Crunch abstraction?
> Also - MAPREDUCE-4923 might also be applicable to CrunchInputSplit

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to