[
https://issues.apache.org/jira/browse/CRUNCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345736#comment-16345736
]
Ben Roling commented on CRUNCH-663:
-----------------------------------
The attached patch is a quick proof-of-concept. I wouldn't expect it to be
merged directly. The patch has a modified WordCount examples that demonstrates
leveraging this property. I should have just added a unit test, to show it,
but haven't done that yet. If I get feedback that the general approach is
acceptable, I would certainly be happy to add one or more tests.
> Expose Record-level File Path to Processing Functions
> -----------------------------------------------------
>
> Key: CRUNCH-663
> URL: https://issues.apache.org/jira/browse/CRUNCH-663
> Project: Crunch
> Issue Type: Improvement
> Components: Core
> Reporter: Ben Roling
> Assignee: Josh Wills
> Priority: Major
> Attachments: CRUNCH-663.patch
>
>
> We have some processing pipelines where we want to know the file path that
> each record being processed came from. It would be nice if this could be
> exposed to the DoFns in our pipelines.
>
> This same desire was expressed a little over 1 year ago on the mailing list:
> [http://mail-archives.apache.org/mod_mbox/crunch-user/201611.mbox/%3CCAG-tO+Y42KRFiocg1RJT4qFcyvkPjFSfZa4z=wk34arip4w...@mail.gmail.com%3E]
>
> Unfortunately, that thread dead-ended.
>
> I will use the comments section and a patch to propose a simple, albeit
> slightly hacky solution. Another alternative would be to create a new Source
> that provides a PCollection<Pair<Path, Record>>, but I'm not sure of the
> effort it would take to create that.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)