[
https://issues.apache.org/jira/browse/CRUNCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16346286#comment-16346286
]
Josh Wills commented on CRUNCH-663:
-----------------------------------
So I like this; it's backwards-compatible with existing APIs, but does
something that is most certainly useful for a relatively small fraction of
pipelines. I'm +1 and will be happy to commit the patch to master if no one has
any objections in the next day or so.
> Expose Record-level File Path to Processing Functions
> -----------------------------------------------------
>
> Key: CRUNCH-663
> URL: https://issues.apache.org/jira/browse/CRUNCH-663
> Project: Crunch
> Issue Type: Improvement
> Components: Core
> Reporter: Ben Roling
> Assignee: Josh Wills
> Priority: Major
> Attachments: CRUNCH-663.patch
>
>
> We have some processing pipelines where we want to know the file path that
> each record being processed came from. It would be nice if this could be
> exposed to the DoFns in our pipelines.
>
> This same desire was expressed a little over 1 year ago on the mailing list:
> [http://mail-archives.apache.org/mod_mbox/crunch-user/201611.mbox/%3CCAG-tO+Y42KRFiocg1RJT4qFcyvkPjFSfZa4z=wk34arip4w...@mail.gmail.com%3E]
>
> Unfortunately, that thread dead-ended.
>
> I will use the comments section and a patch to propose a simple, albeit
> slightly hacky solution. Another alternative would be to create a new Source
> that provides a PCollection<Pair<Path, Record>>, but I'm not sure of the
> effort it would take to create that.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)