Ben Roling created CRUNCH-663:
---------------------------------
Summary: Expose Record-level File Path to Processing Functions
Key: CRUNCH-663
URL: https://issues.apache.org/jira/browse/CRUNCH-663
Project: Crunch
Issue Type: Improvement
Components: Core
Reporter: Ben Roling
Assignee: Josh Wills
We have some processing pipelines where we want to know the file path that each
record being processed came from. It would be nice if this could be exposed to
the DoFns in our pipelines.
This same desire was expressed a little over 1 year ago on the mailing list:
[http://mail-archives.apache.org/mod_mbox/crunch-user/201611.mbox/%3CCAG-tO+Y42KRFiocg1RJT4qFcyvkPjFSfZa4z=wk34arip4w...@mail.gmail.com%3E]
Unfortunately, that thread dead-ended.
I will use the comments section and a patch to propose a simple, albeit
slightly hacky solution. Another alternative would be to create a new Source
that provides a PCollection<Pair<Path, Record>>, but I'm not sure of the effort
it would take to create that.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)