[
https://issues.apache.org/jira/browse/PIG-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210004#comment-13210004
]
Dmitriy V. Ryaboy commented on PIG-2541:
----------------------------------------
Good call.
If you are working on implementing this, can I ask that instead of appending to
a tuple, you wrap the normal pigStorage tuple with one that adds the column on
demand?
That'll make potential tuple memory optimizations that don't deal well with
strings work (not to mention that you can keep only one copy of each source tag
in ram, instead of keeping one for each tuple).
> Automatic record provenance (source tagging) for PigStorage
> -----------------------------------------------------------
>
> Key: PIG-2541
> URL: https://issues.apache.org/jira/browse/PIG-2541
> Project: Pig
> Issue Type: Improvement
> Components: impl
> Affects Versions: 0.9.1
> Reporter: Richard Ding
>
> There are a lot of interests in knowing where the data comes from when
> loading from a directory (or a set of directories). One can do it manually
> (see https://cwiki.apache.org/confluence/display/PIG/FAQ). But it will be
> more convenient for users if we implement this in the PigStorage with a
> command line option (e.g., pig.source.tagging=true/false) to turn it on/off.
> By default it will be off.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira