[ https://issues.apache.org/jira/browse/PIG-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214181#comment-13214181 ]
Daniel Dai commented on PIG-2541: --------------------------------- Here are my thoughts: 1. As Dmitriy said, we need to include this column in getSchema. 2. Hive name this column INPUT__FILE__NAME, we can follow this convention, no need to invent something new. 3. Usually user don't care about the input file name for intermediate file, name conflict should be rare. I would like to fail out when this happen to make the patch simpler(believe parser already do the name conflict check). > Automatic record provenance (source tagging) for PigStorage > ----------------------------------------------------------- > > Key: PIG-2541 > URL: https://issues.apache.org/jira/browse/PIG-2541 > Project: Pig > Issue Type: Improvement > Components: impl > Affects Versions: 0.9.1 > Reporter: Richard Ding > Assignee: Prashant Kommireddi > Attachments: PIG-2541.patch > > > There are a lot of interests in knowing where the data comes from when > loading from a directory (or a set of directories). One can do it manually > (see https://cwiki.apache.org/confluence/display/PIG/FAQ). But it will be > more convenient for users if we implement this in the PigStorage with a > command line option (e.g., pig.source.tagging=true/false) to turn it on/off. > By default it will be off. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira