[
https://issues.apache.org/jira/browse/PIG-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210058#comment-13210058
]
Dmitriy V. Ryaboy commented on PIG-2541:
----------------------------------------
Prashant,
Nice start!
I think this needs more work to make it work with cases when people use a
loaded schema (the reported schema will be wrong).
Is appending to the end the right thing to do? Putting it in front ensures we
know where to expect it. Without a schema, PigStorage can produce
variable-length tuples depending on the number of entries it sees in a row.
That will make source tag's ordinal float about.
Minor code/style comments: please make static final vars all caps. Also please
make sure the tabulation is strictly 4 spaces (a few lines looks off -- tabs?).
Just set your IDE settings to enforce this.
Also the final patch should include the new parameters in Javadoc, not just in
populateValidOptions().
D
> Automatic record provenance (source tagging) for PigStorage
> -----------------------------------------------------------
>
> Key: PIG-2541
> URL: https://issues.apache.org/jira/browse/PIG-2541
> Project: Pig
> Issue Type: Improvement
> Components: impl
> Affects Versions: 0.9.1
> Reporter: Richard Ding
> Assignee: Prashant Kommireddi
> Attachments: PIG-2541.patch
>
>
> There are a lot of interests in knowing where the data comes from when
> loading from a directory (or a set of directories). One can do it manually
> (see https://cwiki.apache.org/confluence/display/PIG/FAQ). But it will be
> more convenient for users if we implement this in the PigStorage with a
> command line option (e.g., pig.source.tagging=true/false) to turn it on/off.
> By default it will be off.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira