[ https://issues.apache.org/jira/browse/PIG-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668748#comment-13668748 ]
Alan Gates commented on PIG-3257: --------------------------------- I don't see how records can be missing or redundant. Take the following query: {code} A = load ... B = group A by UUID(); C = foreach B... {code] This won't reduce at all. For every record it is totally irrelevant what particular value its key is, because it's guaranteed to be unique for each record. So 1) this is a totally meaningless thing to do; 2) if a particular map does get rerun or is used in speculative execution it doesn't matter because which particular key is generated by UUID is irrelevant. The way this intended to be used is something like this: {code} A = load 'over100k' using org.apache.hcatalog.pig.HCatLoader(); B = foreach A generate *, UUID(); C = group B by s; D = foreach C generate flatten(B), SUM(B.i) as sum_b; E = group B by si; F = foreach E generate flatten(B), SUM(B.f) as sum_f; G = join D by uuid, F by uuid; H = foreach G generate D::B::s, sum_b, sum_f; store H into 'output'; {code} > Add unique identifier UDF > ------------------------- > > Key: PIG-3257 > URL: https://issues.apache.org/jira/browse/PIG-3257 > Project: Pig > Issue Type: Improvement > Components: internal-udfs > Reporter: Alan Gates > Assignee: Alan Gates > Fix For: 0.12 > > Attachments: PIG-3257.patch > > > It would be good to have a Pig function to generate unique identifiers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira