[ https://issues.apache.org/jira/browse/DATAFU-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16646720#comment-16646720 ]
Matthew Hayes edited comment on DATAFU-129 at 10/11/18 4:25 PM: ---------------------------------------------------------------- +1 Merged was (Author: matterhayes): Merged > New macro - dedup > ----------------- > > Key: DATAFU-129 > URL: https://issues.apache.org/jira/browse/DATAFU-129 > Project: DataFu > Issue Type: New Feature > Reporter: Eyal Allweil > Assignee: Eyal Allweil > Priority: Major > Labels: macro > Fix For: 1.5.0 > > Attachments: DATAFU-129-2.patch, DATAFU-129-bad.patch, > DATAFU-129.patch > > > Macro used to dedup (de-duplicate) a table, based on a key or keys and an > ordering (typically a date updated field). > One thing to consider - the implementation relies on the > ExtremalTupleByNthField UDF in PiggyBank. I've added it to the test > dependencies in order for the test to run. While I feel that anyone using Pig > typically has PiggyBank in the classpath, this might not be true - do we have > an alternative? (maybe adding it to the jarjar?) > The macro's definition looks as follows: > DEFINE dedup(relation, row_key, order_field) returns out { > relation - relation to dedup > row_key - field(s) for group by > order_field - the field for ordering (to find the most recent record) -- This message was sent by Atlassian JIRA (v7.6.3#76005)