Eyal Allweil created DATAFU-129: ----------------------------------- Summary: New macro - dedup Key: DATAFU-129 URL: https://issues.apache.org/jira/browse/DATAFU-129 Project: DataFu Issue Type: New Feature Reporter: Eyal Allweil Assignee: Eyal Allweil
Macro used to dedup (de-duplicate) a table, based on a key or keys and an ordering (typically a date updated field). One thing to consider - the implementation relies on the ExtremalTupleByNthField UDF in PiggyBank. I've added it to the test dependencies in order for the test to run. While I feel that anyone using Pig typically has PiggyBank in the classpath, this might not be true - do we have an alternative? (maybe adding it to the jarjar?) The macro's definition looks as follows: DEFINE dedup(relation, row_key, order_field) returns out { relation - relation to dedup row_key - field(s) for group by order_field - the field for ordering (to find the most recent record) -- This message was sent by Atlassian JIRA (v6.4.14#64029)