[
https://issues.apache.org/jira/browse/DATAFU-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552460#comment-16552460
]
Eyal Allweil commented on DATAFU-129:
-------------------------------------
I will prepare a new patch as soon as I can.
> New macro - dedup
> -----------------
>
> Key: DATAFU-129
> URL: https://issues.apache.org/jira/browse/DATAFU-129
> Project: DataFu
> Issue Type: New Feature
> Reporter: Eyal Allweil
> Assignee: Eyal Allweil
> Priority: Major
> Labels: macro
> Attachments: DATAFU-129.patch
>
>
> Macro used to dedup (de-duplicate) a table, based on a key or keys and an
> ordering (typically a date updated field).
> One thing to consider - the implementation relies on the
> ExtremalTupleByNthField UDF in PiggyBank. I've added it to the test
> dependencies in order for the test to run. While I feel that anyone using Pig
> typically has PiggyBank in the classpath, this might not be true - do we have
> an alternative? (maybe adding it to the jarjar?)
> The macro's definition looks as follows:
> DEFINE dedup(relation, row_key, order_field) returns out {
> relation - relation to dedup
> row_key - field(s) for group by
> order_field - the field for ordering (to find the most recent record)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)