Eyal Allweil created DATAFU-117:
-----------------------------------

             Summary: New UDF - CountDistinctUpTo
                 Key: DATAFU-117
                 URL: https://issues.apache.org/jira/browse/DATAFU-117
             Project: DataFu
          Issue Type: New Feature
            Reporter: Eyal Allweil


A UDF that counts distinct tuples within a bag, but only up to a preset limit. 
If the bag contains more distinct tuples than the limit, the UDF returns the 
limit. 

This UDF can run reasonably well even on large bags if the limit chosen is 
small enough though the count is done in memory.

We use this UDF in PayPal for filtering, when we don't need to use the actual 
tuples afterward.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to