Re: Review Request 46701: DATAFU-117 - New UDF - CountDistinctUpTo

2016-05-19 Thread Eyal Allweil


> On May 18, 2016, 10:19 p.m., Matthew Hayes wrote:
> > datafu-pig/src/main/java/datafu/pig/bags/CountDistinctUpTo.java, line 147
> > 
> >
> > What about clearing the set so we don't have to garbage collect?

I just reassigned it because the clear() method in HashSet uses Array.fill and 
I thought it would be more expensive than just letting it be garbage collected 
and making a new one.


> On May 18, 2016, 10:19 p.m., Matthew Hayes wrote:
> > datafu-pig/src/test/java/datafu/test/pig/bags/BagTests.java, line 22
> > 
> >
> > We already have the testng assertEquals and this gives me a build 
> > error.  Can you confirm this command passes?
> > 
> > ./gradlew :datafu-pig:test -Dtest.single=BagTests
> > 
> > I'm getting an error with this.

I removed this line ... don't know why it worked for me (although I think it 
did)


- Eyal


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46701/#review133803
---


On April 27, 2016, 7:44 a.m., Eyal Allweil wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46701/
> ---
> 
> (Updated April 27, 2016, 7:44 a.m.)
> 
> 
> Review request for DataFu.
> 
> 
> Repository: datafu
> 
> 
> Description
> ---
> 
> DATAFU-117 - New UDF - CountDistinctUpTo
> 
> 
> Diffs
> -
> 
>   datafu-pig/src/main/java/datafu/pig/bags/CountDistinctUpTo.java 
> PRE-CREATION 
>   datafu-pig/src/test/java/datafu/test/pig/bags/BagTests.java 
> 28292db0c01a1967ea53d9cc3d316e9906d942a8 
> 
> Diff: https://reviews.apache.org/r/46701/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Eyal Allweil
> 
>



[jira] [Updated] (DATAFU-117) New UDF - CountDistinctUpTo

2016-05-19 Thread Eyal Allweil (JIRA)

 [ 
https://issues.apache.org/jira/browse/DATAFU-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eyal Allweil updated DATAFU-117:

Attachment: DATAFU-117-3.patch

Incorporates changes from [review |https://reviews.apache.org/r/46701/]

> New UDF - CountDistinctUpTo
> ---
>
> Key: DATAFU-117
> URL: https://issues.apache.org/jira/browse/DATAFU-117
> Project: DataFu
>  Issue Type: New Feature
>Reporter: Eyal Allweil
> Attachments: DATAFU-117-2.patch, DATAFU-117-3.patch, DATAFU-117.patch
>
>
> A UDF that counts distinct tuples within a bag, but only up to a preset 
> limit. If the bag contains more distinct tuples than the limit, the UDF 
> returns the limit. 
> This UDF can run reasonably well even on large bags if the limit chosen is 
> small enough though the count is done in memory.
> We use this UDF in PayPal for filtering, when we don't need to use the actual 
> tuples afterward.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)