[ 
https://issues.apache.org/jira/browse/PIG-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139536#comment-13139536
 ] 

Daniel Dai commented on PIG-2328:
---------------------------------

Here are some comments:
1. javadoc sample is wrong:
define bb BuildBloom(100, 3, Hash.JENKINS_HASH); => define bb 
BuildBloom('jenkins', '100', '0.1');
C = filter B by Bloom(mybloom, z); =>C = filter B by Bloom(z);
2. It should be trivial to convert it into scalar, so that we get out of the 
business to figure out the symbol link name:
{code}
define bb BuildBloom('jenkins', '10', '0.1');
small = load 'S' as (x, y, z);
grpd = group small all;
fltrd = foreach grpd generate bb(small.x) as a0;

large = load 'L' as (a, b, c);
flarge = filter large by Bloom(fltrd.a0, a);
joined = join small by x, flarge by a;
store joined into 'results';
{code}
Wanna me to upload a patch?
                
> Add builtin UDFs for building and using bloom filters
> -----------------------------------------------------
>
>                 Key: PIG-2328
>                 URL: https://issues.apache.org/jira/browse/PIG-2328
>             Project: Pig
>          Issue Type: New Feature
>          Components: internal-udfs
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>             Fix For: 0.10
>
>         Attachments: PIG-bloom-2.patch, PIG-bloom.patch
>
>
> Bloom filters are a common way to do select a limited set of records before 
> moving data for a join or other heavy weight operation.  Pig should add UDFs 
> to support building and using bloom filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to