[
https://issues.apache.org/jira/browse/PIG-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13130904#comment-13130904
]
Alan Gates commented on PIG-2328:
---------------------------------
This patch includes two new UDFs, BuildBloom and Bloom. BuildBloom will build
a bloom filter. It is intended to be used in a group all script where all
records passed to it will be added to the filter. The results are stored to
the output of the group all Pig job. Bloom can then be used to apply that
filter to inputs in subsequent queries, such as a join. See the tests in the
patch for examples.
> Add builtin UDFs for building and using bloom filters
> -----------------------------------------------------
>
> Key: PIG-2328
> URL: https://issues.apache.org/jira/browse/PIG-2328
> Project: Pig
> Issue Type: New Feature
> Components: internal-udfs
> Reporter: Alan Gates
> Assignee: Alan Gates
> Fix For: 0.10
>
> Attachments: PIG-bloom.patch
>
>
> Bloom filters are a common way to do select a limited set of records before
> moving data for a join or other heavy weight operation. Pig should add UDFs
> to support building and using bloom filters.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira