[
https://issues.apache.org/jira/browse/CRUNCH-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201375#comment-15201375
]
Gabriel Reid commented on CRUNCH-596:
-------------------------------------
First off, sorry I missed your long-standing pull request for this -- I saw it
pass by a while back and didn't get on it.
This looks really good -- a great solution for something that I was pretty much
convinced wasn't possible.
Very good point in the javadoc about setting dealing with the deep-copying that
will occur by using the two filters to split the right side up into two
PCollections. However, I think that we can get around that by putting the
"right" PCollection through a dummy parallelDo call with a DoFn that returns
true for {{disableDeepCopy()}}. This will stop the deep copying that would
happen otherwise before the values are passed to the two filter functions, and
then there wouldn't be any need to set DISABLE_DEEP_COPY globally any more to
get decent performance. Would you be able to update the patch with that little
change? Other than that, this looks like it's good to go.
> Right and full outer join for Bloom filter strategy
> ---------------------------------------------------
>
> Key: CRUNCH-596
> URL: https://issues.apache.org/jira/browse/CRUNCH-596
> Project: Crunch
> Issue Type: Improvement
> Components: Core
> Affects Versions: 0.13.0
> Reporter: Piotr Chromiec
> Assignee: Josh Wills
> Priority: Minor
> Labels: features, github-import, newbie
> Fix For: 0.14.0
>
>
> Seems that current Bloom filter join strategy lacks of support for right and
> full outer joins. At RTBHOUSE we had recently found this as useful and
> implemented for our internal project. Code for this feature with javadoc and
> tests is pushed at GitHub
> [PullRequest#9|https://github.com/apache/crunch/pull/9]
> I'm newbie here so forgive me if this issue is somehow incomplete or buggy.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)