[
https://issues.apache.org/jira/browse/CRUNCH-598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15214465#comment-15214465
]
Josh Wills commented on CRUNCH-598:
-----------------------------------
So I'm trying to figure out the best way to solve this, and I'd like to avoid
modifying the ShardingStrategy, since changing interfaces has more downstream
impact (clients don't compile against the new version, etc.) There's lots of
stuff we can do on the constructor of ShardedJoinStrategy, including allowing a
scaleFactor argument in place of (or in addition to) the numReducers argument
we provide, or even letting someone pass in their own custom JoinStrategy
instead of the DefaultJoinStrategy that we use under the covers now. Would
either of those options work for your use case?
> scaleFactor for JoinStrategy
> ----------------------------
>
> Key: CRUNCH-598
> URL: https://issues.apache.org/jira/browse/CRUNCH-598
> Project: Crunch
> Issue Type: Improvement
> Reporter: Stefan De Smit
> Priority: Minor
>
> the scaleFactor method has a big influence on planner.
> For joins, there currently isn't a clean way to set this, while it often is
> required, as a join can have a big multiply factor.
> for the DefaultJoinStrategy, it's possible to add a custom JoinFn with proper
> scaleFactor, or just extend the default InnerJoinFn with a scaleFactor.
> For the ShardedJoinStrategy, this isn't possible, while it often is needed
> more (as ShardedJoin is especially handy for 1 to really many).
> For the default ConstantShardingStrategy, it might make sense to use the
> numShards also as scalingFactor for left side. as that's kind of what
> happens: emit every left entry numShards times.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)