[
https://issues.apache.org/jira/browse/CRUNCH-525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Wills updated CRUNCH-525:
------------------------------
Attachment: CRUNCH-525b.patch
I played with a few different versions of [~spatel89]'s formula and felt that
the average of the two scale factors was the best default guess, on the
assumption that keys.size == values.size in this context. This patch reflects
that.
> The ExtractKeyFn is has an incorrect scale factor
> -------------------------------------------------
>
> Key: CRUNCH-525
> URL: https://issues.apache.org/jira/browse/CRUNCH-525
> Project: Crunch
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.12.0
> Reporter: Stephen Patel
> Assignee: Josh Wills
> Priority: Minor
> Attachments: CRUNCH-525.patch, CRUNCH-525b.patch
>
>
> The ExtractKeyFn[0] used by the by[1] method of the PCollectionImpl is using
> the default scale factor for a MapFn (1.0). It should be using 1.0 + the
> scale factor of the wrapped MapFn, in order to be accurate.
> [0]:
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/fn/ExtractKeyFn.java
> [1]:
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/impl/dist/collect/PCollectionImpl.java#L270
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)