[
https://issues.apache.org/jira/browse/CRUNCH-525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556136#comment-14556136
]
Stephen Patel commented on CRUNCH-525:
--------------------------------------
Yeah I don't see how to get PairMapFn's scale to be correct. I would think it
should be something like:
((keyMapFn.scale * keys.size) + (valMapFn.scale *
vals.size))/(keys.size+vals.size)
but that's not possible.
> The ExtractKeyFn is has an incorrect scale factor
> -------------------------------------------------
>
> Key: CRUNCH-525
> URL: https://issues.apache.org/jira/browse/CRUNCH-525
> Project: Crunch
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.12.0
> Reporter: Stephen Patel
> Assignee: Josh Wills
> Priority: Minor
> Attachments: CRUNCH-525.patch
>
>
> The ExtractKeyFn[0] used by the by[1] method of the PCollectionImpl is using
> the default scale factor for a MapFn (1.0). It should be using 1.0 + the
> scale factor of the wrapped MapFn, in order to be accurate.
> [0]:
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/fn/ExtractKeyFn.java
> [1]:
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/impl/dist/collect/PCollectionImpl.java#L270
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)