[
https://issues.apache.org/jira/browse/CRUNCH-673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriel Reid updated CRUNCH-673:
--------------------------------
Attachment: CRUNCH-673.2.patch
> Sort fails when using more reducers than records
> ------------------------------------------------
>
> Key: CRUNCH-673
> URL: https://issues.apache.org/jira/browse/CRUNCH-673
> Project: Crunch
> Issue Type: Bug
> Reporter: Gabriel Reid
> Priority: Minor
> Attachments: CRUNCH-673.2.patch, CRUNCH-673.patch
>
>
> We've run into an issue where running Sort with a number of reducers that is
> higher than the number of records to be sorted fails.
> The way in which this occurs is that a large PCollection is filtered down to
> almost nothing (say 10 records), and that filtered PCollection is passed in
> to Sort. Sort configures n reducers for the small PCollection (because it
> doesn't realize that it has been filtered so aggressively), so then there are
> for example 20 reducers configured. Reservoir sampling is used to build up
> the partition definitions for the TotalOrderPartitioner, but because there
> are only 10 records in the filtered PCollection, only 10 partitions are
> defined for the TotalOrderPartitioner. This then causes a precondition in
> TotalOrderPartitioner to fail, because the number of partitions in the
> partitions file doesn't match up with the number of configured reducers.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)