[
https://issues.apache.org/jira/browse/CRUNCH-23?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440956#comment-13440956
]
Rahul Sharma commented on CRUNCH-23:
------------------------------------
TotalOrderPartitioner in the current form is not usable with Avro.
MAPREDUCE-4574 issue states the same. We will need to re implement the
TotalOrderPartitioner if we want to use it.
But on second thought do we want this work with avro data ? In avro the sort
order is imposed by the Schema. So if the user specifies some order in the
schema then Avro will make sure it loads all data using the same. If none is
specified then avro will select ascending order by default on each of the
fields of the record. It feels like avro data is sorted out-of the box.
> PCollection#sort doesn't do a full sort on values
> -------------------------------------------------
>
> Key: CRUNCH-23
> URL: https://issues.apache.org/jira/browse/CRUNCH-23
> Project: Crunch
> Issue Type: Bug
> Reporter: Gabriel Reid
> Assignee: Rahul Sharma
> Attachments: 0001-CRUNCH-23-fix-sorting.patch,
> CRUNCH-23-sorting-issue.patch,
> CRUNCH-23-used-TotalOrderpartioner-for-sorting-keys.patch, SortTest.java
>
>
> When a PCollection is sorted (using PCollection#sort), the sorting that is
> performed is only per reducer, and not an absolute sort over all values. This
> means that the values are not in sorted order if they are iterated over on a
> materialized collection. It also means that the sorted files that are output
> from a sort operation can not be simply concatenated to come to a single
> sorted file.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira