[
https://issues.apache.org/jira/browse/CRUNCH-96?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478274#comment-13478274
]
Josh Wills commented on CRUNCH-96:
----------------------------------
I ran into it on a machine learning project I was working on, and it seems to
come up fairly often in sessionization applications (e.g., group by user ID,
sort events by timestamp), viz.,
https://www.google.com/search?q=mapreduce+secondary+sort
Your point on naming well-taken: this isn't a total ordering on the keys, it's
just a sort on the values going into the reducer. Something more like
GroupByKeyWithSecondarySort would be more accurate (albeit more verbose.)
Recommendations?
> Add secondary sort functionality to o.a.c.lib
> ---------------------------------------------
>
> Key: CRUNCH-96
> URL: https://issues.apache.org/jira/browse/CRUNCH-96
> Project: Crunch
> Issue Type: Improvement
> Components: Core, MapReduce Patterns
> Reporter: Josh Wills
> Assignee: Josh Wills
> Fix For: 0.4.0
>
> Attachments: CRUNCH-96.patch
>
>
> I've been working on a problem that required a secondary sorting pattern that
> was very similar to the example that Alex Kozlov created in CRUNCH-78, so it
> would be good to extract the pattern from the example and move it to
> o.a.c.lib so it can be easily available to clients.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira