[
https://issues.apache.org/jira/browse/CRUNCH-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880063#comment-13880063
]
Chao Shi commented on CRUNCH-329:
---------------------------------
bq. use some kind of hashing algorithm on the class name to generate the
serialization codes for custom WritableComparable classes, and store the
mapping from serialization code to class name in the Configuration. With this
one we have to watch out for id collisions, but we could just fail fast if one
of those happens.
Can we serialize the comparison function (with the knowledge of PType) into
Configuration, and dynamically restore it at runtime inside MR workers? With
such approach, we can also support sorting on arbitrary types. Of course, there
will be overhead on PType's InputMapFn.
> Re-add type info to TupleWritable to make fields sort correctly
> ---------------------------------------------------------------
>
> Key: CRUNCH-329
> URL: https://issues.apache.org/jira/browse/CRUNCH-329
> Project: Crunch
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.10.0, 0.8.3
> Reporter: Josh Wills
> Assignee: Josh Wills
> Fix For: 0.10.0, 0.8.3
>
> Attachments: fix-ss-writables.patch
>
>
> Secondary sorts aren't currently working correctly for Writable types after
> we hacked the TupleWritable impl to make all of the fields BytesWritables
> (e.g., secondary IntWritable values will no longer be sorted correctly, even
> though everything is still grouped correctly.)
> The least-bad way that I came up with to fix this is to use integer codes for
> each possible WritableComparable type in a pipeline that we can use to decode
> what Writable type each tuple field corresponds to. This allows us to keep
> the various fields sortable while still doing a reasonable job of minimizing
> the serialization required to pass the type information along.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)