[
https://issues.apache.org/jira/browse/CRUNCH-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880093#comment-13880093
]
Josh Wills commented on CRUNCH-329:
-----------------------------------
[~gabriel.reid] what if we made the writable codes explicit-- i.e., you had to
call something like Writables.setCode(Class<? extends Writable> clazz, int
code) and re-defining an existing code was a runtime error? We'd also probably
want to put a floor in place, like 16 or 32, so that we could have some room to
expand the base types if necessary. Ugly but explicit, and it would keep the
serialized codes small in the vast majority of cases.
[~stepinto] we could support custom sorts on Writable types relatively easy
that way-- to sort a primitive type in a tuple in a non-standard way, you would
associate a custom code with both the class and the comparator. But that would
still involve a Comparator that operated on Writable types, and I think you're
talking about using a regular Java Comparator on primtive types, right?
> Re-add type info to TupleWritable to make fields sort correctly
> ---------------------------------------------------------------
>
> Key: CRUNCH-329
> URL: https://issues.apache.org/jira/browse/CRUNCH-329
> Project: Crunch
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.10.0, 0.8.3
> Reporter: Josh Wills
> Assignee: Josh Wills
> Fix For: 0.10.0, 0.8.3
>
> Attachments: fix-ss-writables.patch
>
>
> Secondary sorts aren't currently working correctly for Writable types after
> we hacked the TupleWritable impl to make all of the fields BytesWritables
> (e.g., secondary IntWritable values will no longer be sorted correctly, even
> though everything is still grouped correctly.)
> The least-bad way that I came up with to fix this is to use integer codes for
> each possible WritableComparable type in a pipeline that we can use to decode
> what Writable type each tuple field corresponds to. This allows us to keep
> the various fields sortable while still doing a reasonable job of minimizing
> the serialization required to pass the type information along.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)