[
https://issues.apache.org/jira/browse/CRUNCH-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880828#comment-13880828
]
Gabriel Reid commented on CRUNCH-329:
-------------------------------------
I've been thinking about this one some more, and I guess I've come back to
liking your idea of having a way of registering serialization codes explicitly
(but not doing it implicitly). The TupleWritables aren't reusable on their own
(without a PType) as it is, so I think I was worrying too much about being able
to read a previously-created file of TupleWritables.
To summarize the idea I've got in my head (which was actually your idea I
think):
* core primitive types are serialized as themselves (as it's currently
implemented in the patch)
* you can explicitly register a code for a custom type, but you're not required
to, and it doesn't happen implicitly. This registering could require a
Configuration object, and then the custom-registered types would just be stored
in the Configuration instead of a static in Writables.
* types that don't have a registered code are just serialized as BytesWritable
implicitly
* trying to configure a TupleWritableComparator on a tuple field that is
implicitly serialized as BytesWritable fails fast with a message about the use
of Writables.setCode()
Sound good?
> Re-add type info to TupleWritable to make fields sort correctly
> ---------------------------------------------------------------
>
> Key: CRUNCH-329
> URL: https://issues.apache.org/jira/browse/CRUNCH-329
> Project: Crunch
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.10.0, 0.8.3
> Reporter: Josh Wills
> Assignee: Josh Wills
> Fix For: 0.10.0, 0.8.3
>
> Attachments: fix-ss-writables.patch
>
>
> Secondary sorts aren't currently working correctly for Writable types after
> we hacked the TupleWritable impl to make all of the fields BytesWritables
> (e.g., secondary IntWritable values will no longer be sorted correctly, even
> though everything is still grouped correctly.)
> The least-bad way that I came up with to fix this is to use integer codes for
> each possible WritableComparable type in a pipeline that we can use to decode
> what Writable type each tuple field corresponds to. This allows us to keep
> the various fields sortable while still doing a reasonable job of minimizing
> the serialization required to pass the type information along.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)