[ 
https://issues.apache.org/jira/browse/CRUNCH-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880828#comment-13880828
 ] 

Gabriel Reid commented on CRUNCH-329:
-------------------------------------

I've been thinking about this one some more, and I guess I've come back to 
liking your idea of having a way of registering serialization codes explicitly 
(but not doing it implicitly). The TupleWritables aren't reusable on their own 
(without a PType) as it is, so I think I was worrying too much about being able 
to read a previously-created file of TupleWritables.

To summarize the idea I've got in my head (which was actually your idea I 
think):
* core primitive types are serialized as themselves (as it's currently 
implemented in the patch)
* you can explicitly register a code for a custom type, but you're not required 
to, and it doesn't happen implicitly. This registering could require a 
Configuration object, and then the custom-registered types would just be stored 
in the Configuration instead of a static in Writables.
* types that don't have a registered code are just serialized as BytesWritable 
implicitly
* trying to configure a TupleWritableComparator on a tuple field that is 
implicitly serialized as BytesWritable fails fast with a message about the use 
of Writables.setCode()

Sound good?

> Re-add type info to TupleWritable to make fields sort correctly
> ---------------------------------------------------------------
>
>                 Key: CRUNCH-329
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-329
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.10.0, 0.8.3
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>             Fix For: 0.10.0, 0.8.3
>
>         Attachments: fix-ss-writables.patch
>
>
> Secondary sorts aren't currently working correctly for Writable types after 
> we hacked the TupleWritable impl to make all of the fields BytesWritables 
> (e.g., secondary IntWritable values will no longer be sorted correctly, even 
> though everything is still grouped correctly.)
> The least-bad way that I came up with to fix this is to use integer codes for 
> each possible WritableComparable type in a pipeline that we can use to decode 
> what Writable type each tuple field corresponds to. This allows us to keep 
> the various fields sortable while still doing a reasonable job of minimizing 
> the serialization required to pass the type information along.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to