[ 
https://issues.apache.org/jira/browse/CRUNCH-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880443#comment-13880443
 ] 

Gabriel Reid commented on CRUNCH-329:
-------------------------------------

I think if we would want to throw out the serialization codes completely then 
we'd need to store the TupleWritable schema in the Configuration all the time 
(not just for the key used in the shuffle), wouldn't we? Assuming that that 
isn't too much of a problem then I guess it could all work somehow.

Am I correct in thinking that forcing Avro for all shuffles will bring us back 
full circle to the same problem that started this ticket? Avros.writables 
stores the bytes of the Writable directly, so Avro will do the sorting based on 
the raw serialized bytes of the Writable right?



> Re-add type info to TupleWritable to make fields sort correctly
> ---------------------------------------------------------------
>
>                 Key: CRUNCH-329
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-329
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.10.0, 0.8.3
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>             Fix For: 0.10.0, 0.8.3
>
>         Attachments: fix-ss-writables.patch
>
>
> Secondary sorts aren't currently working correctly for Writable types after 
> we hacked the TupleWritable impl to make all of the fields BytesWritables 
> (e.g., secondary IntWritable values will no longer be sorted correctly, even 
> though everything is still grouped correctly.)
> The least-bad way that I came up with to fix this is to use integer codes for 
> each possible WritableComparable type in a pipeline that we can use to decode 
> what Writable type each tuple field corresponds to. This allows us to keep 
> the various fields sortable while still doing a reasonable job of minimizing 
> the serialization required to pass the type information along.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to