[ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480190#comment-13480190 ]
Jonathan Coveney commented on PIG-2975: --------------------------------------- Koji, That is super reasonable. I hate bugs like this, so let's kill it :) I'm responsible for trying to usher in a new pig-0.11 release, both internally and externally, which is why I'm so gung ho about it. Here is what I would say: 0. It'd be nice to have some tests focused on just this. 1. I was thinking that since compareBinInterSedesDatum has a handle on the ByteBuffer, instead of reading in the full byte[], we could just do the comparison via calls to .get(). ByteBuffered is buffered so I think that in the general case, this will be a win (but I could be wrong -- it'd be quick to implement and see). I put this before #2 because if we can bring the times in, it'd be nice to leverage the same code path. 2. We could just make a custom WritableComparator for this case. It would not be hard at all. We know the byte layout of how NullableBytesWritable is implemented, so we can just leverage that directly (right now it is going to be TUPLE_1 / {TINYBYTEARRAY, SMALLBYTEARRAY, BYTEARRAY} / SIZE/ and so on. Hadoop's BytesWritable is actually a key resource, we just need to tailor it to pig. It can just be switch based, and if it is an object other than a bytearray, we can default to another comparator. If you want it to be fast in all cases, you could copy the switch that BinInterSedesRawComparator uses, and go from there. I put this after #1 though because it seems lame to pull out all that logic, since in BignInterSedesRawComparator we are in fact making the decision to wrap it in a ByteBuffer, so if that is introducing a severe speed penalty, we need to be aware of that. And then go from there. Seem reasonable? > TestTypedMap.testOrderBy failing with incorrect result > ------------------------------------------------------- > > Key: PIG-2975 > URL: https://issues.apache.org/jira/browse/PIG-2975 > Project: Pig > Issue Type: Sub-task > Affects Versions: 0.11 > Reporter: Koji Noguchi > Assignee: Koji Noguchi > Priority: Blocker > Fix For: 0.11 > > Attachments: PIG-2975-0_jco.patch, pig-2975-trunk_v01.txt, > pig-2975-trunk_v02-broken.txt > > > Looked at > {noformat} > junit.framework.AssertionFailedError > at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352) > {noformat} > This looks like a valid test case failing with incorrect result. > {noformat} > % cat test/orderby.txt > [key#1,key9#23] > [key#3,key3#2] > [key#22] > % cat test/orderby.pig > a = load 'test/orderby.txt' as (m:[]); > b = foreach a generate m#'key' as b0; > dump b; > c = order b by b0; > dump c; > % java ... org.apache.pig.Main -x local test/orderby.pig > [dump b] > (1) > (3) > (22) > ... > [dump c] > (1) > (1) > (22) > % > where did the '(3)' go? > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira