[ 
https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480190#comment-13480190
 ] 

Jonathan Coveney commented on PIG-2975:
---------------------------------------

Koji,

That is super reasonable. I hate bugs like this, so let's kill it :) I'm 
responsible for trying to usher in a new pig-0.11 release, both internally and 
externally, which is why I'm so gung ho about it.

Here is what I would say:

0. It'd be nice to have some tests focused on just this.

1. I was thinking that since compareBinInterSedesDatum has a handle on the 
ByteBuffer, instead of reading in the full byte[], we could just do the 
comparison via calls to .get(). ByteBuffered is buffered so I think that in the 
general case, this will be a win (but I could be wrong -- it'd be quick to 
implement and see). I put this before #2 because if we can bring the times in, 
it'd be nice to leverage the same code path.

2. We could just make a custom WritableComparator for this case. It would not 
be hard at all. We know the byte layout of how NullableBytesWritable is 
implemented, so we can just leverage that directly (right now it is going to be 
TUPLE_1 / {TINYBYTEARRAY, SMALLBYTEARRAY, BYTEARRAY} / SIZE/ and so on. 
Hadoop's BytesWritable is actually a key resource, we just need to tailor it to 
pig. It can just be switch based, and if it is an object other than a 
bytearray, we can default to another comparator. If you want it to be fast in 
all cases, you could copy the switch that BinInterSedesRawComparator uses, and 
go from there. I put this after #1 though because it seems lame to pull out all 
that logic, since in BignInterSedesRawComparator we are in fact making the 
decision to wrap it in a ByteBuffer, so if that is introducing a severe speed 
penalty, we need to be aware of that.


And then go from there. Seem reasonable?
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, pig-2975-trunk_v01.txt, 
> pig-2975-trunk_v02-broken.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to