[
https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480190#comment-13480190
]
Jonathan Coveney commented on PIG-2975:
---------------------------------------
Koji,
That is super reasonable. I hate bugs like this, so let's kill it :) I'm
responsible for trying to usher in a new pig-0.11 release, both internally and
externally, which is why I'm so gung ho about it.
Here is what I would say:
0. It'd be nice to have some tests focused on just this.
1. I was thinking that since compareBinInterSedesDatum has a handle on the
ByteBuffer, instead of reading in the full byte[], we could just do the
comparison via calls to .get(). ByteBuffered is buffered so I think that in the
general case, this will be a win (but I could be wrong -- it'd be quick to
implement and see). I put this before #2 because if we can bring the times in,
it'd be nice to leverage the same code path.
2. We could just make a custom WritableComparator for this case. It would not
be hard at all. We know the byte layout of how NullableBytesWritable is
implemented, so we can just leverage that directly (right now it is going to be
TUPLE_1 / {TINYBYTEARRAY, SMALLBYTEARRAY, BYTEARRAY} / SIZE/ and so on.
Hadoop's BytesWritable is actually a key resource, we just need to tailor it to
pig. It can just be switch based, and if it is an object other than a
bytearray, we can default to another comparator. If you want it to be fast in
all cases, you could copy the switch that BinInterSedesRawComparator uses, and
go from there. I put this after #1 though because it seems lame to pull out all
that logic, since in BignInterSedesRawComparator we are in fact making the
decision to wrap it in a ByteBuffer, so if that is introducing a severe speed
penalty, we need to be aware of that.
And then go from there. Seem reasonable?
> TestTypedMap.testOrderBy failing with incorrect result
> -------------------------------------------------------
>
> Key: PIG-2975
> URL: https://issues.apache.org/jira/browse/PIG-2975
> Project: Pig
> Issue Type: Sub-task
> Affects Versions: 0.11
> Reporter: Koji Noguchi
> Assignee: Koji Noguchi
> Priority: Blocker
> Fix For: 0.11
>
> Attachments: PIG-2975-0_jco.patch, pig-2975-trunk_v01.txt,
> pig-2975-trunk_v02-broken.txt
>
>
> Looked at
> {noformat}
> junit.framework.AssertionFailedError
> at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main -x local test/orderby.pig
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira