[ 
https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480378#comment-13480378
 ] 

Jonathan Coveney commented on PIG-2975:
---------------------------------------

As a side note, Koji, if you make a new jira specifically about improve 
BinInterSedesRawComparator's handling of DataByteArray's I will review and 
commit it. And if you want to learn Pig, you could make another JIRA about 
improving the performance in general. IMHO BinInterSedes (and that whole code 
path that touches it) could probably be significantly improved.

W.r.t. to this issue, I think we should either directly compare the bytes 
(currently leaning towards this), or we can just have a special lightweight 
comparator that special cases DataByteArrays, and delegates to 
BinInterSedesRawComparator otherwise. We wouldn't need the complexity of the 
union approach, and we should get the correctness, speed, and stable bytearray 
sort order.

That said, IF we decide to preserve byte array sort order, I think we should 
make a decision now about whether or not we want to define that semantic. If 
not, then just directly comparing the bytes should be a-ok, since all that is 
important for bytearrays currently is that a global ordering exists, not what 
that global ordering is.
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, 
> pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, 
> pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to