[ 
https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480103#comment-13480103
 ] 

Koji Noguchi commented on PIG-2975:
-----------------------------------

bq. I don't think we need to sacrifice performance if we use 
BinInterSedes.BinInterSedesRawComparator.

I tried sorting uncompressed texts with and without your patch and compared the 
map time of one of the sort phase job. (1mapper 1 reducer)

Without the patch
1st 6mins, 24sec
2nd 6mins, 21sec

With the patch
1st 12mins, 41sec
2nd 12mins, 40sec

So there is a performance hit. 

If you look at BinInterSedes.BinInterSedesRawComparator, it'll eventually comes 
down to 
{noformat}
 759         private int compareBinInterSedesDatum(ByteBuffer bb1, ByteBuffer 
bb2, boolean[] asc) throws IOException {
 857             case BinInterSedes.TINYBYTEARRAY:
 858             case BinInterSedes.SMALLBYTEARRAY:
 859             case BinInterSedes.BYTEARRAY: {
 860                 type1 = DataType.BYTEARRAY;
 861                 type2 = getGeneralizedDataType(dt2);
 862                 if (type1 == type2) {
 863                     int basz1 = readSize(bb1, dt1);
 864                     int basz2 = readSize(bb2, dt2);
 865                     byte[] ba1 = new byte[basz1];
 866                     byte[] ba2 = new byte[basz2];
 867                     bb1.get(ba1);
 868                     bb2.get(ba2);
 869                     rc = DataByteArray.compare(ba1, ba2);
 870                 }
 871                 break;
{noformat}

Probably taking out this extra copies for bytearray comparisons would improve 
the time.  Trying.   

(Separately, I'm trying out having a union-in-c like approach for DataByteArray 
and Tuple.)

                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, pig-2975-trunk_v01.txt, 
> pig-2975-trunk_v02-broken.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to