[ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480103#comment-13480103 ]
Koji Noguchi commented on PIG-2975: ----------------------------------- bq. I don't think we need to sacrifice performance if we use BinInterSedes.BinInterSedesRawComparator. I tried sorting uncompressed texts with and without your patch and compared the map time of one of the sort phase job. (1mapper 1 reducer) Without the patch 1st 6mins, 24sec 2nd 6mins, 21sec With the patch 1st 12mins, 41sec 2nd 12mins, 40sec So there is a performance hit. If you look at BinInterSedes.BinInterSedesRawComparator, it'll eventually comes down to {noformat} 759 private int compareBinInterSedesDatum(ByteBuffer bb1, ByteBuffer bb2, boolean[] asc) throws IOException { 857 case BinInterSedes.TINYBYTEARRAY: 858 case BinInterSedes.SMALLBYTEARRAY: 859 case BinInterSedes.BYTEARRAY: { 860 type1 = DataType.BYTEARRAY; 861 type2 = getGeneralizedDataType(dt2); 862 if (type1 == type2) { 863 int basz1 = readSize(bb1, dt1); 864 int basz2 = readSize(bb2, dt2); 865 byte[] ba1 = new byte[basz1]; 866 byte[] ba2 = new byte[basz2]; 867 bb1.get(ba1); 868 bb2.get(ba2); 869 rc = DataByteArray.compare(ba1, ba2); 870 } 871 break; {noformat} Probably taking out this extra copies for bytearray comparisons would improve the time. Trying. (Separately, I'm trying out having a union-in-c like approach for DataByteArray and Tuple.) > TestTypedMap.testOrderBy failing with incorrect result > ------------------------------------------------------- > > Key: PIG-2975 > URL: https://issues.apache.org/jira/browse/PIG-2975 > Project: Pig > Issue Type: Sub-task > Affects Versions: 0.11 > Reporter: Koji Noguchi > Assignee: Koji Noguchi > Priority: Blocker > Fix For: 0.11 > > Attachments: PIG-2975-0_jco.patch, pig-2975-trunk_v01.txt, > pig-2975-trunk_v02-broken.txt > > > Looked at > {noformat} > junit.framework.AssertionFailedError > at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352) > {noformat} > This looks like a valid test case failing with incorrect result. > {noformat} > % cat test/orderby.txt > [key#1,key9#23] > [key#3,key3#2] > [key#22] > % cat test/orderby.pig > a = load 'test/orderby.txt' as (m:[]); > b = foreach a generate m#'key' as b0; > dump b; > c = order b by b0; > dump c; > % java ... org.apache.pig.Main -x local test/orderby.pig > [dump b] > (1) > (3) > (22) > ... > [dump c] > (1) > (1) > (22) > % > where did the '(3)' go? > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira