[
https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480103#comment-13480103
]
Koji Noguchi commented on PIG-2975:
-----------------------------------
bq. I don't think we need to sacrifice performance if we use
BinInterSedes.BinInterSedesRawComparator.
I tried sorting uncompressed texts with and without your patch and compared the
map time of one of the sort phase job. (1mapper 1 reducer)
Without the patch
1st 6mins, 24sec
2nd 6mins, 21sec
With the patch
1st 12mins, 41sec
2nd 12mins, 40sec
So there is a performance hit.
If you look at BinInterSedes.BinInterSedesRawComparator, it'll eventually comes
down to
{noformat}
759 private int compareBinInterSedesDatum(ByteBuffer bb1, ByteBuffer
bb2, boolean[] asc) throws IOException {
857 case BinInterSedes.TINYBYTEARRAY:
858 case BinInterSedes.SMALLBYTEARRAY:
859 case BinInterSedes.BYTEARRAY: {
860 type1 = DataType.BYTEARRAY;
861 type2 = getGeneralizedDataType(dt2);
862 if (type1 == type2) {
863 int basz1 = readSize(bb1, dt1);
864 int basz2 = readSize(bb2, dt2);
865 byte[] ba1 = new byte[basz1];
866 byte[] ba2 = new byte[basz2];
867 bb1.get(ba1);
868 bb2.get(ba2);
869 rc = DataByteArray.compare(ba1, ba2);
870 }
871 break;
{noformat}
Probably taking out this extra copies for bytearray comparisons would improve
the time. Trying.
(Separately, I'm trying out having a union-in-c like approach for DataByteArray
and Tuple.)
> TestTypedMap.testOrderBy failing with incorrect result
> -------------------------------------------------------
>
> Key: PIG-2975
> URL: https://issues.apache.org/jira/browse/PIG-2975
> Project: Pig
> Issue Type: Sub-task
> Affects Versions: 0.11
> Reporter: Koji Noguchi
> Assignee: Koji Noguchi
> Priority: Blocker
> Fix For: 0.11
>
> Attachments: PIG-2975-0_jco.patch, pig-2975-trunk_v01.txt,
> pig-2975-trunk_v02-broken.txt
>
>
> Looked at
> {noformat}
> junit.framework.AssertionFailedError
> at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main -x local test/orderby.pig
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira