[ https://issues.apache.org/jira/browse/PIG-5288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123053#comment-16123053 ]
Adam Szita commented on PIG-5288: --------------------------------- [^PIG-5288-1.patch] looks good to me, +1 > Improve performance of PigTextRawBytesComparator > ------------------------------------------------ > > Key: PIG-5288 > URL: https://issues.apache.org/jira/browse/PIG-5288 > Project: Pig > Issue Type: Improvement > Reporter: Rohini Palaniswamy > Assignee: Rohini Palaniswamy > Fix For: 0.18.0 > > Attachments: PIG-5288-1.patch > > > Came across this stacktrace for a group by when investigating a different > performance issue. > {code} > "TezChild" #22 daemon prio=5 os_prio=0 tid=0x00007fa935495000 nid=0x7c3e > runnable [0x00007fa91d354000] > java.lang.Thread.State: RUNNABLE > at sun.nio.cs.UTF_8$Decoder.decodeLoop(UTF_8.java:412) > at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:579) > at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:802) > at org.apache.hadoop.io.Text.decode(Text.java:412) > at org.apache.hadoop.io.Text.decode(Text.java:389) > at org.apache.hadoop.io.Text.toString(Text.java:280) > at > org.apache.pig.impl.io.NullableText.getValueAsPigType(NullableText.java:46) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextRawComparator.compare(PigTextRawComparator.java:95) > at > org.apache.tez.runtime.library.common.ValuesIterator.readNextKey(ValuesIterator.java:188) > at > org.apache.tez.runtime.library.common.ValuesIterator.access$300(ValuesIterator.java:47) > at > org.apache.tez.runtime.library.common.ValuesIterator$1$1.next(ValuesIterator.java:143) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POShuffleTezLoad.getNextTuple(POShuffleTezLoad.java:218) > {code} > Conversion to String and comparing is a wastage (result of extending from > PigTextRawBytesComparator which is used in sorting). > PigCharArrayWritableComparator which is the equivalent used in mapreduce does > not. It directly compares it as a Text. -- This message was sent by Atlassian JIRA (v6.4.14#64029)