[ https://issues.apache.org/jira/browse/PIG-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rohini Palaniswamy updated PIG-4656: ------------------------------------ Fix Version/s: (was: 0.16.0) 0.17.0 > Improve String serialization and comparator performance in BinInterSedes > ------------------------------------------------------------------------ > > Key: PIG-4656 > URL: https://issues.apache.org/jira/browse/PIG-4656 > Project: Pig > Issue Type: Improvement > Reporter: Rohini Palaniswamy > Assignee: Rohini Palaniswamy > Fix For: 0.17.0 > > Attachments: PIG-4656-1.patch > > > Two major optimizations can be done: > - PIG-1472 added multiple data types to store different sizes (byte, > short, int). It can be simplified using WritableUtils.writeVInt. There is no > difference for byte and short compared to current approach. But with int, it > could be beneficial where lot of numbers could be written with 3 bytes > instead of 4. For eg: 32768 is written using 3 bytes in with > WritableUtils.writeVInt whereas currently 4 bytes (int) is used. > - String comparison in BinInterSedesTupleRawComparator initializes String > for comparison. Should instead compare bytes like Text.Comparator. > {code} > str1 = new String(bb1.array(), bb1.position(), casz1, BinInterSedes.UTF8); > str2 = new String(bb2.array(), bb2.position(), casz2, BinInterSedes.UTF8); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)