[
https://issues.apache.org/jira/browse/PIG-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rohini Palaniswamy updated PIG-4656:
------------------------------------
Attachment: PIG-4656-1.patch
> Improve String serialization and comparator performance in BinInterSedes
> ------------------------------------------------------------------------
>
> Key: PIG-4656
> URL: https://issues.apache.org/jira/browse/PIG-4656
> Project: Pig
> Issue Type: Improvement
> Reporter: Rohini Palaniswamy
> Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
> Attachments: PIG-4656-1.patch
>
>
> Two major optimizations can be done:
> - PIG-1472 added multiple data types to store different sizes (byte,
> short, int). It can be simplified using WritableUtils.writeVInt. There is no
> difference for byte and short compared to current approach. But with int, it
> could be beneficial where lot of numbers could be written with 3 bytes
> instead of 4. For eg: 32768 is written using 3 bytes in with
> WritableUtils.writeVInt whereas currently 4 bytes (int) is used.
> - String comparison in BinInterSedesTupleRawComparator initializes String
> for comparison. Should instead compare bytes like Text.Comparator.
> {code}
> str1 = new String(bb1.array(), bb1.position(), casz1, BinInterSedes.UTF8);
> str2 = new String(bb2.array(), bb2.position(), casz2, BinInterSedes.UTF8);
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)