[ 
https://issues.apache.org/jira/browse/PIG-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4656:
------------------------------------
    Fix Version/s:     (was: 0.16.0)
                   0.17.0

> Improve String serialization and comparator performance in BinInterSedes
> ------------------------------------------------------------------------
>
>                 Key: PIG-4656
>                 URL: https://issues.apache.org/jira/browse/PIG-4656
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.17.0
>
>         Attachments: PIG-4656-1.patch
>
>
> Two major optimizations can be done:
>   -  PIG-1472 added multiple data types to store different sizes (byte, 
> short, int). It can be simplified using WritableUtils.writeVInt. There is no 
> difference for byte and short compared to current approach. But with int, it 
> could be beneficial where lot of numbers could be written with 3 bytes 
> instead of 4. For eg: 32768 is written using 3 bytes in with 
> WritableUtils.writeVInt whereas currently 4 bytes (int) is used. 
>   -  String comparison in BinInterSedesTupleRawComparator initializes String 
> for comparison. Should instead compare bytes like Text.Comparator.
> {code}
> str1 = new String(bb1.array(), bb1.position(), casz1, BinInterSedes.UTF8);
> str2 = new String(bb2.array(), bb2.position(), casz2, BinInterSedes.UTF8);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to