[ 
https://issues.apache.org/jira/browse/HIVE-266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-266:
----------------------------

    Attachment: HIVE-266.8.patch

Because of the last change (make it compatible with Java Primitive Class UDFs), 
we can calculate hashcode based on String (just as before) instead of Text now.
This reverted the changes to sample*.q.out. Here is the updated patch.


> Improve SerDe performance by using Text instead of String
> ---------------------------------------------------------
>
>                 Key: HIVE-266
>                 URL: https://issues.apache.org/jira/browse/HIVE-266
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>            Priority: Critical
>             Fix For: 0.4.0
>
>         Attachments: HIVE-266.1.patch, HIVE-266.2.patch, HIVE-266.3.patch, 
> HIVE-266.4.patch, HIVE-266.5.patch, HIVE-266.6.patch, HIVE-266.7.patch, 
> HIVE-266.8.patch
>
>
> A recent performance study showed that 2 places in Hive code has exhibited 
> large cpu usage percentage:
> 1. String.getBytes() (UTF-8 encoding)
> 2. String.split()
> We should replace String with Text object to:
> 1. Avoid UTF-8 decoding and encoding
> 2. Reuse the Text object and avoid creating new objects for each column in 
> each row like in String.split()
> This is expected to give a big (20%+) performance improvement to Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to