[ 
https://issues.apache.org/jira/browse/HIVE-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728997#action_12728997
 ] 

Zheng Shao commented on HIVE-553:
---------------------------------

The plan is to have 2 SerDes:

A. LazyBinarySerDe: with properties 1, 2, 3
B. BinarySortableSerDe: with properties 2, 3, 4.

There is no way to accomodate 1 and 4 in the same serialization format.


> Add BinarySortableSerDe to Hive
> -------------------------------
>
>                 Key: HIVE-553
>                 URL: https://issues.apache.org/jira/browse/HIVE-553
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>         Attachments: HIVE-553.2.patch
>
>
> Currently the most popular SerDe in Hive is LazySimpleSerDe. LazySimpleSerDe 
> has the benefit of being simple (use text format to store data), but its 
> performance may suffer in the following cases:
> 1. For double values, we are storing them in text format which is very 
> space-inefficient, and both serialization and deserialization are slow;
> 2. For complex type of columns that contains a lot of levels, we are scanning 
> the buffer once per level, which is very inefficient.
> We should add a binary serde format that stores the data in binary format. 
> The format should have the following properties:
> 1. Compact: it should be space-efficient;
> 2. Fast: it should be efficiently to deserialize the data, especially for 
> double values and complex types.
> 3. It should support serializing NULL values.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to