[ https://issues.apache.org/jira/browse/HIVE-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728997#action_12728997 ]
Zheng Shao commented on HIVE-553: --------------------------------- The plan is to have 2 SerDes: A. LazyBinarySerDe: with properties 1, 2, 3 B. BinarySortableSerDe: with properties 2, 3, 4. There is no way to accomodate 1 and 4 in the same serialization format. > Add BinarySortableSerDe to Hive > ------------------------------- > > Key: HIVE-553 > URL: https://issues.apache.org/jira/browse/HIVE-553 > Project: Hadoop Hive > Issue Type: New Feature > Affects Versions: 0.4.0 > Reporter: Zheng Shao > Assignee: Zheng Shao > Attachments: HIVE-553.2.patch > > > Currently the most popular SerDe in Hive is LazySimpleSerDe. LazySimpleSerDe > has the benefit of being simple (use text format to store data), but its > performance may suffer in the following cases: > 1. For double values, we are storing them in text format which is very > space-inefficient, and both serialization and deserialization are slow; > 2. For complex type of columns that contains a lot of levels, we are scanning > the buffer once per level, which is very inefficient. > We should add a binary serde format that stores the data in binary format. > The format should have the following properties: > 1. Compact: it should be space-efficient; > 2. Fast: it should be efficiently to deserialize the data, especially for > double values and complex types. > 3. It should support serializing NULL values. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.