[ https://issues.apache.org/jira/browse/HIVE-708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747755#action_12747755 ]
Zheng Shao commented on HIVE-708: --------------------------------- The new test failed somehow. I had to revert the new test for now. > Add TypedBytes SerDe for transform > ---------------------------------- > > Key: HIVE-708 > URL: https://issues.apache.org/jira/browse/HIVE-708 > Project: Hadoop Hive > Issue Type: New Feature > Components: Serializers/Deserializers > Reporter: Namit Jain > Assignee: Namit Jain > Fix For: 0.5.0 > > Attachments: hive.708.1.patch, hive.708.2.patch, hive.708.3.patch, > hive.708.4.patch > > > Currently, LazySimpleSerDe is used to send data to the user transformation > functions - it would be useful to let the user specify the format of the data. > Specifically, it would be very easy and useful to accommodate: > (cut and paste from Venky's mail) > Here's the typedbytes stuff that Dumbo uses. > http://issues.apache.org/jira/browse/HADOOP-1722 > From: > http://static.last.fm/johan/huguk-20090414/klaas-hadoop-1722.pdf > Timings for IP count program on 300gigs of weblogs: > Java: 8 minutes > Dumbo with typed bytes: 10 minutes > Hive: 13 minutes > Dumbo without typed bytes: 16 minutes > They also have a fast python decoder for this, which is apparently 25% faster > than the python version. > http://github.com/klbostee/ctypedbytes/tree/master > http://dumbotics.com/2009/05/31/dumbo-on-clouderas-distribution/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.