[ https://issues.apache.org/jira/browse/HIVE-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721942#action_12721942 ]
Zheng Shao commented on HIVE-396: --------------------------------- Q: Why for the first query Hive program is faster than Hadoop app? A: This is definitely possible in a lot of situations. This particular case is mainly because Hive's implementation of LIKE is using Text, while the hadoop app's implementation was using String.find(). We used the hadoop code from the SIGMOD 2009 paper to allow us to have a consistent comparison. While it's possible to improve the hadoop code in this particular case, there are cases that it's very hard to do the same optimization for each and every hadoop application. For example, the map-side join (HIVE-195) provides much better efficiency for joining a very small table with any other table, without using reducer. Another case is the object model in Hive is different from Hadoop - we reuse the same object across different rows. Details of this is in the org.apache.hadoop.hive.serde package. > Hive performance benchmarks > --------------------------- > > Key: HIVE-396 > URL: https://issues.apache.org/jira/browse/HIVE-396 > Project: Hadoop Hive > Issue Type: New Feature > Reporter: Zheng Shao > Attachments: hive_benchmark_2009-06-18.pdf, > hive_benchmark_2009-06-18.tar.gz > > > We need some performance benchmark to measure and track the performance > improvements of Hive. > Some references: > PIG performance benchmarks PIG-200 > PigMix: http://wiki.apache.org/pig/PigMix -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.