[jira] Commented: (HIVE-396) Hive performance benchmarks

Zheng Shao (JIRA) Fri, 19 Jun 2009 12:16:33 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721942#action_12721942
 ]


Zheng Shao commented on HIVE-396:
---------------------------------

Q: Why for the first query Hive program is faster than Hadoop app?
A: This is definitely possible in a lot of situations.
This particular case is mainly because Hive's implementation of LIKE is using 
Text, while the hadoop app's implementation was using String.find(). We used 
the hadoop code from the SIGMOD 2009 paper to allow us to have a consistent 
comparison.
While it's possible to improve the hadoop code in this particular case, there 
are cases that it's very hard to do the same optimization for each and every 
hadoop application. For example, the map-side join (HIVE-195) provides much 
better efficiency for joining a very small table with any other table, without 
using reducer. Another case is the object model in Hive is different from 
Hadoop - we reuse the same object across different rows. Details of this is in 
the org.apache.hadoop.hive.serde package.


> Hive performance benchmarks
> ---------------------------
>
>                 Key: HIVE-396
>                 URL: https://issues.apache.org/jira/browse/HIVE-396
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Zheng Shao
>         Attachments: hive_benchmark_2009-06-18.pdf, 
> hive_benchmark_2009-06-18.tar.gz
>
>
> We need some performance benchmark to measure and track the performance 
> improvements of Hive.
> Some references:
> PIG performance benchmarks PIG-200
> PigMix: http://wiki.apache.org/pig/PigMix

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-396) Hive performance benchmarks

Reply via email to