[jira] Commented: (HIVE-600) Running TPC-H queries on Hive

Aaron Kimball (JIRA) Tue, 11 Aug 2009 18:22:45 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742169#action_12742169
 ]


Aaron Kimball commented on HIVE-600:
------------------------------------

Yuntao,

Thanks. I took a look through this file and have some questions:

1) {{mapred.reduce.tasks}} isn't set in hadoop-site.xml, nor do any of the 
scripts explicitly set it. This means it's left at the default value of '1'. 
Necessary for anything with an {{ORDER BY}} clause, but slows down anything 
else (you could set this to 40 on your cluster for any situations where you 
don't need total ordering). Could some of these queries get refactored to make 
use of multiple reducers in the middle? 

2) Your writeup says that you've got 4 hdds per machine, but  {{dfs.data.dir}} 
and {{mapred.local.dir}} both just reference a single path each. Are you doing 
something unusual in your filesystem to get this to spread across all 4 disks? 
Or could three of them be unused by this?

Thank you
- Aaron

> Running TPC-H queries on Hive
> -----------------------------
>
>                 Key: HIVE-600
>                 URL: https://issues.apache.org/jira/browse/HIVE-600
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Yuntao Jia
>            Assignee: Yuntao Jia
>         Attachments: TPC-H_on_Hive_2009-08-11.pdf, 
> TPC-H_on_Hive_2009-08-11.tar.gz
>
>
> The goal is to run all TPC-H (http://www.tpc.org/tpch/) benchmark queries on 
> Hive for two reasons. First, through those queries, we would like to find the 
> new features that we need to put into Hive so that Hive supports common SQL 
> queries. Second, we would like to measure the performance of Hive to find out 
> what Hive is not good at. We can then improve Hive based on those 
> information. 
> For queries that are not supported now in Hive, I will try to rewrite them to 
> one or more Hive-supported queries. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-600) Running TPC-H queries on Hive

Reply via email to