[ https://issues.apache.org/jira/browse/HIVE-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742169#action_12742169 ]
Aaron Kimball commented on HIVE-600: ------------------------------------ Yuntao, Thanks. I took a look through this file and have some questions: 1) {{mapred.reduce.tasks}} isn't set in hadoop-site.xml, nor do any of the scripts explicitly set it. This means it's left at the default value of '1'. Necessary for anything with an {{ORDER BY}} clause, but slows down anything else (you could set this to 40 on your cluster for any situations where you don't need total ordering). Could some of these queries get refactored to make use of multiple reducers in the middle? 2) Your writeup says that you've got 4 hdds per machine, but {{dfs.data.dir}} and {{mapred.local.dir}} both just reference a single path each. Are you doing something unusual in your filesystem to get this to spread across all 4 disks? Or could three of them be unused by this? Thank you - Aaron > Running TPC-H queries on Hive > ----------------------------- > > Key: HIVE-600 > URL: https://issues.apache.org/jira/browse/HIVE-600 > Project: Hadoop Hive > Issue Type: New Feature > Reporter: Yuntao Jia > Assignee: Yuntao Jia > Attachments: TPC-H_on_Hive_2009-08-11.pdf, > TPC-H_on_Hive_2009-08-11.tar.gz > > > The goal is to run all TPC-H (http://www.tpc.org/tpch/) benchmark queries on > Hive for two reasons. First, through those queries, we would like to find the > new features that we need to put into Hive so that Hive supports common SQL > queries. Second, we would like to measure the performance of Hive to find out > what Hive is not good at. We can then improve Hive based on those > information. > For queries that are not supported now in Hive, I will try to rewrite them to > one or more Hive-supported queries. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.