We are still building infrastructure to make performance optimizing easier, but for now, all the measurements are kind of manual. Especially to the component/operations level, we don't have a good tool to tell it yet.
What we are doing now, is to select some typical "benchmark" queries that cover some simple use cases. We have performance base number for it (we focus on CPU cycles since it is relatively stable) and then we run simple Java's profiler to see which components can be optimized, implement the improvement and run it against the same set of "benchmark" queries (on the same environment) and verify whather we see the improvements we expect happen. We try to isolate Hive's execution performance from factors by Hadoop. We do concern hadoop-cluster performance in the context of Hive queries and we optimize it separately. -----Original Message----- From: prafulla.tekaw...@gmail.com [mailto:prafulla.tekaw...@gmail.com] On Behalf Of Prafulla Tekawade Sent: Monday, November 01, 2010 11:06 PM To: hive-...@hadoop.apache.org Subject: Anyway in hive to measure query performance. Hi, I was wondering if there is anyway in hive that can be used to measure the performance of variour components/operations of a single query run. Eg. Typecally query involvs various operations like tablescan, joins, aggregation, orderby etc. Can I get how much time was required for each of this? Also how do you measure hadoop-cluster performance as far as hive query/load run is concerned ? -- Best Regards, Prafulla V Tekawade