We are still building infrastructure to make performance optimizing easier, but 
for now, all the measurements are kind of manual.
Especially to the component/operations level, we don't have a good tool to tell 
it yet.

What we are doing now, is to select some typical "benchmark" queries that cover 
some simple use cases. We have performance base number for it (we focus on CPU 
cycles since it is relatively stable) and then we run simple Java's profiler to 
see which components can be optimized, implement the improvement and run it 
against the same set of "benchmark" queries (on the same environment) and 
verify whather we see the improvements we expect happen.

We try to isolate Hive's execution performance from factors by Hadoop. We do 
concern hadoop-cluster performance in the context of Hive queries and we 
optimize it separately.

-----Original Message-----
From: prafulla.tekaw...@gmail.com [mailto:prafulla.tekaw...@gmail.com] On 
Behalf Of Prafulla Tekawade
Sent: Monday, November 01, 2010 11:06 PM
To: hive-...@hadoop.apache.org
Subject: Anyway in hive to measure query performance.

Hi,
I was wondering if there is anyway in hive that can be used to
measure the performance of variour components/operations of a
single query run.
Eg.
Typecally query involvs various operations like tablescan, joins,
aggregation, orderby etc. Can I get how much time was required for
each of this?

Also how do you measure hadoop-cluster performance as far as hive
query/load run is concerned ?

-- 
Best Regards,
Prafulla V Tekawade

Reply via email to