By the way, just for clarification, these queries are used for gathering 
performance data.

Zheng
From: Zheng Shao
Sent: Monday, June 22, 2009 10:37 PM
To: 'pig-dev@hadoop.apache.org'
Subject: asking for comments on benchmark queries

Hi Pig team,

We'd like to get your feedback on a set of queries we implemented on Pig.

We've attached the hadoop configuration and pig queries in the email. We start 
the queries by issuing "pig xxx.pig". The queries are from SIGMOD'2009 paper. 
More details are at https://issues.apache.org/jira/browse/HIVE-396 (Shall we 
open a JIRA on PIG for this?)


One improvement is that we are going to change hadoop to use LZO as 
intermediate compression algorithm very soon. Previously we used gzip for all 
performance tests including hadoop, hive and pig.

The reason that we specify the number of reducers in the query is to try to 
match the same number of reducer as Hive automatically suggested. Please let us 
know what is the best way to set the number of reducers in Pig.

Are there any other improvements we can make to the Pig query and the hadoop 
configuration?

Thanks,
Zheng

Reply via email to