Todd, Here is the job info.
Counter Map Reduce Total File Systems HDFS bytes read 199,115,508 0 199,115,508 HDFS bytes written 0 9,665,472 9,665,472 Local bytes read 0 321,210,205 321,210,205 Local bytes written 204,404,812 321,210,205 525,615,017 Job Counters Launched reduce tasks 0 0 1 Rack-local map tasks 0 0 614 Launched map tasks 0 0 37,130 Data-local map tasks 0 0 36,516 org.apache.hadoop.hive.ql.exec.FilterOperator$Counter PASSED 0 10,572 10,572 FILTERED 0 217,305 217,305 org.apache.hadoop.hive.ql.exec.MapOperator$Counter DESERIALIZE_ERRORS 0 0 0 Map-Reduce Framework Reduce input groups 0 429,557 429,557 Combine output records 0 0 0 Map input records 429,557 0 429,557 Reduce output records 0 0 0 Map output bytes 201,425,848 0 201,425,848 Map input bytes 199,115,508 0 199,115,508 Map output records 429,557 0 429,557 Combine input records 0 0 0 Reduce input records 0 429,557 429,557 Lee Sagi | Data Warehouse Tech Lead & Architect | Work: 650-616-6575 | Cell: 718-930-7947 ________________________________ From: Todd Lipcon [mailto:t...@cloudera.com] Sent: Thursday, December 17, 2009 12:18 PM To: hive-user@hadoop.apache.org Subject: Re: Throttling hive queries Hi Lee, The MapReduce framework in general makes it hard for you assign fewer mappers than there are blocks in the input data, when using FileInputFormat. Is your input set about 42GB with a 64M block size, or 84G with a 128M block size? -Todd On Thu, Dec 17, 2009 at 11:32 AM, Sagi, Lee <ls...@shopping.com> wrote: Here is the query that I am running, just in case someone has an idea of how to improve it. SELECT CONCAT(CONCAT('"', PRSS.DATE_KEY), '"'), CONCAT(CONCAT('"', PRSC.DATE_KEY), '"'), CONCAT(CONCAT('"', PRSS.VOTF_REQUEST_ID), '"'), CONCAT(CONCAT('"', PRSC.VOTF_REQUEST_ID), '"'), CONCAT(CONCAT('"', PRSS.PRS_REQUEST_ID), '"'), CONCAT(CONCAT('"', PRSC.PRS_REQUEST_ID), '"'), ... ... ... FROM FCT_PRSS PRSS FULL OUTER JOIN FCT_PRSC PRSC ON (PRSS.PRS_REQUEST_ID = PRSC.PRS_REQUEST_ID) WHERE (PRSS.date_key >= '2009121600' AND PRSS.date_key < '2009121700') OR (PRSC.date_key >= '2009121600' AND PRSC.date_key < '2009121700') Lee Sagi | Data Warehouse Tech Lead & Architect | Work: 650-616-6575 | Cell: 718-930-7947 -----Original Message----- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Thursday, December 17, 2009 11:03 AM To: hive-user@hadoop.apache.org Subject: Re: Throttling hive queries You should be able hive > set mapred.map.tasks=1000 hive > set mapred.reduce.tasks=5 In some cases mappers is controlled by input files (pre hadoop 20) On Thu, Dec 17, 2009 at 1:58 PM, Sagi, Lee <ls...@shopping.com> wrote: > Is there a way to throttle hive queries? > > For example, I want to tell hive to not use more then 1000 mappers and > 5 reducers for a particular query (or session). >