Todd, Here is the job info.
 

        Counter  Map     Reduce  Total  
File Systems     HDFS bytes read         199,115,508     0
199,115,508     
HDFS bytes written       0       9,665,472       9,665,472      
Local bytes read         0       321,210,205     321,210,205    
Local bytes written      204,404,812     321,210,205     525,615,017    
Job Counters     Launched reduce tasks   0       0       1      
Rack-local map tasks     0       0       614    
Launched map tasks       0       0       37,130 
Data-local map tasks     0       0       36,516 
org.apache.hadoop.hive.ql.exec.FilterOperator$Counter    PASSED  0
10,572   10,572 
FILTERED         0       217,305         217,305        
org.apache.hadoop.hive.ql.exec.MapOperator$Counter
DESERIALIZE_ERRORS       0       0       0      
Map-Reduce Framework     Reduce input groups     0       429,557
429,557 
Combine output records   0       0       0      
Map input records        429,557         0       429,557        
Reduce output records    0       0       0      
Map output bytes         201,425,848     0       201,425,848    
Map input bytes  199,115,508     0       199,115,508    
Map output records       429,557         0       429,557        
Combine input records    0       0       0      
Reduce input records     0       429,557         429,557        
 

Lee Sagi | Data Warehouse Tech Lead & Architect | Work: 650-616-6575 |
Cell: 718-930-7947 

 

________________________________

From: Todd Lipcon [mailto:t...@cloudera.com] 
Sent: Thursday, December 17, 2009 12:18 PM
To: hive-user@hadoop.apache.org
Subject: Re: Throttling hive queries


Hi Lee,

The MapReduce framework in general makes it hard for you assign fewer
mappers than there are blocks in the input data, when using
FileInputFormat. Is your input set about 42GB with a 64M block size, or
84G with a 128M block size?

-Todd


On Thu, Dec 17, 2009 at 11:32 AM, Sagi, Lee <ls...@shopping.com> wrote:


        Here is the query that I am running, just in case someone has an
idea of
        how to improve it.
        
        SELECT
             CONCAT(CONCAT('"', PRSS.DATE_KEY), '"'),
             CONCAT(CONCAT('"', PRSC.DATE_KEY), '"'),
             CONCAT(CONCAT('"', PRSS.VOTF_REQUEST_ID), '"'),
             CONCAT(CONCAT('"', PRSC.VOTF_REQUEST_ID), '"'),
             CONCAT(CONCAT('"', PRSS.PRS_REQUEST_ID), '"'),
             CONCAT(CONCAT('"', PRSC.PRS_REQUEST_ID), '"'),
             ...
             ...
             ...
         FROM
             FCT_PRSS PRSS FULL OUTER JOIN FCT_PRSC PRSC ON
        (PRSS.PRS_REQUEST_ID = PRSC.PRS_REQUEST_ID)
         WHERE (PRSS.date_key >= '2009121600' AND
               PRSS.date_key < '2009121700') OR
              (PRSC.date_key >= '2009121600' AND
               PRSC.date_key < '2009121700')
        


        Lee Sagi | Data Warehouse Tech Lead & Architect | Work:
650-616-6575 |
        Cell: 718-930-7947
        
        -----Original Message-----
        From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
        Sent: Thursday, December 17, 2009 11:03 AM
        To: hive-user@hadoop.apache.org
        Subject: Re: Throttling hive queries
        
        
        You should be able
        
        hive > set mapred.map.tasks=1000
        hive > set mapred.reduce.tasks=5
        
        In some cases mappers is controlled by input files (pre hadoop
20)
        
        
        On Thu, Dec 17, 2009 at 1:58 PM, Sagi, Lee <ls...@shopping.com>
wrote:
        > Is there a way to throttle hive queries?
        >
        > For example, I want to tell hive to not use more then 1000
mappers and
        
        > 5 reducers for a particular query (or session).
        >
        


Reply via email to