Yes that's true, I have a process that runs and pulls 3 weblogs files
every hour from 10 servers...10*3*24=720 (not all hours have all the
files)
 

Lee Sagi | Data Warehouse Tech Lead & Architect | Work: 650-616-6575 |
Cell: 718-930-7947 

 

________________________________

From: Todd Lipcon [mailto:t...@cloudera.com] 
Sent: Thursday, December 17, 2009 4:24 PM
To: hive-user@hadoop.apache.org
Subject: Re: Throttling hive queries


Hi Sagi,

Any chance you're running on a directory that has 614 small files?

-Todd


On Thu, Dec 17, 2009 at 2:30 PM, Sagi, Lee <ls...@shopping.com> wrote:


        
Todd, Here is the job info.     
         

        

        Counter  Map     Reduce  Total  
File Systems     HDFS bytes read         199,115,508     0
199,115,508     
HDFS bytes written       0       9,665,472       9,665,472      
Local bytes read         0       321,210,205     321,210,205    
Local bytes written      204,404,812     321,210,205     525,615,017    
Job Counters     Launched reduce tasks   0       0       1      
Rack-local map tasks     0       0       614    
Launched map tasks       0       0       37,130 
Data-local map tasks     0       0       36,516 
org.apache.hadoop.hive.ql.exec.FilterOperator$Counter    PASSED  0
10,572   10,572 
FILTERED         0       217,305         217,305        
org.apache.hadoop.hive.ql.exec.MapOperator$Counter
DESERIALIZE_ERRORS       0       0       0      
Map-Reduce Framework     Reduce input groups     0       429,557
429,557 
Combine output records   0       0       0      
Map input records        429,557         0       429,557        
Reduce output records    0       0       0      
Map output bytes         201,425,848     0       201,425,848    
Map input bytes  199,115,508     0       199,115,508    
Map output records       429,557         0       429,557        
Combine input records    0       0       0      
Reduce input records     0       429,557         429,557        
         

        Lee Sagi | Data Warehouse Tech Lead & Architect | Work:
650-616-6575 | Cell: 718-930-7947 

         

________________________________

        From: Todd Lipcon [mailto:t...@cloudera.com] 
        Sent: Thursday, December 17, 2009 12:18 PM 

        To: hive-user@hadoop.apache.org
        Subject: Re: Throttling hive queries
        

        Hi Lee,
        
        The MapReduce framework in general makes it hard for you assign
fewer mappers than there are blocks in the input data, when using
FileInputFormat. Is your input set about 42GB with a 64M block size, or
84G with a 128M block size?
        
        -Todd
        
        
        On Thu, Dec 17, 2009 at 11:32 AM, Sagi, Lee <ls...@shopping.com>
wrote:
        

                Here is the query that I am running, just in case
someone has an idea of
                how to improve it.
                
                SELECT
                     CONCAT(CONCAT('"', PRSS.DATE_KEY), '"'),
                     CONCAT(CONCAT('"', PRSC.DATE_KEY), '"'),
                     CONCAT(CONCAT('"', PRSS.VOTF_REQUEST_ID), '"'),
                     CONCAT(CONCAT('"', PRSC.VOTF_REQUEST_ID), '"'),
                     CONCAT(CONCAT('"', PRSS.PRS_REQUEST_ID), '"'),
                     CONCAT(CONCAT('"', PRSC.PRS_REQUEST_ID), '"'),
                     ...
                     ...
                     ...
                 FROM
                     FCT_PRSS PRSS FULL OUTER JOIN FCT_PRSC PRSC ON
                (PRSS.PRS_REQUEST_ID = PRSC.PRS_REQUEST_ID)
                 WHERE (PRSS.date_key >= '2009121600' AND
                       PRSS.date_key < '2009121700') OR
                      (PRSC.date_key >= '2009121600' AND
                       PRSC.date_key < '2009121700')
                


                Lee Sagi | Data Warehouse Tech Lead & Architect | Work:
650-616-6575 |
                Cell: 718-930-7947
                
                -----Original Message-----
                From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
                Sent: Thursday, December 17, 2009 11:03 AM
                To: hive-user@hadoop.apache.org
                Subject: Re: Throttling hive queries
                
                
                You should be able
                
                hive > set mapred.map.tasks=1000
                hive > set mapred.reduce.tasks=5
                
                In some cases mappers is controlled by input files (pre
hadoop 20)
                
                
                On Thu, Dec 17, 2009 at 1:58 PM, Sagi, Lee
<ls...@shopping.com> wrote:
                > Is there a way to throttle hive queries?
                >
                > For example, I want to tell hive to not use more then
1000 mappers and
                
                > 5 reducers for a particular query (or session).
                >
                



Reply via email to