Re: MRBench Maps strange behaviour

2012-08-29 Thread Bejoy KS
Hi Gaurav

You can get the information on the num of map tasks in the job from the JT web 
UI itself.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-Original Message-
From: Gaurav Dasgupta gdsay...@gmail.com
Date: Wed, 29 Aug 2012 13:14:11 
To: user@hadoop.apache.org
Reply-To: user@hadoop.apache.org
Subject: Re: MRBench Maps strange behaviour

Hi Hemanth,

Thanks for the reply.
Can you tell me how can I calculate or ensure from the counters what should
be the exact number of Maps?
Thanks,
Gaurav Dasgupta
On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala yhema...@gmail.comwrote:

 Hi,

 The number of maps specified to any map reduce program (including
 those part of MRBench) is generally only a hint, and the actual number
 of maps will be influenced in typical cases by the amount of data
 being processed. You can take a look at this wiki link to understand
 more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces

 In the examples below, since the data you've generated is different,
 the number of mappers are different. To be able to judge your
 benchmark results, you'd need to benchmark against the same data (or
 at least same type of type - i.e. size and type).

 The number of maps printed at the end is straight from the input
 specified and doesn't reflect what the job actually ran with. The
 information from the counters is the right one.

 Thanks
 Hemanth

 On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta gdsay...@gmail.com
 wrote:
  Hi All,
 
  I executed the MRBench program from hadoop-test.jar in my 12 node
 CDH3
  cluster. After executing, I had some strange observations regarding the
  number of Maps it ran.
 
  First I ran the command:
  hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps
 200
  -reduces 200 -inputLines 1024 -inputType random
  And I could see that the actual number of Maps it ran was 201 (for all
 the 3
  runs) instead of 200 (Though the end report displays the launched to be
  200). Here is the console report:
 
 
  12/08/28 04:34:35 INFO mapred.JobClient: Job complete:
 job_201208230144_0035
 
  12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
 
  12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
 
  12/08/28 04:34:35 INFO mapred.JobClient: Launched reduce tasks=200
 
  12/08/28 04:34:35 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=617209
 
  12/08/28 04:34:35 INFO mapred.JobClient: Total time spent by all
 reduces
  waiting after reserving slots (ms)=0
 
  12/08/28 04:34:35 INFO mapred.JobClient: Total time spent by all maps
  waiting after reserving slots (ms)=0
 
  12/08/28 04:34:35 INFO mapred.JobClient: Rack-local map tasks=137
 
  12/08/28 04:34:35 INFO mapred.JobClient: Launched map tasks=201
 
  12/08/28 04:34:35 INFO mapred.JobClient: Data-local map tasks=64
 
  12/08/28 04:34:35 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=1756882
 
 
 
  Again, I ran the MRBench for just 10 Maps and 10 Reduces:
 
  hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10
 -reduces 10
 
 
 
  This time the actual number of Maps were only 2 and again the end report
  displays Maps Lauched to be 10. The console output:
 
 
 
  12/08/28 05:05:35 INFO mapred.JobClient: Job complete:
 job_201208230144_0040
  12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
  12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
  12/08/28 05:05:35 INFO mapred.JobClient: Launched reduce tasks=20
  12/08/28 05:05:35 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=6648
  12/08/28 05:05:35 INFO mapred.JobClient: Total time spent by all
 reduces
  waiting after reserving slots (ms)=0
  12/08/28 05:05:35 INFO mapred.JobClient: Total time spent by all maps
  waiting after reserving slots (ms)=0
  12/08/28 05:05:35 INFO mapred.JobClient: Launched map tasks=2
  12/08/28 05:05:35 INFO mapred.JobClient: Data-local map tasks=2
  12/08/28 05:05:35 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=163257
  12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
  12/08/28 05:05:35 INFO mapred.JobClient: FILE_BYTES_READ=407
  12/08/28 05:05:35 INFO mapred.JobClient: HDFS_BYTES_READ=258
  12/08/28 05:05:35 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1072596
  12/08/28 05:05:35 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=3
  12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
  12/08/28 05:05:35 INFO mapred.JobClient: Map input records=1
  12/08/28 05:05:35 INFO mapred.JobClient: Reduce shuffle bytes=647
  12/08/28 05:05:35 INFO mapred.JobClient: Spilled Records=2
  12/08/28 05:05:35 INFO mapred.JobClient: Map output bytes=5
  12/08/28 05:05:35 INFO mapred.JobClient: CPU time spent (ms)=17070
  12/08/28 05:05:35 INFO mapred.JobClient: Total committed heap usage
  (bytes)=6218842112
  12/08/28 05:05:35 INFO mapred.JobClient: Map input bytes=2
  12/08/28 05:05:35 INFO mapred.JobClient: Combine input records=0
  12/08/28 05:05:35 INFO mapred.JobClient

Re: MRBench Maps strange behaviour

2012-08-29 Thread praveenesh kumar
Then the question arises how MRBench is using the parameters :
According to the mail he send... he is running MRBench with the following
parameter:
*
hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10 -reduces 10
*

I guess he is assuming the MRbench to launch 10 mappers and 10 reducers.
But he is getting some different results which are visible in the counters
and we can use all our map and input-split logics to justify the counter
outputs.

The question arises here -- how can we use MRBench -- what it provides you
? How can we control it to run with different parameters to do some
benchmarking ? Can someone explain how to use MRBench and what it exactly
does.

Regards,
Praveenesh

On Wed, Aug 29, 2012 at 3:31 AM, Hemanth Yamijala yhema...@gmail.comwrote:

 Assume you are asking about what is the exact number of maps launched.
 If yes, then the output of the MRBench run is printing the counter
 Launched map tasks. That is the exact value of maps launched.

 Thanks
 Hemanth

 On Wed, Aug 29, 2012 at 1:14 PM, Gaurav Dasgupta gdsay...@gmail.com
 wrote:
  Hi Hemanth,
 
  Thanks for the reply.
  Can you tell me how can I calculate or ensure from the counters what
 should
  be the exact number of Maps?
  Thanks,
  Gaurav Dasgupta
  On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala yhema...@gmail.com
  wrote:
 
  Hi,
 
  The number of maps specified to any map reduce program (including
  those part of MRBench) is generally only a hint, and the actual number
  of maps will be influenced in typical cases by the amount of data
  being processed. You can take a look at this wiki link to understand
  more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
 
  In the examples below, since the data you've generated is different,
  the number of mappers are different. To be able to judge your
  benchmark results, you'd need to benchmark against the same data (or
  at least same type of type - i.e. size and type).
 
  The number of maps printed at the end is straight from the input
  specified and doesn't reflect what the job actually ran with. The
  information from the counters is the right one.
 
  Thanks
  Hemanth
 
  On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta gdsay...@gmail.com
  wrote:
   Hi All,
  
   I executed the MRBench program from hadoop-test.jar in my 12 node
   CDH3
   cluster. After executing, I had some strange observations regarding
 the
   number of Maps it ran.
  
   First I ran the command:
   hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3
 -maps
   200
   -reduces 200 -inputLines 1024 -inputType random
   And I could see that the actual number of Maps it ran was 201 (for all
   the 3
   runs) instead of 200 (Though the end report displays the launched to
 be
   200). Here is the console report:
  
  
   12/08/28 04:34:35 INFO mapred.JobClient: Job complete:
   job_201208230144_0035
  
   12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
  
   12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
  
   12/08/28 04:34:35 INFO mapred.JobClient: Launched reduce tasks=200
  
   12/08/28 04:34:35 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=617209
  
   12/08/28 04:34:35 INFO mapred.JobClient: Total time spent by all
   reduces
   waiting after reserving slots (ms)=0
  
   12/08/28 04:34:35 INFO mapred.JobClient: Total time spent by all
   maps
   waiting after reserving slots (ms)=0
  
   12/08/28 04:34:35 INFO mapred.JobClient: Rack-local map tasks=137
  
   12/08/28 04:34:35 INFO mapred.JobClient: Launched map tasks=201
  
   12/08/28 04:34:35 INFO mapred.JobClient: Data-local map tasks=64
  
   12/08/28 04:34:35 INFO mapred.JobClient:
   SLOTS_MILLIS_REDUCES=1756882
  
  
  
   Again, I ran the MRBench for just 10 Maps and 10 Reduces:
  
   hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10
   -reduces 10
  
  
  
   This time the actual number of Maps were only 2 and again the end
 report
   displays Maps Lauched to be 10. The console output:
  
  
  
   12/08/28 05:05:35 INFO mapred.JobClient: Job complete:
   job_201208230144_0040
   12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
   12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
   12/08/28 05:05:35 INFO mapred.JobClient: Launched reduce tasks=20
   12/08/28 05:05:35 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=6648
   12/08/28 05:05:35 INFO mapred.JobClient: Total time spent by all
   reduces
   waiting after reserving slots (ms)=0
   12/08/28 05:05:35 INFO mapred.JobClient: Total time spent by all
   maps
   waiting after reserving slots (ms)=0
   12/08/28 05:05:35 INFO mapred.JobClient: Launched map tasks=2
   12/08/28 05:05:35 INFO mapred.JobClient: Data-local map tasks=2
   12/08/28 05:05:35 INFO mapred.JobClient:
 SLOTS_MILLIS_REDUCES=163257
   12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
   12/08/28 05:05:35 INFO mapred.JobClient: FILE_BYTES_READ=407
   12/08/28 05:05:35 INFO mapred.JobClient: 

Re: MRBench Maps strange behaviour

2012-08-28 Thread Hemanth Yamijala
Hi,

The number of maps specified to any map reduce program (including
those part of MRBench) is generally only a hint, and the actual number
of maps will be influenced in typical cases by the amount of data
being processed. You can take a look at this wiki link to understand
more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces

In the examples below, since the data you've generated is different,
the number of mappers are different. To be able to judge your
benchmark results, you'd need to benchmark against the same data (or
at least same type of type - i.e. size and type).

The number of maps printed at the end is straight from the input
specified and doesn't reflect what the job actually ran with. The
information from the counters is the right one.

Thanks
Hemanth

On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta gdsay...@gmail.com wrote:
 Hi All,

 I executed the MRBench program from hadoop-test.jar in my 12 node CDH3
 cluster. After executing, I had some strange observations regarding the
 number of Maps it ran.

 First I ran the command:
 hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps 200
 -reduces 200 -inputLines 1024 -inputType random
 And I could see that the actual number of Maps it ran was 201 (for all the 3
 runs) instead of 200 (Though the end report displays the launched to be
 200). Here is the console report:


 12/08/28 04:34:35 INFO mapred.JobClient: Job complete: job_201208230144_0035

 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28

 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters

 12/08/28 04:34:35 INFO mapred.JobClient: Launched reduce tasks=200

 12/08/28 04:34:35 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=617209

 12/08/28 04:34:35 INFO mapred.JobClient: Total time spent by all reduces
 waiting after reserving slots (ms)=0

 12/08/28 04:34:35 INFO mapred.JobClient: Total time spent by all maps
 waiting after reserving slots (ms)=0

 12/08/28 04:34:35 INFO mapred.JobClient: Rack-local map tasks=137

 12/08/28 04:34:35 INFO mapred.JobClient: Launched map tasks=201

 12/08/28 04:34:35 INFO mapred.JobClient: Data-local map tasks=64

 12/08/28 04:34:35 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=1756882



 Again, I ran the MRBench for just 10 Maps and 10 Reduces:

 hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10 -reduces 10



 This time the actual number of Maps were only 2 and again the end report
 displays Maps Lauched to be 10. The console output:



 12/08/28 05:05:35 INFO mapred.JobClient: Job complete: job_201208230144_0040
 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
 12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
 12/08/28 05:05:35 INFO mapred.JobClient: Launched reduce tasks=20
 12/08/28 05:05:35 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=6648
 12/08/28 05:05:35 INFO mapred.JobClient: Total time spent by all reduces
 waiting after reserving slots (ms)=0
 12/08/28 05:05:35 INFO mapred.JobClient: Total time spent by all maps
 waiting after reserving slots (ms)=0
 12/08/28 05:05:35 INFO mapred.JobClient: Launched map tasks=2
 12/08/28 05:05:35 INFO mapred.JobClient: Data-local map tasks=2
 12/08/28 05:05:35 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=163257
 12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
 12/08/28 05:05:35 INFO mapred.JobClient: FILE_BYTES_READ=407
 12/08/28 05:05:35 INFO mapred.JobClient: HDFS_BYTES_READ=258
 12/08/28 05:05:35 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1072596
 12/08/28 05:05:35 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=3
 12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
 12/08/28 05:05:35 INFO mapred.JobClient: Map input records=1
 12/08/28 05:05:35 INFO mapred.JobClient: Reduce shuffle bytes=647
 12/08/28 05:05:35 INFO mapred.JobClient: Spilled Records=2
 12/08/28 05:05:35 INFO mapred.JobClient: Map output bytes=5
 12/08/28 05:05:35 INFO mapred.JobClient: CPU time spent (ms)=17070
 12/08/28 05:05:35 INFO mapred.JobClient: Total committed heap usage
 (bytes)=6218842112
 12/08/28 05:05:35 INFO mapred.JobClient: Map input bytes=2
 12/08/28 05:05:35 INFO mapred.JobClient: Combine input records=0
 12/08/28 05:05:35 INFO mapred.JobClient: SPLIT_RAW_BYTES=254
 12/08/28 05:05:35 INFO mapred.JobClient: Reduce input records=1
 12/08/28 05:05:35 INFO mapred.JobClient: Reduce input groups=1
 12/08/28 05:05:35 INFO mapred.JobClient: Combine output records=0
 12/08/28 05:05:35 INFO mapred.JobClient: Physical memory (bytes)
 snapshot=3348828160
 12/08/28 05:05:35 INFO mapred.JobClient: Reduce output records=1
 12/08/28 05:05:35 INFO mapred.JobClient: Virtual memory (bytes)
 snapshot=22955810816
 12/08/28 05:05:35 INFO mapred.JobClient: Map output records=1
 DataLines Maps Reduces AvgTime (milliseconds)
 120 20   17451

 Can some one please help me understand this behaviour of Hadoop in