when done, HADOOP-3387 would allow you to do that. In our implementation we can tell Hadoop the exact # maps and it will group splits if necessary.
On Fri, Aug 1, 2008 at 5:25 AM, Andreas Kostyrka <[EMAIL PROTECTED]> wrote: > Well, the only way to reliably fix the number of maptasks that I've found is > by using compressed input files, that forces hadoop to assign one and only > one file to a map task ;) > > Andreas > > On Thursday 31 July 2008 21:30:33 Gopal Gandhi wrote: >> Thank you, finally someone has interests in my questions =) >> My cluster contains more than one machine. Please don't get me wrong :-). I >> don't want to limit the total mappers in one node (by mapred.map.tasks). >> What I want is to limit the total mappers for one job. The motivation is >> that I have 2 jobs to run at the same time. they have "the same input data >> in Hadoop". I found that one job has to wait until the other finishes its >> mapping. Because the 2 jobs are submitted by 2 different people, I don't >> want one job to be starving. So I want to limit the first job's total >> mappers so that the 2 jobs will be launched simultaneously. >> >> >> >> ----- Original Message ---- >> From: "Goel, Ankur" <[EMAIL PROTECTED]> >> To: core-user@hadoop.apache.org >> Cc: [EMAIL PROTECTED] >> Sent: Wednesday, July 30, 2008 10:17:53 PM >> Subject: RE: How can I control Number of Mappers of a job? >> >> How big is your cluster? Assuming you are running a single node cluster, >> >> Hadoop-default.xml has a parameter 'mapred.map.tasks' that is set to 2. >> So >> By default, no matter how many map tasks are calculated by framework, >> only 2 map task will execute on a single node cluster. >> >> -----Original Message----- >> From: Gopal Gandhi [mailto:[EMAIL PROTECTED] >> Sent: Thursday, July 31, 2008 4:38 AM >> To: core-user@hadoop.apache.org >> Cc: [EMAIL PROTECTED] >> Subject: How can I control Number of Mappers of a job? >> >> The motivation is to control the max # of mappers of a job. For example, >> the input data is 246MB, divided by 64M is 4. If by default there will >> be 4 mappers launched on the 4 blocks. >> What I want is to set its max # of mappers as 2, so that 2 mappers are >> launched first and when they completes on the first 2 blocks, another 2 >> mappers start on the rest 2 blocks. Does Hadoop provide a way? > > >