What is the use case for this? Especially since you have 0 reducers. Thanks, Amogh
-----Original Message----- From: Saptarshi Guha [mailto:saptarshi.g...@gmail.com] Sent: Friday, July 31, 2009 12:08 PM To: core-u...@hadoop.apache.org Subject: Re: Running 145K maps, zero reduces- does Hadoop scale? In this particular example, the record reader emits a single number per split as both key and value. Regards S On Fri, Jul 31, 2009 at 1:55 AM, Saptarshi Guha <saptarshi.g...@gmail.com>wrote: > Hello, > Does Hadoop scale well for 100K+ input splits? > I have not tried with sequence files. My custom inputformat, generates 145K > splits. > The record reader emits about 15 bytes as key and 8 bytes as value. > It doesn't do anything else, in fact it doesn't read from disk (basically > it emits splitbeginning ... splitend for every split,) > So essentially, my inputformat is creating 145K InputSplit objects.(see > below) > > However I got this > 09/07/31 01:41:41 INFO mapred.JobClient: Running job: job_200907251335_0005 > 09/07/31 01:41:42 INFO mapred.JobClient: map 0% reduce 0% > 09/07/31 01:43:06 INFO mapred.JobClient: Job complete: > job_200907251335_0005 > And the job does not end! Hangs here. > > Very strange. The jobtracker does not respond to web requests. > This is on Hadoop 0.20 though am using 0.19.1. api. > The master is 64 bit with 4 cores and 16GB ram and not running any > tasktrackers. > > Any pointers would be appreciated > > Regards > Saptarshi > > > //Basically FileInputSplit reworded > public InputSplit[] getSplits(JobConf job, int numSplits) throws > IOException { > long n = the_length_of_something ; //==145K > long chunkSize = n / (numSplits == 0 ? 1 : numSplits); > InputSplit[] splits = new InputSplit[numSplits]; > for (int i = 0; i < numSplits; i++) { > MyInputSplit split; > if ((i + 1) == numSplits) > split = new MySplit(i * chunkSize, n); > else > split = new MySplit(i * chunkSize, (i * chunkSize) + chunkSize); > splits[i] = split; > } > return splits; > } > > >