What is your input data? Some types of files are not splittable, because of non-splittable compression codecs (like gzip). Could that be in your case?
Friso On 25 okt. 2011, at 21:43, Artem Yankov wrote: It looks like input data is not splited correctly. It always generates only one map task and gives it to one of the nodes. I tried to pass parameters like -D mapred.max.split.size but it doesn't seem to have any effect. So the question would be: how to specify the maximum amount of input records each mapper can receive? On Tue, Oct 25, 2011 at 10:56 AM, Artem Yankov <artem.yan...@gmail.com<mailto:artem.yan...@gmail.com>> wrote: Hey, I set up a hadoop cluster on EC2 using this documentation: http://wiki.apache.org/hadoop/AmazonEC2 OS: Linux Fedora 8 Hadoop version is 0.20.203.0 java version "1.7.0_01" heap size: 1Gb (stats always shows that it uses only 4% of this) I use mongo-hadoop plugin to get data from mongodb. Everything seems to work perfect with the small chunks of data: calculations are fast, I'm getting the results and tasks seem to be distributed normally among the slaves. Then I try to load a huge amount of data (22 Millions of records) and everything hangs. First slave receives a map task and other slaves are not. In logs I constantly see this: INFO org.apache.hadoop.hdfs.StateChange: *BLOCK* NameSystem.processReport: from x.x.x.x:50010, blocks: 2, processing time: 0 m I tried to use different number of slaves (maximum I ran 25 nodes), but it doesn't help cause it seems that when first slave receives a job it blocks everything else. (again, everything works cool with the small chunks of data). There are no significant CPU or Memory load on Master. Any ideas on what can be a reason of this? Artem.