I have an MR task which runs well with a single input file or an input directory with dozens of 50MB input files.
When the data is in a single input file of 1 GB of more the mapper never gets to 0%. There are not errors but when I look at the cluster, the CPUs are spending huge amounts of time in a wait state. The job runs when the input is 800MB and can complete even with a number of 500MB files as input. The cluster (0.02) has 8 nodes - 8 cpu per node. Block size is 64MB. Any bright ideas -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com