I'm using the machine running the namenode to run maps as well - could that be a source of my problem? The load is fairly high, essentially no idle time. 8 cores per machine, so I've got 8 maps running. I'm guessing I'd be better off running 80 smaller machines instead of 20 larger ones for the same price, but we haven't been approved for more than 20 instances yet. Given that I'm not seeing any idle time, I'm assuming that I'm CPU not IO-bound.
fwiw, I have not found the large or xlarge EC2 instances proportionally faster with Hadoop. Thus we run many small instances more cheaply.
btw, the email notifying you that you have been approved may lag the actual approval (mine did for days). might be worth trying a larger cluster to see.
Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http://www.cascading.org/