I'm trying to run a MapReduce task against a cluster of 4 DataNodes with 4 cores each. My input data is 4GB in size and it's split into 100MB files. Current configuration is default so block size is 64MB.
If I understand it correctly Hadoop should be running 64 Mappers to process the data. I'm running a simple data counting MapReduce and it's taking about 30mins to complete. This seems like way too much, doesn't it? Is there any tunning you guys would recommend to try and see an improvement in performance? Thanks, Pony