Aaron Kimball wrote:
(Note: this is a tasktracker setting, not a job setting. you'll need to set
this on every
node, then restart the mapreduce cluster to take effect.)
Ok. And here is my mistake. I set this to 16 only on the main node not
also on data nodes. Thanks a lot!!!!!!
Of course, you need to have enough RAM to make sure that all these tasks can
run concurrently without swapping.
No problem!
If your individual records require around a minute each to process as you
claimed earlier, you're
nowhere near in danger of hitting that particular performance bottleneck.
I was thinking that is I am under the recommended value of 64MB, Hadoop
cannot properly calculate the number of tasks.