Hi,
First, it sounds like you have 2 6 core CPUs giving you 12 cores not 24.
Even though the OS reports 24 cores that's the hyper threading and not real
cores.
This becomes an issue with respect to tuning.
To answer your question ...
You have a single 1TB HD. That's going to be a major bottleneck in terms of
performance. You usually want to have at least 1 drive per core. With a 12
core box that's 12 spindles.
How large is your hadoop job's jar? This gets pushed around to all of the
nodes.
Bigger jars take longer to process and handle.
Having said that, the start up time isn't out of whack.
It depends on what job you're launching and what you are doing within the job.
Remember that the tasks have to report back to the JT.
Do you have Ganglia up and running?
You will probably see a high load on the CPUs and then a lot of Wait IOs.
HTH
-Mike
On Mar 20, 2012, at 5:40 AM, praveenesh kumar wrote:
I have 10 node cluster ( around 24 CPUs, 48 GB RAM, 1 TB HDD, 10 GB
ethernet connection)
After triggering any MR job, its taking like 3-5 seconds to launch ( I mean
the time when I can see any MR job completion % on the screen).
I know internally its trying to launch the job,intialize mappers, loading
data etc.
What I want to know - Is it a default/desired/expected hadoop behavior or
there are ways in which I can decrease this startup time ?
Also I feel like my hadoop jobs should run faster, but I am still not able
to make it as fast as it should be according to me ?
I did some tunning also, following are the parameters I am playing around
these days but still I feel there are something missing that I can still
use:
dfs.block.size:
mapred.compress.map.output
mapred.map/reduce.tasks.speculative.execution
mapred.tasktracker.map/reduce.tasks.maximum:
mapred.child.java.opts
io.sort.mb:
io.sort.factor:
mapred.reduce.parallel.copies:
mapred.job.reuse.jvm.num.tasks:
Thanks,
Praveenesh