Hemanth,
<snip>
Just FYI, at Yahoo! we've set torque to allocate separate nodes for the number specified to HOD. In other words, the number corresponds to the number of nodes, not processors. This has proved simpler to manage. I forget right now, but I think you can make Torque behave like this (to not treat processors as individual nodes).
Thanks - I think it's a Maui directive, either on the job level or globally. I'm looking into this currently.
However, on inspection of the Jobtracker UI, it tells us that node19 has "Max Map Tasks" and "Max Reduce Tasks" both set to 2, when for node19, it should only be allowed one map task.
While HOD does not do this automatically, please note that since you are bringing up a Map/Reduce cluster on the allocated nodes, you can submit map/reduce parameters with which to bring up the cluster when allocating jobs. The relevant options are --gridservice-mapred.server-params (or -M in shorthand). Please refer to http://hadoop.apache.org/core/docs/r0.19.0/hod_user_guide.html#Options+for+Configuring+Hadoop for details.
I was aware of this, but the issue is that unless you obtain dedicated nodes (as above), this option is not suitable, as it isn't set on a per-node basis. I think it would be /fairly/ straightfoward to add to HOD, as I detailed in my initial email, so that it "does the correct thing" out the box.
(2) In our InputFormat, we use the numSplits to tell us how many map tasks the job's files should be split into. However, HOD does not override the mapred.map.tasks property (nor the mapred.reduce.tasks), while they should be set dependent on the number of available task trackers and/or nodes in the HOD session.
Can this not be submitted via the Hadoop job's configuration ? Again, HOD cannot do this automatically currently. But you could use the hod.client-params to set up a client side hadoop-site.xml that would work like this for all jobs submitted to the cluster.
According to hadoop-default.xml, the number of maps is "Typically set to a prime several times greater than number of available hosts." - Say that we relax this recommendation to read "Typically set to a NUMBER several times greater than number of available hosts" then it should be straightforward for HOD to set it automatically then?

Craig

Reply via email to