Hi Hemanth,

While HOD does not do this automatically, please note that since you are bringing up a Map/Reduce cluster on the allocated nodes, you can submit map/reduce parameters with which to bring up the cluster when allocating jobs. The relevant options are --gridservice-mapred.server-params (or -M in shorthand). Please refer to http://hadoop.apache.org/core/docs/r0.19.0/hod_user_guide.html#Options+for+Configuring+Hadoop for details.
I was aware of this, but the issue is that unless you obtain dedicated nodes (as above), this option is not suitable, as it isn't set on a per-node basis. I think it would be /fairly/ straightfoward to add to HOD, as I detailed in my initial email, so that it "does the correct thing" out the box.
True, I did assume you obtained dedicated nodes. It has been fairly simpler to operate HOD in this manner, and if I understand correctly, would help to solve the requirement you are having as well.
I think it's a Maui change (or qos directive) to obtain dedicated nodes - I'm looking into it presently, but I'm not sure that the correct exact incantation is correct.
-W x="NACCESSPOLICY=SINGLETASK"

For mixed job environments [e.g. universities] - where users have jobs which aren't HOD, often using single CPUs, it can mean that a job has more complicated requirements and will hence take longer to reach the head of the queue.

According to hadoop-default.xml, the number of maps is "Typically set to a prime several times greater than number of available hosts." - Say that we relax this recommendation to read "Typically set to a NUMBER several times greater than number of available hosts" then it should be straightforward for HOD to set it automatically then?
Actually, AFAIK, the number of maps for a job is determined more or less exclusively by the M/R framework based on the number of splits. I've seen messages on this list before about how the documentation for this configuration item is misleading. So, this might actually not make a difference at all, whatever is specified.
The reason we were asking is that mapred.map.tasks is provided as the "hint" to the input split. We were using this number to generate the number of maps. I think its just that FileInputFormat doesn't exactly honour the hint, from what I can see. Pig's InputFormat ignores the hint.



Craig

Reply via email to