Hi Hemanth,
While HOD does not do this automatically, please note that since you
are bringing up a Map/Reduce cluster on the allocated nodes, you can
submit map/reduce parameters with which to bring up the cluster when
allocating jobs. The relevant options are
--gridservice-mapred.server-params (or -M in shorthand). Please
refer to
http://hadoop.apache.org/core/docs/r0.19.0/hod_user_guide.html#Options+for+Configuring+Hadoop
for details.
I was aware of this, but the issue is that unless you obtain
dedicated nodes (as above), this option is not suitable, as it isn't
set on a per-node basis. I think it would be /fairly/ straightfoward
to add to HOD, as I detailed in my initial email, so that it "does
the correct thing" out the box.
True, I did assume you obtained dedicated nodes. It has been fairly
simpler to operate HOD in this manner, and if I understand correctly,
would help to solve the requirement you are having as well.
I think it's a Maui change (or qos directive) to obtain dedicated nodes
- I'm looking into it presently, but I'm not sure that the correct exact
incantation is correct.
-W x="NACCESSPOLICY=SINGLETASK"
For mixed job environments [e.g. universities] - where users have jobs
which aren't HOD, often using single CPUs, it can mean that a job has
more complicated requirements and will hence take longer to reach the
head of the queue.
According to hadoop-default.xml, the number of maps is "Typically set
to a prime several times greater than number of available hosts." -
Say that we relax this recommendation to read "Typically set to a
NUMBER several times greater than number of available hosts" then it
should be straightforward for HOD to set it automatically then?
Actually, AFAIK, the number of maps for a job is determined more or
less exclusively by the M/R framework based on the number of splits.
I've seen messages on this list before about how the documentation for
this configuration item is misleading. So, this might actually not
make a difference at all, whatever is specified.
The reason we were asking is that mapred.map.tasks is provided as the
"hint" to the input split.
We were using this number to generate the number of maps. I think its
just that FileInputFormat doesn't exactly honour the hint, from what I
can see. Pig's InputFormat ignores the hint.
Craig