Re: HOD questions

2008-12-19 Thread Craig Macdonald

Hi Hemanth,

While HOD does not do this automatically, please note that since you 
are bringing up a Map/Reduce cluster on the allocated nodes, you can 
submit map/reduce parameters with which to bring up the cluster when 
allocating jobs. The relevant options are 
--gridservice-mapred.server-params (or -M in shorthand). Please 
refer to
http://hadoop.apache.org/core/docs/r0.19.0/hod_user_guide.html#Options+for+Configuring+Hadoop 
for details.
I was aware of this, but the issue is that unless you obtain 
dedicated nodes (as above), this option is not suitable, as it isn't 
set on a per-node basis. I think it would be /fairly/ straightfoward 
to add to HOD, as I detailed in my initial email, so that it does 
the correct thing out the box.
True, I did assume you obtained dedicated nodes. It has been fairly 
simpler to operate HOD in this manner, and if I understand correctly, 
would help to solve the requirement you are having as well.
I think it's a Maui change (or qos directive) to obtain dedicated nodes 
- I'm looking into it presently, but I'm not sure that the correct exact 
incantation is correct.

-W x=NACCESSPOLICY=SINGLETASK

For mixed job environments [e.g. universities] - where users have jobs 
which aren't HOD, often using single CPUs, it can mean that a job has 
more complicated requirements and will hence take longer to reach the 
head of the queue.


According to hadoop-default.xml, the number of maps is Typically set 
to a prime several times greater than number of available hosts. - 
Say that we relax this recommendation to read Typically set to a 
NUMBER several times greater than number of available hosts then it 
should be straightforward for HOD to set it automatically then?
Actually, AFAIK, the number of maps for a job is determined more or 
less exclusively by the M/R framework based on the number of splits. 
I've seen messages on this list before about how the documentation for 
this configuration item is misleading. So, this might actually not 
make a difference at all, whatever is specified.
The reason we were asking is that mapred.map.tasks is provided as the 
hint to the input split.
We were using this number to generate the number of maps. I think its 
just that FileInputFormat doesn't exactly honour the hint, from what I 
can see. Pig's InputFormat ignores the hint.




Craig


Re: HOD questions

2008-12-18 Thread Craig Macdonald

Hemanth,
snip
Just FYI, at Yahoo! we've set torque to allocate separate nodes for 
the number specified to HOD. In other words, the number corresponds to 
the number of nodes, not processors. This has proved simpler to 
manage. I forget right now, but I think you can make Torque behave 
like this (to not treat processors as individual nodes).
Thanks  - I think it's a Maui directive, either on the job level or 
globally. I'm looking into this currently.
However, on inspection of the Jobtracker UI, it tells us that node19 
has Max Map Tasks and Max Reduce Tasks both set to 2, when for 
node19, it should only be allowed one map task. 
While HOD does not do this automatically, please note that since you 
are bringing up a Map/Reduce cluster on the allocated nodes, you can 
submit map/reduce parameters with which to bring up the cluster when 
allocating jobs. The relevant options are 
--gridservice-mapred.server-params (or -M in shorthand). Please refer to
http://hadoop.apache.org/core/docs/r0.19.0/hod_user_guide.html#Options+for+Configuring+Hadoop 
for details.
I was aware of this, but the issue is that unless you obtain dedicated 
nodes (as above), this option is not suitable, as it isn't set on a 
per-node basis. I think it would be /fairly/ straightfoward to add to 
HOD, as I detailed in my initial email, so that it does the correct 
thing out the box.
(2) In our InputFormat, we use the numSplits to tell us how many map 
tasks the job's files should be split into. However, HOD does not 
override the mapred.map.tasks property (nor the mapred.reduce.tasks), 
while they should be set dependent on the number of available task 
trackers and/or nodes in the HOD session.
Can this not be submitted via the Hadoop job's configuration ? Again, 
HOD cannot do this automatically currently. But you could use the 
hod.client-params to set up a client side hadoop-site.xml that would 
work like this for all jobs submitted to the cluster.
According to hadoop-default.xml, the number of maps is Typically set to 
a prime several times greater than number of available hosts. - Say 
that we relax this recommendation to read Typically set to a NUMBER 
several times greater than number of available hosts then it should be 
straightforward for HOD to set it automatically then?


Craig


HOD questions

2008-12-17 Thread Craig Macdonald

Hello,

We have two HOD questions:

(1) For our current Torque PBS setup, the number of nodes requested by 
HOD (-l nodes=X) corresponds to the number of CPUs allocated, however 
these nodes can be spread across various partially or empty nodes. 
Unfortunately, HOD does not appear to honour the number of processors 
actually allocated by Torque PBS to that job.


For example, a current running HOD session can be viewed in qstat as:
104544.trmaster  user parallel HOD   4178 8  ----  288:0 R 01:48
  node29/2+node29/1+node29/0+node17/2+node17/1+node18/2+node18/1
  +node19/1

However, on inspection of the Jobtracker UI, it tells us that node19 has 
Max Map Tasks and Max Reduce Tasks both set to 2, when I think that 
for node19, it should only be allowed one map task.


I believe that for each node, HOD should determine (using the 
information in the $PBS_NODEFILE), how many CPUs for each node are 
allocated to the HOD job, and then set 
mapred.tasktracker.map.tasks.maximum appropriately on each node.


(2) In our InputFormat, we use the numSplits to tell us how many map 
tasks the job's files should be split into. However, HOD does not 
override the mapred.map.tasks property (nor the mapred.reduce.tasks), 
while they should be set dependent on the number of available task 
trackers and/or nodes in the HOD session.


Craig


Re: HOD questions

2008-12-17 Thread Hemanth Yamijala

Craig,

Hello,

We have two HOD questions:

(1) For our current Torque PBS setup, the number of nodes requested by 
HOD (-l nodes=X) corresponds to the number of CPUs allocated, however 
these nodes can be spread across various partially or empty nodes. 
Unfortunately, HOD does not appear to honour the number of processors 
actually allocated by Torque PBS to that job.


Just FYI, at Yahoo! we've set torque to allocate separate nodes for the 
number specified to HOD. In other words, the number corresponds to the 
number of nodes, not processors. This has proved simpler to manage. I 
forget right now, but I think you can make Torque behave like this (to 
not treat processors as individual nodes).

For example, a current running HOD session can be viewed in qstat as:
104544.trmaster  user parallel HOD   4178 8  ----  288:0 R 
01:48

  node29/2+node29/1+node29/0+node17/2+node17/1+node18/2+node18/1
  +node19/1

However, on inspection of the Jobtracker UI, it tells us that node19 
has Max Map Tasks and Max Reduce Tasks both set to 2, when I think 
that for node19, it should only be allowed one map task.


While HOD does not do this automatically, please note that since you are 
bringing up a Map/Reduce cluster on the allocated nodes, you can submit 
map/reduce parameters with which to bring up the cluster when allocating 
jobs. The relevant options are --gridservice-mapred.server-params (or -M 
in shorthand). Please refer to
http://hadoop.apache.org/core/docs/r0.19.0/hod_user_guide.html#Options+for+Configuring+Hadoop 
for details.


I believe that for each node, HOD should determine (using the 
information in the $PBS_NODEFILE), how many CPUs for each node are 
allocated to the HOD job, and then set 
mapred.tasktracker.map.tasks.maximum appropriately on each node.


(2) In our InputFormat, we use the numSplits to tell us how many map 
tasks the job's files should be split into. However, HOD does not 
override the mapred.map.tasks property (nor the mapred.reduce.tasks), 
while they should be set dependent on the number of available task 
trackers and/or nodes in the HOD session.


Can this not be submitted via the Hadoop job's configuration ? Again, 
HOD cannot do this automatically currently. But you could use the 
hod.client-params to set up a client side hadoop-site.xml that would 
work like this for all jobs submitted to the cluster.


Hope this helps some.

Thanks
Hemanth


Re: More HOD questions 0.16.0 - debug log enclosed - help with how to debug

2008-02-26 Thread Hemanth Yamijala

Jason Venner wrote:

My hadoop jobs don't start
This is configured to use an existing DFS and to unpack a tarball with 
a cut down 0.16.0 config
I have looked in the mom logs on the client machines and am not 
getting anything meaningful.


What is your hod command line ? Specifically, how did you provide the 
tarball option ?
Can you attach the log of the hod command, like you did the hodrc. There 
are some lines in the output that don't seem complete.
Set your debug option in the [ringmaster] section to 4, and rerun hod. 
Under the log-dir specified in the [ringmaster] section you will be able 
to see a log file corresponding to your jobid. Can you attach that too ? 
The ringmaster node is the first one allocated by torque for the job, 
that is, the mother superior for the job.
How is your tarball built ? Can you check that there's no hadoop-env.sh 
with pre-filled values in them. Look at HADOOP-2860.


Thanks
Hemanth