Hello,
We have two HOD questions:
(1) For our current Torque PBS setup, the number of nodes requested by
HOD (-l nodes=X) corresponds to the number of CPUs allocated, however
these nodes can be spread across various partially or empty nodes.
Unfortunately, HOD does not appear to honour the number of processors
actually allocated by Torque PBS to that job.
For example, a current running HOD session can be viewed in qstat as:
104544.trmaster user parallel HOD 4178 8 -- -- 288:0 R 01:48
node29/2+node29/1+node29/0+node17/2+node17/1+node18/2+node18/1
+node19/1
However, on inspection of the Jobtracker UI, it tells us that node19 has
"Max Map Tasks" and "Max Reduce Tasks" both set to 2, when I think that
for node19, it should only be allowed one map task.
I believe that for each node, HOD should determine (using the
information in the $PBS_NODEFILE), how many CPUs for each node are
allocated to the HOD job, and then set
mapred.tasktracker.map.tasks.maximum appropriately on each node.
(2) In our InputFormat, we use the numSplits to tell us how many map
tasks the job's files should be split into. However, HOD does not
override the mapred.map.tasks property (nor the mapred.reduce.tasks),
while they should be set dependent on the number of available task
trackers and/or nodes in the HOD session.
Craig