On Nov 19, 2007, at 4:05 PM, Daniel Andrzejewski wrote:
I'd like to add that the name of the compute node is node10.local
not node10.local:1 as you can see in the error message.
So, possibly the PBS nodefile is coming out in a different format
than expected, thus causing trouble in the following loop:
hosts=\`cat \$PBS_NODEFILE\`;
counter=0
while test \$counter -lt $count; do
for host in \$hosts; do
if test \$counter -lt $count; then
$remote_shell \$host "/bin/sh $cmd_script_name" < $stdin &
counter=\`expr \$counter + 1\`
else
break
fi
done
done
That winds up in the submit file to PBS. You can add a line like:
system("cp $pbs_job_script_name /tmp/ws.gram.job");
right before the line reading:
chomp($job_id = `$qsub < $pbs_job_script_name $errfile`);
Then you can edit the /tmp/ws.gram.job file to see what fix is required.
Charles
The following is the piece of ${GLOBUS_LOCATION}/lib/perl/Globus/
GRAM/JobManager/pbs.pm file
----------------------
my ($mpirun, $mpiexec, $qsub, $qstat, $qdel, $cluster,
$cpu_per_node, $remote_shell);
BEGIN
{
$mpiexec = 'no';
$mpirun = '/usr/local/bin/mpirun';
$qsub = '/usr/local/bin/qsub';
$qstat = '/usr/local/bin/qstat';
$qdel = '/usr/local/bin/qdel';
$cluster = 1;
$cpu_per_node = 1;
$remote_shell = '/usr/local/bin/ssh';
$softenv_dir = '';
$soft_msc = "$softenv_dir/bin/soft-msc";
$softenv_load = "$softenv_dir/etc/softenv-load.sh";
}
----------------------
If I change $cluster to 0 I don't get any errors, but I don't get
as many resources as I request (in a job description file, e.g.
<count>10</count>)
Thank you,
--
Daniel
Charles Bacon wrote:
Your client sends its hostname to the container. Are you
submitting from a machine named node10.local? If so, you should
set GLOBUS_HOSTNAME to the publically visible name of your machine
instead.
If myhost.com is really node10.local, then you should set
GLOBUS_HOSTNAME in its environment to its publically visible name.
Charles
On Nov 19, 2007, at 2:57 PM, Daniel Andrzejewski wrote:
Hi all,
When I submit the following job I get no problems, but no output
either.
globusrun-ws -submit -F https://myhost.com:8443/wsrf/services/
ManagedJobFactoryService -Ft PBS -c /bin/ls
When I add -s option I get the following error:
ssh: node10.local:1: Name or service not known
/var/torque/mom_priv/jobs/179.myhost-head.SC: line 37: [: too
many arguments
I don't have any problems with ssh keys and I use Torque/Maui.
Thanks in advance.
--Daniel Andrzejewski
student IT Administrator
Electrical Engineering and Computer Science
University of Tennessee
(865) 974 - 4388 (work)
"Investment in knowledge always pays the best interest" Benjamin
Franklin
--