On Nov 19, 2007, at 4:05 PM, Daniel Andrzejewski wrote:

I'd like to add that the name of the compute node is node10.local not node10.local:1 as you can see in the error message.

So, possibly the PBS nodefile is coming out in a different format than expected, thus causing trouble in the following loop:
hosts=\`cat \$PBS_NODEFILE\`;
counter=0
while test \$counter -lt $count; do
    for host in \$hosts; do
        if test \$counter -lt $count; then
            $remote_shell \$host "/bin/sh $cmd_script_name" < $stdin &
            counter=\`expr \$counter + 1\`
        else
            break
        fi
    done
done


That winds up in the submit file to PBS.  You can add a line like:
system("cp $pbs_job_script_name /tmp/ws.gram.job");

right before the line reading:
chomp($job_id = `$qsub < $pbs_job_script_name $errfile`);

Then you can edit the /tmp/ws.gram.job file to see what fix is required.


Charles


The following is the piece of ${GLOBUS_LOCATION}/lib/perl/Globus/ GRAM/JobManager/pbs.pm file

----------------------
my ($mpirun, $mpiexec, $qsub, $qstat, $qdel, $cluster, $cpu_per_node, $remote_shell);

BEGIN
{
    $mpiexec        = 'no';
    $mpirun         = '/usr/local/bin/mpirun';
    $qsub           = '/usr/local/bin/qsub';
    $qstat          = '/usr/local/bin/qstat';
    $qdel           = '/usr/local/bin/qdel';
    $cluster        = 1;
    $cpu_per_node   = 1;
    $remote_shell   = '/usr/local/bin/ssh';
    $softenv_dir    = '';
    $soft_msc       = "$softenv_dir/bin/soft-msc";
    $softenv_load   = "$softenv_dir/etc/softenv-load.sh";

}
----------------------

If I change $cluster to 0 I don't get any errors, but I don't get as many resources as I request (in a job description file, e.g. <count>10</count>)

Thank you,

--
Daniel

Charles Bacon wrote:
Your client sends its hostname to the container. Are you submitting from a machine named node10.local? If so, you should set GLOBUS_HOSTNAME to the publically visible name of your machine instead. If myhost.com is really node10.local, then you should set GLOBUS_HOSTNAME in its environment to its publically visible name.
Charles
On Nov 19, 2007, at 2:57 PM, Daniel Andrzejewski wrote:
Hi all,

When I submit the following job I get no problems, but no output either.

globusrun-ws -submit -F https://myhost.com:8443/wsrf/services/ ManagedJobFactoryService -Ft PBS -c /bin/ls

When I add -s option I get the following error:

ssh: node10.local:1: Name or service not known
/var/torque/mom_priv/jobs/179.myhost-head.SC: line 37: [: too many arguments

I don't have any problems with ssh keys and I use Torque/Maui.

Thanks in advance.

--Daniel Andrzejewski
student IT Administrator
Electrical Engineering and Computer Science
University of Tennessee
(865) 974 - 4388 (work)

"Investment in knowledge always pays the best interest" Benjamin Franklin
--




Reply via email to