Re: [OMPI users] Running mpirun with grid

2020-06-01 Thread Ralph Castain via users
Afraid I don't have much to offer. I suspect the problem is here:

> Unmatched ".
> Unmatched ".
> Unmatched ".

Something may be eating a quote, or mistakenly adding one, to the cmd line. You 
might try upping the verbosity: --mca plm_base_verbose 5



> On May 31, 2020, at 2:49 PM, Kulshrestha, Vipul 
>  wrote:
> 
> Hi Ralph,
> 
> Thanks for your response.
> 
> I added the option "--mca plm_rsh_no_tree_spawn 1" to mpirun command line, 
> but I get a similar error. (pasted below).
> 
> Regards,
> Vipul
> 
> Got 14 slots.
> tmpdir is /tmp/194954128.1.all.q
> pe_hostfile is /var/spool/sge/has2/active_jobs/194954128.1/pe_hostfile
> has2.org.com 2 al...@has2.org.com 
> has6.org.com 2 al...@has6.org.com 
> cod4.org.com 2 al...@cod4.org.com 
> cod6.org.com 2 al...@cod6.org.com 
> cod5.org.com 2 al...@cod5.org.com 
> hpb12.org.com 2 al...@hpb12.org.com 
> has4.org.com 2 al...@has4.org.com 
> [has2:08703] [[24953,0],0] plm:rsh: using "/grid2/sge/bin/lx-amd64/qrsh 
> -inherit -nostdin -V -verbose" for launching
> [has2:08703] [[24953,0],0] plm:rsh: final template argv:
>/grid2/sge/bin/lx-amd64/qrsh -inherit -nostdin -V -verbose   
>set path = ( /build/openmpi/openmpi-4.0.1/rhel6/bin $path ) ; if ( 
> $?LD_LIBRARY_PATH == 1 ) set OMPI_have_llp ; if ( $?LD_LIBRARY_PATH == 0 ) 
> setenv LD_LIBRARY_PATH /build/openmpi/openmpi-4.0.1/rhel6/lib ; if ( 
> $?OMPI_have_llp == 1 ) setenv LD_LIBRARY_PATH 
> /build/openmpi/openmpi-4.0.1/rhel6/lib:$LD_LIBRARY_PATH ; if ( 
> $?DYLD_LIBRARY_PATH == 1 ) set OMPI_have_dllp ; if ( $?DYLD_LIBRARY_PATH == 0 
> ) setenv DYLD_LIBRARY_PATH /build/openmpi/openmpi-4.0.1/rhel6/lib ; if ( 
> $?OMPI_have_dllp == 1 ) setenv DYLD_LIBRARY_PATH 
> /build/openmpi/openmpi-4.0.1/rhel6/lib:$DYLD_LIBRARY_PATH ;   
> /build/openmpi/openmpi-4.0.1/rhel6/bin/orted -mca ess "env" -mca 
> ess_base_jobid "1635319808" -mca ess_base_vpid "" -mca 
> ess_base_num_procs "7" -mca orte_node_regex 
> "has[1:2,6],cod[1:4,6,5],hpb[2:12],has[1:4]@0(7)" -mca orte_hnp_uri 
> "1635319808.0;tcp://139.181.79.58:57879" --mca routed "direct" --mca 
> orte_base_help_aggregate "0" --mca plm_base_verbose "1" --mca 
> plm_rsh_no_tree_spawn "1" -mca plm "rsh" -mca orte_output_filename 
> "./veloce.log/velsyn/dvelsyn:nojobid,nocopy" -mca hwloc_base_binding_policy 
> "none" -mca pmix "^s1,s2,cray,isolated"
> Starting server daemon at host "cod5"Starting server daemon at host 
> "cod6"Starting server daemon at host "has4"Starting server daemon at host "co
> d4"
> 
> 
> 
> Starting server daemon at host "hpb12"Starting server daemon at host "has6"
> 
> Server daemon successfully started with task id "1.cod4"
> Server daemon successfully started with task id "1.cod5"
> Server daemon successfully started with task id "1.cod6"
> Server daemon successfully started with task id "1.has6"
> Server daemon successfully started with task id "1.hpb12"
> Server daemon successfully started with task id "1.has4"
> Unmatched ".
> Unmatched ".
> Unmatched ".
> --
> ORTE was unable to reliably start one or more daemons.
> This usually is caused by:
> 
> * not finding the required libraries and/or binaries on
>  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
>  settings, or configure OMPI with --enable-orterun-prefix-by-default
> 
> * lack of authority to execute on one or more specified nodes.
>  Please verify your allocation and authorities.
> 
> * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
>  Please check with your sys admin to determine the correct location to use.
> 
> *  compilation of the orted with dynamic libraries when static are required
>  (e.g., on Cray). Please check your configure cmd line and consider using
>  one of the contrib/platform definitions for your system type.
> 
> * an inability to create a connection back to mpirun due to a
>  lack of common network interfaces and/or no route found between
>  them. Please check network connectivity (including firewalls
>  and network routing requirements).
> --
> --
> 
> 
> 
> 
> 
> 
> -Original Message-
> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Ralph 
> Castain via users
> Sent: Sunday, May 31, 2020 10:50 AM
> To: Open MPI Users 
> Cc: Ralph Castain 
> Subject: Re: [OMPI users] Running mpirun with grid
> 
> The messages about the daemons is coming from two different sources. Grid is 
> saying it was able to spawn the orted - then the orted is saying it doesn't 
> know how to communicate and fails.
> 
> I think the root of the problem lies in the plm output that shows the qrsh it 
> will use to start the job. For some reason, mpirun is still trying to "tree 
> spawn", which (IIRC) isn't allowed on grid (all the daemons have to be 
> launched in one shot by mpirun using qrsh

Re: [OMPI users] Running mpirun with grid

2020-06-01 Thread Jeff Squyres (jsquyres) via users
On top of what Ralph said, I think that this output is unexpected:

> Starting server daemon at host "cod5"Starting server daemon at host 
> "cod6"Starting server daemon at host "has4"Starting server daemon at host "co
> d4"
> 
> 
> 
> Starting server daemon at host "hpb12"Starting server daemon at host "has6"
> 
> Server daemon successfully started with task id "1.cod4"
> Server daemon successfully started with task id "1.cod5"
> Server daemon successfully started with task id "1.cod6"
> Server daemon successfully started with task id "1.has6"
> Server daemon successfully started with task id "1.hpb12"
> Server daemon successfully started with task id "1.has4"

I don't think that's coming from Open MPI.

My guess is that something is apparently trying to parse (or run?) that output, 
and it's getting confused because that output is unexpected, and then you get 
these errors:

> Unmatched ".
> Unmatched ".
> Unmatched ".

And the Open MPI helper daemon doesn't actually start.  Therefore you get this 
error:

> --
> ORTE was unable to reliably start one or more daemons.
> This usually is caused by:
...etc.

-- 
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI users] Running mpirun with grid

2020-06-01 Thread Kulshrestha, Vipul via users
Thank Jeff & Ralph for your responses.

I tried changing the verbose level to 5 using the option suggested by Ralph, 
but there was no difference in the output (so no additional information in the 
output).

I also tried to replace the grid submission script to a command line qsub job 
submission, but got the same issue. Removing the use of job submission script, 
the qsub command looks like below. This uses mpirun option "--N 1" to ensure 
that only 1 process is launched by mpirun on one host.

Do you have some suggestion on how I can go about investigating the root cause 
of the problem I am facing? I am able to run mpirun successfully, if I specify 
the same set of hosts (as allocated by grid) using mpirun host file. I have 
also pasted the verbose output with host file and the orted command looks very 
similar to the one generated for grid submission (except that it uses 
/usr/bin/ssh instead of /grid2/sge/bin/lx-amd64/qrsh.

Thanks,
Vipul


qsub -N velsyn -pe orte2 10 -V -b y -cwd -j y -o $cwd/a -l "os=redhat6.7*" -q 
all /build/openmpi/openmpi-4.0.1/rhel6/bin/mpirun --N 1  -x 
LD_LIBRARY_PATH=/build/openmpi/openmpi-4.0.1/rhel6/lib -x PATH=$PATH 
--merge-stderr-to-stdout --output-filename 
./veloce.log/velsyn/dvelsyn:nojobid,nocopy -np 5 --mca orte_base_help_aggregate 
0 --mca plm_base_verbose 5 --mca plm_rsh_no_tree_spawn 1 


$ /build/openmpi/openmpi-4.0.1/rhel6/bin/mpirun --hostfile host.txt -x 
VMW_HOME=$VMW_HOME -x VMW_BIN=$VMW_BIN -x 
LD_LIBRARY_PATH=/build/openmpi/openmpi-4.0.1/rhel6/lib -x PATH=$PATH 
--merge-stderr-to-stdout --output-filename 
./veloce.log/velsyn/dvelsyn:nojobid,nocopy -np 5 --mca orte_base_help_aggregate 
0 --mca plm_base_verbose 5 --mca plm_rsh_no_tree_spawn 1 
  
[sox3:24416] [[26562,0],0] plm:rsh: final template argv:
/usr/bin/ssh  set path = ( 
/build/openmpi/openmpi-4.0.1/rhel6/bin $path ) ; if ( $?LD_LIBRARY_PATH == 1 ) 
set OMPI_have_llp ; if ( $?LD_LIBRARY_PATH == 0 ) setenv LD_LIBRARY_PATH 
/build/openmpi/openmpi-4.0.1/rhel6/lib ; if ( $?OMPI_have_llp == 1 ) setenv 
LD_LIBRARY_PATH /build/openmpi/openmpi-4.0.1/rhel6/lib:$LD_LIBRARY_PATH ; if ( 
$?DYLD_LIBRARY_PATH == 1 ) set OMPI_have_dllp ; if ( $?DYLD_LIBRARY_PATH == 0 ) 
setenv DYLD_LIBRARY_PATH /build/openmpi/openmpi-4.0.1/rhel6/lib ; if ( 
$?OMPI_have_dllp == 1 ) setenv DYLD_LIBRARY_PATH 
/build/openmpi/openmpi-4.0.1/rhel6/lib:$DYLD_LIBRARY_PATH ;   
/build/openmpi/openmpi-4.0.1/rhel6/bin/orted -mca ess "env" -mca ess_base_jobid 
"1740767232" -mca ess_base_vpid "" -mca ess_base_num_procs "6" -mca 
orte_node_regex "sox[1:3],bos[1:3],bos[2:15],bos[1:9],bos[2:12],bos[1:7]@0(6)" 
-mca orte_hnp_uri "1740767232.0;tcp://147.34.216.21:54496" --mca 
orte_base_help_aggregate "0" --mca plm_base_verbose "5" --mca 
plm_rsh_no_tree_spawn "1" -mca plm "rsh" -mca orte_output_filename 
"./veloce.log/velsyn/dvelsyn:nojobid,nocopy" -mca pmix "^s1,s2,cray,isolated"   
   
[sox3:24416] [[26562,0],0] complete_setup on job [26562,1]
[sox3:24416] [[26562,0],0] plm:base:receive update proc state command from 
[[26562,0],5]
[sox3:24416] [[26562,0],0] plm:base:receive got update_proc_state for job 
[26562,1]
[sox3:24416] [[26562,0],0] plm:base:receive update proc state command from 
[[26562,0],4]
[sox3:24416] [[26562,0],0] plm:base:receive got update_proc_state for job 
[26562,1]
[sox3:24416] [[26562,0],0] plm:base:receive update proc state command from 
[[26562,0],1]
[sox3:24416] [[26562,0],0] plm:base:receive got update_proc_state for job 
[26562,1]
[sox3:24416] [[26562,0],0] plm:base:receive update proc state command from 
[[26562,0],2]
[sox3:24416] [[26562,0],0] plm:base:receive got update_proc_state for job 
[26562,1]
[sox3:24416] [[26562,0],0] plm:base:receive update proc state command from 
[[26562,0],3]
[sox3:24416] [[26562,0],0] plm:base:receive got update_proc_state for job 
[26562,1]

-Original Message-
From: Jeff Squyres (jsquyres) [mailto:jsquy...@cisco.com] 
Sent: Monday, June 1, 2020 4:15 PM
To: Open MPI User's List 
Cc: Kulshrestha, Vipul 
Subject: Re: [OMPI users] Running mpirun with grid

On top of what Ralph said, I think that this output is unexpected:

> Starting server daemon at host "cod5"Starting server daemon at host 
> "cod6"Starting server daemon at host "has4"Starting server daemon at host "co 
> d4"
> 
> 
> 
> Starting server daemon at host "hpb12"Starting server daemon at host "has6"
> 
> Server daemon successfully started with task id "1.cod4"
> Server daemon successfully started with task id "1.cod5"
> Server daemon successfully started with task id "1.cod6"
> Server daemon successfully started with task id "1.has6"
> Server daemon successfully started with task id "1.hpb12"
> Server daemon successfully started with task id "1.has4"

I don't think that's coming from Open MPI.

My guess is that something is apparently trying to parse (or run?) that output, 
and it's getting confused because that output is unexpected, and then you get 
these

Re: [OMPI users] Running mpirun with grid

2020-06-01 Thread Ralph Castain via users
Afraid I have no real ideas here. Best I can suggest is taking the qrsh cmd 
line from the prior debug output and try running it manually. This might give 
you a chance to manipulate it and see if you can identify what is causing it an 
issue, if anything. Without mpirun executing, the daemons will bark about being 
unable to connect back, so you might need to use some other test program for 
this purpose.

I agree with Jeff - you should check to see where these messages are coming 
from:


>> Server daemon successfully started with task id "1.cod4"
>> Server daemon successfully started with task id "1.cod5"
>> Server daemon successfully started with task id "1.cod6"
>> Server daemon successfully started with task id "1.has6"
>> Server daemon successfully started with task id "1.hpb12"
>> Server daemon successfully started with task id "1.has4"
> 
>> Unmatched ".
>> Unmatched ".
>> Unmatched ".
> 


Could be a clue as to what is actually happening.


> On Jun 1, 2020, at 1:57 PM, Kulshrestha, Vipul via users 
>  wrote:
> 
> Thank Jeff & Ralph for your responses.
> 
> I tried changing the verbose level to 5 using the option suggested by Ralph, 
> but there was no difference in the output (so no additional information in 
> the output).
> 
> I also tried to replace the grid submission script to a command line qsub job 
> submission, but got the same issue. Removing the use of job submission 
> script, the qsub command looks like below. This uses mpirun option "--N 1" to 
> ensure that only 1 process is launched by mpirun on one host.
> 
> Do you have some suggestion on how I can go about investigating the root 
> cause of the problem I am facing? I am able to run mpirun successfully, if I 
> specify the same set of hosts (as allocated by grid) using mpirun host file. 
> I have also pasted the verbose output with host file and the orted command 
> looks very similar to the one generated for grid submission (except that it 
> uses /usr/bin/ssh instead of /grid2/sge/bin/lx-amd64/qrsh.
> 
> Thanks,
> Vipul
> 
> 
> qsub -N velsyn -pe orte2 10 -V -b y -cwd -j y -o $cwd/a -l "os=redhat6.7*" -q 
> all /build/openmpi/openmpi-4.0.1/rhel6/bin/mpirun --N 1  -x 
> LD_LIBRARY_PATH=/build/openmpi/openmpi-4.0.1/rhel6/lib -x PATH=$PATH 
> --merge-stderr-to-stdout --output-filename 
> ./veloce.log/velsyn/dvelsyn:nojobid,nocopy -np 5 --mca 
> orte_base_help_aggregate 0 --mca plm_base_verbose 5 --mca 
> plm_rsh_no_tree_spawn 1 
> 
> 
> $ /build/openmpi/openmpi-4.0.1/rhel6/bin/mpirun --hostfile host.txt -x 
> VMW_HOME=$VMW_HOME -x VMW_BIN=$VMW_BIN -x 
> LD_LIBRARY_PATH=/build/openmpi/openmpi-4.0.1/rhel6/lib -x PATH=$PATH 
> --merge-stderr-to-stdout --output-filename 
> ./veloce.log/velsyn/dvelsyn:nojobid,nocopy -np 5 --mca 
> orte_base_help_aggregate 0 --mca plm_base_verbose 5 --mca 
> plm_rsh_no_tree_spawn 1 
> 
> [sox3:24416] [[26562,0],0] plm:rsh: final template argv:
>/usr/bin/ssh  set path = ( 
> /build/openmpi/openmpi-4.0.1/rhel6/bin $path ) ; if ( $?LD_LIBRARY_PATH == 1 
> ) set OMPI_have_llp ; if ( $?LD_LIBRARY_PATH == 0 ) setenv LD_LIBRARY_PATH 
> /build/openmpi/openmpi-4.0.1/rhel6/lib ; if ( $?OMPI_have_llp == 1 ) setenv 
> LD_LIBRARY_PATH /build/openmpi/openmpi-4.0.1/rhel6/lib:$LD_LIBRARY_PATH ; if 
> ( $?DYLD_LIBRARY_PATH == 1 ) set OMPI_have_dllp ; if ( $?DYLD_LIBRARY_PATH == 
> 0 ) setenv DYLD_LIBRARY_PATH /build/openmpi/openmpi-4.0.1/rhel6/lib ; if ( 
> $?OMPI_have_dllp == 1 ) setenv DYLD_LIBRARY_PATH 
> /build/openmpi/openmpi-4.0.1/rhel6/lib:$DYLD_LIBRARY_PATH ;   
> /build/openmpi/openmpi-4.0.1/rhel6/bin/orted -mca ess "env" -mca 
> ess_base_jobid "1740767232" -mca ess_base_vpid "" -mca 
> ess_base_num_procs "6" -mca orte_node_regex 
> "sox[1:3],bos[1:3],bos[2:15],bos[1:9],bos[2:12],bos[1:7]@0(6)" -mca 
> orte_hnp_uri "1740767232.0;tcp://147.34.216.21:54496" --mca 
> orte_base_help_aggregate "0" --mca plm_base_verbose "5" --mca 
> plm_rsh_no_tree_spawn "1" -mca plm "rsh" -mca orte_output_filename 
> "./veloce.log/velsyn/dvelsyn:nojobid,nocopy" -mca pmix "^s1,s2,cray,isolated" 
>  
> [sox3:24416] [[26562,0],0] complete_setup on job [26562,1]
> [sox3:24416] [[26562,0],0] plm:base:receive update proc state command from 
> [[26562,0],5]
> [sox3:24416] [[26562,0],0] plm:base:receive got update_proc_state for job 
> [26562,1]
> [sox3:24416] [[26562,0],0] plm:base:receive update proc state command from 
> [[26562,0],4]
> [sox3:24416] [[26562,0],0] plm:base:receive got update_proc_state for job 
> [26562,1]
> [sox3:24416] [[26562,0],0] plm:base:receive update proc state command from 
> [[26562,0],1]
> [sox3:24416] [[26562,0],0] plm:base:receive got update_proc_state for job 
> [26562,1]
> [sox3:24416] [[26562,0],0] plm:base:receive update proc state command from 
> [[26562,0],2]
> [sox3:24416] [[26562,0],0] plm:base:receive got update_proc_state for job 
> [26562,1]
> [sox3:24416] [[26562,0],0] plm:base:receive update proc state command from 
> [[26562,0],3]
> [sox3:24416] [[2

Re: [OMPI users] Running mpirun with grid

2020-06-01 Thread John Hearns via users
As a suggestion can we see the configuration of your Parallel Environment?

qconf -spl

qconf -sp orte2

On Mon, 1 Jun 2020 at 22:20, Ralph Castain via users <
users@lists.open-mpi.org> wrote:

> Afraid I have no real ideas here. Best I can suggest is taking the qrsh
> cmd line from the prior debug output and try running it manually. This
> might give you a chance to manipulate it and see if you can identify what
> is causing it an issue, if anything. Without mpirun executing, the daemons
> will bark about being unable to connect back, so you might need to use some
> other test program for this purpose.
>
> I agree with Jeff - you should check to see where these messages are
> coming from:
>
>
> >> Server daemon successfully started with task id "1.cod4"
> >> Server daemon successfully started with task id "1.cod5"
> >> Server daemon successfully started with task id "1.cod6"
> >> Server daemon successfully started with task id "1.has6"
> >> Server daemon successfully started with task id "1.hpb12"
> >> Server daemon successfully started with task id "1.has4"
> >
> >> Unmatched ".
> >> Unmatched ".
> >> Unmatched ".
> >
>
>
> Could be a clue as to what is actually happening.
>
>
> > On Jun 1, 2020, at 1:57 PM, Kulshrestha, Vipul via users <
> users@lists.open-mpi.org> wrote:
> >
> > Thank Jeff & Ralph for your responses.
> >
> > I tried changing the verbose level to 5 using the option suggested by
> Ralph, but there was no difference in the output (so no additional
> information in the output).
> >
> > I also tried to replace the grid submission script to a command line
> qsub job submission, but got the same issue. Removing the use of job
> submission script, the qsub command looks like below. This uses mpirun
> option "--N 1" to ensure that only 1 process is launched by mpirun on one
> host.
> >
> > Do you have some suggestion on how I can go about investigating the root
> cause of the problem I am facing? I am able to run mpirun successfully, if
> I specify the same set of hosts (as allocated by grid) using mpirun host
> file. I have also pasted the verbose output with host file and the orted
> command looks very similar to the one generated for grid submission (except
> that it uses /usr/bin/ssh instead of /grid2/sge/bin/lx-amd64/qrsh.
> >
> > Thanks,
> > Vipul
> >
> >
> > qsub -N velsyn -pe orte2 10 -V -b y -cwd -j y -o $cwd/a -l
> "os=redhat6.7*" -q all /build/openmpi/openmpi-4.0.1/rhel6/bin/mpirun --N 1
> -x LD_LIBRARY_PATH=/build/openmpi/openmpi-4.0.1/rhel6/lib -x PATH=$PATH
> --merge-stderr-to-stdout --output-filename
> ./veloce.log/velsyn/dvelsyn:nojobid,nocopy -np 5 --mca
> orte_base_help_aggregate 0 --mca plm_base_verbose 5 --mca
> plm_rsh_no_tree_spawn 1 
> >
> >
> > $ /build/openmpi/openmpi-4.0.1/rhel6/bin/mpirun --hostfile host.txt -x
> VMW_HOME=$VMW_HOME -x VMW_BIN=$VMW_BIN -x
> LD_LIBRARY_PATH=/build/openmpi/openmpi-4.0.1/rhel6/lib -x PATH=$PATH
> --merge-stderr-to-stdout --output-filename
> ./veloce.log/velsyn/dvelsyn:nojobid,nocopy -np 5 --mca
> orte_base_help_aggregate 0 --mca plm_base_verbose 5 --mca
> plm_rsh_no_tree_spawn 1 
> >
> > [sox3:24416] [[26562,0],0] plm:rsh: final template argv:
> >/usr/bin/ssh  set path = (
> /build/openmpi/openmpi-4.0.1/rhel6/bin $path ) ; if ( $?LD_LIBRARY_PATH ==
> 1 ) set OMPI_have_llp ; if ( $?LD_LIBRARY_PATH == 0 ) setenv
> LD_LIBRARY_PATH /build/openmpi/openmpi-4.0.1/rhel6/lib ; if (
> $?OMPI_have_llp == 1 ) setenv LD_LIBRARY_PATH
> /build/openmpi/openmpi-4.0.1/rhel6/lib:$LD_LIBRARY_PATH ; if (
> $?DYLD_LIBRARY_PATH == 1 ) set OMPI_have_dllp ; if ( $?DYLD_LIBRARY_PATH ==
> 0 ) setenv DYLD_LIBRARY_PATH /build/openmpi/openmpi-4.0.1/rhel6/lib ; if (
> $?OMPI_have_dllp == 1 ) setenv DYLD_LIBRARY_PATH
> /build/openmpi/openmpi-4.0.1/rhel6/lib:$DYLD_LIBRARY_PATH ;
>  /build/openmpi/openmpi-4.0.1/rhel6/bin/orted -mca ess "env" -mca
> ess_base_jobid "1740767232" -mca ess_base_vpid "" -mca
> ess_base_num_procs "6" -mca orte_node_regex
> "sox[1:3],bos[1:3],bos[2:15],bos[1:9],bos[2:12],bos[1:7]@0(6)" -mca
> orte_hnp_uri "1740767232.0;tcp://147.34.216.21:54496" --mca
> orte_base_help_aggregate "0" --mca plm_base_verbose "5" --mca
> plm_rsh_no_tree_spawn "1" -mca plm "rsh" -mca orte_output_filename
> "./veloce.log/velsyn/dvelsyn:nojobid,nocopy" -mca pmix
> "^s1,s2,cray,isolated"
> > [sox3:24416] [[26562,0],0] complete_setup on job [26562,1]
> > [sox3:24416] [[26562,0],0] plm:base:receive update proc state command
> from [[26562,0],5]
> > [sox3:24416] [[26562,0],0] plm:base:receive got update_proc_state for
> job [26562,1]
> > [sox3:24416] [[26562,0],0] plm:base:receive update proc state command
> from [[26562,0],4]
> > [sox3:24416] [[26562,0],0] plm:base:receive got update_proc_state for
> job [26562,1]
> > [sox3:24416] [[26562,0],0] plm:base:receive update proc state command
> from [[26562,0],1]
> > [sox3:24416] [[26562,0],0] plm:base:receive got update_proc_state for
> job [26562,1]
> > [sox3:24416] [[26562,0],0] plm:base:receive update proc state c