Re: [OMPI users] Running mpirun with grid

2020-06-02 Thread Kulshrestha, Vipul via users
Thanks.

$  qconf -spl
OpenMP
dist
make
mc4
oneper
orte
orte2
perf
run15
run25
run5
run50
thread
turbo
$  qconf -sp orte2
pe_nameorte2
slots  9
used_slots 0
bound_slots0
user_lists NONE
xuser_listsNONE
start_proc_argsNONE
stop_proc_args NONE
per_pe_task_prolog NONE
per_pe_task_epilog NONE
allocation_rule2
control_slaves TRUE
job_is_first_task  FALSE
urgency_slots  min
accounting_summary FALSE
daemon_forks_slavesFALSE
master_forks_slavesFALSE

From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of John Hearns 
via users
Sent: Tuesday, June 2, 2020 2:25 AM
To: Open MPI Users 
Cc: John Hearns 
Subject: Re: [OMPI users] Running mpirun with grid

As a suggestion can we see the configuration of your Parallel Environment?

qconf -spl

qconf -sp orte2

On Mon, 1 Jun 2020 at 22:20, Ralph Castain via users 
mailto:users@lists.open-mpi.org>> wrote:
Afraid I have no real ideas here. Best I can suggest is taking the qrsh cmd 
line from the prior debug output and try running it manually. This might give 
you a chance to manipulate it and see if you can identify what is causing it an 
issue, if anything. Without mpirun executing, the daemons will bark about being 
unable to connect back, so you might need to use some other test program for 
this purpose.

I agree with Jeff - you should check to see where these messages are coming 
from:


>> Server daemon successfully started with task id "1.cod4"
>> Server daemon successfully started with task id "1.cod5"
>> Server daemon successfully started with task id "1.cod6"
>> Server daemon successfully started with task id "1.has6"
>> Server daemon successfully started with task id "1.hpb12"
>> Server daemon successfully started with task id "1.has4"
>
>> Unmatched ".
>> Unmatched ".
>> Unmatched ".
>


Could be a clue as to what is actually happening.


> On Jun 1, 2020, at 1:57 PM, Kulshrestha, Vipul via users 
> mailto:users@lists.open-mpi.org>> wrote:
>
> Thank Jeff & Ralph for your responses.
>
> I tried changing the verbose level to 5 using the option suggested by Ralph, 
> but there was no difference in the output (so no additional information in 
> the output).
>
> I also tried to replace the grid submission script to a command line qsub job 
> submission, but got the same issue. Removing the use of job submission 
> script, the qsub command looks like below. This uses mpirun option "--N 1" to 
> ensure that only 1 process is launched by mpirun on one host.
>
> Do you have some suggestion on how I can go about investigating the root 
> cause of the problem I am facing? I am able to run mpirun successfully, if I 
> specify the same set of hosts (as allocated by grid) using mpirun host file. 
> I have also pasted the verbose output with host file and the orted command 
> looks very similar to the one generated for grid submission (except that it 
> uses /usr/bin/ssh instead of /grid2/sge/bin/lx-amd64/qrsh.
>
> Thanks,
> Vipul
>
>
> qsub -N velsyn -pe orte2 10 -V -b y -cwd -j y -o $cwd/a -l "os=redhat6.7*" -q 
> all /build/openmpi/openmpi-4.0.1/rhel6/bin/mpirun --N 1  -x 
> LD_LIBRARY_PATH=/build/openmpi/openmpi-4.0.1/rhel6/lib -x PATH=$PATH 
> --merge-stderr-to-stdout --output-filename 
> ./veloce.log/velsyn/dvelsyn:nojobid,nocopy -np 5 --mca 
> orte_base_help_aggregate 0 --mca plm_base_verbose 5 --mca 
> plm_rsh_no_tree_spawn 1 
>
>
> $ /build/openmpi/openmpi-4.0.1/rhel6/bin/mpirun --hostfile host.txt -x 
> VMW_HOME=$VMW_HOME -x VMW_BIN=$VMW_BIN -x 
> LD_LIBRARY_PATH=/build/openmpi/openmpi-4.0.1/rhel6/lib -x PATH=$PATH 
> --merge-stderr-to-stdout --output-filename 
> ./veloce.log/velsyn/dvelsyn:nojobid,nocopy -np 5 --mca 
> orte_base_help_aggregate 0 --mca plm_base_verbose 5 --mca 
> plm_rsh_no_tree_spawn 1 
>
> [sox3:24416] [[26562,0],0] plm:rsh: final template argv:
>/usr/bin/ssh  set path = ( 
> /build/openmpi/openmpi-4.0.1/rhel6/bin $path ) ; if ( $?LD_LIBRARY_PATH == 1 
> ) set OMPI_have_llp ; if ( $?LD_LIBRARY_PATH == 0 ) setenv LD_LIBRARY_PATH 
> /build/openmpi/openmpi-4.0.1/rhel6/lib ; if ( $?OMPI_have_llp == 1 ) setenv 
> LD_LIBRARY_PATH /build/openmpi/openmpi-4.0.1/rhel6/lib:$LD_LIBRARY_PATH ; if 
> ( $?DYLD_LIBRARY_PATH == 1 ) set OMPI_have_dllp ; if ( $?DYLD_LIBRARY_PATH == 
> 0 ) setenv DYLD_LIBRARY_PATH /build/openmpi/openmpi-4.0.1/rhel6/lib ; if ( 
> $?OMPI_have_dllp == 1 ) setenv DYLD_LIBRARY_PATH 
> /build/openmpi/openmpi-4.0.1/rhel6/lib:$DYLD_LIBRARY_PATH ;   
> /build/openmpi/openmpi-4.0.1/rhel6/bin/orted -mca ess "env" -mca 
> ess_base_jobid "1740767232" -mca ess_base_vpid "" -mca 
> ess_base_num_procs "6" -mca orte_node_regex 
> "sox[1:3],bos[1:3],bos[2:15],bos[1:9],bos[2:12],bos[1:7]@0(6)" -mca 
> orte_hnp_uri 
> "1740767232.0;tcp://147.34.216.21:54496" --mca 
> orte_base_help_aggregate "0" --mca plm_base_verbose "5" --mca 
> plm_rsh_no_tree_spawn "1" 

Re: [hwloc-users] One more silly warning squash

2020-06-02 Thread Samuel Thibault
Balaji, Pavan, le mar. 02 juin 2020 09:31:29 +, a ecrit:
> > On Jun 1, 2020, at 4:11 AM, Balaji, Pavan via hwloc-users 
> >  wrote:
> >> On Jun 1, 2020, at 4:10 AM, Balaji, Pavan  wrote:
> >>> On Jun 1, 2020, at 4:06 AM, Samuel Thibault  
> >>> wrote:
> >>> could you check whether the attached patch avoids the warning?
> >>> (we should really not need a cast to const char*)
> >> 
> >> The attached patch is basically the same as what we are using, isn't it?  
> >> It does avoid the warning.
> > 
> > Oh, sorry, I see now that you skipped the extra cast in that case.  Let me 
> > try it out and get back to you.
> 
> I've verified that the patch works.

Ok, I pushed the fix to master, thanks!

Samuel
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users


Re: [hwloc-users] One more silly warning squash

2020-06-02 Thread Balaji, Pavan via hwloc-users


> On Jun 1, 2020, at 4:11 AM, Balaji, Pavan via hwloc-users 
>  wrote:
>> On Jun 1, 2020, at 4:10 AM, Balaji, Pavan  wrote:
>>> On Jun 1, 2020, at 4:06 AM, Samuel Thibault  
>>> wrote:
>>> could you check whether the attached patch avoids the warning?
>>> (we should really not need a cast to const char*)
>> 
>> The attached patch is basically the same as what we are using, isn't it?  It 
>> does avoid the warning.
> 
> Oh, sorry, I see now that you skipped the extra cast in that case.  Let me 
> try it out and get back to you.

I've verified that the patch works.

Thanks,

  -- Pavan

___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users


Re: [OMPI users] Coordinating (non-overlapping) local stores with remote puts form using passive RMA synchronization

2020-06-02 Thread Joseph Schuchart via users

Hi Stephen,

Let me try to answer your questions inline (I don't have extensive 
experience with the separate model and from my experience most 
implementations support the unified model, with some exceptions):


On 5/31/20 1:31 AM, Stephen Guzik via users wrote:

Hi,

I'm trying to get a better understanding of coordinating 
(non-overlapping) local stores with remote puts when using passive 
synchronization for RMA.  I understand that the window should be locked 
for a local store, but can it be a shared lock?


Yes. There is no reason why that cannot be a shared lock.

In my example, each 
process retrieves and increments an index (indexBuf and indexWin) from a 
target process and then stores it's rank into an array (dataBuf and 
dataWin) at that index on the target.  If the target is local, a local 
store is attempted:


/* indexWin on indexBuf, dataWin on dataBuf */
std::vector myvals(numProc);
MPI_Win_lock_all(0, indexWin);
MPI_Win_lock_all(0, dataWin);
for (int tgtProc = 0; tgtProc != numProc; ++tgtProc)
   {
     MPI_Fetch_and_op(, [tgtProc], MPI_INT, tgtProc, 0, 
MPI_SUM,indexWin);

     MPI_Win_flush_local(tgtProc, indexWin);
     // Put our rank into the right location of the target
     if (tgtProc == procID)
   {
     dataBuf[myvals[procID]] = procID;
   }
     else
   {
     MPI_Put(, 1, MPI_INT, tgtProc, myvals[tgtProc], 1, 
MPI_INT,dataWin);

   }
   }
MPI_Win_flush_all(dataWin);  /* Force completion and time 
synchronization */

MPI_Barrier(MPI_COMM_WORLD);
/* Proceed with local loads and unlock windows later */

I believe this is valid for a unified memory model but would probably 
fail for a separate model (unless a separate model very cleverly merges 
a private and public window?)  Is this understanding correct?  And if I 
instead use MPI_Put for the local write, then it should be valid for 
both memory models?


Yes, if you use RMA operations even on local memory it is valid for both 
memory models.


The MPI standard on page 455 (S3) states that "a store to process memory 
to a location in a window must not start once a put or accumulate update 
to that target window has started, until the put or accumulate update 
becomes visible in process memory." So there is no clever merging and it 
is up to the user to ensure that there are no puts and stores happening 
at the same time.




Another approach is specific locks.  I don't like this because it seems 
there are excessive synchronizations.  But if I really want to mix local 
stores and remote puts, is this the only way using locks?


/* indexWin on indexBuf, dataWin on dataBuf */
std::vector myvals(numProc);
for (int tgtProc = 0; tgtProc != numProc; ++tgtProc)
   {
     MPI_Win_lock(MPI_LOCK_SHARED, tgtProc, 0, indexWin);
     MPI_Fetch_and_op(, [tgtProc], MPI_INT, tgtProc, 0, 
MPI_SUM,indexWin);

     MPI_Win_unlock(tgtProc, indexWin);
     // Put our rank into the right location of the target
     if (tgtProc == procID)
   {
     MPI_Win_lock(MPI_LOCK_EXCLUSIVE, tgtProc, 0, dataWin);
     dataBuf[myvals[procID]] = procID;
     MPI_Win_unlock(tgtProc, dataWin);  /*(A)*/
   }
     else
   {
     MPI_Win_lock(MPI_LOCK_SHARED, tgtProc, 0, dataWin);
     MPI_Put(, 1, MPI_INT, tgtProc, myvals[tgtProc], 1, 
MPI_INT,dataWin);

     MPI_Win_unlock(tgtProc, dataWin);
   }
   }
/* Proceed with local loads */

I believe this is also valid for both memory models?  An unlock must 
have followed the last access to the local window, before the exclusive 
lock is gained.  That should have synchronized the windows and another 
synchronization should happen at (A).  Is that understanding correct? 


That is correct for both memory models, yes. It is likely to be slower 
because locking and unlocking involves some effort. You are better off 
using put instead.


If you really want to use local stores you can check for the 
MPI_WIN_UNIFIED attribute and fall-back to using puts only for the 
separate model.


> If so, how does one ever get into a situation where MPI_Win_sync must 
be used?


You can think of a synchronization scheme where each process takes a 
shared lock on a window, stores data to a local location, calls 
MPI_Win_sync and signals to other processes that the data is now 
available, e.g., through a barrier or a send. In that case processes 
keep the lock and use some non-RMA synchronization instead.




Final question.  In the first example, let's say there is a lot of 
computation in the loop and I want the MPI_Puts to immediately make 
progress.  Would it be sensible to follow the MPI_Put with a 
MPI_Win_flush_local to get things moving?  Or is it best to avoid any 
unnecessary synchronizations?


That is highly implementation-specific. Some implementations may buffer 
the puts and delay the transfer to the flush, some may initiate it 
immediately, and some may treat a local flush similar to a regular 
flush. I would not make any assumptions about the underlying 

Re: [OMPI users] Running mpirun with grid

2020-06-02 Thread Gilles Gouaillardet via users
Vipul,

You can also use the launch_agent to debug that.

Long story short
mpirun --mca orte_launch_agent /.../agent.sh a.out
will
qrsh ... /.../agent.sh 
instead of
qrsh ... orted 

at first, you can write a trivial agent that simply dumps the command line.
you might also want to dump the environment and run ldd /.../orted to
make sure there is not an accidental mix of libraries.

Cheers,

Gilles

On Tue, Jun 2, 2020 at 6:20 AM Ralph Castain via users
 wrote:
>
> Afraid I have no real ideas here. Best I can suggest is taking the qrsh cmd 
> line from the prior debug output and try running it manually. This might give 
> you a chance to manipulate it and see if you can identify what is causing it 
> an issue, if anything. Without mpirun executing, the daemons will bark about 
> being unable to connect back, so you might need to use some other test 
> program for this purpose.
>
> I agree with Jeff - you should check to see where these messages are coming 
> from:
>
>
> >> Server daemon successfully started with task id "1.cod4"
> >> Server daemon successfully started with task id "1.cod5"
> >> Server daemon successfully started with task id "1.cod6"
> >> Server daemon successfully started with task id "1.has6"
> >> Server daemon successfully started with task id "1.hpb12"
> >> Server daemon successfully started with task id "1.has4"
> >
> >> Unmatched ".
> >> Unmatched ".
> >> Unmatched ".
> >
>
>
> Could be a clue as to what is actually happening.
>
>
> > On Jun 1, 2020, at 1:57 PM, Kulshrestha, Vipul via users 
> >  wrote:
> >
> > Thank Jeff & Ralph for your responses.
> >
> > I tried changing the verbose level to 5 using the option suggested by 
> > Ralph, but there was no difference in the output (so no additional 
> > information in the output).
> >
> > I also tried to replace the grid submission script to a command line qsub 
> > job submission, but got the same issue. Removing the use of job submission 
> > script, the qsub command looks like below. This uses mpirun option "--N 1" 
> > to ensure that only 1 process is launched by mpirun on one host.
> >
> > Do you have some suggestion on how I can go about investigating the root 
> > cause of the problem I am facing? I am able to run mpirun successfully, if 
> > I specify the same set of hosts (as allocated by grid) using mpirun host 
> > file. I have also pasted the verbose output with host file and the orted 
> > command looks very similar to the one generated for grid submission (except 
> > that it uses /usr/bin/ssh instead of /grid2/sge/bin/lx-amd64/qrsh.
> >
> > Thanks,
> > Vipul
> >
> >
> > qsub -N velsyn -pe orte2 10 -V -b y -cwd -j y -o $cwd/a -l "os=redhat6.7*" 
> > -q all /build/openmpi/openmpi-4.0.1/rhel6/bin/mpirun --N 1  -x 
> > LD_LIBRARY_PATH=/build/openmpi/openmpi-4.0.1/rhel6/lib -x PATH=$PATH 
> > --merge-stderr-to-stdout --output-filename 
> > ./veloce.log/velsyn/dvelsyn:nojobid,nocopy -np 5 --mca 
> > orte_base_help_aggregate 0 --mca plm_base_verbose 5 --mca 
> > plm_rsh_no_tree_spawn 1 
> >
> >
> > $ /build/openmpi/openmpi-4.0.1/rhel6/bin/mpirun --hostfile host.txt -x 
> > VMW_HOME=$VMW_HOME -x VMW_BIN=$VMW_BIN -x 
> > LD_LIBRARY_PATH=/build/openmpi/openmpi-4.0.1/rhel6/lib -x PATH=$PATH 
> > --merge-stderr-to-stdout --output-filename 
> > ./veloce.log/velsyn/dvelsyn:nojobid,nocopy -np 5 --mca 
> > orte_base_help_aggregate 0 --mca plm_base_verbose 5 --mca 
> > plm_rsh_no_tree_spawn 1 
> >
> > [sox3:24416] [[26562,0],0] plm:rsh: final template argv:
> >/usr/bin/ssh  set path = ( 
> > /build/openmpi/openmpi-4.0.1/rhel6/bin $path ) ; if ( $?LD_LIBRARY_PATH == 
> > 1 ) set OMPI_have_llp ; if ( $?LD_LIBRARY_PATH == 0 ) setenv 
> > LD_LIBRARY_PATH /build/openmpi/openmpi-4.0.1/rhel6/lib ; if ( 
> > $?OMPI_have_llp == 1 ) setenv LD_LIBRARY_PATH 
> > /build/openmpi/openmpi-4.0.1/rhel6/lib:$LD_LIBRARY_PATH ; if ( 
> > $?DYLD_LIBRARY_PATH == 1 ) set OMPI_have_dllp ; if ( $?DYLD_LIBRARY_PATH == 
> > 0 ) setenv DYLD_LIBRARY_PATH /build/openmpi/openmpi-4.0.1/rhel6/lib ; if ( 
> > $?OMPI_have_dllp == 1 ) setenv DYLD_LIBRARY_PATH 
> > /build/openmpi/openmpi-4.0.1/rhel6/lib:$DYLD_LIBRARY_PATH ;   
> > /build/openmpi/openmpi-4.0.1/rhel6/bin/orted -mca ess "env" -mca 
> > ess_base_jobid "1740767232" -mca ess_base_vpid "" -mca 
> > ess_base_num_procs "6" -mca orte_node_regex 
> > "sox[1:3],bos[1:3],bos[2:15],bos[1:9],bos[2:12],bos[1:7]@0(6)" -mca 
> > orte_hnp_uri "1740767232.0;tcp://147.34.216.21:54496" --mca 
> > orte_base_help_aggregate "0" --mca plm_base_verbose "5" --mca 
> > plm_rsh_no_tree_spawn "1" -mca plm "rsh" -mca orte_output_filename 
> > "./veloce.log/velsyn/dvelsyn:nojobid,nocopy" -mca pmix 
> > "^s1,s2,cray,isolated"
> > [sox3:24416] [[26562,0],0] complete_setup on job [26562,1]
> > [sox3:24416] [[26562,0],0] plm:base:receive update proc state command from 
> > [[26562,0],5]
> > [sox3:24416] [[26562,0],0] plm:base:receive got update_proc_state for job 
> > [26562,1]
> > [sox3:24416] [[26562,0],0] plm:base:receive update 

Re: [OMPI users] Running mpirun with grid

2020-06-02 Thread John Hearns via users
As a suggestion can we see the configuration of your Parallel Environment?

qconf -spl

qconf -sp orte2

On Mon, 1 Jun 2020 at 22:20, Ralph Castain via users <
users@lists.open-mpi.org> wrote:

> Afraid I have no real ideas here. Best I can suggest is taking the qrsh
> cmd line from the prior debug output and try running it manually. This
> might give you a chance to manipulate it and see if you can identify what
> is causing it an issue, if anything. Without mpirun executing, the daemons
> will bark about being unable to connect back, so you might need to use some
> other test program for this purpose.
>
> I agree with Jeff - you should check to see where these messages are
> coming from:
>
>
> >> Server daemon successfully started with task id "1.cod4"
> >> Server daemon successfully started with task id "1.cod5"
> >> Server daemon successfully started with task id "1.cod6"
> >> Server daemon successfully started with task id "1.has6"
> >> Server daemon successfully started with task id "1.hpb12"
> >> Server daemon successfully started with task id "1.has4"
> >
> >> Unmatched ".
> >> Unmatched ".
> >> Unmatched ".
> >
>
>
> Could be a clue as to what is actually happening.
>
>
> > On Jun 1, 2020, at 1:57 PM, Kulshrestha, Vipul via users <
> users@lists.open-mpi.org> wrote:
> >
> > Thank Jeff & Ralph for your responses.
> >
> > I tried changing the verbose level to 5 using the option suggested by
> Ralph, but there was no difference in the output (so no additional
> information in the output).
> >
> > I also tried to replace the grid submission script to a command line
> qsub job submission, but got the same issue. Removing the use of job
> submission script, the qsub command looks like below. This uses mpirun
> option "--N 1" to ensure that only 1 process is launched by mpirun on one
> host.
> >
> > Do you have some suggestion on how I can go about investigating the root
> cause of the problem I am facing? I am able to run mpirun successfully, if
> I specify the same set of hosts (as allocated by grid) using mpirun host
> file. I have also pasted the verbose output with host file and the orted
> command looks very similar to the one generated for grid submission (except
> that it uses /usr/bin/ssh instead of /grid2/sge/bin/lx-amd64/qrsh.
> >
> > Thanks,
> > Vipul
> >
> >
> > qsub -N velsyn -pe orte2 10 -V -b y -cwd -j y -o $cwd/a -l
> "os=redhat6.7*" -q all /build/openmpi/openmpi-4.0.1/rhel6/bin/mpirun --N 1
> -x LD_LIBRARY_PATH=/build/openmpi/openmpi-4.0.1/rhel6/lib -x PATH=$PATH
> --merge-stderr-to-stdout --output-filename
> ./veloce.log/velsyn/dvelsyn:nojobid,nocopy -np 5 --mca
> orte_base_help_aggregate 0 --mca plm_base_verbose 5 --mca
> plm_rsh_no_tree_spawn 1 
> >
> >
> > $ /build/openmpi/openmpi-4.0.1/rhel6/bin/mpirun --hostfile host.txt -x
> VMW_HOME=$VMW_HOME -x VMW_BIN=$VMW_BIN -x
> LD_LIBRARY_PATH=/build/openmpi/openmpi-4.0.1/rhel6/lib -x PATH=$PATH
> --merge-stderr-to-stdout --output-filename
> ./veloce.log/velsyn/dvelsyn:nojobid,nocopy -np 5 --mca
> orte_base_help_aggregate 0 --mca plm_base_verbose 5 --mca
> plm_rsh_no_tree_spawn 1 
> >
> > [sox3:24416] [[26562,0],0] plm:rsh: final template argv:
> >/usr/bin/ssh  set path = (
> /build/openmpi/openmpi-4.0.1/rhel6/bin $path ) ; if ( $?LD_LIBRARY_PATH ==
> 1 ) set OMPI_have_llp ; if ( $?LD_LIBRARY_PATH == 0 ) setenv
> LD_LIBRARY_PATH /build/openmpi/openmpi-4.0.1/rhel6/lib ; if (
> $?OMPI_have_llp == 1 ) setenv LD_LIBRARY_PATH
> /build/openmpi/openmpi-4.0.1/rhel6/lib:$LD_LIBRARY_PATH ; if (
> $?DYLD_LIBRARY_PATH == 1 ) set OMPI_have_dllp ; if ( $?DYLD_LIBRARY_PATH ==
> 0 ) setenv DYLD_LIBRARY_PATH /build/openmpi/openmpi-4.0.1/rhel6/lib ; if (
> $?OMPI_have_dllp == 1 ) setenv DYLD_LIBRARY_PATH
> /build/openmpi/openmpi-4.0.1/rhel6/lib:$DYLD_LIBRARY_PATH ;
>  /build/openmpi/openmpi-4.0.1/rhel6/bin/orted -mca ess "env" -mca
> ess_base_jobid "1740767232" -mca ess_base_vpid "" -mca
> ess_base_num_procs "6" -mca orte_node_regex
> "sox[1:3],bos[1:3],bos[2:15],bos[1:9],bos[2:12],bos[1:7]@0(6)" -mca
> orte_hnp_uri "1740767232.0;tcp://147.34.216.21:54496" --mca
> orte_base_help_aggregate "0" --mca plm_base_verbose "5" --mca
> plm_rsh_no_tree_spawn "1" -mca plm "rsh" -mca orte_output_filename
> "./veloce.log/velsyn/dvelsyn:nojobid,nocopy" -mca pmix
> "^s1,s2,cray,isolated"
> > [sox3:24416] [[26562,0],0] complete_setup on job [26562,1]
> > [sox3:24416] [[26562,0],0] plm:base:receive update proc state command
> from [[26562,0],5]
> > [sox3:24416] [[26562,0],0] plm:base:receive got update_proc_state for
> job [26562,1]
> > [sox3:24416] [[26562,0],0] plm:base:receive update proc state command
> from [[26562,0],4]
> > [sox3:24416] [[26562,0],0] plm:base:receive got update_proc_state for
> job [26562,1]
> > [sox3:24416] [[26562,0],0] plm:base:receive update proc state command
> from [[26562,0],1]
> > [sox3:24416] [[26562,0],0] plm:base:receive got update_proc_state for
> job [26562,1]
> > [sox3:24416] [[26562,0],0] plm:base:receive update proc state