[slurm-dev] Re: untracked processes

2013-02-21 Thread Moe Jette

The message below should read epilog rather than prolog.

Slurm only tracks the processes that it's daemons launch (most MPI
implementations can launch their tasks using slurm). Anything launched
outside of Slurm can be killed as part of a job epilog, but accounting
and job step management are outside of Slurm's control.

The epilog can check if user still has a Slurm job allocated to the  
node and if not, kill all processes owned by that user.


Quoting Moe Jette je...@schedmd.com:


 Slurm only tracks the processes that it's daemons launch (most MPI
 implementations can launch their tasks using slurm). Anything launched
 outside of Slurm can be killed as part of a job prolog, but accounting
 and job step management are outside of Slurm's control.

 Quoting Michael Colonno mcolo...@stanford.edu:


  SLURM gurus ~

  I'm trying to configure a commercial MPI code to run through SLURM.
 I can launch this code through either srun or sbatch without any
 issues (the good) but the processes manage to run completely
 disconnected from SLURM's notice (the bad). i.e. the job is running
 just fine but SLURM thinks it's completed and hence does not report
 anything running. I'm guessing this is due to the fact that this
 tool runs a pre-processing-type executable and then launches
 sub-processes to solve (MPI on a local system) without connecting
 the process IDs(?) In any event, I'm guessing I'm not the first
 person to run into this. Is there a recommended solution to
 configure SLURM to track codes like this?

  Thanks,
  ~Mike C.







[slurm-dev] Re: untracked processes

2013-02-21 Thread Ryan Cox

This may not be exactly what you're looking for but it could be a start.

We're looking at adding modifying ssh_config and sshd_config to 
propagate SLURM_JOB_ID for jobs that use ssh to spawn processes (credit 
to our sysadmin Lloyd Brown for that one).  Then we will use something 
like a script in /etc/profile.d to add the process to the correct cgroup 
if it's launched via ssh and has $SLURM_JOB_ID set. We're not using 
cgroups yet (still have some CentOS 5) so I don't have exact 
implementation details at this point.  Then the cgroups should work for 
resource control and, I assume, accounting if using the correct plugin.

This may not catch 100% of everything, but we would probably have 
something look for all user processes that are not part of a cgroup and 
add them to the user cgroup.  I don't think accounting could work in 
that case, but that would help catch and control rogue processes that 
aren't accounted for under SLURM.  Epilog or a cron could clean up all 
of a user's processes after they don't have jobs on the node anymore.

I don't know if SLURM has something like Torque's tm_adopt, but that 
could work in lieu of cgroups for accounting if you don't happen to use 
cgroups.  tm_adopt allowed you to add a random process to be accounted 
for under Torque, even if it wasn't launched under Torque.  We used to 
have a wrapper script for ssh that did just that when we used Torque and 
Moab.

Ryan

P.S. We've only been using SLURM for a few weeks so you might want to 
double-check the accuracy and viability of my statements :)


On 02/21/2013 12:57 PM, Moe Jette wrote:
 Slurm only tracks the processes that it's daemons launch (most MPI
 implementations can launch their tasks using slurm). Anything launched
 outside of Slurm can be killed as part of a job prolog, but accounting
 and job step management are outside of Slurm's control.

 Quoting Michael Colonno mcolo...@stanford.edu:

  SLURM gurus ~

  I'm trying to configure a commercial MPI code to run through SLURM.
 I can launch this code through either srun or sbatch without any
 issues (the good) but the processes manage to run completely
 disconnected from SLURM's notice (the bad). i.e. the job is running
 just fine but SLURM thinks it's completed and hence does not report
 anything running. I'm guessing this is due to the fact that this
 tool runs a pre-processing-type executable and then launches
 sub-processes to solve (MPI on a local system) without connecting
 the process IDs(?) In any event, I'm guessing I'm not the first
 person to run into this. Is there a recommended solution to
 configure SLURM to track codes like this?

  Thanks,
  ~Mike C.



-- 
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University


[slurm-dev] RE: untracked processes

2013-02-21 Thread Lloyd Brown

I believe you're understanding the problem correctly.  Especially with
MPI, you need some mechanism to launch processes on remote hosts.  With
scheduler integration of some kind (eg. srun with SLURM, TM-API with
Torque, etc.), then the MPI implementation can work with the scheduling
tool to do this, and everyone is happy.  But lacking that, the MPI
implementation still needs to have a fallback, which is usually SSH.

Having said that, in my experience most commercial codes that use MPI,
just package/utilize some MPI implementation that's out there on the
market.  Many of those MPI implementations do have SLURM integration.
It would be helpful to know what the commercial software is, and perhaps
more importantly, what MPI implementation it is using.

Also, some commercial codes also have a way to specify (via a cli
parameter) a drop-in replacement for ssh.  In theory, you could probably
create a simple wrapper around srun to make the syntax more like ssh,
and use that.

There are probably other solutions out there too.


Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University
http://marylou.byu.edu

On 02/21/2013 02:06 PM, Michael Colonno wrote:
 
   Thanks for the reply. I'm not 100% clear on the below so let me be more 
 specific. I'm launching the code via srun (for example). The code launches, 
 runs a few different executables in order, and eventually launches a few MPI 
 processes though its own MPI implementation. I have no control over the 
 source code nor what syntax is used to launch the sub-processes. srun 
 launches these processes and then reports the job completed; this is the only 
 tool that behaves this way (others seem to track processes even if not 
 launched through SLURM). Is the conclusion that if the sub-processes are not 
 launched explicitly via SLURM (but are child processes of a SLURM-launched 
 process) there is nothing that can be done at the SLURM level to prevent 
 SLURM from relinquishing the resources before the job is completed? 
 
   Thanks,
   ~Mike C. 
 
 -Original Message-
 From: Moe Jette [mailto:je...@schedmd.com] 
 Sent: Thursday, February 21, 2013 11:00 AM
 To: slurm-dev; Michael Colonno
 Subject: Re: [slurm-dev] untracked processes
 
 Slurm only tracks the processes that it's daemons launch (most MPI 
 implementations can launch their tasks using slurm). Anything launched 
 outside of Slurm can be killed as part of a job prolog, but accounting and 
 job step management are outside of Slurm's control.
 
 Quoting Michael Colonno mcolo...@stanford.edu:
 

  SLURM gurus ~

  I'm trying to configure a commercial MPI code to run through SLURM.  
 I can launch this code through either srun or sbatch without any 
 issues (the good) but the processes manage to run completely 
 disconnected from SLURM's notice (the bad). i.e. the job is running 
 just fine but SLURM thinks it's completed and hence does not report 
 anything running. I'm guessing this is due to the fact that this tool 
 runs a pre-processing-type executable and then launches sub-processes 
 to solve (MPI on a local system) without connecting the process IDs(?) 
 In any event, I'm guessing I'm not the first person to run into this. 
 Is there a recommended solution to configure SLURM to track codes like 
 this?

  Thanks,
  ~Mike C.