[slurm-dev] Re: untracked processes
The message below should read epilog rather than prolog. Slurm only tracks the processes that it's daemons launch (most MPI implementations can launch their tasks using slurm). Anything launched outside of Slurm can be killed as part of a job epilog, but accounting and job step management are outside of Slurm's control. The epilog can check if user still has a Slurm job allocated to the node and if not, kill all processes owned by that user. Quoting Moe Jette je...@schedmd.com: Slurm only tracks the processes that it's daemons launch (most MPI implementations can launch their tasks using slurm). Anything launched outside of Slurm can be killed as part of a job prolog, but accounting and job step management are outside of Slurm's control. Quoting Michael Colonno mcolo...@stanford.edu: SLURM gurus ~ I'm trying to configure a commercial MPI code to run through SLURM. I can launch this code through either srun or sbatch without any issues (the good) but the processes manage to run completely disconnected from SLURM's notice (the bad). i.e. the job is running just fine but SLURM thinks it's completed and hence does not report anything running. I'm guessing this is due to the fact that this tool runs a pre-processing-type executable and then launches sub-processes to solve (MPI on a local system) without connecting the process IDs(?) In any event, I'm guessing I'm not the first person to run into this. Is there a recommended solution to configure SLURM to track codes like this? Thanks, ~Mike C.
[slurm-dev] Re: untracked processes
This may not be exactly what you're looking for but it could be a start. We're looking at adding modifying ssh_config and sshd_config to propagate SLURM_JOB_ID for jobs that use ssh to spawn processes (credit to our sysadmin Lloyd Brown for that one). Then we will use something like a script in /etc/profile.d to add the process to the correct cgroup if it's launched via ssh and has $SLURM_JOB_ID set. We're not using cgroups yet (still have some CentOS 5) so I don't have exact implementation details at this point. Then the cgroups should work for resource control and, I assume, accounting if using the correct plugin. This may not catch 100% of everything, but we would probably have something look for all user processes that are not part of a cgroup and add them to the user cgroup. I don't think accounting could work in that case, but that would help catch and control rogue processes that aren't accounted for under SLURM. Epilog or a cron could clean up all of a user's processes after they don't have jobs on the node anymore. I don't know if SLURM has something like Torque's tm_adopt, but that could work in lieu of cgroups for accounting if you don't happen to use cgroups. tm_adopt allowed you to add a random process to be accounted for under Torque, even if it wasn't launched under Torque. We used to have a wrapper script for ssh that did just that when we used Torque and Moab. Ryan P.S. We've only been using SLURM for a few weeks so you might want to double-check the accuracy and viability of my statements :) On 02/21/2013 12:57 PM, Moe Jette wrote: Slurm only tracks the processes that it's daemons launch (most MPI implementations can launch their tasks using slurm). Anything launched outside of Slurm can be killed as part of a job prolog, but accounting and job step management are outside of Slurm's control. Quoting Michael Colonno mcolo...@stanford.edu: SLURM gurus ~ I'm trying to configure a commercial MPI code to run through SLURM. I can launch this code through either srun or sbatch without any issues (the good) but the processes manage to run completely disconnected from SLURM's notice (the bad). i.e. the job is running just fine but SLURM thinks it's completed and hence does not report anything running. I'm guessing this is due to the fact that this tool runs a pre-processing-type executable and then launches sub-processes to solve (MPI on a local system) without connecting the process IDs(?) In any event, I'm guessing I'm not the first person to run into this. Is there a recommended solution to configure SLURM to track codes like this? Thanks, ~Mike C. -- Ryan Cox Operations Director Fulton Supercomputing Lab Brigham Young University
[slurm-dev] RE: untracked processes
I believe you're understanding the problem correctly. Especially with MPI, you need some mechanism to launch processes on remote hosts. With scheduler integration of some kind (eg. srun with SLURM, TM-API with Torque, etc.), then the MPI implementation can work with the scheduling tool to do this, and everyone is happy. But lacking that, the MPI implementation still needs to have a fallback, which is usually SSH. Having said that, in my experience most commercial codes that use MPI, just package/utilize some MPI implementation that's out there on the market. Many of those MPI implementations do have SLURM integration. It would be helpful to know what the commercial software is, and perhaps more importantly, what MPI implementation it is using. Also, some commercial codes also have a way to specify (via a cli parameter) a drop-in replacement for ssh. In theory, you could probably create a simple wrapper around srun to make the syntax more like ssh, and use that. There are probably other solutions out there too. Lloyd Brown Systems Administrator Fulton Supercomputing Lab Brigham Young University http://marylou.byu.edu On 02/21/2013 02:06 PM, Michael Colonno wrote: Thanks for the reply. I'm not 100% clear on the below so let me be more specific. I'm launching the code via srun (for example). The code launches, runs a few different executables in order, and eventually launches a few MPI processes though its own MPI implementation. I have no control over the source code nor what syntax is used to launch the sub-processes. srun launches these processes and then reports the job completed; this is the only tool that behaves this way (others seem to track processes even if not launched through SLURM). Is the conclusion that if the sub-processes are not launched explicitly via SLURM (but are child processes of a SLURM-launched process) there is nothing that can be done at the SLURM level to prevent SLURM from relinquishing the resources before the job is completed? Thanks, ~Mike C. -Original Message- From: Moe Jette [mailto:je...@schedmd.com] Sent: Thursday, February 21, 2013 11:00 AM To: slurm-dev; Michael Colonno Subject: Re: [slurm-dev] untracked processes Slurm only tracks the processes that it's daemons launch (most MPI implementations can launch their tasks using slurm). Anything launched outside of Slurm can be killed as part of a job prolog, but accounting and job step management are outside of Slurm's control. Quoting Michael Colonno mcolo...@stanford.edu: SLURM gurus ~ I'm trying to configure a commercial MPI code to run through SLURM. I can launch this code through either srun or sbatch without any issues (the good) but the processes manage to run completely disconnected from SLURM's notice (the bad). i.e. the job is running just fine but SLURM thinks it's completed and hence does not report anything running. I'm guessing this is due to the fact that this tool runs a pre-processing-type executable and then launches sub-processes to solve (MPI on a local system) without connecting the process IDs(?) In any event, I'm guessing I'm not the first person to run into this. Is there a recommended solution to configure SLURM to track codes like this? Thanks, ~Mike C.