[slurm-dev] Re: Job Submit Lua Plugin
Nathan, I have very much appreciated the job_submit.lua plugin for helping educate users on what is an acceptable job. It is one of my favorite features about SLURM and has been invaluable in assisting students in submitting valid job requirements. If a user specifies some absurd amount of memory, or some other sbatch or srun parameter... or does not choose a parameter, I like to notify the user what they have done wrong. For example I require all users to specify a QoS when they submit a job. == BEGIN EXAMPLE job_submit.lua == function slurm_job_modify(job_desc, part_list, submit_uid) end function slurm_job_submit(job_desc, part_list, submit_uid) --[[ Start with an error count of 0 ]]-- local asc_error = 0 local asc_error_verbose = "" --[[ Pretend if statement ]]-- asc_error = asc_error + 1 asc_error_verbose = string.format("%s\nERROR: Job requested something we dont like.\n", asc_error_verbose) --[[ End Pretend if statement ]]-- --[[ Pretend if statement ]]-- asc_error = asc_error + 1 asc_error_verbose = string.format("%s\nERROR: More bad stuff.\n", asc_error_verbose) --[[ End Pretend if statement ]]-- if asc_error > 0 then slurm.log_user("\n%s", asc_error_verbose) return slurm.ERROR end --[[ Want to return slurm.SUCCESS if the entire script runs to end ]]-- return slurm.SUCCESS end == END EXAMPLE job_submit.lua === This is the method that I worked out, where it collects all of the errors inside asc_error_verbose and dumps out at the end with return slurm.ERROR. If you use the current file above, it will return every job with those errors above. This would be a great way to check that job_submit.lua is working on your system. If you have any current jobs though, it will kill them all... so use this on a development environment for testing. My example for making a user specify a QoS: local asc_qos = job_desc.qos if asc_qos == nil then asc_error = asc_error + 1 asc_error_verbose = string.format("%s\nJob must request a QoS using the --qos= flag.\n",asc_error_verbose) asc_qos = "invalid" end I'd be more than happy to share my job_submit.lua if anyone is interested. I only ask that you share yours back. -- Nicholas McCollum HPC Systems Administrator Alabama Supercomputer Authority On Tue, 2017-06-27 at 14:30 -0600, Nathan Vance wrote: > Darby, > > The "job_submit.lua: initialized" line in slurm.conf was indeed the > issue. When compiling slurm I only got the "yes lua" line without the > flags, but that seems to be just a difference in OS's. > > Now that I have debugging feedback I should be good to go! > > Thanks, > Nathan > > On 27 June 2017 at 16:13, Vicker, Darby (JSC-EG311)asa.gov> wrote: > > We recently started using a lua job submit plugin as well. You > > have to have the lua-devel package installed when you compile > > slurm. It looks like you do (but we use RHEL the package name is > > lua-devel) but confirm that you see something like these in > > config.log: > > > > configure:24784: result: yes lua > > pkg_cv_lua_LIBS='-llua -lm -ldl ' > > lua_CFLAGS=' -DLUA_COMPAT_ALL' > > lua_LIBS='-llua -lm -ldl ' > > > > Do you have this in your slurm.conf? > > > > JobSubmitPlugins=lua > > > > I'm guessing not given you don't see anything in the logs. Before I > > got all the errors worked out, I would see errors like this in > > slurmctld_log: > > > > error: Couldn't find the specified plugin name for job_submit/lua > > looking at all files > > error: cannot find job_submit plugin for job_submit/lua > > error: cannot create job_submit context for job_submit/lua > > failed to initialize job_submit plugin > > > > > > After getting everything working, you should see this: > > > > job_submit.lua: initialized > > > > As well as any other slurm.log_info messages you put in your lua > > script. > > > > > > From: Nathan Vance > > Reply-To: slurm-dev > > Date: Tuesday, June 27, 2017 at 12:15 PM > > To: slurm-dev > > Subject: [slurm-dev] Job Submit Lua Plugin > > > > Hello all! > > > > I've been working on getting off the ground with Lua plugins. The > > goal is to implement Torque's routing queues for SLURM, but so far > > I have been unable to get SLURM to even call my plugin. > > > > What I have tried: > > 1) Copied contrib/lua/job_submit.lua to /etc/slurm/ (the same > > directory as slurm.conf) > > 2) Restarted slurmctld and verified that no functionality was > > broken > > 3) Added slurm.log_info("I got here") to several points in the > > script. After restarting slurmctld and submitting a job, grep "I > > got here" -R /var/log found no results. > > 4) In case there was a problem with the log file, I added > > os.execute("touch /home/myUser/slurm_job_submitted") to the top of > > the slurm_job_submit method. Restarting slurmctld and submitting a > > job still produced no
[slurm-dev] Re: Job Submit Lua Plugin
Nathan and Darby, For you and anyone else using Lua, see https://bugs.schedmd.com/show_bug.cgi?id=3815 with regards to --mem vs --mem-per-cpu starting in 17.02. Ryan On 06/27/2017 02:30 PM, Nathan Vance wrote: Re: [slurm-dev] Re: Job Submit Lua Plugin Darby, The "job_submit.lua: initialized" line in slurm.conf was indeed the issue. When compiling slurm I only got the "yes lua" line without the flags, but that seems to be just a difference in OS's. Now that I have debugging feedback I should be good to go! Thanks, Nathan On 27 June 2017 at 16:13, Vicker, Darby (JSC-EG311)> wrote: We recently started using a lua job submit plugin as well. You have to have the lua-devel package installed when you compile slurm. It looks like you do (but we use RHEL the package name is lua-devel) but confirm that you see something like these in config.log: configure:24784: result: yes lua pkg_cv_lua_LIBS='-llua -lm -ldl ' lua_CFLAGS=' -DLUA_COMPAT_ALL' lua_LIBS='-llua -lm -ldl ' Do you have this in your slurm.conf? JobSubmitPlugins=lua I'm guessing not given you don't see anything in the logs. Before I got all the errors worked out, I would see errors like this in slurmctld_log: error: Couldn't find the specified plugin name for job_submit/lua looking at all files error: cannot find job_submit plugin for job_submit/lua error: cannot create job_submit context for job_submit/lua failed to initialize job_submit plugin After getting everything working, you should see this: job_submit.lua: initialized As well as any other slurm.log_info messages you put in your lua script. *From: *Nathan Vance > *Reply-To: *slurm-dev > *Date: *Tuesday, June 27, 2017 at 12:15 PM *To: *slurm-dev > *Subject: *[slurm-dev] Job Submit Lua Plugin Hello all! I've been working on getting off the ground with Lua plugins. The goal is to implement Torque's routing queues for SLURM, but so far I have been unable to get SLURM to even call my plugin. What I have tried: 1) Copied contrib/lua/job_submit.lua to /etc/slurm/ (the same directory as slurm.conf) 2) Restarted slurmctld and verified that no functionality was broken 3) Added slurm.log_info("I got here") to several points in the script. After restarting slurmctld and submitting a job, grep "I got here" -R /var/log found no results. 4) In case there was a problem with the log file, I added os.execute("touch /home/myUser/slurm_job_submitted") to the top of the slurm_job_submit method. Restarting slurmctld and submitting a job still produced no evidence that my plugin was called. 5) In case there were permission issues, I made job_submit.lua executable. Nothing. Even grep "job_submit" -R /var/log (in case there was an error calling the script) comes up dry. Relevant information: OS: Ubuntu 16.04 Lua: lua5.2 and liblua5.2-dev (I can use Lua interactively) SLURM version: 17.02.5, compiled from source (after installing Lua) using ./configure --prefix=/usr --sysconfdir=/etc/slurm Any guidance to get me up and running would be greatly appreciated! Thanks, Nathan -- Ryan Cox Operations Director Fulton Supercomputing Lab Brigham Young University
[slurm-dev] Re: Job Submit Lua Plugin
Darby, The "job_submit.lua: initialized" line in slurm.conf was indeed the issue. When compiling slurm I only got the "yes lua" line without the flags, but that seems to be just a difference in OS's. Now that I have debugging feedback I should be good to go! Thanks, Nathan On 27 June 2017 at 16:13, Vicker, Darby (JSC-EG311)wrote: > We recently started using a lua job submit plugin as well. You have to > have the lua-devel package installed when you compile slurm. It looks like > you do (but we use RHEL the package name is lua-devel) but confirm that you > see something like these in config.log: > > > > configure:24784: result: yes lua > > pkg_cv_lua_LIBS='-llua -lm -ldl ' > > lua_CFLAGS=' -DLUA_COMPAT_ALL' > > lua_LIBS='-llua -lm -ldl ' > > > > Do you have this in your slurm.conf? > > > > JobSubmitPlugins=lua > > > > I'm guessing not given you don't see anything in the logs. Before I got > all the errors worked out, I would see errors like this in slurmctld_log: > > > > error: Couldn't find the specified plugin name for job_submit/lua looking > at all files > > error: cannot find job_submit plugin for job_submit/lua > > error: cannot create job_submit context for job_submit/lua > > failed to initialize job_submit plugin > > > > > > After getting everything working, you should see this: > > > > job_submit.lua: initialized > > > > As well as any other slurm.log_info messages you put in your lua script. > > > > > > *From: *Nathan Vance > *Reply-To: *slurm-dev > *Date: *Tuesday, June 27, 2017 at 12:15 PM > *To: *slurm-dev > *Subject: *[slurm-dev] Job Submit Lua Plugin > > > > Hello all! > > I've been working on getting off the ground with Lua plugins. The goal is > to implement Torque's routing queues for SLURM, but so far I have been > unable to get SLURM to even call my plugin. > > What I have tried: > > 1) Copied contrib/lua/job_submit.lua to /etc/slurm/ (the same directory as > slurm.conf) > > 2) Restarted slurmctld and verified that no functionality was broken > > 3) Added slurm.log_info("I got here") to several points in the script. > After restarting slurmctld and submitting a job, grep "I got here" -R > /var/log found no results. > > 4) In case there was a problem with the log file, I added > os.execute("touch /home/myUser/slurm_job_submitted") to the top of the > slurm_job_submit method. Restarting slurmctld and submitting a job still > produced no evidence that my plugin was called. > > 5) In case there were permission issues, I made job_submit.lua executable. > Nothing. Even grep "job_submit" -R /var/log (in case there was an error > calling the script) comes up dry. > > Relevant information: > > OS: Ubuntu 16.04 > > Lua: lua5.2 and liblua5.2-dev (I can use Lua interactively) > > SLURM version: 17.02.5, compiled from source (after installing Lua) using > ./configure --prefix=/usr --sysconfdir=/etc/slurm > > Any guidance to get me up and running would be greatly appreciated! > > > > Thanks, > > Nathan >
[slurm-dev] Re: Job Submit Lua Plugin
We recently started using a lua job submit plugin as well. You have to have the lua-devel package installed when you compile slurm. It looks like you do (but we use RHEL the package name is lua-devel) but confirm that you see something like these in config.log: configure:24784: result: yes lua pkg_cv_lua_LIBS='-llua -lm -ldl ' lua_CFLAGS=' -DLUA_COMPAT_ALL' lua_LIBS='-llua -lm -ldl ' Do you have this in your slurm.conf? JobSubmitPlugins=lua I'm guessing not given you don't see anything in the logs. Before I got all the errors worked out, I would see errors like this in slurmctld_log: error: Couldn't find the specified plugin name for job_submit/lua looking at all files error: cannot find job_submit plugin for job_submit/lua error: cannot create job_submit context for job_submit/lua failed to initialize job_submit plugin After getting everything working, you should see this: job_submit.lua: initialized As well as any other slurm.log_info messages you put in your lua script. From: Nathan VanceReply-To: slurm-dev Date: Tuesday, June 27, 2017 at 12:15 PM To: slurm-dev Subject: [slurm-dev] Job Submit Lua Plugin Hello all! I've been working on getting off the ground with Lua plugins. The goal is to implement Torque's routing queues for SLURM, but so far I have been unable to get SLURM to even call my plugin. What I have tried: 1) Copied contrib/lua/job_submit.lua to /etc/slurm/ (the same directory as slurm.conf) 2) Restarted slurmctld and verified that no functionality was broken 3) Added slurm.log_info("I got here") to several points in the script. After restarting slurmctld and submitting a job, grep "I got here" -R /var/log found no results. 4) In case there was a problem with the log file, I added os.execute("touch /home/myUser/slurm_job_submitted") to the top of the slurm_job_submit method. Restarting slurmctld and submitting a job still produced no evidence that my plugin was called. 5) In case there were permission issues, I made job_submit.lua executable. Nothing. Even grep "job_submit" -R /var/log (in case there was an error calling the script) comes up dry. Relevant information: OS: Ubuntu 16.04 Lua: lua5.2 and liblua5.2-dev (I can use Lua interactively) SLURM version: 17.02.5, compiled from source (after installing Lua) using ./configure --prefix=/usr --sysconfdir=/etc/slurm Any guidance to get me up and running would be greatly appreciated! Thanks, Nathan
[slurm-dev] Re: slurm-dev Announce: Node status tool "pestat" for Slurm updated to version 0.50
On 26-06-2017 17:20, Adrian Sevcenco wrote: On 06/22/2017 01:34 PM, Ole Holm Nielsen wrote: I'm announcing an updated version 0.50 of the node status tool "pestat" for Slurm. I discovered how to obtain the node Free Memory with sinfo, so now we can do nice things with memory usage! Hi! thank you for the great tool! i don't know if this is intended but : [Monday 26.06.17 18:12] adrian@sev : ~ $ sinfo -N -t idle -o "%N %P %C %O %m %e %t" | column -t NODELIST PARTITION CPUS(A/I/O/T) CPU_LOAD MEMORY FREE_MEM STATE localhost local* 0/8/0/80.03 14984 201 idle [Monday 26.06.17 18:13] adrian@sev : ~ $ free -m totalusedfree shared buff/cache available Mem: 14984 392 182 134 14409 14081 Swap: 8191 08191 [Monday 26.06.17 18:13] adrian@sev : ~ $ pestat Hostname Partition Node Num_CPU CPUload Memsize Freemem Joblist State Use/Tot (MB) (MB) JobId User ... localhost local* idle 0 80.03 14984 201* while it is clear that the reported free mem is what is reported by free as "free" one might argue that buffers/cache is memory available for usage as it will shrink with the application usage ... Maybe the FREE_MEM should be reported as (free + cached) ? The pestat tool simply reports the free_mem value provided by sinfo. I'm not sure I understand your point, but only SchedMD can change Slurm's reporting. /Ole
[slurm-dev] Job Submit Lua Plugin
Hello all! I've been working on getting off the ground with Lua plugins. The goal is to implement Torque's routing queues for SLURM, but so far I have been unable to get SLURM to even call my plugin. What I have tried: 1) Copied contrib/lua/job_submit.lua to /etc/slurm/ (the same directory as slurm.conf) 2) Restarted slurmctld and verified that no functionality was broken 3) Added slurm.log_info("I got here") to several points in the script. After restarting slurmctld and submitting a job, grep "I got here" -R /var/log found no results. 4) In case there was a problem with the log file, I added os.execute("touch /home/myUser/slurm_job_submitted") to the top of the slurm_job_submit method. Restarting slurmctld and submitting a job still produced no evidence that my plugin was called. 5) In case there were permission issues, I made job_submit.lua executable. Nothing. Even grep "job_submit" -R /var/log (in case there was an error calling the script) comes up dry. Relevant information: OS: Ubuntu 16.04 Lua: lua5.2 and liblua5.2-dev (I can use Lua interactively) SLURM version: 17.02.5, compiled from source (after installing Lua) using ./configure --prefix=/usr --sysconfdir=/etc/slurm Any guidance to get me up and running would be greatly appreciated! Thanks, Nathan