[slurm-dev] Re: Job Submit Lua Plugin

2017-06-27 Thread Nicholas McCollum
Nathan,

I have very much appreciated the job_submit.lua plugin for helping
educate users on what is an acceptable job.  It is one of my favorite
features about SLURM and has been invaluable in assisting students in
submitting valid job requirements.  

If a user specifies some absurd amount of memory, or some other sbatch
or srun parameter... or does not choose a parameter, I like to notify
the user what they have done wrong.  For example I require all users to
specify a QoS when they submit a job.  

== BEGIN EXAMPLE job_submit.lua ==

function slurm_job_modify(job_desc, part_list, submit_uid)
end

function slurm_job_submit(job_desc, part_list, submit_uid)

--[[ Start with an error count of 0 ]]--
  local asc_error = 0
  local asc_error_verbose = ""

  --[[ Pretend if statement ]]--
asc_error = asc_error + 1
asc_error_verbose = string.format("%s\nERROR: Job requested
something we dont like.\n", asc_error_verbose)
  --[[ End Pretend if statement ]]--

  --[[ Pretend if statement ]]--
asc_error = asc_error + 1
asc_error_verbose = string.format("%s\nERROR: More bad stuff.\n",
asc_error_verbose)
  --[[ End Pretend if statement ]]--
  
  if asc_error > 0 then
slurm.log_user("\n%s", asc_error_verbose)
return slurm.ERROR
  end  

  --[[ Want to return slurm.SUCCESS if the entire script runs to end
]]--
  return slurm.SUCCESS
end

== END EXAMPLE job_submit.lua ===

This is the method that I worked out, where it collects all of the
errors inside asc_error_verbose and dumps out at the end with return
slurm.ERROR.   If you use the current file above, it will return every
job with those errors above.  This would be a great way to check that
job_submit.lua is working on your system.  If you have any current jobs
though, it will kill them all... so use this on a development
environment for testing.

My example for making a user specify a QoS:

  local asc_qos = job_desc.qos
  if asc_qos == nil then
asc_error = asc_error + 1
asc_error_verbose = string.format("%s\nJob must request a QoS using
the --qos= flag.\n",asc_error_verbose)
asc_qos = "invalid"
  end


I'd be more than happy to share my job_submit.lua if anyone is
interested.  I only ask that you share yours back.

-- 
Nicholas McCollum
HPC Systems Administrator
Alabama Supercomputer Authority

On Tue, 2017-06-27 at 14:30 -0600, Nathan Vance wrote:
> Darby,
> 
> The "job_submit.lua: initialized" line in slurm.conf was indeed the
> issue. When compiling slurm I only got the "yes lua" line without the
> flags, but that seems to be just a difference in OS's.
> 
> Now that I have debugging feedback I should be good to go!
> 
> Thanks,
> Nathan
> 
> On 27 June 2017 at 16:13, Vicker, Darby (JSC-EG311)  asa.gov> wrote:
> > We recently started using a lua job submit plugin as well.  You
> > have to have the lua-devel package installed when you compile
> > slurm.  It looks like you do (but we use RHEL the package name is
> > lua-devel) but confirm that you see something like these in
> > config.log:
> >  
> > configure:24784: result: yes lua
> > pkg_cv_lua_LIBS='-llua -lm -ldl  '
> > lua_CFLAGS='  -DLUA_COMPAT_ALL'
> > lua_LIBS='-llua -lm -ldl  '
> >  
> > Do you have this in your slurm.conf?
> >  
> > JobSubmitPlugins=lua
> >  
> > I'm guessing not given you don't see anything in the logs. Before I
> > got all the errors worked out, I would see errors like this in
> > slurmctld_log:
> >  
> > error: Couldn't find the specified plugin name for job_submit/lua
> > looking at all files
> > error: cannot find job_submit plugin for job_submit/lua
> > error: cannot create job_submit context for job_submit/lua
> > failed to initialize job_submit plugin
> >  
> >  
> > After getting everything working, you should see this:
> >  
> > job_submit.lua: initialized
> >  
> > As well as any other slurm.log_info messages you put in your lua
> > script. 
> >  
> >  
> > From: Nathan Vance 
> > Reply-To: slurm-dev 
> > Date: Tuesday, June 27, 2017 at 12:15 PM
> > To: slurm-dev 
> > Subject: [slurm-dev] Job Submit Lua Plugin
> >  
> > Hello all!
> > 
> > I've been working on getting off the ground with Lua plugins. The
> > goal is to implement Torque's routing queues for SLURM, but so far
> > I have been unable to get SLURM to even call my plugin.
> > 
> > What I have tried:
> > 1) Copied contrib/lua/job_submit.lua to /etc/slurm/ (the same
> > directory as slurm.conf)
> > 2) Restarted slurmctld and verified that no functionality was
> > broken
> > 3) Added slurm.log_info("I got here") to several points in the
> > script. After restarting slurmctld and submitting a job, grep "I
> > got here" -R /var/log found no results.
> > 4) In case there was a problem with the log file, I added
> > os.execute("touch /home/myUser/slurm_job_submitted") to the top of
> > the slurm_job_submit method. Restarting slurmctld and submitting a
> > job still produced no 

[slurm-dev] Re: Job Submit Lua Plugin

2017-06-27 Thread Ryan Cox

Nathan and Darby,

For you and anyone else using Lua, see 
https://bugs.schedmd.com/show_bug.cgi?id=3815 with regards to --mem vs 
--mem-per-cpu starting in 17.02.


Ryan

On 06/27/2017 02:30 PM, Nathan Vance wrote:

Re: [slurm-dev] Re: Job Submit Lua Plugin
Darby,

The "job_submit.lua: initialized" line in slurm.conf was indeed the 
issue. When compiling slurm I only got the "yes lua" line without the 
flags, but that seems to be just a difference in OS's.


Now that I have debugging feedback I should be good to go!

Thanks,
Nathan

On 27 June 2017 at 16:13, Vicker, Darby (JSC-EG311) 
> wrote:


We recently started using a lua job submit plugin as well.  You
have to have the lua-devel package installed when you compile
slurm. It looks like you do (but we use RHEL the package name is
lua-devel) but confirm that you see something like these in
config.log:

configure:24784: result: yes lua

pkg_cv_lua_LIBS='-llua -lm -ldl '

lua_CFLAGS='  -DLUA_COMPAT_ALL'

lua_LIBS='-llua -lm -ldl  '

Do you have this in your slurm.conf?

JobSubmitPlugins=lua

I'm guessing not given you don't see anything in the logs. Before
I got all the errors worked out, I would see errors like this in
slurmctld_log:

error: Couldn't find the specified plugin name for job_submit/lua
looking at all files

error: cannot find job_submit plugin for job_submit/lua

error: cannot create job_submit context for job_submit/lua

failed to initialize job_submit plugin

After getting everything working, you should see this:

job_submit.lua: initialized

As well as any other slurm.log_info messages you put in your lua
script.

*From: *Nathan Vance >
*Reply-To: *slurm-dev >
*Date: *Tuesday, June 27, 2017 at 12:15 PM
*To: *slurm-dev >
*Subject: *[slurm-dev] Job Submit Lua Plugin

Hello all!

I've been working on getting off the ground with Lua plugins. The
goal is to implement Torque's routing queues for SLURM, but so far
I have been unable to get SLURM to even call my plugin.

What I have tried:

1) Copied contrib/lua/job_submit.lua to /etc/slurm/ (the same
directory as slurm.conf)

2) Restarted slurmctld and verified that no functionality was broken

3) Added slurm.log_info("I got here") to several points in the
script. After restarting slurmctld and submitting a job, grep "I
got here" -R /var/log found no results.

4) In case there was a problem with the log file, I added
os.execute("touch /home/myUser/slurm_job_submitted") to the top of
the slurm_job_submit method. Restarting slurmctld and submitting a
job still produced no evidence that my plugin was called.

5) In case there were permission issues, I made job_submit.lua
executable. Nothing. Even grep "job_submit" -R /var/log (in case
there was an error calling the script) comes up dry.

Relevant information:

OS: Ubuntu 16.04

Lua: lua5.2 and liblua5.2-dev (I can use Lua interactively)

SLURM version: 17.02.5, compiled from source (after installing
Lua) using ./configure --prefix=/usr --sysconfdir=/etc/slurm

Any guidance to get me up and running would be greatly appreciated!

Thanks,

Nathan




--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University



[slurm-dev] Re: Job Submit Lua Plugin

2017-06-27 Thread Nathan Vance
Darby,

The "job_submit.lua: initialized" line in slurm.conf was indeed the issue.
When compiling slurm I only got the "yes lua" line without the flags, but
that seems to be just a difference in OS's.

Now that I have debugging feedback I should be good to go!

Thanks,
Nathan

On 27 June 2017 at 16:13, Vicker, Darby (JSC-EG311)  wrote:

> We recently started using a lua job submit plugin as well.  You have to
> have the lua-devel package installed when you compile slurm.  It looks like
> you do (but we use RHEL the package name is lua-devel) but confirm that you
> see something like these in config.log:
>
>
>
> configure:24784: result: yes lua
>
> pkg_cv_lua_LIBS='-llua -lm -ldl  '
>
> lua_CFLAGS='  -DLUA_COMPAT_ALL'
>
> lua_LIBS='-llua -lm -ldl  '
>
>
>
> Do you have this in your slurm.conf?
>
>
>
> JobSubmitPlugins=lua
>
>
>
> I'm guessing not given you don't see anything in the logs. Before I got
> all the errors worked out, I would see errors like this in slurmctld_log:
>
>
>
> error: Couldn't find the specified plugin name for job_submit/lua looking
> at all files
>
> error: cannot find job_submit plugin for job_submit/lua
>
> error: cannot create job_submit context for job_submit/lua
>
> failed to initialize job_submit plugin
>
>
>
>
>
> After getting everything working, you should see this:
>
>
>
> job_submit.lua: initialized
>
>
>
> As well as any other slurm.log_info messages you put in your lua script.
>
>
>
>
>
> *From: *Nathan Vance 
> *Reply-To: *slurm-dev 
> *Date: *Tuesday, June 27, 2017 at 12:15 PM
> *To: *slurm-dev 
> *Subject: *[slurm-dev] Job Submit Lua Plugin
>
>
>
> Hello all!
>
> I've been working on getting off the ground with Lua plugins. The goal is
> to implement Torque's routing queues for SLURM, but so far I have been
> unable to get SLURM to even call my plugin.
>
> What I have tried:
>
> 1) Copied contrib/lua/job_submit.lua to /etc/slurm/ (the same directory as
> slurm.conf)
>
> 2) Restarted slurmctld and verified that no functionality was broken
>
> 3) Added slurm.log_info("I got here") to several points in the script.
> After restarting slurmctld and submitting a job, grep "I got here" -R
> /var/log found no results.
>
> 4) In case there was a problem with the log file, I added
> os.execute("touch /home/myUser/slurm_job_submitted") to the top of the
> slurm_job_submit method. Restarting slurmctld and submitting a job still
> produced no evidence that my plugin was called.
>
> 5) In case there were permission issues, I made job_submit.lua executable.
> Nothing. Even grep "job_submit" -R /var/log (in case there was an error
> calling the script) comes up dry.
>
> Relevant information:
>
> OS: Ubuntu 16.04
>
> Lua: lua5.2 and liblua5.2-dev (I can use Lua interactively)
>
> SLURM version: 17.02.5, compiled from source (after installing Lua) using
> ./configure --prefix=/usr --sysconfdir=/etc/slurm
>
> Any guidance to get me up and running would be greatly appreciated!
>
>
>
> Thanks,
>
> Nathan
>


[slurm-dev] Re: Job Submit Lua Plugin

2017-06-27 Thread Vicker, Darby (JSC-EG311)
We recently started using a lua job submit plugin as well.  You have to have 
the lua-devel package installed when you compile slurm.  It looks like you do 
(but we use RHEL the package name is lua-devel) but confirm that you see 
something like these in config.log:

configure:24784: result: yes lua
pkg_cv_lua_LIBS='-llua -lm -ldl  '
lua_CFLAGS='  -DLUA_COMPAT_ALL'
lua_LIBS='-llua -lm -ldl  '

Do you have this in your slurm.conf?

JobSubmitPlugins=lua

I'm guessing not given you don't see anything in the logs. Before I got all the 
errors worked out, I would see errors like this in slurmctld_log:

error: Couldn't find the specified plugin name for job_submit/lua looking at 
all files
error: cannot find job_submit plugin for job_submit/lua
error: cannot create job_submit context for job_submit/lua
failed to initialize job_submit plugin


After getting everything working, you should see this:

job_submit.lua: initialized

As well as any other slurm.log_info messages you put in your lua script.


From: Nathan Vance 
Reply-To: slurm-dev 
Date: Tuesday, June 27, 2017 at 12:15 PM
To: slurm-dev 
Subject: [slurm-dev] Job Submit Lua Plugin

Hello all!
I've been working on getting off the ground with Lua plugins. The goal is to 
implement Torque's routing queues for SLURM, but so far I have been unable to 
get SLURM to even call my plugin.
What I have tried:
1) Copied contrib/lua/job_submit.lua to /etc/slurm/ (the same directory as 
slurm.conf)
2) Restarted slurmctld and verified that no functionality was broken
3) Added slurm.log_info("I got here") to several points in the script. After 
restarting slurmctld and submitting a job, grep "I got here" -R /var/log found 
no results.
4) In case there was a problem with the log file, I added os.execute("touch 
/home/myUser/slurm_job_submitted") to the top of the slurm_job_submit method. 
Restarting slurmctld and submitting a job still produced no evidence that my 
plugin was called.
5) In case there were permission issues, I made job_submit.lua executable. 
Nothing. Even grep "job_submit" -R /var/log (in case there was an error calling 
the script) comes up dry.
Relevant information:
OS: Ubuntu 16.04
Lua: lua5.2 and liblua5.2-dev (I can use Lua interactively)
SLURM version: 17.02.5, compiled from source (after installing Lua) using 
./configure --prefix=/usr --sysconfdir=/etc/slurm
Any guidance to get me up and running would be greatly appreciated!

Thanks,
Nathan


[slurm-dev] Re: slurm-dev Announce: Node status tool "pestat" for Slurm updated to version 0.50

2017-06-27 Thread Ole Holm Nielsen


On 26-06-2017 17:20, Adrian Sevcenco wrote:


On 06/22/2017 01:34 PM, Ole Holm Nielsen wrote:


I'm announcing an updated version 0.50 of the node status tool 
"pestat" for Slurm.  I discovered how to obtain the node Free Memory 
with sinfo, so now we can do nice things with memory usage!


Hi! thank you for the great tool! i don't know if this is intended but :

[Monday 26.06.17 18:12] adrian@sev : ~  $
sinfo -N -t idle -o "%N %P %C %O %m %e %t" | column -t
NODELIST   PARTITION  CPUS(A/I/O/T)  CPU_LOAD  MEMORY  FREE_MEM  STATE
localhost  local* 0/8/0/80.03  14984   201   idle

[Monday 26.06.17 18:13] adrian@sev : ~  $
free -m
   totalusedfree  shared  buff/cache 
available

Mem:  14984 392 182 134   14409  14081
Swap:  8191   08191

[Monday 26.06.17 18:13] adrian@sev : ~  $
pestat
Hostname   Partition Node Num_CPU  CPUload  Memsize  Freemem 
Joblist
 State Use/Tot  (MB) (MB) 
JobId User ...

localhost  local* idle   0   80.03 14984  201*


while it is clear that the reported free mem is what is reported by free 
as "free" one might argue that buffers/cache is memory available for 
usage as it will shrink with the application usage ...


Maybe the FREE_MEM should be reported as (free + cached) ?


The pestat tool simply reports the free_mem value provided by sinfo.
I'm not sure I understand your point, but only SchedMD can change 
Slurm's reporting.


/Ole


[slurm-dev] Job Submit Lua Plugin

2017-06-27 Thread Nathan Vance
Hello all!

I've been working on getting off the ground with Lua plugins. The goal is
to implement Torque's routing queues for SLURM, but so far I have been
unable to get SLURM to even call my plugin.

What I have tried:
1) Copied contrib/lua/job_submit.lua to /etc/slurm/ (the same directory as
slurm.conf)
2) Restarted slurmctld and verified that no functionality was broken
3) Added slurm.log_info("I got here") to several points in the script.
After restarting slurmctld and submitting a job, grep "I got here" -R
/var/log found no results.
4) In case there was a problem with the log file, I added os.execute("touch
/home/myUser/slurm_job_submitted") to the top of the slurm_job_submit
method. Restarting slurmctld and submitting a job still produced no
evidence that my plugin was called.
5) In case there were permission issues, I made job_submit.lua executable.
Nothing. Even grep "job_submit" -R /var/log (in case there was an error
calling the script) comes up dry.

Relevant information:
OS: Ubuntu 16.04
Lua: lua5.2 and liblua5.2-dev (I can use Lua interactively)
SLURM version: 17.02.5, compiled from source (after installing Lua) using
./configure --prefix=/usr --sysconfdir=/etc/slurm

Any guidance to get me up and running would be greatly appreciated!

Thanks,
Nathan