Re: [slurm-users] 20.11.1 on Cray: job_submit.lua: SO loaded on CtlD restart: script skipped when job submitted

2020-12-17 Thread Kevin Buckley

On 2020/12/17 11:34, Chris Samuel wrote:

On 16/12/20 6:21 pm, Kevin Buckley wrote:


The skip is occuring, in src/lua/slurm_lua.c, because of this trap


That looks right to me, that's Doug's code which is checking whether the
file has been updated since slurmctld last read it in.  If it has then
it'll reload it, but if it hasn't then it'll skip it (and if you've got
debugging up high then you'll see that message).


OK. That makes sense.


So if you see that message then the lua has been read in to slurmctld
and should get called.  You might want to check the log for when it last
read it in, just in case there was some error detected at that point.


Well, in the log snippet I provided, the implication is: Success.


You can also use luac to run a check over the script you've got like this:

luac -p /etc/opt/slurm/job_submit.lua


There's no luac in the Cray SDB images by default, only the
supporting libs, as the functionality is clearly there, vis:
the very first loading had already picked up a "missing end",
hence the assumption that the Success seen was implying a
"deep joy".

Will keep playing: cheers for the info,
Kevin

--
Supercomputing Systems Administrator
Pawsey Supercomputing Centre



Re: [slurm-users] 20.11.1 on Cray: job_submit.lua: SO loaded on CtlD restart: script skipped when job submitted

2020-12-16 Thread Chris Samuel

On 16/12/20 6:21 pm, Kevin Buckley wrote:


The skip is occuring, in src/lua/slurm_lua.c, because of this trap


That looks right to me, that's Doug's code which is checking whether the 
file has been updated since slurmctld last read it in.  If it has then 
it'll reload it, but if it hasn't then it'll skip it (and if you've got 
debugging up high then you'll see that message).


So if you see that message then the lua has been read in to slurmctld 
and should get called.  You might want to check the log for when it last 
read it in, just in case there was some error detected at that point.


You can also use luac to run a check over the script you've got like this:

luac -p /etc/opt/slurm/job_submit.lua

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



[slurm-users] 20.11.1 on Cray: job_submit.lua: SO loaded on CtlD restart: script skipped when job submitted

2020-12-16 Thread Kevin Buckley

Probaly not specific to 20.11.1, nor a Cray, but has anyone out there seen 
anything like this.

As the slurmctld restarts, after upping the debug level, it all look hunky dory,

[2020-12-17T09:23:46.204] debug3: Trying to load plugin 
/opt/slurm/20.11.1/lib64/slurm/job_submit_cray_aries.so
[2020-12-17T09:23:46.205] debug3: Success.
[2020-12-17T09:23:46.206] debug3: Trying to load plugin 
/opt/slurm/20.11.1/lib64/slurm/job_submit_lua.so
[2020-12-17T09:23:46.207] debug3: slurm_lua_loadscript: job_submit/lua: loading 
Lua script: /etc/opt/slurm/job_submit.lua
[2020-12-17T09:23:46.208] debug3: Success.
[2020-12-17T09:23:46.209] debug3: Trying to load plugin 
/opt/slurm/20.11.1/lib64/slurm/prep_script.so
[2020-12-17T09:23:46.210] debug3: Success.

but, at the point a submiited job that should pass through the job_submit 
script,

[2020-12-17T09:26:06.806] debug3: job_submit/lua: slurm_lua_loadscript: 
skipping loading Lua script: /etc/opt/slurm/job_submit.lua
[2020-12-17T09:26:06.807] debug3: assoc_mgr_fill_in_user: found correct user: 
someuser(12345)
[2020-12-17T09:26:06.808] debug5: assoc_mgr_fill_in_assoc: looking for assoc of 
user=someuser(12345), acct=accnts0001, cluster=clust, partition=acceptance
[2020-12-17T09:26:06.809] debug3: assoc_mgr_fill_in_assoc: found correct 
association of user=someuser(12345), acct=accnts0001, cluster=clust, 
partition=acceptance to assoc=67 acct=accnts0001


Reason I went looking is that the job_submit.lua should be telling
me, the job submitter, to "sling my hook" as I have, deliberately,
left something out.

FWIW, the debug level here goes all the way to 5, so I was hoping
for a little more info as to why it is skipping it.

The skip is occuring, in src/lua/slurm_lua.c, because of this trap

if (st.st_mtime <= *load_time) {
debug3("%s: %s: skipping loading Lua script: %s", plugin,
   __func__, script_path);
return SLURM_SUCCESS;
}
debug3("%s: %s: loading Lua script: %s", __func__, plugin, script_path);

where "st" is a stat struct, but I am currently none the wiser as why
such a condition would be (maybe even, would need to be) triggered?

The job submit script is certainly "younger" than the time of the slurmctld
restart, and of the job submission, be then, why wouldn't it be?

Kevin
--
Supercomputing Systems Administrator
Pawsey Supercomputing Centre