from:"Chris Samuel"

[slurm-users] Re: Spread a multistep job across clusters

2024-08-27 Thread Chris Samuel via slurm-users

On 26/8/24 8:40 am, Di Bernardini, Fabio via slurm-users wrote:

Hi everyone, for accounting reasons, I need to create only one job 
across two or more federated clusters with two or more srun steps.

The limitations for heterogenous jobs say:

https://slurm.schedmd.com/heterogeneous_jobs.html#limitations

> In a federation of clusters, a heterogeneous job will execute
> entirely on the cluster from which the job is submitted. The
> heterogeneous job will not be eligible to migrate between clusters
> or to have different components of the job execute on different
> clusters in the federation.

However, from your script it's not clear to me that's what you're 
meaning, because you include multiple --cluster options. I'm not sure if 
that works, as you mention the docs don't cover that case. They do say 
(however) that:

> If a heterogeneous job is submitted to run in multiple clusters not
> part of a federation (e.g. "sbatch --cluster=alpha,beta ...") then
> the entire job will be sent to the cluster expected to be able to
> start all components at the earliest time.

My gut instinct is that this isn't going to work, my feeling is that to 
launch a heterogenous job like this requires the slurmctld's on each 
cluster to coordinate and I'm not aware of that being possible currently.

All the best,
Chris

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: REST API - get_user_environment

2024-08-27 Thread Chris Samuel via slurm-users


On 27/8/24 10:26 am, jpuerto--- via slurm-users wrote:


Is anyone in contact with the development team?


Folks with a support contract can submit bugs at 
https://support.schedmd.com/



I feel that this is pretty basic functionality that was removed from the REST API without 
warning. Considering that this was a "patch" release (based on traditional 
semantic versioning guidelines), this type of modification shouldn't have happened and 
makes me worry about upgrading in the future.


Slurm hasn't used semantic versioning for a long time, they moved to a 
year.month.minor version system a long time ago. The major releases are 
(now) every 6 months, so the most recent ones have been:


* 23.02.0
* 23.11.0 (old 9 month system)
* 24.05.0 (new 6 month system)

Next major release should be in November:

* 24.11.0

All the best,
Chris

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: REST API - get_user_environment

2024-08-27 Thread Chris Samuel via slurm-users


On 22/8/24 11:18 am, jpuerto--- via slurm-users wrote:


Do you have a link to that code? Haven't had any luck finding that repo


It's here (on the 23.11 branch):

https://github.com/SchedMD/slurm/tree/slurm-23.11/src/slurmrestd/plugins/openapi/dbv0.0.38

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: REST API - get_user_environment

2024-08-27 Thread Chris Samuel via slurm-users


On 15/8/24 10:55 am, jpuerto--- via slurm-users wrote:


Any ideas on whether there's a way to mirror this functionality in v0.0.40?


Sorry for not seeing this sooner, I don't I'm afraid!

All the best,
Chris

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: canonical way to run longer shell/bash interactive job (instead of srun inside of screen/tmux at front-end)?

2024-02-27 Thread Chris Samuel via slurm-users


On 26/2/24 12:27 am, Josef Dvoracek via slurm-users wrote:


What is the recommended way to run longer interactive job at your systems?


We provide NX for our users and also access via JupyterHub.

We also have high priority QOS's intended for interactive use for rapid 
response, but they are capped at 4 hours (or 6 hours for Jupyter users).


All the best,
Chris

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: slurm-23.11.3-1 with X11 and zram causing permission errors: error: _forkexec_slurmstepd: slurmstepd failed to send return code got 0: Resource temporarily unavailable; Requeue of Jo

2024-02-24 Thread Chris Samuel via slurm-users


On 24/2/24 06:14, Robert Kudyba via slurm-users wrote:

For now I just set it to chmod 777 on /tmp and that fixed the errors. Is 
there a better option?


Traditionally /tmp and /var/tmp have been 1777 (that "1" being the 
sticky bit, originally invented to indicate that the OS should attempt 
to keep a frequently used binary in memory but then adopted to indicate 
special handling of a world writeable directory so users can only unlink 
objects they own and not others).


Hope that helps!

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Re: [slurm-users] Guarantee minimum amount of GPU resources to a Slurm account

2023-09-12 Thread Chris Samuel


On 12/9/23 9:22 am, Stephan Roth wrote:


Thanks Noam, this looks promising!


I would suggest that as was as the "magnetic" flag you may want the 
"flex" flag on the reservation too in order to let jobs that match it 
run on GPUs outside of the reservation.


All the best,
Chris

Re: [slurm-users] Dynamic Node Shrinking/Expanding for Running Jobs in Slurm

2023-06-28 Thread Chris Samuel


On 28/6/23 04:02, Rahmanpour Koushki, Maysam wrote:

Upon reviewing the current FAQ, I found that it states node shrinking is 
only possible for pending jobs. Unfortunately, it does not provide 
additional information or examples to clarify if this functionality can 
be extended to running jobs.


You can definitely release nodes from a running job, what I believe the 
FAQ is saying is you cannot do something like change the number of cores 
per node or memory you requested once a job is running.


As for why you'd do that, we've had people who (before we set up a 
mechanism to automatically reboot nodes to address this) would request 
more nodes than they needed, look for how fragmented kernel hugepages 
were and then exclude nodes where there were too many fragmented for 
their needs.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-27 Thread Chris Samuel


On 27/2/23 03:34, David Laehnemann wrote:


Hi Chris, hi Sean,


Hiya!


thanks also (and thanks again) for chiming in.


No worries.


Quick follow-up question:

Would `squeue` be a better fall-back command than `scontrol` from the
perspective of keeping `slurmctld` responsive?


Sadly not, whilst a site can do some tricks to enforce rate limiting on 
squeue via the cli_filter that doesn't mean others have that set up, so 
they are vulnerable to the same issue.



Also, just as a quick heads-up: I am documenting your input by linking
to the mailing list archives, I hope that's alright for you?
https://github.com/snakemake/snakemake/pull/2136#issuecomment-1446170467


No problem - but I would say it's got to be sacct.

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-27 Thread Chris Samuel


On 27/2/23 06:53, Brian Andrus wrote:

Sorry, I had to share that this is very much like "Are we there yet?" on 
a road trip with kids 😄


Slurm is trying to drive.


Oh I love this analogy!

Whereas sacct is like looking talking to the navigator. The navigator 
does talk to the driver to give directions, and the driver keeps them up 
to date with the current situation, but the kids can talk to the 
navigator without disrupting the drivers concentration.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-25 Thread Chris Samuel


On 23/2/23 2:55 am, David Laehnemann wrote:


And consequently, would using `scontrol` thus be the better default
option (as opposed to `sacct`) for repeated job status checks by a
workflow management system?


Many others have commented on this, but use of scontrol in this way is 
really really bad because of the impact it has on slurmctld. This is 
because responding to the RPC (IIRC) requires taking read locks on 
internal data structures and on a large, busy system (like ours, we 
recently rolled over slurm job IDs back to 1 after ~6 years of operation 
and run at over 90% occupancy most of the time) this can really damage 
scheduling performance.


We've had numerous occasions where we've had to track down users abusing 
scontrol in this way and redirect them to use sacct instead.


We already use the cli filter abilities in Slurm to impose a form of 
rate limiting on RPCs from other commands, but unfortunately scontrol is 
not covered by that.


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Slurm - UnkillableStepProgram

2023-01-20 Thread Chris Samuel


On 20/1/23 3:51 am, Stefan Staeglich wrote:


But someone who is actually using a UnkillableStepProgram stated the opposite
(that it's executed on the controller nodes). Are you aware of any change
between Slurm releases? Maybe one of the two parts is just a leftover. Are you
using a UnkillableStepProgram?


Yes, we've been using it for years on 7 different systems in my time here.

It runs on the compute nodes and collects troubleshooting info for us 
when a job fails to die in an allowed time.


--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] slurmrestd service broken by 22.05.07 update

2022-12-29 Thread Chris Samuel

On 29/12/22 11:31 am, Timo Rothenpieler wrote:

Having service files in top level dirs like /run or /var/lib is bound to 
cause issues like this.

You can use local systemd overrides for things like this. In this case I 
suspect you can create this directory:

/etc/systemd/system/slurmrestd.service.d/

and drop files into it via the Configuration Management System Of Your 
Choice to override/augment the vendor supplied configuration.

https://www.freedesktop.org/software/systemd/man/systemd.unit.html

> Along with a unit file foo.service, a "drop-in" directory
> foo.service.d/ may exist. All files with the suffix ".conf"
> from this directory will be merged in the alphanumeric order
> and parsed after the main unit file itself has been parsed.
> This is useful to alter or add configuration settings for a
> unit, without having to modify unit files. Each drop-in file
> must contain appropriate section headers. For instantiated
> units, this logic will first look for the instance ".d/"
> subdirectory (e.g. "foo@bar.service.d/") and read its ".conf"
> files, followed by the template ".d/" subdirectory
> (e.g. "foo@.service.d/") and the ".conf" files there.

Caveat: written whilst travelling and without testing or even having 
access to a system where I can test, but we do use this method for other 
services already.

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Job cancelled into the future

2022-12-23 Thread Chris Samuel


On 20/12/22 6:01 pm, Brian Andrus wrote:

You may want to dump the database, find what table/records need updated 
and try updating them. If anything went south, you could restore from 
the dump.


+lots to making sure you've got good backups first, and stop slurmdbd 
before you start on the backups and don't restart it until you've made 
the changes, including setting the rollup times to be before the jobs 
started to make sure that the rollups include these changes!


When you start slurmdbd after making the changes it should see that it 
needs to do rollups and kick those off.


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] salloc problem

2022-10-30 Thread Chris Samuel


On 27/10/22 4:18 am, Gizo Nanava wrote:


we run into another issue when using salloc interactively on a cluster where 
Slurm
power saving is enabled. The problem seems to be caused by the job_container 
plugin
and occurs when the job starts on a node which boots from a power down state.
If I resubmit a job immediately after the failure to the same node, it always 
works.
I can't find any other way to reproduce the issue other than booting a reserved 
node from a power down state.


Looking at this:


slurmstepd: error: container_p_join: open failed for 
/scratch/job_containers/791670/.ns: No such file or directory


I'm wondering is a separate filesystem and, if so, could /scratch be 
only getting mounted _after_ slurmd has started on the node?


If that's the case then it would explain the error and why it works 
immediately after.


On our systems we always try and ensure that slurmd is the very last 
thing to start on a node, and it only starts if everything has succeeded 
up to that point.


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Switch setting in slurm.conf breaks slurmctld if the switch type is not there in slurmcrld node

2022-10-30 Thread Chris Samuel


On 27/10/22 11:30 pm, Richard Chang wrote:

Yes, the system is a HPE Cray EX, and I am trying to use 
switch/hpe_slingshot.


Which version of Slurm are you using Richard?

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Prolog and job_submit

2022-10-30 Thread Chris Samuel


On 30/10/22 12:27 pm, Davide DelVento wrote:


But if I understand correctly your Prolog vs TaskProlog distinction,
the latter would have the environmental variable and run as user,
whereas the former runs as root and doesn't get the environment,


That's correct. My personal view is that injecting arbitrary input from 
a user (such as these environment variables) would make life hazardous 
from a security point of view for a root privileged process such as a 
prolog.



not even from the job_submit script.


That is correct, all the job_submit will do is inject the environment 
variable into the jobs environment, just as if a user had done so.



The problem with a TaskProlog
approach is that what I want to do (making a non-accessible file
available) would work best as root. As a workaround is that I could
make that just obscure but still user-possible. Not ideal, but better
than nothing as it is now.

Alternatively, I could use another way to let the job_submit lua
script communicate with the Prolog, not sure exactly what (temp
directory on the shared filesystem, writeable only by root??)


My only other thought is that you might be able to use node features & 
job constraints to communicate this without the user realising.


For instance you could declare the nodes where the software is installed 
to have "Feature=mysoftware" and then your job submit could spot users 
requesting the license and add the constraint "mysoftware" to their job. 
The (root privileged) Prolog can see that via the SLURM_JOB_CONSTRAINTS 
environment variable and so could react to it.


Then when 23.02 comes out you could use the new SLURM_JOB_LICENSES 
environment variable in addition and retire the old way once jobs using 
the old method have completed.



Thanks for pointing to that commit. I bit too down the road but good to know.


No worries, best of luck!

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Prolog and job_submit

2022-10-30 Thread Chris Samuel


On 30/10/22 10:23 am, Chris Samuel wrote:

Unfortunately it looks like the license request information doesn't get 
propagated into any prologs from what I see from a scan of the 
documentation. 🙁


This _may_ be fixed in the next major Slurm release (February) if I'm 
reading this right:


https://github.com/SchedMD/slurm/commit/3c6c4c08d8deb89aa2c992a65964f53663097d26

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Prolog and job_submit

2022-10-30 Thread Chris Samuel


On 29/10/22 7:37 am, Davide DelVento wrote:


So either I misinterpreted that "same environment as the user tasks"
or there is something else that I am doing wrong.


Slurm has a number of different prologs that can run which can cause 
confusion, and I suspect that's what's happening here.


The "Prolog" in your configuration runs as root, but its the 
"TaskProlog" that runs as the user and so has access to the jobs 
environment (including the environment variable you are setting).


Unfortunately it looks like the license request information doesn't get 
propagated into any prologs from what I see from a scan of the 
documentation. :-(


Best of luck,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] job_time_limit: inactivity time limit reached ...

2022-09-19 Thread Chris Samuel


On 19/9/22 05:46, Paul Raines wrote:


In slurm.conf I had InactiveLimit=60 which I guess is what is happening
but my reading of the docs on this setting was it only affects the
starting of a job with srun/salloc and not a job that has been running
for days.  Is it InactiveLimit that leads to the "inactivity time limit 
reached" message?


I believe so, but remember that this governs timeouts around 
communications between slurmctld and the srun/salloc commands, and not 
things like shell inactivity timeouts which are quite different.


See:

https://slurm.schedmd.com/faq.html#purge

# A job is considered inactive if it has no active job steps or
# if the srun command creating the job is not responding.

Hope this helps!

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] admin users without a database

2022-09-19 Thread Chris Samuel

On 19/9/22 06:14, Bernstein, Noam CIV USN NRL (6393) Washington DC (USA) 
wrote:


Is it possible to make a user an admin without slurmdbd? The docs I've 
found indicates that I need to set the user's admin level with sacctmgr, 
but that command always says


I don't believe so, I believe that's all stored in slurmdbd (and 
sacctmgr is a command to communicate with slurmdbd).


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] srun: error: io_init_msg_unpack: unpack error

2022-08-06 Thread Chris Samuel


On 6/8/22 10:43 am, David Magda wrote:


It seems that the the new srun(1) cannot talk to the old slurmd(8).

Is this 'on purpose'? Does the backwards compatibility of the protocol not 
extend to srun(1)?


That's expected, what you're hoping for here is forward compatibility.

Newer daemons know how to talk to older utilities, but it doesn't work 
the other way around.


What we do in this situation is upgrade slurmdbd, then slurmctld, change 
our images for compute nodes to be ones that have the new Slurm version 
then before we bring partitions back up we issue an "scontrol reboot 
ASAP nextstate=resume" for all the compute nodes.


This means existing jobs will keep going but no new jobs will start on 
compute nodes with older versions of Slurm from that point on. As jobs 
on nodes finish they'll get rebooted into the new images and will accept 
jobs again (the "ASAP" flag drains the node, then once it's successfully 
started its slurmd as the final thing on boot it'll undrain at that 
point - and also slurmctld is smart with planning its scheduling for 
this situation).


It's also safe to restart slurmd's with running jobs, though you may 
want to drain them before that so slurmctld won't try and send them a 
job in the middle.


The one issue you can get where backwards compatibility in the Slurm 
protocol can't help is if there are incompatible config file changes 
needed, then you need to bite the bullet and upgrade the slurmd's and 
commands at the same time everywhere where the new config file goes (and 
for those of running in configless mode that means everywhere).


Hope this helps! All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Rolling reboot with at most N machines down simultaneously?

2022-08-04 Thread Chris Samuel


On 3/8/22 10:20 pm, Gerhard Strangar wrote:


With a fake license called reboot?


It's a neat idea, but I think there is a catch:

* 3 jobs start, each taking 1 license
* Other reboot jobs are all blocked
* Running reboot jobs trigger node reboot
* Running reboot jobs end when either the script exits and slurmd cleans 
it up before the reboot kills it, or it gets killed as NODE_FAIL when 
the node has been unresponsive for too long and is marked as down

* Licenses for those jobs are released
* 3 more reboot jobs start whilst the original 3 are rebooting
* 6 nodes are now rebooting
* Filesystem fall down go boom
* Also your rebooted nodes are now drained as "Node unexpectedly rebooted"

I guess you could change your Slurm config to not mark nodes as down if 
they stop responding and make sure the job that's launched, but that 
feels wrong to me.


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] "Plugin is corrupted" message when using drmaa / debugging libslurm

2022-07-01 Thread Chris Samuel


On 1/7/22 07:51, Jean-Christophe HAESSIG wrote:


The libraries were incompatible but that wasn't reflected in the
packaging and due to the similar and long version string, I didn't spot
it before.


Oh good spot!

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] "Plugin is corrupted" message when using drmaa / debugging libslurm

2022-07-01 Thread Chris Samuel


On 29/6/22 09:01, Jean-Christophe HAESSIG wrote:


No, the job is placed through DRMAA API which enables programs to place
jobs in a cluster-agnostic way. Th program doesn't know it is talking to
Slurm. The DRMAA library makes the translation and loads libslurm36,
where the messages comes from. That's why I don't know how to tell
libslurm to log more, since its use is hidden behind DRMAA.


My gut instinct with this is that it will be reading your slurm.conf 
file to find its configuration and so you can adjust that to increase 
the log level (realising that everything that reads it at that point 
will pick those up). Academic now though you've solved it I guess!


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] "Plugin is corrupted" message when using drmaa / debugging libslurm

2022-06-28 Thread Chris Samuel


On 28/6/22 12:19 pm, Jean-Christophe HAESSIG wrote:


Hi,

I'm facing a weird issue where launching a job through drmaa
(https://github.com/natefoo/slurm-drmaa) aborts with the message "Plugin
is corrupted", but only when that job is placed from one of my compute
nodes. Running the command from the login node seems to work.


I suspect this is where your error is happening:

https://github.com/SchedMD/slurm/blob/1ce55318222f89fbc862ce559edfd17e911fee38/src/common/plugin.c#L284

it's when it's checking it can load the plugin and not hit any 
unresolved library symbols. The fact you are hitting this sounds like 
you're missing libraries from the compute nodes that are present on the 
login node (or there's some reason they're not getting found if present).


[...]

Anyway, the message seems to originate from libslurm36 and I would like
to activate the debug messages (debug3, debug4). Is there a way to do
this with an environment variable or any other convenient method ?


This depends on what part of Slurm is generating these errors, is this 
something like sbatch or srun? If so using multiple -v's will increase 
the debug level so you can pick those up. If it's from slurmd then 
you'll want to set SlurmdDebug to "debug3" in your slurm.conf.


Once that's done you should get the information on what symbols are not 
being found and that should give you some insight into what's going on.


Best of luck,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Rolling upgrade of compute nodes

2022-05-30 Thread Chris Samuel


On 30/5/22 10:06 am, Chris Samuel wrote:

If you switch that symlink those jobs will pick up the 20.11 srun binary 
and that's where you may come unstuck.


Just to quickly fix that, srun talks to slurmctld (which would also be 
20.11 for you), slurmctld will talk to the slurmd's running the job 
(which would be 19.05, so OK) but then the slurmd would try and launch a 
20.11 slurmstepd and that is where I suspect things could come undone.


Sorry - hadn't had coffee when I was writing earlier. :-)

--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Rolling upgrade of compute nodes

2022-05-30 Thread Chris Samuel


On 30/5/22 3:01 am, byron wrote:

The one thing I'm unsure about is as much as Linux / NFS issue than a a 
slurm one.  When I change the soft link for "default" to point to the 
new 20.11 slurm install but all the compute nodes are still run the old 
19.05 version because they havent been restarted yet, will that not 
cause any problems?   Or will they still just see the same old 19.05 
version of slurm that they are running until they are restarted.


That may cause issues, whilst the ASAP flag to scontrol reboot 
guarantees no new jobs will start on the selected nodes until after 
they've rebooted that doesn't (and shouldn't) stop new job steps from 
srun starting on them.


If you switch that symlink those jobs will pick up the 20.11 srun binary 
and that's where you may come unstuck.


This is one of the reasons why we do everything with Slurm installed via 
RPM inside an image, you have a pretty straightforward A -> B transition.


If your symlink was node-local in some way (say created at boot time via 
some config management system before slurmd start) then that could work 
around that as then the nodes would still see the appropriate slurm 
binaries for the running slurmd.


Best of luck!
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Limit partition to 1 job at a time

2022-03-28 Thread Chris Samuel


On 22/3/22 11:40 am, Russell Jones wrote:


I am struggling to figure out how to do this. Any tips?


My only thought to achieve this would be to define a license for the 
partition with a count of 1 and to use the job submit filter to ensure 
that any job that is submitted to (or ends up being directed to) that 
partition requests that one license.


Best of luck!
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] initscript poll timeout @ 10000 msec :: what slurm conf var?

2021-12-05 Thread Chris Samuel


On 4/12/21 9:34 am, Adrian Sevcenco wrote:

actually is not ... so, once again, does anyone have an idea about 
customization of the timeout of init script defined in job_container.conf?


Looking at the source it's hard-coded in Slurm 21.08, so you'd need to 
patch and rebuild at present.


https://github.com/SchedMD/slurm/blob/934f3b543b6bc9f3335d1cc6813b8d95cb2c49b4/src/plugins/job_container/tmpfs/job_container_tmpfs.c#L473

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Wrong hwloc detected?

2021-11-09 Thread Chris Samuel


On 5/11/21 4:47 am, Diego Zuccato wrote:


How can Slurm detect such an old HWLOC version?


Looking at the code it's not actually checking the hwloc version, it's 
finding an error condition and suggesting that may be the cause, but it 
sounds like it's not for you.


src/plugins/task/cgroup/task_cgroup_cpuset.c :

/* should never happen in normal scenario */
if ((sock_loop > npdist) && !hwloc_success) {
/* hwloc_get_obj_below_by_type() fails if no CPU set
 * configured, see hwloc documentation for details */
error("hwloc_get_obj_below_by_type() failing, "
  "task/affinity plugin may be required to address 
bug "

  "fixed in HWLOC version 1.11.5");
return XCGROUP_ERROR;
} [...]


If you've got support from SchedMD open a bug with them, but if not and 
you're using the Debian packages I'd suggest opening a bug with Debian 
about it.


Best of luck!
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] How to get an estimate of job completion for planned maintenance?

2021-11-09 Thread Chris Samuel


On 9/11/21 5:42 am, Loris Bennett wrote:


We just set up a reservation at a point at a time which is further in the
future than our maximum run-time.  There is then no need to drain
anything.  Short running jobs can still run right up to the reservation.


This is the same technique we use too, works well!

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] is there a way to temporarily freeze an account?

2021-10-06 Thread Chris Samuel


On 6/10/21 6:21 am, byron wrote:

We have some accounts that we would like to suspend / freeze for the 
time being that have unused hours associated with them.  Is there anyway 
of doing this without removing the users associated with the accounts or 
zeroing their hours?


We have a QOS called "batchdisable" which has MaxJobs=0 and 
MaxSubmitJobs=0 and then we just set the user's list of QOS's to that.


sacctmgr update where where name=bar qos=batchdisable

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] draining nodes due to failed killing of task?

2021-08-07 Thread Chris Samuel

On Friday, 6 August 2021 12:02:45 AM PDT Adrian Sevcenco wrote:

> i was wondering why a node is drained when killing of task fails and how can
> i disable it? (i use cgroups) moreover, how can the killing of task fails?
> (this is on slurm 19.05)

Slurm has tried to kill processes, but they refuse to go away. Usually this 
means they're stuck in a device or I/O wait for some reason, so look for 
processes that are in a "D" state on the node.

As others have said they can be stuck writing out large files and waiting for 
the kernel to complete that before they exit.  This can also happen if you're 
using GPUs and something has gone wrong in the driver and the process is stuck 
in the kernel somewhere.

You can try doing "echo w > /proc/sysrq-trigger" on the node to see if the 
kernel reports tasks stuck and where they are stuck.

If there are tasks stuck in that state then often the only recourse is to 
reboot the node back into health.

You can tell Slurm to run a program on the node should it find itself in this 
state, see:

https://slurm.schedmd.com/slurm.conf.html#OPT_UnkillableStepProgram

Best of luck,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] (no subject)

2021-07-30 Thread Chris Samuel

On Friday, 30 July 2021 11:21:19 AM PDT Soichi Hayashi wrote:

> I am running slurm-wlm 17.11.2

You are on a truly ancient version of Slurm there I'm afraid (there have been 
4 major releases & over 13,000 commits since that was tagged in January 2018), 
I would strongly recommend you try and get to a more recent release to get 
those bug fixes and improvements. A quick scan of the NEWS file shows a number 
that are cloud related. https://github.com/SchedMD/slurm/blob/slurm-20.11/NEWS

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] OpenMPI interactive change in behavior?

2021-05-27 Thread Chris Samuel

On Monday, 26 April 2021 2:12:41 PM PDT John DeSantis wrote:

> Furthermore,
> searching the mailing list suggests that the appropriate method is to use
> `salloc` first, despite version 17.11.9 not needing `salloc` for an
> "interactive" sessions.

Before 20.11 with salloc you needed to set a SallocDefaultCommand to use srun 
to push the session over on to a compute node, and then you needed to set a 
bunch of things to prevent that srun from consuming resources that the 
subsequent srun's would need.  That was especially annoying when you were 
dealing with GPUs as you would need to "srun" anything that needed to access 
them (when you used cgroups to control access).

With 20.11 there's a new "use_interactive_step" option that uses similar 
trickery, except Slurm handles not consuming those resources for you and 
handles GPUs correctly.

So for your 20.11 system I would recommend giving salloc and the 
"use_interactive_step" option a go and see if it helps.

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] unable to Hold and release the job using scontrol

2021-05-22 Thread Chris Samuel

On Saturday, 22 May 2021 11:05:54 PM PDT Zainul Abiddin wrote:

> i am trying to hold the job from Scontol but not able to hold the job.

It looks like you're trying to hold a running job, which isn't possible.

I see from the Slurm FAQ that you should be able to use "scontrol requeuehold" 
for what you are trying to achieve.

https://slurm.schedmd.com/faq.html#req

# Slurm supports requeuing jobs in a hold state with the command:
#
# scontrol requeuehold job_id
#
# The job can be in state RUNNING, SUSPENDED, COMPLETED or FAILED before
# being requeued.

Best of luck,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Slurm - UnkillableStepProgram

2021-03-22 Thread Chris Samuel


Hi Mike,

On 22/3/21 7:12 pm, Yap, Mike wrote:

# I presume UnkillableStepTimeout is set in slurm.conf. and it act as a 
timer to trigger UnkillableStepProgram


That is correct.

# UnkillableStepProgram   can be use to send email or reboot compute node 
– question is how do we configure it ?


Also - or to automate collecting debug info (which is what we do) and 
then we manually intervene to reboot the node once we've determined 
there's no more useful info to collect.


It's just configured in your slurm.conf.

UnkillableStepProgram=/path/to/the/unkillable/step/script.sh

Of course this script has to be present on every compute node.

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Job not running with Resource Reason even though resources appear to be available

2021-01-23 Thread Chris Samuel

On Saturday, 23 January 2021 9:54:11 AM PST Paul Raines wrote:

> Now rtx-08 which has only 4 GPUs seems to always get all 4 uses.
> But the others seem to always only get half used (except rtx-07
> which somehow gets 6 used so another wierd thing).
> 
> Again if I submit non-GPU jobs, they end up allocating all hte
> cores/cpus on the nodes just fine.

What does your gres.conf look like for these nodes?

One thing I've seen in the past is where the core specifications for the GPUs 
are out of step with the hardware and so Slurm thinks they're on the wrong 
socket.  Then when all the cores in that socket are used up Slurm won't put 
more GPU jobs on the node without the jobs explicitly asking to not do 
locality.

One thing I've noticed is that in prior to Slurm 20.02 the documentation for 
gres.conf used to say:

# If your cores contain multiple threads only the first thread
# (processing unit) of each core needs to be listed.

but that language is gone from 20.02 and later and the change isn't mentioned 
in the release notes for 20.02 so I'm not sure what happened there, the only 
clue is this commit:

https://github.com/SchedMD/slurm/commit/
7461b6ba95bb8ae70b36425f2c7e4961ac35799e#diff-
cac030b65a8fc86123176971a94062fafb262cb2b11b3e90d6cc69e353e3bb89

which says "xcpuinfo_abs_to_mac() expects a core list, not a CPU list."

Best of luck!
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] trying to add gres

2020-12-24 Thread Chris Samuel


On 24/12/20 4:42 pm, Erik Bryer wrote:

I made sure my slurm.conf is synchronized across machines. My intention 
is to add some arbitrary gres for testing purposes.


Did you update your gres.conf on all the nodes to match?

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Slurm Upgrade Philosophy?

2020-12-24 Thread Chris Samuel


On 24/12/20 6:24 am, Paul Edmon wrote:

We then have a test cluster that we install the release on a run a few 
test jobs to make sure things are working, usually MPI jobs as they tend 
to hit most of the features of the scheduler.


One thing I meant to mention last night was that we use Reframe from 
CSCS as the test framework for our systems, our user support folks 
maintain our local tests as they're best placed to understand the user 
requirements that need coverage and we feed in our system facing 
requirements to them so they can add tests for that side too.


https://reframe-hpc.readthedocs.io/

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Slurm Upgrade Philosophy?

2020-12-23 Thread Chris Samuel

On Friday, 18 December 2020 10:10:19 AM PST Jason Simms wrote:

> Thanks to several helpful members on this list, I think I have a much better
> handle on how to upgrade Slurm. Now my question is, do most of you upgrade
> with each major release?

We do, though not immediately and not without a degree of testing on our test 
systems.  One of the big reasons for us upgrading is that we've usually paid 
for features in Slurm for our needs (for example in 20.11 that includes 
scrontab so users won't be tied to favourite login nodes, as well as  the 
experimental RPC queue code due to the large numbers of RPCs our systems need 
to cope with).

I also keep an eye out for discussions of what other sites find with new 
releases too, so I'm following the current concerns about 20.11 and the change 
in behaviour for job steps that do (expanding NVIDIA's example slightly):

#SBATCH --exclusive
#SBATCH -N2
srun --ntasks-per-node=1 python multi_node_launch.py

which (if I'm reading the bugs correctly) fails in 20.11 as that srun no 
longer gets all the allocated resources, instead just gets the default of
--cpus-per-task=1 instead, which also affects things like mpirun in OpenMPI 
built with Slurm support (as it effectively calls "srun orted" and that "orted" 
launches the MPI ranks, so in 20.11 it only has access to a single core for 
them all to fight over).  Again - if I'm interpreting the bugs correctly!

I don't currently have a test system that's free to try 20.11 on, but 
hopefully early in the new year I'll be able to test this out to see how much 
of an impact this is going to have and how we will manage it.

https://bugs.schedmd.com/show_bug.cgi?id=10383
https://bugs.schedmd.com/show_bug.cgi?id=10489

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] 20.11.1 on Cray: job_submit.lua: SO loaded on CtlD restart: script skipped when job submitted

2020-12-16 Thread Chris Samuel


On 16/12/20 6:21 pm, Kevin Buckley wrote:


The skip is occuring, in src/lua/slurm_lua.c, because of this trap


That looks right to me, that's Doug's code which is checking whether the 
file has been updated since slurmctld last read it in.  If it has then 
it'll reload it, but if it hasn't then it'll skip it (and if you've got 
debugging up high then you'll see that message).


So if you see that message then the lua has been read in to slurmctld 
and should get called.  You might want to check the log for when it last 
read it in, just in case there was some error detected at that point.


You can also use luac to run a check over the script you've got like this:

luac -p /etc/opt/slurm/job_submit.lua

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Backfill pushing jobs back

2020-12-09 Thread Chris Samuel


Hi David,

On 9/12/20 3:35 am, David Baker wrote:

We see the following issue with smaller jobs pushing back large jobs. We 
are using slurm 19.05.8 so not sure if this is patched in newer releases.


This sounds like a problem that we had at NERSC (small jobs pushing back 
multi-thousand node jobs), and we carried a local patch for which Doug 
managed to get upstreamed in 20.02.x (I think it landed in 20.02.3, but 
20.02.6 is the current version).


Hope this helps!
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] trying to diagnose a connectivity issue between the slurmctld process and the slurmd nodes

2020-11-27 Thread Chris Samuel


On 26/11/20 9:21 am, Steve Bland wrote:


Sinfo always returns nodes not responding


One thing - do the nodes return to this state when you resume them with 
"scontrol update node=srvgridslurm[01-03] state=resume" ?


If they do then what does your slurmctld logs say for the reason for this?

You can bump up the log level on your slurmctld with (for instance 
"scontrol setdebug debug" for more info (we run ours at debug all the 
time anyway).


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Slurm Upgrade

2020-11-02 Thread Chris Samuel


On 11/2/20 7:31 am, Paul Edmon wrote:

e. Run slurmdbd -Dv to do the database upgrade. Depending on the 
upgrade this can take a while because of database schema changes.


I'd like to emphasis the importance of doing the DB upgrade in this way, 
do not use systemctl for this as if systemd runs out of patience waiting 
for slurmdbd to finish the migration and start up it can kill it part 
way through the migration.


Fortunately not something I've run into myself, but as our mysqldump of 
our production DB is approaching 100GB now it's not something we want to 
run into!


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Jobs stuck in "completing" (CG) state

2020-10-24 Thread Chris Samuel


On 10/24/20 9:22 am, Kimera Rodgers wrote:


[root@kla-ac-ohpc-01 critical]# srun -c 8 --pty bash -i
srun: error: slurm_receive_msgs: Socket timed out on send/recv operation
srun: error: Task launch for 37.0 failed on node c-node3: Socket timed 
out on send/recv operation
srun: error: Application launch failed: Socket timed out on send/recv 
operation

srun: Job step aborted: Waiting up to 32 seconds for job step to finish.


To me this looks like networking issues, perhaps firewall/iptables rules 
blocking connections.


Best of luck,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] SLES 15 rpmbuild from 20.02.5 tarball wants munge-libs: system munge RPMs don't provide it

2020-10-18 Thread Chris Samuel

On Thursday, 15 October 2020 8:50:33 PM PDT Kevin Buckley wrote:

> Maybe the SLES 15 SRPM will shed some light althought it seems odd
> that the SPEC file inside the Slurm tarball can't recognise that's
> on a SLES 15 OS.

I've not had problems building Slurm 20.02.x on SLES15 SP0 (CLE7.0 UP01), so 
I'm wondering if something big happened with munge in SP1?

I'd suggest opening a bug with SchedMD on this to check into what's happening, 
they'll likely be able to help with this!

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Segfault with 32 processes, OK with 30 ???

2020-10-12 Thread Chris Samuel

On Monday, 12 October 2020 2:43:36 AM PDT Diego Zuccato wrote:

> Seems so:
> "The application appears to have been direct launched using "srun",
> but OMPI was not built with SLURM's PMI support and therefore cannot
> execute."
> 
> So it seems I can't use srun to launch OpenMPI jobs.

OK, I suspect this rules Slurm out of the running as the cause, I'd suggest 
either rebuilding OpenMPI with Slurm support or if it's a distro related 
package filing a bug with the distro, or alternatively trying for help with the 
OpenMPI users list:

https://lists.open-mpi.org/mailman/listinfo/users

Best of luck!
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] unable to run on all the logical cores

2020-10-11 Thread Chris Samuel


On 10/7/20 10:13 pm, David Bellot wrote:

NodeName=foobar01 CPUs=80 Boards=1 SocketsPerBoard=2 CoresPerSocket=20 
ThreadsPerCore=2 RealMemory=257243 State=UNKNOWN


With this configuration Slurm is allocation a single physical core (with 
2 thread units) per task. So you are using all (physical) cores.


However, if what you want is to have 1 process per thread unit (not 
necessarily a good idea, depending on how your code works) then I think 
you'd need to adjust your config to lie to Slurm and tell it it's got 40 
cores per socket and 1 thread per core instead.


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Simple free for all cluster

2020-10-10 Thread Chris Samuel

On Tuesday, 6 October 2020 7:53:02 AM PDT Jason Simms wrote:

> I currently don't have a MaxTime defined, because how do I know how long a
> job will take? Most jobs on my cluster require no more than 3-4 days, but
> in some cases at other campuses, I know that jobs can run for weeks. I
> suppose even setting a time limit such as 4 weeks would be overkill, but at
> least it's not infinite. I'm curious what others use as that value, and how
> you arrived at it

My journey over the last 16 years in HPC has been one of decreasing time 
limits, back in 2003 with VPAC's first Linux cluster we had no time limits, we 
then introduced a 90 day limit so we could plan quarterly maintenances (and 
yes, we had users who had jobs which legitimately ran longer than that, so 
they had to learn to checkpoint).  At VLSCI we had 30 day limits (life 
sciences, so many long running poorly scaling jobs), then when I was at 
Swinburne it was a 7 day limit, and now here at NERSC we've got 2 day limits.

It really is down to what your use cases are and how much influence you have 
over your users.  It's often the HPC sysadmins responsibility to try and find 
that balance between good utilisation, effective use of the system and reaching 
the desired science/research/development outcomes.

Best of luck!
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Segfault with 32 processes, OK with 30 ???

2020-10-07 Thread Chris Samuel

On Tuesday, 6 October 2020 12:12:41 AM PDT Diego Zuccato wrote:

> At least I couldn't replicate launching manually (it always says "no
> slots available" unless I use mpirun -np 16 ...). I'm no MPI expert
> (actually less than a noob!) so I can't rule out it's unrelated to
> Slurm. I mostly hope that on this list I can find someone with enough
> experience with both Slurm and MPI.

Launch it with "srun" rather than "mpirun", that way it'll be managed by 
Slurm.  If your test program then says every rank is rank 0 that will tell you 
OpenMPI is not built with Slurm support.

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] How to contact slurm developers

2020-09-30 Thread Chris Samuel


On 9/30/20 8:29 am, Relu Patrascu wrote:

We have actually modified the code on both v 19 and 20 to do what we 
would like, preemption within the same QOS, but we think that the 
community would benefit from this feature, hence our request to have it 
in the release version.


There's a special severity level for contributions of code in the 
SchedMD bugzilla "C - Contributions".


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Core reserved/bound to a GPU

2020-08-31 Thread Chris Samuel

On Monday, 31 August 2020 7:41:13 AM PDT Manuel BERTRAND wrote:

> Every thing works great so far but now I would like to bound a specific
> core to each GPUs on each node. By "bound" I mean to make a particular
> core not assignable to a CPU job alone so that the GPU is available
> whatever the CPU workload on the node.

What I've done in the past (waves to Swinburne folks on the list) was to have 
overlapping partitions on GPU nodes where the GPU job partition had access to 
all the cores and the CPU only job partition had access to only a subset 
(limited by the MaxCPUsPerNode parameter on the partition).

The problem you run into there though is that there's no way to reserve cores 
on a particular socket, which means problems for folks who care about locality 
for GPU codes as they can wait in the queue with GPUs free and cores free but 
not the right cores on the right socket to be able to use the GPUs. :-(

Here's my bug from when I was in Australia for this issue where I suggested a 
MaxCPUsPerSocket parameter for partitions:

https://bugs.schedmd.com/show_bug.cgi?id=4717

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Alternatives for MailProg

2020-08-28 Thread Chris Samuel


On 8/27/20 3:42 pm, Brian Andrus wrote:


Actually, you can add headers of all kinds:

Quick search of "sendmail add headers" discovers:


Problem is that Slurm doesn't directly call sendmail, it calls "mail" 
(or MailProg in your slurm.conf) instead, hence not being able to add 
headers.


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] cgroup limits not created for jobs

2020-07-24 Thread Chris Samuel

On Friday, 24 July 2020 9:48:35 AM PDT Paul Raines wrote:

> But when I run a job on the node it runs I can find no
> evidence in cgroups of any limits being set
> 
> Example job:
> 
> mlscgpu1[0]:~$ salloc -n1 -c3 -p batch --gres=gpu:quadro_rtx_6000:1 --mem=1G
> salloc: Granted job allocation 17
> mlscgpu1[0]:~$ echo $$
> 137112
> mlscgpu1[0]:~$

You're not actually running inside a job at that point unless you've defined 
"SallocDefaultCommand" in your slurm.conf, and I'm guessing that's not the 
case there.  You can make salloc fire up an srun for you in the allocation 
using that option, see the docs here:

https://slurm.schedmd.com/slurm.conf.html#OPT_SallocDefaultCommand

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] squeue reports ReqNodeNotAvail but node is available

2020-07-10 Thread Chris Samuel

On Friday, 10 July 2020 3:34:44 PM PDT Janna Ore Nugent wrote:

> I’ve got an intermittent situation with gpu nodes that sinfo says are
> available and idle, but squeue reports as “ReqNodeNotAvail”.  We’ve cycled
> the nodes to restart services but it hasn’t helped.  Any suggestions for
> resolving this or digging into it more deeply?

What does "scontrol show job $JOB" say for an affected job, and what does 
"scontrol show node $NODE" look like for one of these nodes?

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Jobs killed by OOM-killer only on certain nodes.

2020-07-02 Thread Chris Samuel

On Thursday, 2 July 2020 6:52:15 AM PDT Prentice Bisbal wrote:

> [2020-07-01T16:19:19.463] [801777.extern] _oom_event_monitor: oom-kill
> event count: 1

We get that line for pretty much every job, I don't think it reflects the OOM 
killer being invoked on something in the extern step.

OOM killer invocations should be recorded in the kernel logs on the node, 
check with "dmesg -T" to see if it's being invoked (or whether they are 
getting logged to via syslog if they've got dropped from the ring buffer due to 
later messages).

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Nodes do not return to service after scontrol reboot

2020-06-17 Thread Chris Samuel


On 17/6/20 11:32 pm, David Baker wrote:

Thank you for your comments. The scontrol reboot command is now working 
as expected.


Fantastic!

For those who don't know, using scontrol reboot in this way also allows 
Slurm to take these rebooting nodes into account for scheduling; so if 
you have a large job needing a lot of nodes waiting to begin with high 
priority and you need to reboot some nodes then Slurm won't give up on 
them and put smaller jobs on the system on all the other nodes, delaying 
the larger job for no good reason.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Fw: slurm-users Digest, Vol 31, Issue 50

2020-05-13 Thread Chris Samuel

On Wednesday, 13 May 2020 6:15:53 PM PDT Abhinandan Patil wrote:

> However still:
>  sinfo
> PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
> debug*   up   infinite  1  down* abhi-Lenovo-ideapad-330-15IKB

What does "sinfo -R" say ?

If the node was down at some point you may need to resume it.

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Job Step Resource Requests are Ignored

2020-05-05 Thread Chris Samuel

On Tuesday, 5 May 2020 11:00:27 PM PDT Maria Semple wrote:

> Is there no way to achieve what I want then? I'd like the first and last job
> steps to always be able to run, even if the second step needs too many
> resources (based on the cluster).

That should just work.

#!/bin/bash
#SBATCH -c 2
#SBATCH -n 1

srun -c 1 echo hello
srun -c 4 echo big wide
srun -c 1 echo world

gives:

hello
srun: Job step's --cpus-per-task value exceeds that of job (4 > 2). Job step 
may never run.
srun: error: Unable to create step for job 604659: More processors requested 
than permitted
world

> As a side note, do you know why it's not even possible to restrict the
> number of resources a single step uses (i.e. set less CPUs than are
> available to the full job)?

My suspicion is that you've not set up Slurm to use cgroups to restrict the 
resources a job can use to just those requested.

https://slurm.schedmd.com/cgroups.html

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] "sacctmgr add cluster" crashing slurmdbd

2020-05-05 Thread Chris Samuel

On Tuesday, 5 May 2020 3:21:45 PM PDT Dustin Lang wrote:

> Since this happens on a fresh new database, I just don't understand how I
> can get back to a basic functional state.  This is exceedingly frustrating.

I have to say that if you're seeing this with 17.11, 18.08 and 19.05 and this 
only started when your colleague upgraded MySQL then this sounds like MySQL is 
triggering this problem.

We're running with MariaDB 10.x (from SLES15) without issues (our database is 
huge).

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Job Step Resource Requests are Ignored

2020-05-05 Thread Chris Samuel

On Tuesday, 5 May 2020 4:47:12 PM PDT Maria Semple wrote:

> I'd like to set different resource limits for different steps of my job. A
> sample script might look like this (e.g. job.sh):
> 
> #!/bin/bash
> srun --cpus-per-task=1 --mem=1 echo "Starting..."
> srun --cpus-per-task=4 --mem=250 --exclusive 
> srun --cpus-per-task=1 --mem=1 echo "Finished."
> 
> Then I would run the script from the command line using the following
> command: sbatch --ntasks=1 job.sh.

You shouldn't ask for more resources with "srun" than have been allocated with 
"sbatch" - so if you want the job to be able to use up to 4 cores at once & 
that amount of memory you'll need to use:

sbatch -c 4 --mem=250 --ntasks=1 job.sh

I'd also suggest using suffixes for memory to disambiguate the values.

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] [EXT] Re: Limit the number of GPUS per user per partition

2020-05-05 Thread Chris Samuel

On Tuesday, 5 May 2020 3:48:22 PM PDT Sean Crosby wrote:

> sacctmgr modify qos gpujobs set MaxTRESPerUser=gres/gpu=4

Also don't forget you need to tell Slurm to enforce QOS limits with:

AccountingStorageEnforce=safe,qos

in your Slurm configuration ("safe" is good to set, and turns on enforcement of 
other restrictions around associations too).  See:

https://slurm.schedmd.com/resource_limits.html

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] IPv6 for slurmd and slurmctld

2020-05-02 Thread Chris Samuel

On Friday, 1 May 2020 8:31:47 AM PDT Thomas Schäfer wrote:

> is there an switch, option, environment variable, configurable key word to
> enable IP6 for the slurmd and slurmctld daemons?

I don't believe those Slurm daemons support IPv6, my understanding is the only 
one that does is slurmrestd, see slide 22 of the presentation here:

https://slurm.schedmd.com/SLUG19/REST_API.pdf

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Munge decode failing on new node

2020-04-19 Thread Chris Samuel

On Friday, 17 April 2020 2:22:00 PM PDT Dean Schulze wrote:

> Both work.  The only discrepancy is that the slurm controller output had
> these two lines:
> 
> UID:  ??? (1000)
> GID:  ??? (1000)
> 
> Like the controller doesn't know the username for UID 1000.

What does this say on the controller and the compute node?

getent passwd 1000

Are you using LDAP or the like to ensure that all nodes have the same user 
database?

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Munge decode failing on new node

2020-04-15 Thread Chris Samuel


On 4/15/20 10:57 am, Dean Schulze wrote:


  error: Munge decode failed: Invalid credential
  ENCODED: Wed Dec 31 17:00:00 1969
  DECODED: Wed Dec 31 17:00:00 1969
  error: authentication: Invalid authentication credential


That's really interesting, I had one of these last week when on call, 
for us at least it seemed to be a hardware error as when attempting to 
reboot it the node failed completely and would no longer boot.


Worth checking whatever hardware logging capabilities your system has to 
see if MCE's are being reported.


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Header lengths are longer than data received after changing SelectType & GresTypes to use MPS

2020-04-08 Thread Chris Samuel


On 8/4/20 7:20 am, Eric Berquist wrote:

Once you’ve built SLURM, it’s enough to just have the GPU drivers on the 
nodes where SLURM will be installed.


Yeah I checked that at the Slurm User Group - slurmd will try and 
dlopen() the required libraries and should gracefully deal with them not 
being present.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Running an MPI job across two partitions

2020-03-24 Thread Chris Samuel


On 23/3/20 8:32 am, CB wrote:

I've looked at the heterogeneous job support but it creates two-separate 
jobs.


Yes, but the web page does say:

# By default, the applications launched by a single execution of
# the srun command (even for different components of the
# heterogeneous job) are combined into one MPI_COMM_WORLD with
# non-overlapping task IDs.

So it _should_ work.

I know there are issues with Cray systems & hetjobs at the moment, but I 
suspect that's not likely to concern you.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] slurmd -C showing incorrect core count

2020-03-11 Thread Chris Samuel


On 10/3/20 1:40 pm, mike tie wrote:


Here is the output of lstopo


Hmm, well I believe Slurm should be using hwloc (which provides lstopo) 
to get its information (at least it calls the xcpuinfo_hwloc_topo_get() 
function for that), so if lstopo works then slurmd should too.


Ah, looking a bit deeper I see in src/slurmd/common/xcpuinfo.c:

if (!hwloc_xml_whole)
hwloc_xml_whole = xstrdup_printf("%s/hwloc_topo_whole.xml",
 conf->spooldir);

Do you happen to have a file called "hwloc_topo_whole.xml" in your spool 
directory on that node?  I'm wondering if it's cached old config there.


If so move it out of the way somewhere safe (just in case) and try again.

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] slurmd -C showing incorrect core count

2020-03-09 Thread Chris Samuel


On 9/3/20 7:44 am, mike tie wrote:

Specifically, how is slurmd -C getting that info?  Maybe this is a 
kernel issue, but other than lscpu and /proc/cpuinfo, I don't know where 
to look.  Maybe I should be looking at the slurmd source?


It would be worth looking at what something like "lstopo" from the hwloc 
package says about your VM.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Question about determining pre-empted jobs

2020-02-29 Thread Chris Samuel


On 28/2/20 9:53 am, Jeffrey R. Lang wrote:

We have had a request to generate a report showing the number of jobs by 
date showing pre-empted jobs.   We used sacct to try to gather the data 
but we only found a few jobs with the state “PREEMPTED”.


It might be that if jobs are being set to be requeued then you'll need 
to use the --duplicates option to sacct to see previous iterations of 
the job when it was preempted.


Best of luck!
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Setup for backup slurmctld

2020-02-29 Thread Chris Samuel

On Wednesday, 26 February 2020 12:48:26 PM PST Joshua Baker-LePain wrote:

> We're planning the migration of our moderately sized cluster (~400 nodes,
> 40K jobs/day) from SGE to slurm.  We'd very much like to have a backup
> slurmctld, and it'd be even better if our backup slurmctld could be in a
> separate data center from the primary (though they'd still be on the same
> private network).  So, how are folks sharing the StateSaveLocation in such
> a setup?  Any and all recommendations (including those with the 2
> slurmctld servers in the same rack) welcome.  Thanks!

We use GPFS for our shared state directory (Cori is 12K nodes and we put 
5K-30K jobs a day through it, very variable job mix); the important thing is 
the IOPS rate for the filesystem, if it can't keep up with Slurm then you're 
going to see performance issues.

Tim from SchedMD had some notes on HA (and other things) from the Slurm 2017 
user group):  https://slurm.schedmd.com/SLUG17/FieldNotes.pdf

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] RHEL8 support - Missing Symbols in SelectType libraries

2020-02-21 Thread Chris Samuel


On 21/2/20 9:02 am, Tina Friedrich wrote:


In case that's of interest - this is actually SLURM 18.08.3 that I've
now gotten to run (I haven't quite managed to upgrade to 19 yet). I've
made minor modifications to the spec file - the unhardening of the flags
and the the python dependency.


From what I can see there's a fix in for 20.02 (the same change you've 
added from what I can see), but it's not (yet) backported to earlier 
releases.


commit d3b308aae6d63a9acecd50c0d63a5c8e3ff0086f
Author: Tim McMullan 
Date:   Fri Feb 14 08:25:06 2020 -0500

slurm.spec - disable "hardening" flags

Disable the "hardening" flags - '-z,relro' or '-z,now' that 
RHEL8/Fedora

    inject by default which break Slurm's plugin stack.

Bug 8499.

--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Is it safe to convert cons_res to cons_tres on a running system?

2020-02-20 Thread Chris Samuel


On 20/2/20 2:16 pm, Nathan R Crawford wrote:

   I interpret this as, in general, changing SelectType will nuke 
existing jobs, but that since cons_tres uses the same state format as 
cons_res, it should work.


We got caught with just this on our GPU nodes (though it was fixed 
before I got to see what was going on) - it seems that the format of the 
RPCs changes when you go from cons_res to cons_tres and we were having 
issues until we restarted slurmd on the compute nodes as well.


My memory is that this was causing issues for starting new jobs (in a 
failing completely type of manner), I'm not sure what the consequences 
were for running jobs (though I suspect it would not have been great for 
them).


If Doug sees this he may remember this (he caught and fixed it).

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Slurm Upgrade from 17.02

2020-02-19 Thread Chris Samuel


On 19/2/20 6:10 am, Ricardo Gregorio wrote:

I am putting together an upgrade plan for slurm on our HPC. We are 
currently running old version 17.02.11. Would you guys advise us 
upgrading to 18.08 or 19.05?


Slurm versions only support upgrading from 2 major versions back, so you 
could only upgrade from 17.02 to 17.11 or 18.08.  I'd suggest going 
straight to 18.08.


Remember you have to upgrade slurmdbd first, then upgrade slurmctld and 
then finally the slurmd's.


Also, as Ole points out, 20.02 is due out soon at which point 18.08 gets 
retired from support, so you'd probably want to jump to 19.05 from 18.08.


Don't forget to take backups first!  We do a mysqldump of the whole 
accounting DB and rsync backups of our state directories before an upgrade.


Best of luck!
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Inconsistent cpu bindings with cpu-bind=none

2020-02-17 Thread Chris Samuel


On 17/2/20 12:48 am, Marcus Boden wrote:


I am facing a bit of a weird issue with CPU bindings and mpirun:


I think if you want Slurm to have any control over bindings you'll be 
wanting to use srun to launch your MPI program, not mpirun.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Cluster usage with Slurm

2020-02-17 Thread Chris Samuel


On 17/2/20 4:19 am, Parag Khuraswar wrote:


Does Slurm  provide cluster usage reports like mentioned below ?


For the detailed info you're being asked for I'd probably suggest 
looking at the OpenXDMoD project.


https://open.xdmod.org/

Its "shredder" data importer can import data from a bunch of different 
batch systems, including Slurm.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Longer queuing times for larger jobs

2020-02-12 Thread Chris Samuel


On 5/2/20 1:44 pm, Antony Cleave wrote:

Hi, from what you are describing it sounds like jobs are backfilling in 
front and stopping the large jobs from starting


We use a feature that SchedMD implemented for us called 
"bf_min_prio_reserve" which lets you set a priority threshold below 
which Slurm won't make a forward reservation for a job (and so can only 
start if it can start right now without delaying other jobs).


https://slurm.schedmd.com/slurm.conf.html#OPT_bf_min_prio_reserve

So if you can arrange your local priority system so that large jobs are 
over that threshold and smaller jobs are below it (or whatever suits 
your use case) then you should have a way to let these large jobs get a 
reliable start time without smaller jobs pushing them back in time.


There's some useful background from the bug where this was implemented:

https://bugs.schedmd.com/show_bug.cgi?id=2565

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] How should I configure a node with Autodetect=nvml?

2020-02-11 Thread Chris Samuel

On Tuesday, 11 February 2020 7:27:56 AM PST Dean Schulze wrote:

> No other errors in the logs.  Identical slurm.conf on all nodes and
> controller.  Only the node with gpus has the gres.conf (with the single
> line Autodetect=nvml).

It might be useful to post the output of "slurmd -C" and your slurm.conf for 
us to see (sorry if you've done that already and I've not seen it).

You can also increase the debug level for slurmctld and slurm in slurm.conf 
(we typically run with SlurmctldDebug=debug, you may want to try 
SlurmdDebug=debug whilst experimenting).

Best of luck,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] How should I configure a node with Autodetect=nvml?

2020-02-10 Thread Chris Samuel

On Monday, 10 February 2020 12:11:30 PM PST Dean Schulze wrote:

> With this configuration I get this message every second in my slurmctld.log
> file:
> 
> error: _slurm_rpc_node_registration node=slurmnode1: Invalid argument

What other errors are in the logs?

Could you check that you've got identical slurm.conf and gres.conf files 
everywhere?

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] sacct does always print all jobs regardless filter parameters with accounting_storage/filetxt

2020-01-31 Thread Chris Samuel


On 30/1/20 10:20 am, Dr. Thomas Orgis wrote:


Matching for user (-u) and Job ID (-j) works, but not -N/-S/-E. So is
this just the current state and it's up to me to provide a patch to
enable it if I want that behaviour?


You're using a very very very old version of slurm there (15.08), you 
should upgrade to a recent one (I'd suggest 19.05.5) to check whether 
it's been fixed in the intervening years.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Question about slurm source code and libraries

2020-01-25 Thread Chris Samuel


On 25/1/20 8:08 am, dean.w.schu...@gmail.com wrote:


I'm working on the 19.05.4 source code since it is stable, but I would  prefer 
to use the same C REST library that will be used in 20.02.  Does anyone know 
what C library that is?


They're using OpenAPI (formerly Swagger) for this (see slide 5), and it 
seems that includes a code generator for various languages.


https://swagger.io/tools/swagger-codegen/

Their source code is on Github here:

https://github.com/swagger-api/swagger-codegen

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Multinode blast run

2020-01-24 Thread Chris Samuel


On 24/1/20 3:46 am, Mahmood Naderan wrote:


Has anyone run blast on multiple nodes via slurm?


I don't think blast is something that can run across nodes (or at least 
it didn't used to be).  There is/was something called "mpiblast" that 
could do that.


If you'll excuse the plug this sounds like a good question for the 
Beowulf list https://www.beowulf.org/ which is a more general purpose 
cluster computing list (disclaimer: I'm the caretaker of it these days).


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Can't get node out of drain state

2020-01-23 Thread Chris Samuel


On 23/1/20 7:09 pm, Dean Schulze wrote:

Pretty strange that having a Gres= property on a node that doesn't have 
a gpu would get it stuck in the drain state.


Slurm verifies that nodes have the capabilities you say they have so 
that should a node boot with less RAM than it should have, or a socket 
hidden or should a GPU fail and a node reboot you'll know about it and 
not blindly send jobs to it only for them to find they fail because they 
no longer meet their requirements.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Node can't run simple job when STATUS is up and STATE is idle

2020-01-20 Thread Chris Samuel


On 20/1/20 3:00 pm, Dean Schulze wrote:

There's either a problem with the source code I cloned from github, or 
there is a problem when the controller runs on Ubuntu 19 and the node 
runs on CentOS 7.7.  I'm downgrading to a stable 19.05 build to see if 
that solves the problem.


I've run the master branch on a Cray XC without issues, and I concur 
with what the others have said and suggest it's worth checking the 
slurmd and slurmctld logs to find out why communications is not right 
between them.


Good luck,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Job completed but child process still running

2020-01-13 Thread Chris Samuel


On 1/13/20 5:55 am, Youssef Eldakar wrote:

In an sbatch script, a user calls a shell script that starts a Java 
background process. The job immediately is completed, but the child Java 
process is still running on the compute node.


Is there a way to prevent this from happening?


What I would recommend is to use Slurm's cgroups support so that 
processes that put themselves into the background this way are tracked 
as part of the job and cleaned up when the job exits.


https://slurm.schedmd.com/cgroups.html

Depending on how the Java process puts itself into the background you 
could try adding a "wait" command at the end of the shell script so that 
it doesn't exit immediately (it's not guaranteed though).


With cgroups the Slurm script could also check the processes in your 
cgroup to monitor the existence of the Java process, sleeping for a 
while between checks, and exit when it's no longer found.  For instance 
once you've got the PID of the Java process you can use "kill -0 $PID" 
to check if it's still there (rather than using ps).


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Submission without Scheduling

2019-12-14 Thread Chris Samuel

Hey Lev! :-)

On Monday, 2 December 2019 2:29:06 PM PST Lev Lafayette wrote:

> An idea that bouncing around our site at the moment is the possibility of
> jobs being submitted without being scheduled, given that these are two
> separate functions.

Could you expand on that - do you mean some way to submit jobs whilst 
slurmctld is down, or just whilst nodes are down?

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Maxjobs to accrue age priority points

2019-12-13 Thread Chris Samuel

On Friday, 13 December 2019 7:01:48 AM PST Christopher Benjamin Coffey wrote:

> Maybe because that setting is just not included in the default list of
> settings shown? That is counterintuitive to this in the man page for
> sacctmgr:
> 
> show  []
>   Display information about the specified entity.  By default,
> all entries are displayed, you can narrow results by specifying SPECS in
> your query.  Identical to the list command.
> 
> Thoughts? Thanks!

I _suspect_ what that's saying is that it is has a default list that you can 
narrow, not that specifying it there will show it if it's not part of the 
default list.

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] error: persistent connection experienced an error

2019-12-13 Thread Chris Samuel


On 13/12/19 12:19 pm, Christopher Benjamin Coffey wrote:


error: persistent connection experienced an error


Looking at the source code that comes from here:

if (ufds.revents & POLLERR) {
error("persistent connection experienced an error");
return false;
}

So your TCP/IP stack reported a problem with an existing connection.

That's very odd if you're on the same box.

If you are on a large system or putting a lot of small jobs through 
quickly then it's worth checking out the Slurm HTC guide for networking:


https://slurm.schedmd.com/high_throughput.html

Good luck..

Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Multi-node job failure

2019-12-12 Thread Chris Samuel


On 11/12/19 8:05 am, Chris Woelkers - NOAA Federal wrote:

Partial progress. The scientist that developed the model took a look at 
the output and found that instead of one model run being ran in parallel 
srun had ran multiple instances of the model, one per thread, which for 
this test was 110 threads.


This sounds like MVAPICH isn't built to support Slurm, from the Slurm 
MPI guide you need to build it with this to enable Slurm support (and of 
course add any other options you were using):


./configure --with-pmi=pmi2 --with-pm=slurm

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Is that possible to submit jobs to a Slurm cluster right from a developer's PC

2019-12-12 Thread Chris Samuel


On 12/12/19 7:38 am, Ryan Cox wrote:

Be careful with this approach.  You also need the same munge key 
installed everywhere.  If the developers have root on their own system, 
they can submit jobs and run Slurm commands as any user.


I would echo Ryan's caution on this and add that as root they will be 
able to run admin commands on the box too, create reservations, shut 
Slurm down, cancel other users jobs, etc.


At the Slurm User Group this year Tim Wickberg foreshadowed (and demo'd 
with a very neat "pay-for-priority" box) a REST API planned for the 
Slurm 20.02 release.  It has its own auth system separate to munge and 
would make this a lot safer.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Need help with controller issues

2019-12-12 Thread Chris Samuel


On 12/12/19 8:14 am, Dean Schulze wrote:

configure:5021: gcc -o conftest -I/usr/include/mysql -g -O2   conftest.c 
-L/usr/lib/x86_64-linux-gnu -lmysqlclient -lpthread -lz -lm -lrt 
-latomic -lssl -lcrypto -ldl  >&5

/usr/bin/ld: cannot find -lssl
/usr/bin/ld: cannot find -lcrypto
collect2: error: ld returned 1 exit status


That looks like your failure, you're missing the package that provides 
those libraries it's trying to use - in this case for Debian/Ubuntu I 
suspect it's libssl-dev.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Maxjobs to accrue age priority points

2019-12-12 Thread Chris Samuel


Hi Chris,

On 12/12/19 3:16 pm, Christopher Benjamin Coffey wrote:


What am I missing?


It's just a setting on the QOS, not the user:

csamuel@cori01:~> sacctmgr show qos where name=regular_1 
format=MaxJobsAccruePerUser

MaxJobsAccruePU
---
  2

So any user in that QOS can only have 2 jobs ageing at any one time.

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Need help with controller issues

2019-12-11 Thread Chris Samuel


On 11/12/19 11:31 am, Eli V wrote:


Look for libmariadb-client. That's needed for slurmdbd on debian.


Looking at the output from building some Slurm 19.05.4 RPMs earlier 
tonight, this is what I see in the output of configure:


[...]
checking for mysql_config... /usr/bin/mysql_config
MySQL 10.4.3 test program built properly.
[...]

You should look at the the config.log for the gory details of what it's 
trying to discover and what it found (or didn't).


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Need help with controller issues

2019-12-10 Thread Chris Samuel

On Tuesday, 10 December 2019 1:57:59 PM PST Dean Schulze wrote:

> This bug report from a couple of years ago indicates a source code issue:
> 
> https://bugs.schedmd.com/show_bug.cgi?id=3278
> 
> This must have been fixed by now, though.
> 
> I built using slurm-19.05.2.  Does anyone know if this has been fixed in
> 19.05.4?

I don't think this is a Slurm issue - have you checked that you have the 
MariaDB development package for your distro installed before trying to buidl 
Slurm?   It will skip things it doesn't find and that could explain what you're 
seeing.

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Multi-node job failure

2019-12-10 Thread Chris Samuel

Hi Chris,

On Tuesday, 10 December 2019 11:49:44 AM PST Chris Woelkers - NOAA Federal 
wrote:

> Test jobs, submitted via sbatch, are able to run on one node with no problem
> but will not run on multiple nodes. The jobs are using mpirun and mvapich2
> is installed.

Is there a reason why you aren't using srun for launching these?

https://slurm.schedmd.com/mpi_guide.html

If you're using mpirun then (unless you've built mvapich2 with Slurm support) 
then you'll be relying on ssh to launch tasks and so that could be what's 
broken for you.  Running with srun will avoid that and allow Slurm to track 
your processes correctly.

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Slurm configuration, Weight Parameter

2019-11-23 Thread Chris Samuel


On 23/11/19 9:14 am, Chris Samuel wrote:

My gut instinct (and I've never tried this) is to make the 3GB nodes be 
in a separate partition that is guarded by AllowQos=3GB and have a QOS 
called "3GB" that uses MinTRESPerJob to require jobs to ask for more 
than 2GB of RAM to be allowed into the QOS.


Of course there's nothing to stop a user requesting more memory than 
they need to get access to these nodes, but that's a social issue not a 
technical one. :-)


--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Slurm configuration, Weight Parameter

2019-11-23 Thread Chris Samuel


On 21/11/19 7:25 am, Sistemas NLHPC wrote:

Currently we have two types of nodes, one with 3GB and another with 2GB 
of RAM, it is required that in nodes of 3 GB it is not allowed to 
execute tasks with less than 2GB, to avoid underutilization of resources.


My gut instinct (and I've never tried this) is to make the 3GB nodes be 
in a separate partition that is guarded by AllowQos=3GB and have a QOS 
called "3GB" that uses MinTRESPerJob to require jobs to ask for more 
than 2GB of RAM to be allowed into the QOS.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] Force a use job to a node with state=drain/maint

2019-11-23 Thread Chris Samuel


On 23/11/19 8:54 am, René Neumaier wrote:


In general, is it possible to move a pending job (means forcing as root)
to a specific node which is marked as DRAIN for troubleshooting?


I don't believe so.  Put a reservation on the node first only for this 
user, add the reservation to the job then resume the node.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

1 2 3 4 >

1 - 100 of 391 matches

Mail list logo