Re: [slurm-users] Slurm Upgrade Philosophy?

2020-12-23 Thread Chris Samuel
On Friday, 18 December 2020 10:10:19 AM PST Jason Simms wrote:

> Thanks to several helpful members on this list, I think I have a much better
> handle on how to upgrade Slurm. Now my question is, do most of you upgrade
> with each major release?

We do, though not immediately and not without a degree of testing on our test 
systems.  One of the big reasons for us upgrading is that we've usually paid 
for features in Slurm for our needs (for example in 20.11 that includes 
scrontab so users won't be tied to favourite login nodes, as well as  the 
experimental RPC queue code due to the large numbers of RPCs our systems need 
to cope with).

I also keep an eye out for discussions of what other sites find with new 
releases too, so I'm following the current concerns about 20.11 and the change 
in behaviour for job steps that do (expanding NVIDIA's example slightly):

#SBATCH --exclusive
#SBATCH -N2
srun --ntasks-per-node=1 python multi_node_launch.py

which (if I'm reading the bugs correctly) fails in 20.11 as that srun no 
longer gets all the allocated resources, instead just gets the default of
--cpus-per-task=1 instead, which also affects things like mpirun in OpenMPI 
built with Slurm support (as it effectively calls "srun orted" and that "orted" 
launches the MPI ranks, so in 20.11 it only has access to a single core for 
them all to fight over).  Again - if I'm interpreting the bugs correctly!

I don't currently have a test system that's free to try 20.11 on, but 
hopefully early in the new year I'll be able to test this out to see how much 
of an impact this is going to have and how we will manage it.

https://bugs.schedmd.com/show_bug.cgi?id=10383
https://bugs.schedmd.com/show_bug.cgi?id=10489

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA






Re: [slurm-users] [External] Slurm Upgrade Philosophy?

2020-12-23 Thread Prentice Bisbal
We generally upgrade within 1-2 maintenance windows of a new release 
coming out, so within a couple of months of the release being available. 
For minor updates, we update at the next maintenance window. At one 
point, we were stuck several release behind. Getting all caught up 
wasn't that bad. I think were were on 15.something and upgrading to 
17.11, or something like that.


For that upgrade, most of the work happened on the DB server, I upgraded 
one-at-a-time through each release until we were current on the SlurmDB, 
and then made a single upgrade to the latest version on the slurm 
controller and compute nodes.


Our Slurm DB is small, so the upgrade changes to the DB only took a few 
minutes per upgrade. For larger DBs, it can take hours per upgrade.


This is one reason people like to keep current with Slurm - it makes 
future upgrades that much easier. On maintenance window of only a couple 
hours is more palatable than a couple of days of downtime. Also, when 
you jump several updates, it's hard to tell when a new "feature" or 
"bug" was introduced, which makes identifying the source and 
fixing/understanding the new behavior that much harder.


Prentice

On 12/18/20 1:10 PM, Jason Simms wrote:

Hello all,

Thanks to several helpful members on this list, I think I have a much 
better handle on how to upgrade Slurm. Now my question is, do most of 
you upgrade with each major release?


I recognize that, normally, if something is working well, then don't 
upgrade it! In our case, we're running 20.02, and it seems to be 
working well for us. The notes for 20.11 don't indicate any "must 
have" features for our use cases, but I'm still new to Slurm, so maybe 
there is a hidden benefit I can't immediately see.


Given that, I would normally not consider upgrading. But as I 
understand it, you cannot upgrade more than two major releases back, 
so if I skip this one, I'd have to upgrade to (presumably) 21.08, or 
else I'd have to "double upgrade" if, e.g., I wanted to go from 20.02 
to 22.05.


To prevent that, do most people try to stay within the most recent two 
versions? Or do you go as long as you possibly can with your existing 
version, upgrading only if you absolutely must?


Warmest regards,
Jason

--
*Jason L. Simms, Ph.D., M.P.H.*
Manager of Research and High-Performance Computing
XSEDE Campus Champion
Lafayette College
Information Technology Services
710 Sullivan Rd | Easton, PA 18042
Office: 112 Skillman Library
p: (610) 330-5632




[slurm-users] Power Saving Issue - Job B is executed before Job A - node not ready?

2020-12-23 Thread Eg. Bo.
Hello,
Slurm Power Saving (19.05.) was configured successfuly within our Cloud 
environment. Jobs can be submitted and nodes get provisioned and deprovisioned 
as expected. Unfortunately there seems to be an edge case (or config issue 
:-D).After a job (jobA) is submitted to partition A, node provisioning starts, 
during that phase another job (jobB) is submitted to the partition including 
requesting the same node (-w) - not sure if this is really a must have right 
now. The edge case is based on application job scheduling.
Unfortunately jobB runs before jobA and fails, but few seconds after jobA 
finishes successfully. Therefore the configuration should be ok - overall.
srun: error: Unable to resolve "mynodename": Host name lookup failuresrun: 
error: fwd_tree_thread: can't find address for host mynodename check 
slurm.confsrun: error: Task launch for 123456.0 failed on node mynodename: 
Can't find an address, check slurm.confsrun: error: Application launch failed: 
Can't find an address, check slurm.confsrun: Job step aborted: Waiting up to 
188 seconds for job step to finish.srun: error: Timed out waiting for job step 
to complete
It looks like slurmctld applies some magic to jobA (Resetting JobId=jobidA 
start time for node power up) but not to jobB.
update_node: node mynodename state set to ALLOCATEDNode mynodename2 now 
respondingNode mynodename now respondingupdate_node: node mynodename state set 
to ALLOCATED_pick_step_nodes: Configuration for JobId=jobidB is 
completejob_step_signal: JobId=jobidB StepId=0 not found_pick_step_nodes: 
Configuration for JobId=jobidA is completeResetting JobId=jobidA start time for 
node power up_job_complete: JobId=jobidA WEXITSTATUS 0_job_complete: 
JobId=jobidA donejob_step_signal: JobId=jobidB StepId=0 not found_job_complete: 
JobId=jobidB WTERMSIG 116_job_complete: JobId=jobidB done

Has anyone seen this before or any idea how to fix it?


Thanks & Best
Eg. Bo.