[slurm-users] New member , introduction

2023-09-29 Thread John Joseph
Dear All, 
Thanks for the mailing list. Just joined the list 
Like to introduce myself, My Name Joseph John work as system administrator. 
Have been working on LINUX, but novice to HPC and slurm. Trying to learn 
Thanks 
Joseph John 



Re: [slurm-users] Verifying preemption WON'T happen

2023-09-29 Thread Groner, Rob
Well again, I don't want to tweak things just to get the test to happen 
quicker.  I DO have to keep in mind the scheduler and backfill settings, 
though.  For instance, I think the default scheduler and backfill interval is 
60 and 30 seconds...or vice versa.  So, before I check the Scheduler value for 
the high priority job via scontrol, I wait 90 seconds and then some.  In a 
perfect world, that SHOULD have given the scheduler and backfill scheduler time 
to get to it.  I THINK, however, that in a sufficiently busy system, there's no 
guarantee even after that amount of time that the new high priority job has 
been evaluated.

I'll take a look at sdiag and see if it can tell me where the job is at, thanks 
for the suggestion.

Rob


From: slurm-users  on behalf of Ryan 
Novosielski 
Sent: Friday, September 29, 2023 4:19 PM
To: Slurm User Community List 
Subject: Re: [slurm-users] Verifying preemption WON'T happen

You can get some information on that from sdiag, and there are tweaks you can 
make to backfill scheduling that affect how quickly it will get to a job.

That doesn’t really answer your real question, but might help you when you are 
looking into this.

Sent from my iPhone

On Sep 29, 2023, at 16:10, Groner, Rob  wrote:


I'm not looking for a one-time answer.  We run these tests anytime we change 
anything related to slurmversion, configuration, etc.We certainly run 
the test after the system comes back up after an outage, and an hour would be a 
long time to wait for that.  That's certainly the brute-force approach, but I'm 
hoping there's a definitive way to show, through scontrol job output, that the 
job won't preempt.

I could set the preemptexempttime to a smaller value, say 5 minutes instead of 
1 hour, that is true, but there's a few issues with that.


  1.  I would then no longer be testing the system as it actually is.  I want 
to test the system in its actual production configuration.
  2.  If I did lower its value, what would be a safe value?  5 minutes?  Does 
running for 5 minutes guarantee that the higher priority job had a chance to 
preempt it but didn't?  Or did the scheduler even ever get to it?  On a test 
cluster with few jobs, you could be reasonably assured it did, but running 
tests on the production cluster...isn't it possible the scheduler hasn't yet 
had a chance to process it, even after 5 minutes?  Depends on the slurm 
scheduler  settings I suppose

rob


From: slurm-users  on behalf of 
Bernstein, Noam CIV USN NRL (6393) Washington DC (USA) 

Sent: Friday, September 29, 2023 3:14 PM
To: Slurm User Community List 
Subject: Re: [slurm-users] Verifying preemption WON'T happen

You don't often get email from noam.bernst...@nrl.navy.mil. Learn why this is 
important
On Sep 29, 2023, at 2:51 PM, Davide DelVento 
mailto:davide.quan...@gmail.com>> wrote:

I don't really have an answer for you other than a "hallway comment", that it 
sounds like a good thing which I would test with a simulator, if I had one. 
I've been intrigued by (but really not looked much into) 
https://slurm.schedmd.com/SLUG23/LANL-Batsim-SLUG23.pdf

On Fri, Sep 29, 2023 at 10:05 AM Groner, Rob 
mailto:rug...@psu.edu>> wrote:

I could obviously let the test run for an hour to verify the lower priority job 
was never preempted...but that's not really feasible.

Why not? Isn't it going to take longer than an hour to wait for responses to 
this post? Also, you could set up the minimum time to a much smaller value, so 
it won't take as long to test.


Re: [slurm-users] Verifying preemption WON'T happen

2023-09-29 Thread Ryan Novosielski
You can get some information on that from sdiag, and there are tweaks you can 
make to backfill scheduling that affect how quickly it will get to a job.

That doesn’t really answer your real question, but might help you when you are 
looking into this.

Sent from my iPhone

On Sep 29, 2023, at 16:10, Groner, Rob  wrote:


I'm not looking for a one-time answer.  We run these tests anytime we change 
anything related to slurmversion, configuration, etc.We certainly run 
the test after the system comes back up after an outage, and an hour would be a 
long time to wait for that.  That's certainly the brute-force approach, but I'm 
hoping there's a definitive way to show, through scontrol job output, that the 
job won't preempt.

I could set the preemptexempttime to a smaller value, say 5 minutes instead of 
1 hour, that is true, but there's a few issues with that.


  1.  I would then no longer be testing the system as it actually is.  I want 
to test the system in its actual production configuration.
  2.  If I did lower its value, what would be a safe value?  5 minutes?  Does 
running for 5 minutes guarantee that the higher priority job had a chance to 
preempt it but didn't?  Or did the scheduler even ever get to it?  On a test 
cluster with few jobs, you could be reasonably assured it did, but running 
tests on the production cluster...isn't it possible the scheduler hasn't yet 
had a chance to process it, even after 5 minutes?  Depends on the slurm 
scheduler  settings I suppose

rob


From: slurm-users  on behalf of 
Bernstein, Noam CIV USN NRL (6393) Washington DC (USA) 

Sent: Friday, September 29, 2023 3:14 PM
To: Slurm User Community List 
Subject: Re: [slurm-users] Verifying preemption WON'T happen

You don't often get email from noam.bernst...@nrl.navy.mil. Learn why this is 
important
On Sep 29, 2023, at 2:51 PM, Davide DelVento 
mailto:davide.quan...@gmail.com>> wrote:

I don't really have an answer for you other than a "hallway comment", that it 
sounds like a good thing which I would test with a simulator, if I had one. 
I've been intrigued by (but really not looked much into) 
https://slurm.schedmd.com/SLUG23/LANL-Batsim-SLUG23.pdf

On Fri, Sep 29, 2023 at 10:05 AM Groner, Rob 
mailto:rug...@psu.edu>> wrote:

I could obviously let the test run for an hour to verify the lower priority job 
was never preempted...but that's not really feasible.

Why not? Isn't it going to take longer than an hour to wait for responses to 
this post? Also, you could set up the minimum time to a much smaller value, so 
it won't take as long to test.


Re: [slurm-users] Verifying preemption WON'T happen

2023-09-29 Thread Groner, Rob
I'm not looking for a one-time answer.  We run these tests anytime we change 
anything related to slurmversion, configuration, etc.We certainly run 
the test after the system comes back up after an outage, and an hour would be a 
long time to wait for that.  That's certainly the brute-force approach, but I'm 
hoping there's a definitive way to show, through scontrol job output, that the 
job won't preempt.

I could set the preemptexempttime to a smaller value, say 5 minutes instead of 
1 hour, that is true, but there's a few issues with that.


  1.  I would then no longer be testing the system as it actually is.  I want 
to test the system in its actual production configuration.
  2.  If I did lower its value, what would be a safe value?  5 minutes?  Does 
running for 5 minutes guarantee that the higher priority job had a chance to 
preempt it but didn't?  Or did the scheduler even ever get to it?  On a test 
cluster with few jobs, you could be reasonably assured it did, but running 
tests on the production cluster...isn't it possible the scheduler hasn't yet 
had a chance to process it, even after 5 minutes?  Depends on the slurm 
scheduler  settings I suppose

rob


From: slurm-users  on behalf of 
Bernstein, Noam CIV USN NRL (6393) Washington DC (USA) 

Sent: Friday, September 29, 2023 3:14 PM
To: Slurm User Community List 
Subject: Re: [slurm-users] Verifying preemption WON'T happen

You don't often get email from noam.bernst...@nrl.navy.mil. Learn why this is 
important
On Sep 29, 2023, at 2:51 PM, Davide DelVento 
mailto:davide.quan...@gmail.com>> wrote:

I don't really have an answer for you other than a "hallway comment", that it 
sounds like a good thing which I would test with a simulator, if I had one. 
I've been intrigued by (but really not looked much into) 
https://slurm.schedmd.com/SLUG23/LANL-Batsim-SLUG23.pdf

On Fri, Sep 29, 2023 at 10:05 AM Groner, Rob 
mailto:rug...@psu.edu>> wrote:

I could obviously let the test run for an hour to verify the lower priority job 
was never preempted...but that's not really feasible.

Why not? Isn't it going to take longer than an hour to wait for responses to 
this post? Also, you could set up the minimum time to a much smaller value, so 
it won't take as long to test.


Re: [slurm-users] Verifying preemption WON'T happen

2023-09-29 Thread Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)
On Sep 29, 2023, at 2:51 PM, Davide DelVento 
mailto:davide.quan...@gmail.com>> wrote:

I don't really have an answer for you other than a "hallway comment", that it 
sounds like a good thing which I would test with a simulator, if I had one. 
I've been intrigued by (but really not looked much into) 
https://slurm.schedmd.com/SLUG23/LANL-Batsim-SLUG23.pdf

On Fri, Sep 29, 2023 at 10:05 AM Groner, Rob 
mailto:rug...@psu.edu>> wrote:

I could obviously let the test run for an hour to verify the lower priority job 
was never preempted...but that's not really feasible.

Why not? Isn't it going to take longer than an hour to wait for responses to 
this post? Also, you could set up the minimum time to a much smaller value, so 
it won't take as long to test.


Re: [slurm-users] Verifying preemption WON'T happen

2023-09-29 Thread Davide DelVento
I don't really have an answer for you other than a "hallway comment", that
it sounds like a good thing which I would test with a simulator, if I had
one. I've been intrigued by (but really not looked much into)
https://slurm.schedmd.com/SLUG23/LANL-Batsim-SLUG23.pdf

On Fri, Sep 29, 2023 at 10:05 AM Groner, Rob  wrote:

> On our system, for some partitions, we guarantee that a job can run at
> least an hour before being preempted by a higher priority job.  We use the
> QOS preempt exempt time for this, and it appears to be working.  But of
> course, I want to TEST that it works.
>
> So on a test system, I start a lower priority job on a specific node, wait
> until it starts running, and then I start a higher priority job for the
> same node.  The test should only pass if the higher priority job has an
> OPPORTUNITY to preempt the lower priority job, and doesn't.
>
> Now, I know I can get a preempt eligible time out of scontrol for the
> lower priority job and verify that it's set for an hour (I do check that
> already), but that's not good enough for me.  I could obviously let the
> test run for an hour to verify the lower priority job was never
> preempted...but that's not really feasible.  So instead, I want to verify
> that the higher priority job has had a chance to preempt the lower priority
> job, and it did not.
>
> So far, the way I've been doing that is to check the reported Scheduler in
> the scontrol job output for the higher priority job.  I figure that when
> the scheduler changes to Backfill instead of Main, then the higher priority
> job has been seen by the main scheduler and it passed on the chance to
> preempt the lower priority job.
>
> Is that a good assumption?  Is there any other, or potentially quicker,
> way to verify that the higher priority job will NOT preempt the lower
> priority job?
>
> Rob
>


[slurm-users] FAQ Errata: Can the make command utilize the resources allocated to a Slurm job? answer is out of data

2023-09-29 Thread Cook, Malcolm
FWIW:  

The answer to the question "Can the make command utilize the resources 
allocated to a Slurm job?" [1] is out of date.  The patch mentioned is no 
longer in the distribution, and personal inspection finds that it no longer 
applies to newer versions of Gnu Make.

[1] https://slurm.schedmd.com/faq.html#parallel_make






Re: [slurm-users] Steps to upgrade slurm for a patchlevel change?

2023-09-29 Thread Ole Holm Nielsen

On 29-09-2023 17:33, Ryan Novosielski wrote:
I’ll just say, we haven’t done an online/jobs running upgrade recently 
(in part because we know our database upgrade will take a long time, and 
we have some processes that rely on -M), but we have done it and it does 
work fine. So the paranoia isn’t necessary unless you know that, like 
us, the DB upgrade time is not tenable (Ole’s wiki has some great 
suggestions for how to test that, but they aren’t especially Slurm 
specific, it’s just a dry-run).


Slurm upgrades are clearly documented by SchedMD, and there's no reason 
to worry if you follow the official procedures.  At least, it has always 
worked for us :-)


Just my 2 cents: The detailed list of upgrade steps/commands (first dbd, 
then ctld, then slurmds, finally login nodes) are documented in my Wiki 
page 
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/#upgrading-slurm


The Slurm dbd upgrade instructions in 
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/#make-a-dry-run-database-upgrade 
are totally Slurm specific, since that's the only database upgrade I've 
ever made :-)  I highly recommend doing the database dry-run upgrade on 
a test node before doing the real dbd upgrade!


/Ole



[slurm-users] Verifying preemption WON'T happen

2023-09-29 Thread Groner, Rob
On our system, for some partitions, we guarantee that a job can run at least an 
hour before being preempted by a higher priority job.  We use the QOS preempt 
exempt time for this, and it appears to be working.  But of course, I want to 
TEST that it works.

So on a test system, I start a lower priority job on a specific node, wait 
until it starts running, and then I start a higher priority job for the same 
node.  The test should only pass if the higher priority job has an OPPORTUNITY 
to preempt the lower priority job, and doesn't.

Now, I know I can get a preempt eligible time out of scontrol for the lower 
priority job and verify that it's set for an hour (I do check that already), 
but that's not good enough for me.  I could obviously let the test run for an 
hour to verify the lower priority job was never preempted...but that's not 
really feasible.  So instead, I want to verify that the higher priority job has 
had a chance to preempt the lower priority job, and it did not.

So far, the way I've been doing that is to check the reported Scheduler in the 
scontrol job output for the higher priority job.  I figure that when the 
scheduler changes to Backfill instead of Main, then the higher priority job has 
been seen by the main scheduler and it passed on the chance to preempt the 
lower priority job.

Is that a good assumption?  Is there any other, or potentially quicker, way to 
verify that the higher priority job will NOT preempt the lower priority job?

Rob


Re: [slurm-users] Steps to upgrade slurm for a patchlevel change?

2023-09-29 Thread Groner, Rob
My team lead brought that up also, that we could go ahead and change the 
symlink that EVERYTHING uses, and nothing would happen...until the service is 
restarted.  That's good that it's not a timing-related change.  Of course, we 
do run the risk that a node will variously reboot on its own, and thus pick up 
the change before we're expecting it to.  For patch level changes, that really 
wouldn't be a problem, but if we consider doing this for a major version 
change, then it probably matters more.


From: slurm-users  on behalf of Ryan 
Novosielski 
Sent: Friday, September 29, 2023 11:33 AM
To: Slurm User Community List 
Subject: Re: [slurm-users] Steps to upgrade slurm for a patchlevel change?

You don't often get email from novos...@rutgers.edu. Learn why this is 
important
I’ll just say, we haven’t done an online/jobs running upgrade recently (in part 
because we know our database upgrade will take a long time, and we have some 
processes that rely on -M), but we have done it and it does work fine. So the 
paranoia isn’t necessary unless you know that, like us, the DB upgrade time is 
not tenable (Ole’s wiki has some great suggestions for how to test that, but 
they aren’t especially Slurm specific, it’s just a dry-run).

As far as the shared symlink thing goes, I think you’d be fine, dependent on 
whether or not you have anything else stored in the shared software tree, 
changing the symlink and just not restarting compute nodes’ slurmd until you’re 
ready — though again, you can do this while jobs are running, so there’s not 
really a reason to wait, except in cases like ours where it’s just easier to 
reboot the node than one process for running nodes, and then rebooting, and 
wanting to be sure that the rebooted compute node and the running upgraded node 
will operate exactly the same.

On Sep 29, 2023, at 10:10, Paul Edmon  wrote:

This is one of the reasons we stick with using RPM's rather than the symlink 
process. It's just cleaner and avoids the issue of having the install on shared 
storage that may get overwhelmed with traffic or suffer outages. Also the 
package manager automatically removes the previous versions and local installs 
stuff. I've never been a fan of the symlink method has it runs counter to the 
entire point and design of Linux and package managers which are supposed to do 
this heavy lifting for you.

Rant aside :). Generally for minor upgrades the process is less touchy. For our 
setup we follow the following process that works well for us, but does create 
an outage for the period of the upgrade.

1. Set all partitions to down: This makes sure no new jobs are scheduled.
2. Suspend all jobs: This makes sure jobs aren't running while we upgrade.
3. Stop slurmctld and slurmdbd.
4. Upgrade the slurmdbd. Restart slurmdbd
5. Upgrade the slurmd and slurmctld across the cluster.
6. Restart slurmd and slurmctld simultaneously using choria.
7. Unsuspend all jobs
8. Reopen all partitions.

For major upgrades we always take a mysqldump and backup the spool for the 
slurmctld before upgrading just in case something goes wrong. We've had this 
happen before when the slurmdbd upgrade cut out early (note, always run the 
slurmdbd and slurmctld upgrades in -D mode and not via systemctl as systemctl 
can timeout and kill the upgrade midway for large upgrades).

That said I've also skipped steps 1, 2, 7, and 8 before for minor upgrades and 
it works fine. The slurmd, slurmctld, and slurmdbd can all run on different 
versions so long as the slurmdbd > slurmctld > slurmd.  So if you want to do a 
live upgrade you can do it. However out paranoia we general stop everything. 
The entire process takes about an hour start to finish, with the longest part 
being the pausing of all the jobs.

-Paul Edmon-

On 9/29/2023 9:48 AM, Groner, Rob wrote:
I did already see the upgrade section of Jason's talk, but it wasn't much about 
the mechanics of the actual upgrade process, more of a big picture it seemed.  
It dealt a lot with different parts of slurm at different versions, which is 
something we don't have.

One little wrinkle here is that while, yes, we're using a symlink to point to 
what version of slurm is the current one...it's all on a shared filesystem.  
So, ALL nodes, slurmdb, slurmctld are using that same symlink.  There is no 
means to upgrade one component at a time.  That means to upgrade, EVERYTHING 
has to come down before it could come back up.  Jason's slides seemed to 
indicate that, if there were separate symlinks, then I could focus on just the 
slurmdb first and upgrade it...then focus on slurmctld and upgrade it, and then 
finally the nodes (take down their slurmd, upgrade the link, bring up slurmd).  
So maybe that's what I'm missing.

Otherwise, I think what I'm saying is that I see references to a "rolling 
upgrade", but I don't see any guide to a rolling upgrade.  I just see the 14 
steps  in https

Re: [slurm-users] Steps to upgrade slurm for a patchlevel change?

2023-09-29 Thread Ryan Novosielski
I’ll just say, we haven’t done an online/jobs running upgrade recently (in part 
because we know our database upgrade will take a long time, and we have some 
processes that rely on -M), but we have done it and it does work fine. So the 
paranoia isn’t necessary unless you know that, like us, the DB upgrade time is 
not tenable (Ole’s wiki has some great suggestions for how to test that, but 
they aren’t especially Slurm specific, it’s just a dry-run).

As far as the shared symlink thing goes, I think you’d be fine, dependent on 
whether or not you have anything else stored in the shared software tree, 
changing the symlink and just not restarting compute nodes’ slurmd until you’re 
ready — though again, you can do this while jobs are running, so there’s not 
really a reason to wait, except in cases like ours where it’s just easier to 
reboot the node than one process for running nodes, and then rebooting, and 
wanting to be sure that the rebooted compute node and the running upgraded node 
will operate exactly the same.

On Sep 29, 2023, at 10:10, Paul Edmon  wrote:

This is one of the reasons we stick with using RPM's rather than the symlink 
process. It's just cleaner and avoids the issue of having the install on shared 
storage that may get overwhelmed with traffic or suffer outages. Also the 
package manager automatically removes the previous versions and local installs 
stuff. I've never been a fan of the symlink method has it runs counter to the 
entire point and design of Linux and package managers which are supposed to do 
this heavy lifting for you.

Rant aside :). Generally for minor upgrades the process is less touchy. For our 
setup we follow the following process that works well for us, but does create 
an outage for the period of the upgrade.

1. Set all partitions to down: This makes sure no new jobs are scheduled.
2. Suspend all jobs: This makes sure jobs aren't running while we upgrade.
3. Stop slurmctld and slurmdbd.
4. Upgrade the slurmdbd. Restart slurmdbd
5. Upgrade the slurmd and slurmctld across the cluster.
6. Restart slurmd and slurmctld simultaneously using choria.
7. Unsuspend all jobs
8. Reopen all partitions.

For major upgrades we always take a mysqldump and backup the spool for the 
slurmctld before upgrading just in case something goes wrong. We've had this 
happen before when the slurmdbd upgrade cut out early (note, always run the 
slurmdbd and slurmctld upgrades in -D mode and not via systemctl as systemctl 
can timeout and kill the upgrade midway for large upgrades).

That said I've also skipped steps 1, 2, 7, and 8 before for minor upgrades and 
it works fine. The slurmd, slurmctld, and slurmdbd can all run on different 
versions so long as the slurmdbd > slurmctld > slurmd.  So if you want to do a 
live upgrade you can do it. However out paranoia we general stop everything. 
The entire process takes about an hour start to finish, with the longest part 
being the pausing of all the jobs.

-Paul Edmon-

On 9/29/2023 9:48 AM, Groner, Rob wrote:
I did already see the upgrade section of Jason's talk, but it wasn't much about 
the mechanics of the actual upgrade process, more of a big picture it seemed.  
It dealt a lot with different parts of slurm at different versions, which is 
something we don't have.

One little wrinkle here is that while, yes, we're using a symlink to point to 
what version of slurm is the current one...it's all on a shared filesystem.  
So, ALL nodes, slurmdb, slurmctld are using that same symlink.  There is no 
means to upgrade one component at a time.  That means to upgrade, EVERYTHING 
has to come down before it could come back up.  Jason's slides seemed to 
indicate that, if there were separate symlinks, then I could focus on just the 
slurmdb first and upgrade it...then focus on slurmctld and upgrade it, and then 
finally the nodes (take down their slurmd, upgrade the link, bring up slurmd).  
So maybe that's what I'm missing.

Otherwise, I think what I'm saying is that I see references to a "rolling 
upgrade", but I don't see any guide to a rolling upgrade.  I just see the 14 
steps  in https://slurm.schedmd.com/quickstart_admin.html#upgrade, and I guess 
I'd always thought of that as the full octane, high fat upgrade.  I've only 
ever done upgrades during one of our many scheduled downtimes, because the 
upgrades were always to a new major version, and because I'm a scared little 
chicken, so I figured there were maybe some smaller subset of steps if only 
upgrading a patchlevel change.  Smaller change, less risk, less precautionary 
steps...?  I'm seeing now that's not the case.

Thank you all for the suggestions!

Rob



From: slurm-users 

 on behalf of Ryan Novosielski 

Sent: Friday, September 29, 2023 2:48 AM
To: Slurm User Community List 

Subject: Re: [slurm-users] Steps to upgrade slurm for 

[slurm-users] docker containers and slurm

2023-09-29 Thread Jake Jellinek
Hi list

I have built a small cluster and have attached a few clients to it.
My clients can submit jobs so am confident that the service is setup 
sufficiently.

What I would like to do is to deploy the slurm client into a docker container. 
From within the docker container, I have setup munge and can successfully run 
'sinfo'.
scontrol ping states that the master node is down and any attempt to srun a 
bash (srun --pty bash -i) results in eventual failure.
When I run srun, the master node registers the job in the queue and even 
allocates (and launches) a new machine for it to run on.

Has anyone had any success running slurm clients within a dockerised 
environment?

I ran some tests and I think the problem I have is with the docker firewall. Do 
I need to configure docker to forward certain ports?
My plan is to deploy a graphical environment within each container and allow 
each user to have their own desktop. From there, they should be able to 
schedule jobs etc.
If I had to forward certain ports, I'm not clear how I could achieve this with 
multiple users

Thanks in advance
Jake



Re: [slurm-users] Steps to upgrade slurm for a patchlevel change?

2023-09-29 Thread Paul Edmon
This is one of the reasons we stick with using RPM's rather than the 
symlink process. It's just cleaner and avoids the issue of having the 
install on shared storage that may get overwhelmed with traffic or 
suffer outages. Also the package manager automatically removes the 
previous versions and local installs stuff. I've never been a fan of the 
symlink method has it runs counter to the entire point and design of 
Linux and package managers which are supposed to do this heavy lifting 
for you.



Rant aside :). Generally for minor upgrades the process is less touchy. 
For our setup we follow the following process that works well for us, 
but does create an outage for the period of the upgrade.



1. Set all partitions to down: This makes sure no new jobs are scheduled.

2. Suspend all jobs: This makes sure jobs aren't running while we upgrade.

3. Stop slurmctld and slurmdbd.

4. Upgrade the slurmdbd. Restart slurmdbd

5. Upgrade the slurmd and slurmctld across the cluster.

6. Restart slurmd and slurmctld simultaneously using choria.

7. Unsuspend all jobs

8. Reopen all partitions.


For major upgrades we always take a mysqldump and backup the spool for 
the slurmctld before upgrading just in case something goes wrong. We've 
had this happen before when the slurmdbd upgrade cut out early (note, 
always run the slurmdbd and slurmctld upgrades in -D mode and not via 
systemctl as systemctl can timeout and kill the upgrade midway for large 
upgrades).



That said I've also skipped steps 1, 2, 7, and 8 before for minor 
upgrades and it works fine. The slurmd, slurmctld, and slurmdbd can all 
run on different versions so long as the slurmdbd > slurmctld > slurmd.  
So if you want to do a live upgrade you can do it. However out paranoia 
we general stop everything. The entire process takes about an hour start 
to finish, with the longest part being the pausing of all the jobs.



-Paul Edmon-


On 9/29/2023 9:48 AM, Groner, Rob wrote:
I did already see the upgrade section of Jason's talk, but it wasn't 
much about the mechanics of the actual upgrade process, more of a big 
picture it seemed.  It dealt a lot with different parts of slurm at 
different versions, which is something we don't have.


One little wrinkle here is that while, yes, we're using a symlink to 
point to what version of slurm is the current one...it's all on a 
shared filesystem.  So, ALL nodes, slurmdb, slurmctld are using that 
same symlink.  There is no means to upgrade one component at a time.  
That means to upgrade, EVERYTHING has to come down before it could 
come back up.  Jason's slides seemed to indicate that, if there were 
separate symlinks, then I could focus on just the slurmdb first and 
upgrade it...then focus on slurmctld and upgrade it, and then finally 
the nodes (take down their slurmd, upgrade the link, bring up slurmd). 
So maybe that's what I'm missing.


Otherwise, I think what I'm saying is that I see references to a 
"rolling upgrade", but I don't see any guide to a rolling upgrade.  I 
just see the 14 steps  in 
https://slurm.schedmd.com/quickstart_admin.html#upgrade 
, and I guess 
I'd always thought of that as the full octane, high fat upgrade.  I've 
only ever done upgrades during one of our many scheduled downtimes, 
because the upgrades were always to a new major version, and because 
I'm a scared little chicken, so I figured there were maybe some 
smaller subset of steps if only upgrading a patchlevel change.  
Smaller change, less risk, less precautionary steps...?  I'm seeing 
now that's not the case.


Thank you all for the suggestions!

Rob



*From:* slurm-users  on behalf 
of Ryan Novosielski 

*Sent:* Friday, September 29, 2023 2:48 AM
*To:* Slurm User Community List 
*Subject:* Re: [slurm-users] Steps to upgrade slurm for a patchlevel 
change?



You don't often get email from novos...@rutgers.edu. Learn why this is 
important 



I started off writing there’s really no particular process for 
these/just do your changes and start the new software (be mindful of 
any PATH that might contain data that’s under your software tree, if 
you have that setup), and that you might need to watch the timeouts, 
but I figured I’d have a look at the upgrade guide to be sure.


There’s really nothing onerous in there. I’d personally back up my 
database and state save directories just because I’d rather be safe 
than sorry, or for if have to go backwards and want to be sure. You 
can run SlurmCtld for a good while with no database (note that -M on 
the command line will be broken during that time), just being mindful 
of the RAM on the SlurmCtld machine/don’t restart it before the DB is 
back up, and backing up our fairly large database doesn’t take all 
that long. Whether or not 5 is required mostly depends on how long you

Re: [slurm-users] Steps to upgrade slurm for a patchlevel change?

2023-09-29 Thread Groner, Rob
I did already see the upgrade section of Jason's talk, but it wasn't much about 
the mechanics of the actual upgrade process, more of a big picture it seemed.  
It dealt a lot with different parts of slurm at different versions, which is 
something we don't have.

One little wrinkle here is that while, yes, we're using a symlink to point to 
what version of slurm is the current one...it's all on a shared filesystem.  
So, ALL nodes, slurmdb, slurmctld are using that same symlink.  There is no 
means to upgrade one component at a time.  That means to upgrade, EVERYTHING 
has to come down before it could come back up.  Jason's slides seemed to 
indicate that, if there were separate symlinks, then I could focus on just the 
slurmdb first and upgrade it...then focus on slurmctld and upgrade it, and then 
finally the nodes (take down their slurmd, upgrade the link, bring up slurmd).  
So maybe that's what I'm missing.

Otherwise, I think what I'm saying is that I see references to a "rolling 
upgrade", but I don't see any guide to a rolling upgrade.  I just see the 14 
steps  in https://slurm.schedmd.com/quickstart_admin.html#upgrade, and I guess 
I'd always thought of that as the full octane, high fat upgrade.  I've only 
ever done upgrades during one of our many scheduled downtimes, because the 
upgrades were always to a new major version, and because I'm a scared little 
chicken, so I figured there were maybe some smaller subset of steps if only 
upgrading a patchlevel change.  Smaller change, less risk, less precautionary 
steps...?  I'm seeing now that's not the case.

Thank you all for the suggestions!

Rob



From: slurm-users  on behalf of Ryan 
Novosielski 
Sent: Friday, September 29, 2023 2:48 AM
To: Slurm User Community List 
Subject: Re: [slurm-users] Steps to upgrade slurm for a patchlevel change?

You don't often get email from novos...@rutgers.edu. Learn why this is 
important
I started off writing there’s really no particular process for these/just do 
your changes and start the new software (be mindful of any PATH that might 
contain data that’s under your software tree, if you have that setup), and that 
you might need to watch the timeouts, but I figured I’d have a look at the 
upgrade guide to be sure.

There’s really nothing onerous in there. I’d personally back up my database and 
state save directories just because I’d rather be safe than sorry, or for if 
have to go backwards and want to be sure. You can run SlurmCtld for a good 
while with no database (note that -M on the command line will be broken during 
that time), just being mindful of the RAM on the SlurmCtld machine/don’t 
restart it before the DB is back up, and backing up our fairly large database 
doesn’t take all that long. Whether or not 5 is required mostly depends on how 
long you think it will take you to do 6-11 (which could really take you seconds 
if your process is really as simple as stop, change symlink, start), 12 you’re 
going to do no matter what, 13 you don’t need if you skipped 5, and 14 is up to 
you. So practically, that’s what you’re going to do anyway.

We just did an upgrade last week, and the only difference is that our compute 
nodes are stateless, so the compute node upgrades were a reboot (we could 
upgrade them running, but we did it during a maintenance period anyway, so 
why?).

If you want to do this with running jobs, I’d definitely back up the state save 
directory, but as long as you watch the timeouts, it’s pretty uneventful. You 
won’t have that long database upgrade period, since no database modifications 
will be required, so it’s pretty much like upgrading anything else.

--
#BlackLivesMatter

|| \\UTGERS, |---*O*---
||_// the State  | Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\of NJ  | Office of Advanced Research Computing - MSB A555B, Newark
 `'

On Sep 28, 2023, at 11:58, Groner, Rob  wrote:


There's 14 steps to upgrading slurm listed on their website, including shutting 
down and backing up the database.  So far we've only updated slurm during a 
downtime, and it's been a major version change, so we've taken all the steps 
indicated.

We now want to upgrade from 23.02.4 to 23.02.5.

Our slurm builds end up in version named directories, and we tell production 
which one to use via symlink.  Changing the symlink will automatically change 
it on our slurm controller node and all slurmd nodes.

Is there an expedited, simple, slimmed down upgrade path to follow if we're 
looking at just a . level upgrade?

Rob



Re: [slurm-users] enabling job script archival

2023-09-29 Thread Davide DelVento
Fantastic, this is really helpful, thanks!

On Thu, Sep 28, 2023 at 12:05 PM Paul Edmon  wrote:

> Yes it was later than that. If you are 23.02 you are good.  We've been
> running with storing job_scripts on for years at this point and that part
> of the database only uses up 8.4G.  Our entire database takes up 29G on
> disk. So its about 1/3 of the database.  We also have database compression
> which helps with the on disk size. Raw uncompressed our database is about
> 90G.  We keep 6 months of data in our active database.
>
> -Paul Edmon-
> On 9/28/2023 1:57 PM, Ryan Novosielski wrote:
>
> Sorry for the duplicate e-mail in a short time: do you know (or anyone)
> when the hashing was added? Was planning to enable this on 21.08, but we
> then had to delay our upgrade to it. I’m assuming later than that, as I
> believe that’s when the feature was added.
>
> On Sep 28, 2023, at 13:55, Ryan Novosielski 
>  wrote:
>
> Thank you; we’ll put in a feature request for improvements in that area,
> and also thanks for the warning? I thought of that in passing, but the real
> world experience is really useful. I could easily see wanting that stuff to
> be retained less often than the main records, which is what I’d ask for.
>
> I assume that archiving, in general, would also remove this stuff, since
> old jobs themselves will be removed?
>
> --
> #BlackLivesMatter
> 
> || \\UTGERS, |---*O*---
> ||_// the State  | Ryan Novosielski - novos...@rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> ||  \\of NJ  | Office of Advanced Research Computing - MSB
> A555B, Newark
>  `'
>
> On Sep 28, 2023, at 13:48, Paul Edmon 
>  wrote:
>
> Slurm should take care of it when you add it.
>
> So far as horror stories, under previous versions our database size
> ballooned to be so massive that it actually prevented us from upgrading and
> we had to drop the columns containing the job_script and job_env.  This was
> back before slurm started hashing the scripts so that it would only store
> one copy of duplicate scripts.  After this point we found that the
> job_script database stayed at a fairly reasonable size as most users use
> functionally the same script each time. However the job_env continued to
> grow like crazy as there are variables in our environment that change
> fairly consistently depending on where the user is. Thus job_envs ended up
> being too massive to keep around and so we had to drop them. Frankly we
> never really used them for debugging. The job_scripts though are super
> useful and not that much overhead.
>
> In summary my recommendation is to only store job_scripts. job_envs add
> too much storage for little gain, unless your job_envs are basically the
> same for each user in each location.
>
> Also it should be noted that there is no way to prune out job_scripts or
> job_envs right now. So the only way to get rid of them if they get large is
> to 0 out the column in the table. You can ask SchedMD for the mysql command
> to do this as we had to do it here to our job_envs.
>
> -Paul Edmon-
>
> On 9/28/2023 1:40 PM, Davide DelVento wrote:
>
> In my current slurm installation, (recently upgraded to slurm v23.02.3), I
> only have
>
> AccountingStoreFlags=job_comment
>
> I now intend to add both
>
> AccountingStoreFlags=job_script
> AccountingStoreFlags=job_env
>
> leaving the default 4MB value for max_script_size
>
> Do I need to do anything on the DB myself, or will slurm take care of the
> additional tables if needed?
>
> Any comments/suggestions/gotcha/pitfalls/horror_stories to share? I know
> about the additional diskspace and potentially load needed, and with our
> resources and typical workload I should be okay with that.
>
> Thanks!
>
>
>
>
>