Re: [slurm-users] [External] Re: actual time of start (or finish) of a job

2023-02-20 Thread Florian Zillner
Hi,

note that times reported by sacct may differ from the net times. For example, 
imagine a test job like this:
date
sleep 1m
date

sacct reports:
$ sacct -j 225145 -X -o jobid,start,end
JobID  Start End
 --- ---
225145   2023-02-20T17:31:12 2023-02-20T17:32:30

Whereas:
cat *out.225145
Mon Feb 20 17:31:29 CET 2023
Mon Feb 20 17:32:29 CET 2023

Sometimes these extra few seconds matter. So if you're looking for net 
runtimes, I'd suggest to ask the user to include a date command here and there 
in the submit script.

Cheers,
Florian

From: slurm-users  on behalf of Davide 
DelVento 
Sent: Thursday, 16 February 2023 01:40
To: Slurm User Community List 
Subject: [External] Re: [slurm-users] actual time of start (or finish) of a job

Thanks, that's exactly it.
I naively assumed that the '-l" in sacct provided "everything" (given
how long and unwieldy it is, but I noticed now that it isn't).
Sorry for the noise!

On Wed, Feb 15, 2023 at 5:32 PM Joseph Francisco Guzman
 wrote:
>
> Hi Davide,
>
> I would use the Start and End fields with the sacct command. Something like 
> this: "sacct -j jobid1,jobid2 -X -P -o jobid,start,end".
>
> Were you able to take a look at the sacct manual page outlines what all of 
> the different fields mean? Here's a link to the web version: 
> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fsacct.html=05%7C01%7Cfzillner%40lenovo.com%7C15763a6fc38f499923ff08db0fb6a408%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638121049348286424%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=unxmMQ7DKiu5WGEiAZspSKoHssaaJy3zPvqVzOYDNZM%3D=0.
>
> Best,
>
> Joseph
>
> --
> Joseph F. Guzman - ITS (Advanced Research Computing)
>
> Northern Arizona University
>
> joseph.f.guz...@nau.edu
>
> 
> From: slurm-users  on behalf of Davide 
> DelVento 
> Sent: Wednesday, February 15, 2023 5:18 PM
> To: slurm-us...@schedmd.com 
> Subject: [slurm-users] actual time of start (or finish) of a job
>
> I have a user who needs to find the actual start (or finish) time of a
> number of jobs.
> With the elapsed field of sacct start or finish become equivalent for
> his search.
>
> I see that information in /var/log/slurm/slurmctld.log so Slurm should
> have it, however in sacct itself that information does not seem to
> exist, and with all the queries we tried Google always thinks we are
> looking for something else and never returns an actual answer.
>
> If this was a one-off I could do it for him, but he needs to script it
> for his reasons and I don't want to run his script as root nor give
> him access to the log files forever.
>
> Is there a way to find this information?
>
> Thanks
>



[slurm-users] xcpuinfo_abs_to_mac: failed // cgroups v1 problem

2023-02-09 Thread Florian Zillner
Hi,

I'm experiencing a strange issue related to a CPU swap (8352Y -> 6326) on two 
of our nodes. I adapted the slurm.conf to accommodate the new CPU:
slurm.conf: NodeName=ice27[57-58] CPUs=64 Sockets=2 CoresPerSocket=16 
ThreadsPerCore=2 Realmemory=257550 MemSpecLimit=12000
which is also what slurmd -C autodetects: NodeName=ice2758 CPUs=64 Boards=1 
SocketsPerBoard=2 CoresPerSocket=16 ThreadsPerCore=2 RealMemory=257578

Slurm 22.05.7 (compiled from source)
Kernel: 4.18.0-372.32.1.el8_6.x86_64
OS: Rocky Linux release 8.6 (Green Obsidian)

All nodes boot the same OS image (PXE) and therefore have the same SW.

When I try to run a simple single node job (exclusive) on ice2758, the job 
immediately fails and the nodes is drained with "batch job complete failure". 
From the nodes slurmd.log:

# grep 224313 slurmd.ice2758.log | grep -v debug
[2023-02-08T18:09:35.026] Launching batch job 224313 for UID 1234502026
[2023-02-08T18:09:35.037] [224313.batch] task/affinity: init: task affinity 
plugin loaded with CPU mask 0x
[2023-02-08T18:09:35.037] [224313.batch] cred/mCPUs=64unge: init: Munge 
credential signature plugin loaded
[2023-02-08T18:09:35.049] [224313.batch] error: xcpuinfo_abs_to_mac: failed
[2023-02-08T18:09:35.049] [224313.batch] error: unable to build job physical 
cores
[2023-02-08T18:09:35.050] [224313.batch] task/cgroup: _memcg_initialize: job: 
alloc=245571MB mem.limit=245571MB memsw.limit=unlimited
[2023-02-08T18:09:35.050] [224313.batch] task/cgroup: _memcg_initialize: step: 
alloc=245571MB mem.limit=245571MB memsw.limit=unlimited
[2023-02-08T18:09:35.061] [224313.batch] starting 1 tasks
[2023-02-08T18:09:35.061] [224313.batch] task 0 (20552) started 
2023-02-08T18:09:35
[2023-02-08T18:09:35.062] [224313.batch] error: common_file_write_uint32s: 
write pid 20552 to 
/sys/fs/cgroup/cpuset/slurm/uid_1234502026/job_224313/step_batch/cgroup.procs 
failed: No space left on device
[2023-02-08T18:09:35.062] [224313.batch] error: unable to add pids to 
'/sys/fs/cgroup/cpuset/slurm/uid_1234502026/job_224313/step_batch'
[2023-02-08T18:09:35.062] [224313.batch] error: task_g_pre_set_affinity: No 
space left on device
[2023-02-08T18:09:35.062] [224313.batch] error: 
_exec_wait_child_wait_for_parent: failed: Resource temporarily unavailable
[2023-02-08T18:09:36.065] [224313.batch] error: job_manager: exiting 
abnormally: Slurmd could not execve job
[2023-02-08T18:09:36.065] [224313.batch] job 224313 completed with slurm_rc = 
4020, job_rc = 0
[2023-02-08T18:09:36.068] [224313.batch] done with job

There is plenty of space (= memory, bc PXE boot) available. lscgroup and cat 
/proc/cgroups shows far less than 1000 cgroups.

I then compared this to other nodes and what they are reporting when it comes 
to cgroups during job launch:
# grep -i "job abstract" slurmd*log | grep 2023-02-08
slurmd.banner2401.log:[2023-02-08T18:20:34.102] [224315.batch] debug:  
task/cgroup: task_cgroup_cpuset_create: job abstract cores are '0-31'
   slurmd.ice2701.log:[2023-02-08T18:27:05.391] [224319.batch] debug:  
task/cgroup: task_cgroup_cpuset_create: job abstract cores are '0-71'
   slurmd.ice2758.log:[2023-02-08T18:09:35.049] [224313.batch] debug:  
task/cgroup: task_cgroup_cpuset_create: job abstract cores are '0-63'

# psh banner2401,ice2701,ice2758 slurmd -C | grep -vi uptime
banner2401: NodeName=banner2401 CPUs=64 Boards=1 SocketsPerBoard=2 
CoresPerSocket=16 ThreadsPerCore=2 RealMemory=193090
   ice2701: NodeName=ice2701   CPUs=144 Boards=1 SocketsPerBoard=2 
CoresPerSocket=36 ThreadsPerCore=2 RealMemory=257552
   ice2758: NodeName=ice2758CPUs=64 Boards=1 SocketsPerBoard=2 
CoresPerSocket=16 ThreadsPerCore=2 RealMemory=257578

To me, it looks like slurmd -C is correctly detecting the CPUs, but when it 
comes to cgroups, the plugin somehow addresses all cores, even the HT ones, 
whereas on the other two nodes shown, the cgroups plugin is only addressing 
half, the real cores, of the node. A reboot does not fix this problem. We're 
happy with how slurm works for all the other nodes, just the two which had 
their CPUs changed are behaving differently. What am I missing here?

Cheers,
Florian



Re: [slurm-users] [External] Hibernating a whole cluster

2023-02-06 Thread Florian Zillner
Hi,

follow this guide: https://slurm.schedmd.com/power_save.html

Create poweroff / poweron scripts and configure slurm to do the poweroff after 
X minutes. Works well for us. Make sure to set an appropriate time 
(ResumeTimeout) to allow the node to come back to service.
Note that we did not achieve good power saving with suspending the nodes, 
powering them off and on saves way more power. The downside is it takes ~ 5 
mins to resume (= power on) the nodes when needed.

Cheers,
Florian

From: slurm-users  on behalf of Analabha 
Roy 
Sent: Monday, 6 February 2023 18:21
To: slurm-users@lists.schedmd.com 
Subject: [External] [slurm-users] Hibernating a whole cluster

Hi,

I've just finished  setup of a single node "cluster" with slurm on ubuntu 
20.04. Infrastructural limitations  prevent me from running it 24/7, and it's 
only powered on during business hours.


Currently, I have a cron job running that hibernates that sole node before 
closing time.

The hibernation is done with standard systemd, and hibernates to the swap 
partition.

 I have not run any lengthy slurm jobs on it yet. Before I do, can I get some 
thoughts on a couple of things?

If it hibernated when slurm still had jobs running/queued, would they resume 
properly when the machine powers back on?

Note that my swap space is bigger than my  RAM.

Is it necessary to perhaps setup a pre-hibernate script for systemd to  iterate 
scontrol to suspend all the jobs before hibernating and resume them post-resume?

What about the wall times? I'm uessing that slurm will count the downtime as 
elapsed for each job. Is there a way to config this, or is the only alternative 
a post-hibernate script that iteratively updates the wall times of the running 
jobs using scontrol again?

Thanks for your attention.
Regards
AR


Re: [slurm-users] [External] Re: Request nodes with a custom resource?

2023-02-06 Thread Florian Zillner
+1 for features.
Features can also be added / changed during runtime like this "scontrol update 
Node=$(hostname -s) AvailableFeatures=$FEAT ActiveFeatures=$FEAT"

Cheers,
Florian


From: slurm-users on behalf of Ward Poelmans
Sent: Monday, February 6, 2023 09:03
To: slurm-users@lists.schedmd.com
Subject: [External] Re: [slurm-users] Request nodes with a custom resource?

Hi Xaver,

On 6/02/2023 08:39, Xaver Stiensmeier wrote:
>
> How would you schedule a job (let's say using srun) to work on these
> nodes? Of course this would be interesting in a dynamic case, too
> (assuming that the database is downloaded to nodes during job
> execution), but for now I would be happy with a static solution (where
> it is okay to store a value before that says something like
> "hasDatabase=1" on nodes.

You could use features for that? A feature can be assigned to nodes and you can 
request nodes with specific feature for a job:

In slurm.conf:
NodeName=node001 CPUs=1 Feature=hasdatabase ...

And for your jobs:
sbatch --constraint=hasdatabase ...


Ward


Re: [slurm-users] [External] How can I do to prevent a specific job from being prempted?

2021-09-14 Thread Florian Zillner
See the no-requeue option for SBATCH:

--no-requeue
Specifies that the batch job should never be requeued under any circumstances. 
Setting this option will prevent system administrators from being able to 
restart the job (for example, after a scheduled downtime), recover from a node 
failure, or be requeued upon preemption by a higher priority job. When a job is 
requeued, the batch script is initiated from its beginning. Also see the 
--requeue option. The JobRequeue configuration parameter controls the default 
behavior on the cluster.

https://slurm.schedmd.com/sbatch.html

Get Outlook for Android

From: slurm-users  on behalf of 顏文 

Sent: Tuesday, September 14, 2021 7:02:13 AM
To: slurm-users@lists.schedmd.com 
Subject: [External] [slurm-users] How can I do to prevent a specific job from 
being prempted?

Dear slurm users,

I have some specific jobs that can't be terminated, otherwise they need to be 
rerun from the beginning. Can we simply apply some settings (either by user or 
administrator) so that these jobs will not be preempted ? Thanks.

with regards,
Peter




Re: [slurm-users] [External] Node utilization for 24 hours

2021-09-07 Thread Florian Zillner
Hi,

you can run sreport like this:
sreport cluster AccountUtilizationByUser Start=$(date -d "last month" +%D) 
End=$(date -d "this month" +%D)
or
sreport cluster Utilization Start=$(date -d "last month" +%D) End=$(date -d 
"this month" +%D)

and script something around it, to show what you're looking for.

Cheers,
Florian

From: slurm-users  on behalf of Hemanta 
Sahu 
Sent: Tuesday, 7 September 2021 09:57
To: slurm-users@lists.schedmd.com 
Subject: [External] [slurm-users] Node utilization for 24 hours

Hello ,

  Is there any commands or script available to see particular Node utilization 
percentage in terms of  (allocated CPU CoreMins/available CPU Coremins) for 
certain time period(lets say for last 1 week).

 Appreciate any help in this regard



Thanks
Hemanta


Re: [slurm-users] [External] Sinfo or squeue stuck for some seconds

2021-08-30 Thread Florian Zillner
Hi Navin,

could it be that you're using LDAP/AD/NIS for user management? If so, check if 
the LDAP servers response is slow or gets slowed down when retrieving hundreds 
or thousands of users.

Also CacheGroups=1 was last supported in V15.08.

Best,
Florian

From: slurm-users  on behalf of navin 
srivastava 
Sent: Sunday, 29 August 2021 16:53
To: Slurm User Community List 
Subject: [External] [slurm-users] Sinfo or squeue stuck for some seconds


Dear slurm community users,

We are  using slurm version 20.02.x.

We see the below message appearing a lot of times in slurmctld log and found 
that whenever this message is appearing the sinfo/squeue out gets slow.
No timeout as i kept the value 100.

Warning: Note very large processing time from load_part_uid_allow_list: 
usec=10800885 began=16:27:55.952
[2021-08-29T16:28:06.753] Warning: Note very large processing time from 
_slurmctld_background: usec=10801120 began=16:27:55.952

Is this a bug or some config issue. if anybody faced the similar issue.could 
anybody throw some light on this.

please find the attached slurm.conf.

Regards
Navin.




Re: [slurm-users] [External] jobs stuck in "CG" state

2021-08-20 Thread Florian Zillner
Hi,

scancel the job, then set the nodes to a "down" state like so "scontrol update 
nodename= state=down reason=cg" and resume them afterwards.
However, if there are tasks stuck, then in most cases a reboot is needed to 
bring the node back with in a clean state.

Best,
Florian

From: slurm-users  on behalf of Durai 
Arasan 
Sent: Friday, 20 August 2021 10:31
To: Slurm User Community List 
Subject: [External][slurm-users] jobs stuck in "CG" state

Hello!

We have a huge number of jobs stuck in CG state from a user who probably wrote 
code with bad I/O. "scancel" does not make them go away. Is there a way for 
admins to get rid of these jobs without draining and rebooting the nodes. I 
read somewhere that killing the respective slurmstepd process will do the job. 
Is this possible? Any other solutions? Also are there any parameters in 
slurm.conf one can set to manage such situations better?

Best,
Durai
MPI Tübingen


Re: [slurm-users] [External] Re: Testing Lua job submit plugins

2021-05-06 Thread Florian Zillner
I've used that approach too.
If the submitting user ID is mine, then do this or that, all other's take the 
else clause. That way, you can actually run on the production system without 
having to replicate the whole environment in a sandbox. Certainly not the 
cleanest approach, but it doesn't hurt others.

And to answer you question Mike, the job_submit.lua file is read every time 
it's executed, so you can edit it on the fly.

Best,
Florian


From: slurm-users  on behalf of Renfro, 
Michael 
Sent: Thursday, 6 May 2021 19:23
To: Slurm User Community List 
Subject: [External] Re: [slurm-users] Testing Lua job submit plugins

I’ve used the structure at 
https://gist.github.com/mikerenfro/92d70562f9bb3f721ad1b221a1356de5 to handle 
basic test/production branching. I can isolate the new behavior down to just a 
specific set of UIDs that way.

Factoring out code into separate functions helps, too.

I’ve seen others go so far as to put the functions into separate files, but I 
haven’t needed that yet.

On May 6, 2021, at 12:11 PM, Michael Robbert  wrote:



External Email Warning

This email originated from outside the university. Please use caution when 
opening attachments, clicking links, or responding to requests.



I’m wondering if others in the Slurm community have any tips or best practices 
for the development and testing of Lua job submit plugins. Is there anything 
that can be done prior to deployment on a production cluster that will help to 
ensure the code is going to do what you think it does or at the very least not 
prevent any jobs from being submitted? I realize that any configuration change 
in slurm.conf could break everything, but I feel like adding Lua code adds 
enough complexity that I’m a little more hesitant to just throw it in. Any way 
to run some kind of linting or sanity tests on the Lua script? Additionally, 
does the script get read in one time at startup or reconfig or can it be 
changed on the fly just by editing the file?

Maybe a separate issue, but does anybody have an recipes to build a local test 
cluster in Docker that could be used to test this? I was working on one, but 
broke my local Docker install and thought I’d send this note out while I was 
working on rebuilding it.



Thanks in advance,

Mike Robbert


Re: [slurm-users] [External] Slurm Configuration assistance: Unable to use srun after installation (slurm on fedora 33)

2021-04-19 Thread Florian Zillner
Hi Johnsy,

  1.  Do you have an active support contract with SchedMD? AFAIK they only 
offer paid support.
  2.  The error message is pretty straight forward, slurmctld is not running. 
Did you start it (systemctl start slurmctld)?
  3.  slurmd needs to run on the node(s) you want to run on as well, and as I'm 
guessing you are using localhost for the controller and want to run jobs on 
localhost, so slurmctld and slurmd need to be running on localhost.
  4.  Is munge running?
  5.  May I ask why you're chown-ing pid and logfiles? The slurm user 
(typically "slurm") needs to have access to those files. Munge for instance 
checks for ownership and complains if something is not correct.
  6.  "srun /proc/cpuinfo" will fail, even if slurmctld and slurmd are running, 
because /proc/cpuinfo is not an executable file. You may want to insert "cat" 
after srun. Another simple test would be "srun hostname"

And, just my personal opinion, if this is your first experiment with Slurm, I 
wouldn't change too much right from the beginning but instead get it working 
first and then change things to your needs. Slurm is also available in the EPEL 
repos, so you could install it using dnf and experiment with the packaged 
version.

Hope this helps,
Florian



From: slurm-users  on behalf of Johnsy 
K. John 
Sent: Monday, 19 April 2021 01:43
To: sa...@schedmd.com ; johnsy john ; 
slurm-us...@schedmd.com 
Subject: [External] [slurm-users] Slurm Configuration assistance: Unable to use 
srun after installation (slurm on fedora 33)

Hello SchedMD team,

I would like to use your slurm workload manager for learning purposes.
And I tried installing the the software (downloaded from: 
https://www.schedmd.com/downloads.php ) and followed the steps as mentioned in:

https://slurm.schedmd.com/download.html
https://slurm.schedmd.com/quickstart_admin.html

My Linux OS is fedora 33, and i tried installing it as root login.
After installation and configuration as mentioned in page: 
https://slurm.schedmd.com/quickstart_admin.html
I got some errors when I tried to do srun.
Details about the installation and use are as follows:

Using root permissions, copied to: /root/installations/

cd /root/installations/

tar --bzip -x -f slurm-20.11.5.tar.bz2

cd slurm-20.11.5/

./configure --enable-debug --prefix=/usr/local --sysconfdir=/usr/local/etc

make
make install

Following steps are based on 
https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration

mkdir /var/spool/slurmctld /var/log/slurm
chown johnsy /var/spool/slurmctld
chown johnsy /var/log/slurm
chmod 755 /var/spool/slurmctld /var/log/slurm

 cp /var/run/slurmctld.pid /var/run/slurmd.pid

touch /var/log/slurm/slurmctld.log
chown johnsy /var/log/slurm/slurmctld.log

touch /var/log/slurm/slurm_jobacct.log /var/log/slurm/slurm_jobcomp.log
chown johnsy /var/log/slurm/slurm_jobacct.log /var/log/slurm/slurm_jobcomp.log

ldconfig -n /usr/lib64

Now when I tried an example command for trial:

srun /proc/cpuinfo

I get the following error:

srun: error: Unable to allocate resources: Unable to contact slurm controller 
(connect failure)


My configuration file slurm.conf f that i created is:
##
##
# slurm.conf file generated by configurator.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
SlurmctldHost=homepc
#SlurmctldHost=
#
#DisableRootJobs=NO
#EnforcePartLimits=NO
#Epilog=
#EpilogSlurmctld=
#FirstJobId=1
#MaxJobId=99
#GresTypes=
#GroupUpdateForce=0
#GroupUpdateTime=600
#JobFileAppend=0
#JobRequeue=1
#JobSubmitPlugins=1
#KillOnBadExit=0
#LaunchType=launch/slurm
#Licenses=foo*4,bar
#MailProg=/bin/mail
#MaxJobCount=5000
#MaxStepCount=4
#MaxTasksPerNode=128
MpiDefault=none
#MpiParams=ports=#-#
#PluginDir=
#PlugStackConfig=
#PrivateData=jobs
ProctrackType=proctrack/cgroup
#Prolog=
#PrologFlags=
#PrologSlurmctld=
#PropagatePrioProcess=0
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
#RebootProgram=
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=johnsy
#SlurmdUser=root
#SrunEpilog=
#SrunProlog=
StateSaveLocation=/var/spool
SwitchType=switch/none
#TaskEpilog=
TaskPlugin=task/affinity
#TaskProlog=
#TopologyPlugin=topology/tree
#TmpFS=/tmp
#TrackWCKey=no
#TreeWidth=
#UnkillableStepProgram=
#UsePAM=0
#
#
# TIMERS
#BatchStartTimeout=10
#CompleteWait=0
#EpilogMsgTime=2000
#GetEnvTimeout=2
#HealthCheckInterval=0
#HealthCheckProgram=
InactiveLimit=0
KillWait=30
#MessageTimeout=10
#ResvOverRun=0
MinJobAge=300
#OverTimeLimit=0
SlurmctldTimeout=120
SlurmdTimeout=300
#UnkillableStepTimeout=60
#VSizeFactor=0
Waittime=0
#
#
# SCHEDULING
#DefMemPerCPU=0
#MaxMemPerCPU=0
#SchedulerTimeSlice=30

Re: [slurm-users] [External] Autoset job TimeLimit to fit in a reservation

2021-03-29 Thread Florian Zillner
Hi,

well, I think you're putting the cart before the horse, but anyway, you could 
write a script that extracts the next reservation and does some simple math to 
display the time in hours or else to the user. It's the users job to set the 
time their job needs to finish. Auto-squeezing a job that takes 2 days to 
complete into a remaining 2 hour window until the reservation starts doesn't 
make any sense to me.

# NEXTRES=$(scontrol show res | head -n1 | awk '{print $2}' | cut -f2 -d= | 
xargs -I {} date +%s --date "{}" )
# NOW=$(date +%s)
# echo "$(((NEXTRES - NOW) / 3600)) hours left until reservation begins"
178 hours left until reservation begins

Cheers,
Florian



From: slurm-users  on behalf of Jeremy 
Fix 
Sent: Monday, 29 March 2021 10:48
To: slurm-users@lists.schedmd.com 
Subject: [External] [slurm-users] Autoset job TimeLimit to fit in a reservation

Hi,

I'm wondering if there is any built-in option to autoset a job TimeLimit
to fit within a defined reservation.

For now, it seems to me that the timelimit must be explicitely provided,
in a agreement with the deadline of the reservation, by a user when
invoking the srun or sbatch command while I would find it comfortable to
let slurm calculate the remaining time of the reservation on which
srun/sbatch is executed and fill in the TimeLimit accordingly;

Best;

Jeremy.




Re: [slurm-users] [External] Re: Cluster nodes on multiple cluster networks

2021-01-23 Thread Florian Zillner
Chiming in on Michael's suggestion.

You can specify the same hostname in the slurm.conf but for the on-premise 
nodes you either set the DNS or the /etc/hosts entry to the local (=private) IP 
address.
For the cloud nodes you set DNS or the hosts entry to the publicly reachable IP.

example /etc/hosts on-prem node:
10.10.10.10   myslurmcontroller

example /etc/hosts cloud node:
50.50.50.50   myslurmcontroller

example slurm.conf for both locations
ControlMachine=myslurmcontroller

From: slurm-users  on behalf of Sajesh 
Singh 
Sent: Friday, 22 January 2021 22:17
To: Slurm User Community List 
Subject: [External] Re: [slurm-users] Cluster nodes on multiple cluster networks


Thank you for the recommendation. Will try that out. Unfortunately the on-prem 
nodes cannot reach the head node via the public IP



-Sajesh-



From: slurm-users  On Behalf Of Michael 
Gutteridge
Sent: Friday, January 22, 2021 3:18 PM
To: Slurm User Community List 
Subject: Re: [slurm-users] Cluster nodes on multiple cluster networks



EXTERNAL SENDER



I don't believe the IP address is required- if you can configure a DNS/hosts 
entry differently for cloud nodes you can set:



   SlurmCtldhost = controllername



Then have "controllername" resolve to the private IP for the controller for the 
on-prem cluster, the public IP for the nodes in the cloud.  Theoretically 
anyway- I haven't run a config like that and I'm not sure how the controller 
will react to such a configuration (i.e. getting slurm traffic on both 
interfaces).



If the on-prem nodes can reach the public IP address of the controller it may 
be simpler to use only the public IP for the controller, but I don't know how 
your routing is set up.



HTH



 - Michael







On Fri, Jan 22, 2021 at 11:26 AM Sajesh Singh 
mailto:ssi...@amnh.org>> wrote:

How would I deal with the address of the head node defined in the slurm.conf as 
I have it defined as



SlurmctldHost=private-hostname(private.ip.addr)



The private.ip.addr address is not reachable from the cloud nodes



-Sajesh-



From: slurm-users 
mailto:slurm-users-boun...@lists.schedmd.com>>
 On Behalf Of Brian Andrus
Sent: Friday, January 22, 2021 1:45 PM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] Cluster nodes on multiple cluster networks



EXTERNAL SENDER



You would need to have a direct connect/vpn so the cloud nodes can connect to 
your head node.

Brian Andrus

On 1/22/2021 10:37 AM, Sajesh Singh wrote:

We are looking at rolling out cloud bursting to our on-prem Slurm cluster and I 
am wondering how to deal with the slurm.conf variable SlurmctldHost. It is 
currently configured with the private cluster network address that the on-prem 
nodes use to contact it. The nodes in the cloud would contact the head node via 
its public IP address. How can I configure Slurm so that both IPs are 
recognized as the head node?





-Sajesh-




Re: [slurm-users] [External] [slurm 20.02.3] don't suspend nodes in down state

2020-08-26 Thread Florian Zillner
Hi Herbert,

just like Angelos described, we also have logic in our poweroff script that 
checks if the node is really IDLE and only sends the poweroff command if that's 
the case.

Excerpt:
hosts=$(scontrol show hostnames $1)
for host in $hosts; do
scontrol show node $host | tr ' ' '\n' | grep -q 'State=IDLE+POWER$'
if [[ $? == 1 ]]; then
echo "node $host NOT IDLE" >>$OUTFILE
continue
else
echo "node $host IDLE" >>$OUTFILE
fi
ssh $host poweroff
...
sleep 1
...
done

Best,
Florian


From: slurm-users  on behalf of 
Steininger, Herbert 
Sent: Monday, 24 August 2020 10:52
To: Slurm User Community List 
Subject: [External] [slurm-users] [slurm 20.02.3] don't suspend nodes in down 
state

Hi,

how can I prevent slurm, to suspend nodes, which I have set to down state for 
maintenance?
I know about "SuspendExcNodes", but this doesn't seem the right way, to roll 
out the slurm.conf every time this changes.
Is there a state that I can set so that the nodes doesn't get suspended?

It happened a few times that I was doing some stuff on a server and after our 
idle time (1h) slurm decided to suspend the node.

TIA,
Herbert

--
Herbert Steininger
Leiter EDV & HPC
Administrator
Max-Planck-Institut für Psychiatrie
Kraepelinstr.  2-10
80804 München
Tel  +49 (0)89 / 30622-368
Mail   herbert_steinin...@psych.mpg.de
Web  https://www.psych.mpg.de





Re: [slurm-users] [External] How to exclude nodes in sbatch/srun?

2020-06-22 Thread Florian Zillner
Durai,

To overcome this, we use noXXX features like below. Users can then request 
“8268” to select nodes with 8268s on EDR without GPUs for example.

# scontrol show node node5000 |grep AvailableFeatures
   
AvailableFeatures=192GB,2933MHz,SD530,Platinum,8268,rack25,EDR,sb7890_0416,enc2514,24C,SNCoff,noGPU,DISK,SSD

Cheers,
Florian


From: slurm-users  On Behalf Of Durai 
Arasan
Sent: Montag, 22. Juni 2020 11:02
To: Slurm User Community List 
Subject: [External] [slurm-users] How to exclude nodes in sbatch/srun?

Hi,

The sbatch/srun commands have the "--constraint" option to select nodes with 
certain features. With this you can specify AND, OR, matching OR operators. But 
there is no NOT operator. How do you exclude nodes with a certain feature in 
the "--constraint" option? Or is there another option that can do it?

Thanks,
Durai Arasan
Zentrum für Datenverarbeitung
Tübingen



Re: [slurm-users] [External] How to detect Job submission by srun / interactive jobs

2020-05-18 Thread Florian Zillner
Hi Stephan,

From the slurm.conf docs:
---
BatchFlag
Jobs submitted using the sbatch command have BatchFlag set to 1. Jobs submitted 
using other commands have BatchFlag set to 0.
---
You can look that up e.g. with scontrol show job . I haven't checked 
though how to access that via lua. If you know, let me know, I'd be interested 
as well.

Example:
# scontrol show job 128922
JobId=128922 JobName=sleep
   ...
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   RunTime=00:00:54 TimeLimit=00:30:00 TimeMin=N/A

Cheers,
Florian

-Original Message-
From: slurm-users  On Behalf Of Stephan 
Roth
Sent: Montag, 18. Mai 2020 16:04
To: slurm-users@lists.schedmd.com
Subject: [External] [slurm-users] How to detect Job submission by srun / 
interactive jobs

Dear all,

Does anybody know of a way to detect whether a job is submitted with 
srun, preferrably in job_submit.lua?

The goal is to allow interactive jobs only on specific partitions.

Any recommendation or best practice on how to handle interactive jobs is 
welcome.

Thank you,
Stephan



Re: [slurm-users] [External] Re: Node suspend / Power saving - for *idle* nodes only?

2020-05-15 Thread Florian Zillner
FWIW this is a known bug: https://bugs.schedmd.com/show_bug.cgi?id=5348
5348  Suspending Nodes which are not in IDLE 
mode<https://bugs.schedmd.com/show_bug.cgi?id=5348>
SchedMD - Slurm development and support. Providing support for some of the 
largest clusters in the world.
bugs.schedmd.com



From: slurm-users  on behalf of Florian 
Zillner 
Sent: Thursday, 14 May 2020 15:43
To: Slurm User Community List 
Subject: Re: [slurm-users] [External] Re: Node suspend / Power saving - for 
*idle* nodes only?

Well, the documentation is rather clear on this: "SuspendTime: Nodes becomes 
eligible for power saving mode after being idle or down for this number of 
seconds."
A drained node is neither idle nor down in my mind.

Thanks,
Florian


From: slurm-users  on behalf of Steffen 
Grunewald 
Sent: Thursday, 14 May 2020 15:34
To: Slurm User Community List 
Subject: [External] Re: [slurm-users] Node suspend / Power saving - for *idle* 
nodes only?

On Thu, 2020-05-14 at 13:10:04 +, Florian Zillner wrote:
> Hi,
>
> I'm experimenting with slurm's power saving feature and shutdown of "idle" 
> nodes works in general, also the power up works when "idle~" nodes are 
> requested.
> So far so good, but slurm is also shutting down nodes that are not explicitly 
> "idle". Previously I drained a node to debug something on it and slurm shut 
> it down when the SuspendTimeout was reached.

Perhaps you should have put that node in maint mode?

- S



Re: [slurm-users] [External] Re: Node suspend / Power saving - for *idle* nodes only?

2020-05-14 Thread Florian Zillner
Well, the documentation is rather clear on this: "SuspendTime: Nodes becomes 
eligible for power saving mode after being idle or down for this number of 
seconds."
A drained node is neither idle nor down in my mind.

Thanks,
Florian


From: slurm-users  on behalf of Steffen 
Grunewald 
Sent: Thursday, 14 May 2020 15:34
To: Slurm User Community List 
Subject: [External] Re: [slurm-users] Node suspend / Power saving - for *idle* 
nodes only?

On Thu, 2020-05-14 at 13:10:04 +0000, Florian Zillner wrote:
> Hi,
>
> I'm experimenting with slurm's power saving feature and shutdown of "idle" 
> nodes works in general, also the power up works when "idle~" nodes are 
> requested.
> So far so good, but slurm is also shutting down nodes that are not explicitly 
> "idle". Previously I drained a node to debug something on it and slurm shut 
> it down when the SuspendTimeout was reached.

Perhaps you should have put that node in maint mode?

- S



[slurm-users] Node suspend / Power saving - for *idle* nodes only?

2020-05-14 Thread Florian Zillner
Hi,

I'm experimenting with slurm's power saving feature and shutdown of "idle" 
nodes works in general, also the power up works when "idle~" nodes are 
requested.
So far so good, but slurm is also shutting down nodes that are not explicitly 
"idle". Previously I drained a node to debug something on it and slurm shut it 
down when the SuspendTimeout was reached.

Is this something I can configure or should the SuspendProgramm deal with this 
and ignore poweroff requests for non-idle nodes? I haven't found a setting for 
this, if there is one, please point me to it. Btw, we're on 18.08.8.

Thanks,
Florian


Re: [slurm-users] [External] Re: Filter slurm e-mail notification

2019-11-26 Thread Florian Zillner
Hi,

I guess you could use a lua script to filter out flags you don't want. I 
haven't tried it with mail flags, but I'm using a script like the one 
referenced to enforce accounts/time limits, etc.

https://funinit.wordpress.com/2018/06/07/how-to-use-job_submit_lua-with-slurm/

Cheers,
Florian


-Original Message-
From: slurm-users  On Behalf Of 
ichebo...@univ.haifa.ac.il
Sent: Dienstag, 26. November 2019 07:56
To: Slurm User Community List 
Subject: [External] Re: [slurm-users] Filter slurm e-mail notification

I meant on the admin level, to prevent users spamming and overloading the mail 
server with wrong use of —mail flag. 


Best Wishes,
Igor

> On 25 Nov 2019, at 19:13, Brian Andrus  wrote:
> 
> Set --mail-type appropriately.
> 
> If you want only 1 email for the array, remove the "ARRAY_TASKS" from that.
> If you don't want emails at all, set it to "NONE"
> 
> 
> --mail-type=
> Notify user by email when certain event types occur. Valid type values are 
> NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to BEGIN, END, FAIL, 
> REQUEUE, and STAGE_OUT), STAGE_OUT (burst buffer stage out and teardown 
> completed), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent of time limit), 
> TIME_LIMIT_80 (reached 80 percent of time limit), TIME_LIMIT_50 (reached 50 
> percent of time limit) and ARRAY_TASKS (send emails for each array task). 
> Multiple type values may be specified in a comma separated list. The user to 
> be notified is indicated with --mail-user. Unless the ARRAY_TASKS option is 
> specified, mail notifications on job BEGIN, END and FAIL apply to a job array 
> as a whole rather than generating individual email messages for each task in 
> the job array.
> 
> Brian Andrus
> 
> 
> 
> On 11/25/2019 1:48 AM, ichebo...@univ.haifa.ac.il wrote:
>> Hi,
>> 
>> I would like to ask if there is some options to configure the e-mail 
>> notification of slurm jobs? 
>> 
>> For example, how can i filter or not allow sending notifications of an array 
>> jobs? Our relay servers is getting overloaded because of thousands of 
>> notifications regarding array jobs. 
>> 
>> I would like to allow this flag only on regular jobs.
>> 
>> 
>> Best Wishes,
>> Igor
>> 
>> 
>> 



Re: [slurm-users] [External] Re: Upgrading SLURM from 17.02.7 to 18.08.8 - Job ID gets reset

2019-10-18 Thread Florian Zillner
Hi Lech,

Thanks for the hint. I didn't know about that option.

Another way would be to just retain the StateSaveLocation files and move those 
over to the sandbox in which I've tested the upgrade. Once I copied the files 
and re-did the upgrade from scratch, the IDs were consecutive as expected. :)

Thanks,
Florian



-Original Message-
From: slurm-users  On Behalf Of Lech 
Nieroda
Sent: Freitag, 18. Oktober 2019 12:18
To: Slurm User Community List 
Subject: [External] Re: [slurm-users] Upgrading SLURM from 17.02.7 to 18.08.8 - 
Job ID gets reset

Hi Florian,

You can use the FirstJobId option from slurm.conf to continue the JobIds 
seamlessly.

Kind Regards,
Lech

> Am 18.10.2019 um 11:47 schrieb Florian Zillner :
> 
> Hi all,
>  
> we’re using OpenHPC packages to run SLURM. Current OpenHPC Version is 1.3.8 
> (SLURM 18.08.8), though we’re still at 1.3.3 (SLURM 17.02.7), for now.
>  
> I’ve successfully attempted an upgrade in a separate testing environment, 
> which works fine once you adhere to the upgrading notes… So the upgrade 
> itself is not the issue here.
>  
> However, I do see that the SLURM Job ID gets reset to 1, instead of 
> continuing as sequential number, whereas the job_db_inx is incremented as 
> before. This is visible for example when looking at the job queue. From a 
> database perspective this looks like this:
> MariaDB [slurm_acct_db]> select job_db_inx,id_job,pack_job_id,job_name from 
> clustername_job_table limit 96070,96100;
> +++-+--+
> | job_db_inx | id_job | pack_job_id | job_name |
> +++-+--+
> | 107116 |  96155 |   0 | bt   |
> | 107118 |  96156 |   0 | bt   |
> | 107119 |  96157 |   0 | bt   |
> | 107120 |  96158 |   0 | cs_01|
> | 107121 |  96159 |   0 | cs_01|
> | 107123 |  96160 |   0 | cs_01|
> | 107124 |  96161 |   0 | cs_01|
> | 107125 |  96162 |   0 | cs_01|
> | 107126 |  96163 |   0 | cs_01|
> | 107127 |  96164 |   0 | cs_01| <--- Last Job old version
> | 107128 |  2 |   0 | hostname | <--- Jobs after upgrade
> | 107130 |  3 |   0 | hostname |
> | 107131 |  4 |   0 | hostname |
> | 107133 |  5 |   0 | hostname |
> | 107135 |  6 |   0 | hostname |
> | 107137 |  7 |   0 | hostname |
> | 107138 |  8 |   0 | hostname |
> | 107140 |  9 |   0 | hostname |
> | 107142 | 10 |   0 | hostname |
> | 107144 | 11 |   0 | test |
> | 107145 | 12 |   0 | test |
> | 107146 | 13 |   0 | test |
> | 107147 | 14 |   0 | test |
> | 107148 | 15 |   0 | test |
> | 107149 | 16 |   0 | testzill |
> | 107150 | 17 |   0 | testzill |
> | 107151 | 18 |   0 | testzill |
> | 107152 | 19 |   0 | testzill |
> | 107153 | 20 |   0 | testzill |
> | 107154 | 21 |   0 | testzill |
> +++-+--+
> 30 rows in set (0.134 sec)
>  
> Question: is there a way to a) either let SLURM continue the job IDs as 
> usual, or b) set any arbitrary number? If this is a known thing I failed to 
> find it.
>  
> Thx!
> Florian




[slurm-users] Upgrading SLURM from 17.02.7 to 18.08.8 - Job ID gets reset

2019-10-18 Thread Florian Zillner
Hi all,

we're using OpenHPC packages to run SLURM. Current OpenHPC Version is 1.3.8 
(SLURM 18.08.8), though we're still at 1.3.3 (SLURM 17.02.7), for now.

I've successfully attempted an upgrade in a separate testing environment, which 
works fine once you adhere to the upgrading notes... So the upgrade itself is 
not the issue here.

However, I do see that the SLURM Job ID gets reset to 1, instead of continuing 
as sequential number, whereas the job_db_inx is incremented as before. This is 
visible for example when looking at the job queue. From a database perspective 
this looks like this:
MariaDB [slurm_acct_db]> select job_db_inx,id_job,pack_job_id,job_name from 
clustername_job_table limit 96070,96100;
+++-+--+
| job_db_inx | id_job | pack_job_id | job_name |
+++-+--+
| 107116 |  96155 |   0 | bt   |
| 107118 |  96156 |   0 | bt   |
| 107119 |  96157 |   0 | bt   |
| 107120 |  96158 |   0 | cs_01|
| 107121 |  96159 |   0 | cs_01|
| 107123 |  96160 |   0 | cs_01|
| 107124 |  96161 |   0 | cs_01|
| 107125 |  96162 |   0 | cs_01|
| 107126 |  96163 |   0 | cs_01|
| 107127 |  96164 |   0 | cs_01| <--- Last Job old version
| 107128 |  2 |   0 | hostname | <--- Jobs after upgrade
| 107130 |  3 |   0 | hostname |
| 107131 |  4 |   0 | hostname |
| 107133 |  5 |   0 | hostname |
| 107135 |  6 |   0 | hostname |
| 107137 |  7 |   0 | hostname |
| 107138 |  8 |   0 | hostname |
| 107140 |  9 |   0 | hostname |
| 107142 | 10 |   0 | hostname |
| 107144 | 11 |   0 | test |
| 107145 | 12 |   0 | test |
| 107146 | 13 |   0 | test |
| 107147 | 14 |   0 | test |
| 107148 | 15 |   0 | test |
| 107149 | 16 |   0 | testzill |
| 107150 | 17 |   0 | testzill |
| 107151 | 18 |   0 | testzill |
| 107152 | 19 |   0 | testzill |
| 107153 | 20 |   0 | testzill |
| 107154 | 21 |   0 | testzill |
+++-+--+
30 rows in set (0.134 sec)

Question: is there a way to a) either let SLURM continue the job IDs as usual, 
or b) set any arbitrary number? If this is a known thing I failed to find it.

Thx!
Florian