[slurm-users] Re: Reserving resources for use by non-slurm stuff

2024-04-17 Thread Paul Raines via slurm-users



On a single Rocky8 workstation with one GPU where we wanted ssh 
interactive logins to it to have a small portion of its resources (shell, 
compiling, simple data manipulations, console desktop, etc) and the rest 
for SLURM we did this:


- Set it to use cgroupv2
  * modify /etc/defaultg/grub to add  systemd.unified_cgroup_hierarchy=1
to GRUB_CMDLINE_LINUX.  Remake grub with grub2-mkconfig
  * create file /usr/etc/cgroup_cpuset_init with the lines

#!/bin/bash
echo "+cpuset" >> /sys/fs/cgroup/cgroup.subtree_control
echo "+cpuset" >> /sys/fs/cgroup/system.slice/cgroup.subtree_control

  * Modify/create /etc/systemd/system/slurmd.service.d/override.conf
so it has:

[Service]
ExecStartPre=-/usr/etc/cgroup_cpuset_init

- figure out exact cores to use for "free user" use and cores for SLURM.
  Also use GPU sharding in SLURM so GPU can be shared.

  * install hwloc-ls
  * run 'hwloc-ls' to tranlate physical cores 0-9 to logical cores
For me P 0-9 was Logical 0,2,4,6,8,10,12,14,16,18
  * in /etc/slurm.conf the NodeName definition has

CPUs=128 Boards=1 SocketsPerBoard=1 CoresPerSocket=64 ThreadsPerCore=2 \
RealMemory=257267 MemSpecLimit=20480 \
CpuSpecList=0,2,4,6,8,10,12,14,16,18 \
TmpDisk=600 Gres=gpu:nvidia_a2:1,shard:nvidia_a2:32

reserving those 10 cores and 20GB of RAM for "free user"

  * gres.conf has the lines:

AutoDetect=nvml
Name=shard Count=32

  * Need to add gres/shard to GresTypes= too.  Job submissions use
the option --gres=shard:N where N is less than 32

- Set up systemd to restrict "free users" to cores 0-9 and the 20GB

  * Run: systemctl set-property user.slice MemoryHigh=20480M
  * Run for every individual user on the system

systemctl set-property user-$uid.slice AllowedCPUs=0-9

where $uid is that users user ID.  We do this in a script
that also runs sacctmgr to add them to the SLURM system

I could not just set this one for user.slice itself which is what I
first tried because it then restricted the root user too and that
cause wierd behavior with a lot of system tools.  So far the
root/daemon process work fine in the 20GB limit though so that
MemoryHigh=20480M is one and done

Then reboot.

-- Paul Raines (http://help.nmr.mgh.harvard.edu)



The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham Compliance 
HelpLine at https://www.massgeneralbrigham.org/complianceline 
 .
Please note that this e-mail is not secure (encrypted).  If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately.  Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail. 



--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Reserving resources for use by non-slurm stuff

2024-04-17 Thread Sean Maxwell via slurm-users
Hi Shooktija,

On Wed, Apr 17, 2024 at 7:45 AM Shooktija S N via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> NodeName=server[1-3] RealMemory=128636 Sockets=1 CoresPerSocket=64
> ThreadsPerCore=2 State=UNKNOWN Gres=gpu:1
> PartitionName=mainPartition Nodes=ALL Default=YES MaxTime=INFINITE State=UP
>
> I want to reserve a few cores and a few gigs of RAM for use only by the OS
> which cannot be accessed by jobs being managed by Slurm. What configuration
> do I need to do to achieve this?
>

You want to look at these parameters for the Node section of slurm.conf
https://slurm.schedmd.com/slurm.conf.html#OPT_CoreSpecCount
https://slurm.schedmd.com/slurm.conf.html#OPT_MemSpecLimit



> Is it possible to reserve in a similar fashion a 'percent' of the GPU
> which Slurm cannot exceed so that the OS has some GPU resources?
>

Not that I know of


> Is it possible to have these configs be different for each of the 3 nodes?
>

Yes. You will need to define the nodes using 3 separate Node definitions
versus one definition for all 3

Best,

-Sean

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Inconsistencies in CPU time Reporting by sreport and sacct Tools

2024-04-17 Thread KK via slurm-users
I wish to ascertain the CPU core time utilized by user dj1 and dj. I have
tested with sreport cluster UserUtilizationByAccount, sreport job
SizesByAccount, and sacct. It appears that sreport cluster
UserUtilizationByAccount displays the total core hours used by the entire
account, rather than the individual user's cpu time. Here are the specifics:

Users dj and dj1 are both under the account mehpc.

In 2024-04-12 ~ 2024-04-15, dj1 used approximately 10 minutes of core time,
while dj used about 4 minutes. However, "sreport Cluster
UserUtilizationByAccount user=dj1 start=2024-04-12 end=2024-04-15" shows 14
minutes of usage. Similarly, "sreport job SizesByAccount Users=dj
start=2024-04-12 end=2024-04-15" hows about 14 minutes.
Using "sreport job SizesByAccount Users=dj1 start=2024-04-12
end=2024-04-15" or "sacct -u dj1 -S 2024-04-12 -E 2024-04-15 -o
"jobid,partition,account,user,alloccpus,cputimeraw,state,workdir%60" -X
|awk 'BEGIN{total=0}{total+=$6}END{print total}'" yields the accurate
values, which are around 10 minutes for dj1.

Attachment are the details.


detail_results
Description: Binary data

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Reserving resources for use by non-slurm stuff

2024-04-17 Thread Shooktija S N via slurm-users
Hi, I am running Slurm (v22.05.8) on 3 nodes each with the following specs:
OS: Proxmox VE 8.1.4 x86_64 (based on Debian 12)
CPU: AMD EPYC 7662 (128)
GPU: NVIDIA GeForce RTX 4070 Ti
Memory: 128 Gb

This is /etc/slurm/slurm.conf on all 3 computers without the comment lines:
ClusterName=DlabCluster
SlurmctldHost=server1
GresTypes=gpu
ProctrackType=proctrack/linuxproc
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=root
StateSaveLocation=/var/spool/slurmctld
TaskPlugin=task/affinity,task/cgroup
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
SchedulerType=sched/backfill
SelectType=select/cons_tres
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
SlurmctldDebug=debug3
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdDebug=debug3
SlurmdLogFile=/var/log/slurmd.log
NodeName=server[1-3] RealMemory=128636 Sockets=1 CoresPerSocket=64
ThreadsPerCore=2 State=UNKNOWN Gres=gpu:1
PartitionName=mainPartition Nodes=ALL Default=YES MaxTime=INFINITE State=UP

I want to reserve a few cores and a few gigs of RAM for use only by the OS
which cannot be accessed by jobs being managed by Slurm. What configuration
do I need to do to achieve this?

Is it possible to reserve in a similar fashion a 'percent' of the GPU which
Slurm cannot exceed so that the OS has some GPU resources?

Is it possible to have these configs be different for each of the 3 nodes?

Thanks!

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Association limit problem

2024-04-17 Thread Gestió Servidors via slurm-users
Hello,

I'm doing some test with "associations" with "sacctmgr". I have created three 
users (user_1, user_2 and user_3). For each of these users, I have created an 
association:

[root@myserver log]# sacctmgr show user user_1 --associations
  User   Def Acct AdminClusterAccount  Partition Share   
Priority MaxJobs MaxNodes  MaxCPUs MaxSubmit MaxWall  MaxCPUMins
  QOS   Def QOS
-- -- - -- -- -- - 
-- ---   - --- --- 
 -
user_1   test  None q50004   testaolin.q 1  
42   10 
normal
user_1   test  None q50004   test cuda-staf+ 1  
42   10 
normal

[root@myserver log]# sacctmgr show user user_2 --associations
  User   Def Acct AdminClusterAccount  Partition Share   
Priority MaxJobs MaxNodes  MaxCPUs MaxSubmit MaxWall  MaxCPUMins
  QOS   Def QOS
-- -- - -- -- -- - 
-- ---   - --- --- 
 -
user_2   test  None q50004   test cuda-int.q 1  
  4 
normal

[root@myserver log]# sacctmgr show user user_3 --associations
  User   Def Acct AdminClusterAccount  Partition Share   
Priority MaxJobs MaxNodes  MaxCPUs MaxSubmit MaxWall  MaxCPUMins
  QOS   Def QOS
-- -- - -- -- -- - 
-- ---   - --- --- 
 -
user_3   test  None q50004   test research.q 1  
 21 
normal
user_3   test  None q50004   test xeon.q 1  
 21 
normal

All users belong to "Test" account:
[root@myserver log]# sacctmgr show account test --association
   AccountDescr  OrgCluster ParentName  
 User Share   Priority GrpJobs GrpNodes  GrpCPUs  GrpMem GrpSubmit 
GrpWall  GrpCPUMins MaxJobs MaxNodes  MaxCPUs MaxSubmit MaxWall  MaxCPUMins 
 QOS   Def QOS
--   -- -- 
-- - -- ---   --- - 
--- --- ---   - --- 
---  -
  test test test q50004   root  
  1 

 normal
  test test test q50004
user_1 1
  42   10   
  normal
  test test test q50004
user_1 1
  42   10   
  normal
  test test test q50004
user_2 1
4   
  normal
  test test test q50004
user_3 1
   21   
  normal
  test test test q50004
user_3 1
   21   
  normal


When I submit with "user_1", all tests are running fine: some jobs are queued 
and executed and some jobs are rejected because of the limits.
However, with users "user_2" and "user_3" I can't submit any job. All jobs are 
rejected with these messages:
 11168 research. test  user_3  PENDING 0:00  
2024-04-17T12:53:21  N/A11 OK  N/A 
(AssocMaxCpuPerJo (null)
 11173 research. test  user_3  PENDING 0:00  
2024-04-17T13:06:02   

[slurm-users] Re: Munge log-file fills up the file system to 100%

2024-04-17 Thread Bjørn-Helge Mevik via slurm-users
Jeffrey T Frey via slurm-users  writes:

>> AFAIK, the fs.file-max limit is a node-wide limit, whereas "ulimit -n"
>> is per user.
>
> The ulimit is a frontend to rusage limits, which are per-process restrictions 
> (not per-user).

You are right; I sit corrected. :)

(Except for number of procs and number of pending signals, according to
"man setrlimit".)

Then 1024 might not be so low for ulimit -n after all.

-- 
Regard,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo



signature.asc
Description: PGP signature

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com