[slurm-users] job_container/tmpfs and autofs
Hi there, we excitedly found the job_container/tmpfs plugin which neatly allows us to provide local scratch space and a way of ensuring that /dev/shm gets cleaned up after a job finishes. Unfortunately we found that it does not play nicely with autofs which we use to provide networked project and scratch directories. We found that this is a known issue [1]. I was wondering if that has been solved? I think it would be really useful to have a warning about this issue in the documentation for the job_container/tmpfs plugin. Regards magnus [1] https://cernvm-forum.cern.ch/t/intermittent-client-failures-too-many-levels-of-symbolic-links/156/4 -- Magnus Hagdorn Charité – Universitätsmedizin Berlin Geschäftsbereich IT | Scientific Computing Campus Charité Virchow Klinikum Forum 4 | Ebene 02 | Raum 2.020 Augustenburger Platz 1 13353 Berlin magnus.hagd...@charite.de https://www.charite.de HPC Helpdesk: sc-hpc-helpd...@charite.de smime.p7s Description: S/MIME cryptographic signature
Re: [slurm-users] Jobs can grow in RAM usage surpassing MaxMemPerNode
Hi Cristóbal, I would guess you need to set up a cgroup.conf file ### # Slurm cgroup support configuration file ### ConstrainRAMSpace=yes ConstrainSwapSpace=yes AllowedRAMSpace=100 AllowedSwapSpace=0 MaxRAMPercent=100 MaxSwapPercent=0 #ConstrainDevices=yes MemorySwappiness=0 TaskAffinity=no CgroupAutomount=yes ConstrainCores=yes # Best, Rodrigo On Wed, Jan 11, 2023 at 10:50 PM Cristóbal Navarro < cristobal.navarr...@gmail.com> wrote: > Hi Slurm community, > Recently we found a small problem triggered by one of our jobs. We have a > *MaxMemPerNode*=*532000* setting in our compute node in slurm.conf file, > however we found out that a job that started with mem=65536, and after > hours of execution it was able to grow its memory usage during execution up > to ~650GB. We expected that *MaxMemPerNode* would stop any job exceeding > the limit of 532000, did we miss something in the slurm.conf file? We were > trying to avoid going into setting QOS for each group of users. > any help is welcome > > Here is the node definition in the conf file > ## Nodes list > ## use native GPUs > NodeName=nodeGPU01 SocketsPerBoard=8 CoresPerSocket=16 ThreadsPerCore=1 > RealMemory=1024000 MemSpecLimit=65556 State=UNKNOWN Gres=gpu:A100:8 > Feature=gpu > > > And here is the full slurm.conf file > # node health check > HealthCheckProgram=/usr/sbin/nhc > HealthCheckInterval=300 > > ## Timeouts > SlurmctldTimeout=600 > SlurmdTimeout=600 > > GresTypes=gpu > AccountingStorageTRES=gres/gpu > DebugFlags=CPU_Bind,gres > > ## We don't want a node to go back in pool without sys admin > acknowledgement > ReturnToService=0 > > ## Basic scheduling > SelectType=select/cons_tres > SelectTypeParameters=CR_Core_Memory,CR_ONE_TASK_PER_CORE > SchedulerType=sched/backfill > > ## Accounting > AccountingStorageType=accounting_storage/slurmdbd > AccountingStoreJobComment=YES > AccountingStorageHost=10.10.0.1 > AccountingStorageEnforce=limits > > JobAcctGatherFrequency=30 > JobAcctGatherType=jobacct_gather/linux > > TaskPlugin=task/cgroup > ProctrackType=proctrack/cgroup > > ## scripts > Epilog=/etc/slurm/epilog > Prolog=/etc/slurm/prolog > PrologFlags=Alloc > > ## MPI > MpiDefault=pmi2 > > ## Nodes list > ## use native GPUs > NodeName=nodeGPU01 SocketsPerBoard=8 CoresPerSocket=16 ThreadsPerCore=1 > RealMemory=1024000 MemSpecLimit=65556 State=UNKNOWN Gres=gpu:A100:8 > Feature=gpu > > ## Partitions list > PartitionName=gpu OverSubscribe=No MaxCPUsPerNode=64 DefMemPerNode=65556 > DefCpuPerGPU=8 DefMemPerGPU=65556 MaxMemPerNode=532000 MaxTime=3-12:00:00 > State=UP Nodes=nodeGPU01 Default=YES > PartitionName=cpu OverSubscribe=No MaxCPUsPerNode=64 DefMemPerNode=16384 > MaxMemPerNode=42 MaxTime=3-12:00:00 State=UP Nodes=nodeGPU01 > > > -- > Cristóbal A. Navarro >
[slurm-users] Jobs can grow in RAM usage surpassing MaxMemPerNode
Hi Slurm community, Recently we found a small problem triggered by one of our jobs. We have a *MaxMemPerNode*=*532000* setting in our compute node in slurm.conf file, however we found out that a job that started with mem=65536, and after hours of execution it was able to grow its memory usage during execution up to ~650GB. We expected that *MaxMemPerNode* would stop any job exceeding the limit of 532000, did we miss something in the slurm.conf file? We were trying to avoid going into setting QOS for each group of users. any help is welcome Here is the node definition in the conf file ## Nodes list ## use native GPUs NodeName=nodeGPU01 SocketsPerBoard=8 CoresPerSocket=16 ThreadsPerCore=1 RealMemory=1024000 MemSpecLimit=65556 State=UNKNOWN Gres=gpu:A100:8 Feature=gpu And here is the full slurm.conf file # node health check HealthCheckProgram=/usr/sbin/nhc HealthCheckInterval=300 ## Timeouts SlurmctldTimeout=600 SlurmdTimeout=600 GresTypes=gpu AccountingStorageTRES=gres/gpu DebugFlags=CPU_Bind,gres ## We don't want a node to go back in pool without sys admin acknowledgement ReturnToService=0 ## Basic scheduling SelectType=select/cons_tres SelectTypeParameters=CR_Core_Memory,CR_ONE_TASK_PER_CORE SchedulerType=sched/backfill ## Accounting AccountingStorageType=accounting_storage/slurmdbd AccountingStoreJobComment=YES AccountingStorageHost=10.10.0.1 AccountingStorageEnforce=limits JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/linux TaskPlugin=task/cgroup ProctrackType=proctrack/cgroup ## scripts Epilog=/etc/slurm/epilog Prolog=/etc/slurm/prolog PrologFlags=Alloc ## MPI MpiDefault=pmi2 ## Nodes list ## use native GPUs NodeName=nodeGPU01 SocketsPerBoard=8 CoresPerSocket=16 ThreadsPerCore=1 RealMemory=1024000 MemSpecLimit=65556 State=UNKNOWN Gres=gpu:A100:8 Feature=gpu ## Partitions list PartitionName=gpu OverSubscribe=No MaxCPUsPerNode=64 DefMemPerNode=65556 DefCpuPerGPU=8 DefMemPerGPU=65556 MaxMemPerNode=532000 MaxTime=3-12:00:00 State=UP Nodes=nodeGPU01 Default=YES PartitionName=cpu OverSubscribe=No MaxCPUsPerNode=64 DefMemPerNode=16384 MaxMemPerNode=42 MaxTime=3-12:00:00 State=UP Nodes=nodeGPU01 -- Cristóbal A. Navarro