[slurm-users] Using free memory available when allocating a node to a job

2018-05-29 Thread PULIDO, Alexandre
Hi, in the cluster where I'm deploying Slurm the job allocation has to be based on the actual free memory available on the node, not just the allocated by Slurm. This is nonnegotiable and I understand that it's not how Slurm is designed to work, but I'm trying anyway. Among the solutions that

[slurm-users] Job not in squeue and no log file exists

2018-05-29 Thread Mahmood Naderan
Hi, When I submit the following script, I receive a job id. However, it doesn't show that in squeue. Moreover, there is no log file as I specified in the script hamid@rocks7:scripts$ cat slurm_script.sh #!/bin/bash #SBATCH --job-name=hvacSteadyFoam #SBATCH --output=hvacSteadyFoam.log #SBATCH --nta

Re: [slurm-users] Using free memory available when allocating a node to a job

2018-05-29 Thread John Hearns
Alexandre, it would be helpful if you could say why this behaviour is desirable. For instance, do you have codes which need a large amount of memory and your users are seeing that these codes are crashing because other codes running on the same nodes are using memory. I have two thoughts: A) en

Re: [slurm-users] Using free memory available when allocating a node to a job

2018-05-29 Thread John Hearns
Also regarding memory, there are system tunings you can set for the behaviour of the OurOfMemory Killer and also the VM overcommit. I have seen the VM overcommit parameters being discussed elsewhere, and generally for HPC people advise to disable overcommit https://www.suse.com/support/kb/doc/?id=

Re: [slurm-users] Using free memory available when allocating a node to a job

2018-05-29 Thread PULIDO, Alexandre
Hello John, this behavior is needed because the memory usage of the codes executed on the nodes are particularly hard to guess. Usually, when exceeded the ratio is between 1.1 and 1.3 more than expected. Sometimes much larger. A) Indeed there is a partition running only exclusive jobs, but

Re: [slurm-users] Using free memory available when allocating a node to a job

2018-05-29 Thread John Hearns
Alexandre, you have made a very good point here. "Oftentimes users only input 1G as they really have no idea of the memory requirements," At my last job we introduced cgroups. (this was in PBSPro). We had to enforce a minumum request for memory. Users then asked us how much memory their jobs use

Re: [slurm-users] Using free memory available when allocating a node to a job

2018-05-29 Thread Loris Bennett
John Hearns writes: > Alexandre, you have made a very good point here. "Oftentimes users only input > 1G as they really have no idea of the memory requirements," > At my last job we introduced cgroups. (this was in PBSPro). We had to enforce > a minumum request for memory. > Users then asked us

Re: [slurm-users] Using free memory available when allocating a node to a job

2018-05-29 Thread PULIDO, Alexandre
Thanks for your inputs, the automatic reporting is definitely a great idea and seems easy to implement in Slurm. At our site we have a web portal developed internally where users can see in real time everything that is happening on the cluster, and every metric of their own job. There is especia

Re: [slurm-users] Controller / backup controller q's

2018-05-29 Thread Patrick Goetz
On 05/25/2018 11:19 AM, Will Dennis wrote: Not yet time for us... There's problems with U18.04 that render it unusable for our environment. What problems have you run in to with 18.04?

Re: [slurm-users] Using free memory available when allocating a node to a job

2018-05-29 Thread Brian Andrus
On thing that seems concerning to me is that you may start a job on a node before a currently running job has 'expanded' as much as it will. If there is 128G on the node and current job is using 64G but will eventually use 112G, your approach could start another similar job and they would both

[slurm-users] Call for Abstracts - Slurm User Group Meeting 2018

2018-05-29 Thread Jacob Jenson
Slurm User Group Meeting 2018 25-26 September 2018 Madrid, Spain You are invited to submit an abstract of a tutorial, technical presentation or site report to be given at the Slurm User Group Meeting 2018. This event is sponsored and organized by CIEMAT and SchedMD. This international event is ope

[slurm-users] Registration for 2018 Slurm User Group Meeting is Open

2018-05-29 Thread Jacob Jenson
Registration for the 2018 Slurm User Group Meeting is open. You can register at https://slug18.eventbrite.com The meeting will be held on 25-26 September 2018 in Madrid Spain at CIEMAT. - *Early registration* - May 29 through July 2 - $300 USD - *Standard registration* - J