Re: [slurm-users] Environment modules
On 22/11/19 9:37 am, Mariano.Maluf wrote: The cluster is operational but I need to install and configure environment modules. If you use Easybuild to install your HPC software then it can take care of the modules too for you. I'd also echo the recommendation from others to use Lmod. Website: https://easybuilders.github.io/easybuild/ Documentation: https://easybuild.readthedocs.io/ All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
Re: [slurm-users] Array jobs vs. many jobs
Hi Ryan, On 11/22/19 12:18 PM, Ryan Novosielski wrote: Quick question that I'm not sure how to find the answer to otherwise: do array jobs have less impact on the scheduler in any way than a whole long list of jobs run the more traditional way? Less startup overhead, anything like that? Slurm will represent the whole job array as a single entity until it needs to create elements for scheduling purposes (ageing if you limit the number of jobs that can accrue time, or just starting them up). So if you have a 10,000 element job array it uses the same amount of memory as 1 job until things start to happen to it. It's a big win if you've got a workload that can take advantage of it. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
Re: [slurm-users] Array jobs vs. many jobs
Jan-Albert van Ree | Linux System Administrator | Digital Services MARIN | T +31 317 49 35 48 | mailto:j.a.v@marin.nl | http://www.marin.nl It helps a lot indeed ; we run arrays up to 100k elements and more. If you submit 100k separate jobs, the scheduler will definately grind to a halt. Regards, -- Jan-Albert From: slurm-users on behalf of Ryan Novosielski Sent: Friday, November 22, 2019 21:18 To: slurm-users@lists.schedmd.com Subject: [slurm-users] Array jobs vs. many jobs Hi there, Quick question that I'm not sure how to find the answer to otherwise: do array jobs have less impact on the scheduler in any way than a whole long list of jobs run the more traditional way? Less startup overhead, anything like that? Thanks! (we run 17.11 on CentOS 7, but I'm not sure it makes any difference here)
[slurm-users] Array jobs vs. many jobs
Hi there, Quick question that I'm not sure how to find the answer to otherwise: do array jobs have less impact on the scheduler in any way than a whole long list of jobs run the more traditional way? Less startup overhead, anything like that? Thanks! (we run 17.11 on CentOS 7, but I'm not sure it makes any difference here) -- || \\UTGERS,|---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Computing - MSB C630, Newark `'
Re: [slurm-users] Environment modules
Jan-Albert van Ree | Linux System Administrator | Digital Services MARIN | T +31 317 49 35 48 | mailto:j.a.v@marin.nl | http://www.marin.nl Just install the default CentOS RPM package environment-modules and play with it. If you're at home in bash you'll pick it up in minutes. All default modules will be put in /usr/share/Modules/modulefiles or /etc/modulefiles for CentOS but you can add new locations (in a cluster you'd put it on the shared filesystem, so all nodes can have immediate access after installing it there) For the correct syntax for environment modules , just check out some default modulefiles ; install the CentOS openmpi package and look at the file /etc/modulefiles/mpi/openmpi-x86_64 for some of the possibilities with modulefiles , although there's a lot more possible, such as automatically loading of dependent modules Hope this helps -- Jan-Albert From: slurm-users on behalf of Mariano.Maluf Sent: Friday, November 22, 2019 18:37 To: slurm-users@lists.schedmd.com Subject: [slurm-users] Environment modules Hi all I am setting up for the first time a cluster with Slurm in Centos7 with 1 headnode and 12 nodes. The cluster is operational but I need to install and configure environment modules. Could you advise me some documentation about it? Thanks in advance. Regards, Mariano. -- Lic. Mariano Maluf Universidad Nacional de San Martín 2033-1400 int. 6046
Re: [slurm-users] Environment modules
On Fri, Nov 22, 2019 at 6:37 PM Mariano.Maluf wrote: > Hi all > > I am setting up for the first time a cluster with Slurm in Centos7 with > 1 headnode and 12 nodes. > > The cluster is operational but I need to install and configure > environment modules. > > Could you advise me some documentation about it? > > Nothing to see with Slurm :-) But it's not so hard to setup. Just define your path to modulefiles, the same for headnode & nodes. Put your module files on shared folder for all nodes. That's all :-) > Thanks in advance. > > Regards, > Mariano. > > -- > Lic. Mariano Maluf > Universidad Nacional de San Martín > 2033-1400 int. 6046 > > >
Re: [slurm-users] Environment modules
We use TACC's lmod system. It is pretty straightforward to setup and reasonably well documented: https://www.tacc.utexas.edu/research-development/tacc-projects/lmod Paul. > On Nov 22, 2019, at 12:37 PM, Mariano.Maluf wrote: > > Hi all > > I am setting up for the first time a cluster with Slurm in Centos7 with 1 > headnode and 12 nodes. > > The cluster is operational but I need to install and configure environment > modules. > > Could you advise me some documentation about it? > > Thanks in advance. > > Regards, > Mariano. > > -- > Lic. Mariano Maluf > Universidad Nacional de San Martín > 2033-1400 int. 6046 > >
[slurm-users] Environment modules
Hi all I am setting up for the first time a cluster with Slurm in Centos7 with 1 headnode and 12 nodes. The cluster is operational but I need to install and configure environment modules. Could you advise me some documentation about it? Thanks in advance. Regards, Mariano. -- Lic. Mariano Maluf Universidad Nacional de San Martín 2033-1400 int. 6046
Re: [slurm-users] Slurm configuration, Weight Parameter
Can't you just set the usage priority to be higher for the 2GB machines? This way, if the requested memory is less than 2GB those machines will be used first, and larger jobs skip to the higher memory machines. On 11/21/19 9:44 AM, Jim Prewett wrote: > > Hi Sistemas, > > I could be mistaken, but I don't think there is a way to require jobs on > the 3GB nodes to request more than 2GB! > > https://slurm.schedmd.com/slurm.conf.html states this: "Note that if a > job allocation request can not be satisfied using the nodes with the > lowest weight, the set of nodes with the next lowest weight is added to > the set of nodes under consideration for use (repeat as needed for > higher weight values)." > > I read that to mean "if there are only 3GB nodes available, jobs will be > run there reguardless of the memory needed." We had a similar request > but were unable to find a solution (and, ultimately the particular user > is happier to not have idle machines when there's work to be done!). > > If I'm misunderstanding, I'd love to know! > > HTH, > Jim > > On Thu, 21 Nov 2019, Sistemas NLHPC wrote: > >> Hi all, >> >> Currently we have two types of nodes, one with 3GB and another with >> 2GB of >> RAM, it is required that in nodes of 3 GB it is not allowed to execute >> tasks with less than 2GB, to avoid underutilization of resources. >> >> This, because we have nodes that can fulfill the condition of executing >> tasks with 2GB or less. >> >> I try in the nodes configuration with the option "Weight".I send >> multiples >> jobs but slurm not asigned by "Weight", it's arbitrary in the order how >> send jobs. Some configuration and logs: >> >> slurm.conf >> >> NodeName=DEFAULT RealMemory=3007 Features=3007MB Weight=500 State=idle >> Sockets=2 CoresPerSocket=1 >> NodeName=devcn050 >> >> NodeName=DEFAULT RealMemory=3007 Features=3007MB Weight=100 State=idle >> Sockets=2 CoresPerSocket=1 >> NodeName=devcn002 >> >> NodeName=DEFAULT RealMemory=2000 Features=2000MB Weight=1 State=idle >> Sockets=2 CoresPerSocket=1 >> NodeName=devcn001 >> >> Extra information, I see that slurm assing Weight in the node. >> >> # sinfo -N -l >> >> NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK >> WEIGHT >> AVAIL_FE REASON >> devcn001 1 slims* idle 2 >> 2:1:1 2000 0 1 2000MB none >> >> devcn002 1 slims* idle 2 >> 2:1:1 3007 0 100 3007MB none >> >> devcn050 1 slims* idle 2 >> 2:1:1 3007 0 500 3007MB none >> >> I test other settings, such as the TRESWeigths parameter with no results, >> for example: >> >> NodeName=devcn001 TRESWeights="CPU=2.0,Mem=2000MB" >> >> Too PriorityType=priority/multifactor plugin is also activated and >> deactivated to test, but in all these cases it does not work. >> >> Thanks in advance. >> >> Regards. >> > > James E. Prewett j...@prewett.org downl...@hpc.unm.edu > Systems Team Leader LoGS: http://www.hpc.unm.edu/~download/LoGS/ > Designated Security Officer OpenPGP key: pub 1024D/31816D93 > HPC Systems Engineer III UNM HPC 505.277.8210 > >>> This message is from an external sender. Learn more about why this << >>> matters at https://links.utexas.edu/rtyclf. << >
[slurm-users] nss_slurm not passing groups
Ok, so I wanted to test nss_slurm more after hitting the BoF yesterday. I have it running, but it does not seem to pass groups. >From a simple interactive bash session: [andrubr@gen-b2-03 ~]$ getent -s slurm passwd andrubr:x:43871:11513:Andrus, Brian:/home/andrubr:/bin/bash [andrubr@gen-b2-03 ~]$ scontrol getent gen-b2-03 JobId=236243.Extern: User: andrubr:x:43871:11513:Andrus, Brian:/home/andrubr:/bin/bash Groups: JobId=236243.0: User: andrubr:x:43871:11513:Andrus, Brian:/home/andrubr:/bin/bash Groups: --- but when I do a standard 'id', I get back 41 groups I am in. Bug? Brian Andrus