[slurm-dev] Re: building slurm with rpmbuild and hwloc support
Hi! You need 2 packages installed on the system(in my case it is a RHEL based distro) where you build Slurm: hwloc and hwloc-devel. And also you don't need the .rpmmacros for hwloc if you have installed these packages. By default this option is enabled. ;) Btw, I have never used a custom hwloc installation and I cannot help you if you go that direction. Sorry for the delayed answer, most probably you have already solved the problem... Best Regards, Valantis On 10/17/2014 05:19 PM, Pancorbo, Juan wrote: Hello, we are running slurm2.6.9 and we are trying to use /task/cgroup plugin, but when we run a job we get the following error: slurmd[lxa178]: task/cgroup: plugin not compiled with hwloc support, skipping affinity. Slurm was built with rpmbuild to be installed as an rpm, so we try we to rebuild again slurm with hwlock support so that the plugin is recompiled again. We tried first to use the .rpmmacros cat .rpmmacros %with_hwloc --with-hwloc=/usr/hwloc/1.10 Without success: checking for hwloc installation... configure: WARNING: unable to locate hwloc installation We run the configure script ./configure --enable-pam --enable-debug --enable-salloc-kill-cmd --with-pam_dir=/etc/pam.d --with-munge=/etc/munge --with-ssl=/etc/ssl --with-hwloc=/usr/hwloc/1.10 --prefix=/usr --sysconfdir=/etc/slurm And from the output: checking for hwloc installation... /usr/hwloc/1.10 then we got the configure results inside the tar and build it again without success. We also including this line in slurm.spec after %configure \ %{?with_hwloc:--with-hwloc=/usr/hwloc/1.10} And also didn’t worked. Anybody have an idea of what is needed to compile the plugin with hwloc support using rpmbuild? Thanks in advance. Juan Pancorbo Armada juan.panco...@lrz.demailto:juan.panco...@lrz.de http//www.lrz.de Leibniz-Rechenzentrum Abteilung: Hochleistungssysteme Boltzmannstrasse 1, 85748 Garching Telefon: +49 (0) 89 35831-8735 Fax: +49 (0) 89 35831-8535 Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt
[slurm-dev] Re: slurm cannot work with Infiniband after rebooting
Hi! For sure this is not connected to Slurm, but it is a problem with your Infiband+IMPI configuration. You should go to other forums or mailing lists and ask for help ;) At first, I would suggest you to configure correctly the dat.conf file. In my case it is /etc/dat.conf. You have to comment out all the lines with the unsupported IB modes. And then you should export some Intel MPI variables and set the correct environment. Try to find the documentation about Intel MPI vars, like: I_MPI_DEVICE, I_MPI_FABRICS, I_MPI_FALLBACK, I_MPI_DAPL_PROVIDER_LIST and I_MPI_DEBUG. If you play enough I am sure you will get the desired result. In our case we had set for example: I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx4_0-1 which solved similar problems if I remember correctly. Best Regards, Chrysovalantis Paschoulas On 10/20/2014 06:46 PM, Tingyang Xu wrote: To whom it may concern, Hello. I am new in slurm. I am facing a problem of using slurm with Infiniband. When I ran the mpi jobs on a rebooted node, I would get fabric errors. For example, I tried a simple “hello world” via Intel mpi. I did like: $ salloc -N1 -n12 -w cn117 #cn117 is the node just rebooted salloc: Granted job allocation 1201 $ module list Currently Loaded Modulefiles: 1) modules2) null 3) intelics/2013.1.039 $ export I_MPI_PMI_LIBRARY=/gpfs/slurm/lib/libpmi.so $srun ./hello [3] MPI startup(): ofa fabric is not available and fallback fabric is not enabled [4] MPI startup(): ofa fabric is not available and fallback fabric is not enabled [5] MPI startup(): ofa fabric is not available and fallback fabric is not enabled [6] MPI startup(): ofa fabric is not available and fallback fabric is not enabled [7] MPI startup(): ofa fabric is not available and fallback fabric is not enabled [8] MPI startup(): ofa fabric is not available and fallback fabric is not enabled [10] MPI startup(): ofa fabric is not available and fallback fabric is not enabled [11] MPI startup(): ofa fabric is not available and fallback fabric is not enabled [0] MPI startup(): ofa fabric is not available and fallback fabric is not enabled [9] MPI startup(): ofa fabric is not available and fallback fabric is not enabled [1] MPI startup(): ofa fabric is not available and fallback fabric is not enabled [2] MPI startup(): ofa fabric is not available and fallback fabric is not enabled srun: error: cn117: tasks 0-11: Exited with exit code 254 srun: Terminating job step 1201.0 However, as long as I manually restart the slurm on the cn117, the problem will be fixed. For example: $ ssh root@cn117mailto:root@cn117 cn117# service slurm restart stopping slurmd: [ OK ] slurmd is stopped starting slurmd: [ OK ] # exit $ salloc -N1 -n12 -w cn117 salloc: Granted job allocation 1203 $ export I_MPI_PMI_LIBRARY=/gpfs/slurm/lib/libpmi.so $ srun ./hello This is Process 9 out of 12 running on host cn117 This is Process 3 out of 12 running on host cn117 This is Process 2 out of 12 running on host cn117 This is Process 7 out of 12 running on host cn117 This is Process 6 out of 12 running on host cn117 This is Process 0 out of 12 running on host cn117 This is Process 5 out of 12 running on host cn117 This is Process 1 out of 12 running on host cn117 This is Process 4 out of 12 running on host cn117 This is Process 10 out of 12 running on host cn117 This is Process 8 out of 12 running on host cn117 This is Process 11 out of 12 running on host cn117 = Although I can manully do it, I still hope the system can be more automatic. I tried to add “sleep 10s;/etc/init.d/slurm restart” in the end of the file, rc.local but the issue is still there. Can anyone help me about that? Sincerely, Tingyang Xu HPC Administrator University of Connecticut PS: some information of the infiniband: $ slurmd -V slurm 14.03.0 cn117$ ofed_info|head -n1 MLNX_OFED_LINUX-2.2-1.0.1 (OFED-2.2-1.0.0): cn117$ ibv_devinfo hca_id: mlx4_0 transport: InfiniBand (0) fw_ver:2.11.550 node_guid: sys_image_guid: ## vendor_id: ## vendor_part_id: hw_ver:0x0 board_id: phys_port_cnt: 2 port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 4096 (5) sm_lid: 1 port_lid: 131 port_lmc: 0x00 link_layer: InfiniBand port: 2 state: PORT_DOWN (1) max_mtu: 4096 (5) active_mtu: 4096 (5) sm_lid: 0 port_lid: 0 port_lmc: 0x00 link_layer: InfiniBand cn117$ cat /etc/redhat-release Red Hat Enterprise Linux Workstation release 6.5 (Santiago) Forschungszentrum Juelich GmbH 52425
[slurm-dev] Re: slurm cannot work with Infiniband after rebooting
Thank you very much, Chrysovalantis. I just created a topic in Intel forum though your suggestion did not fix our issue. I will also update this topic if I get the solution in case other slurm users may have the similar issue again. Thanks, Tingyang Xu From: Chrysovalantis Paschoulas Sent: Monday, October 27, 2014 10:45 AM To: slurm-dev Subject: [slurm-dev] Re: slurm cannot work with Infiniband after rebooting Hi! For sure this is not connected to Slurm, but it is a problem with your Infiband+IMPI configuration. You should go to other forums or mailing lists and ask for help ;) At first, I would suggest you to configure correctly the dat.conf file. In my case it is /etc/dat.conf. You have to comment out all the lines with the unsupported IB modes. And then you should export some Intel MPI variables and set the correct environment. Try to find the documentation about Intel MPI vars, like: I_MPI_DEVICE, I_MPI_FABRICS, I_MPI_FALLBACK, I_MPI_DAPL_PROVIDER_LIST and I_MPI_DEBUG. If you play enough I am sure you will get the desired result. In our case we had set for example: I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx4_0-1 which solved similar problems if I remember correctly. Best Regards, Chrysovalantis Paschoulas On 10/20/2014 06:46 PM, Tingyang Xu wrote: To whom it may concern, Hello. I am new in slurm. I am facing a problem of using slurm with Infiniband. When I ran the mpi jobs on a rebooted node, I would get fabric errors. For example, I tried a simple “hello world” via Intel mpi. I did like: $ salloc -N1 -n12 -w cn117 #cn117 is the node just rebooted salloc: Granted job allocation 1201 $ module list Currently Loaded Modulefiles: 1) modules2) null 3) intelics/2013.1.039 $ export I_MPI_PMI_LIBRARY=/gpfs/slurm/lib/libpmi.so $srun ./hello [3] MPI startup(): ofa fabric is not available and fallback fabric is not enabled [4] MPI startup(): ofa fabric is not available and fallback fabric is not enabled [5] MPI startup(): ofa fabric is not available and fallback fabric is not enabled [6] MPI startup(): ofa fabric is not available and fallback fabric is not enabled [7] MPI startup(): ofa fabric is not available and fallback fabric is not enabled [8] MPI startup(): ofa fabric is not available and fallback fabric is not enabled [10] MPI startup(): ofa fabric is not available and fallback fabric is not enabled [11] MPI startup(): ofa fabric is not available and fallback fabric is not enabled [0] MPI startup(): ofa fabric is not available and fallback fabric is not enabled [9] MPI startup(): ofa fabric is not available and fallback fabric is not enabled [1] MPI startup(): ofa fabric is not available and fallback fabric is not enabled [2] MPI startup(): ofa fabric is not available and fallback fabric is not enabled srun: error: cn117: tasks 0-11: Exited with exit code 254 srun: Terminating job step 1201.0 However, as long as I manually restart the slurm on the cn117, the problem will be fixed. For example: $ ssh root@cn117 cn117# service slurm restart stopping slurmd: [ OK ] slurmd is stopped starting slurmd: [ OK ] # exit $ salloc -N1 -n12 -w cn117 salloc: Granted job allocation 1203 $ export I_MPI_PMI_LIBRARY=/gpfs/slurm/lib/libpmi.so $ srun ./hello This is Process 9 out of 12 running on host cn117 This is Process 3 out of 12 running on host cn117 This is Process 2 out of 12 running on host cn117 This is Process 7 out of 12 running on host cn117 This is Process 6 out of 12 running on host cn117 This is Process 0 out of 12 running on host cn117 This is Process 5 out of 12 running on host cn117 This is Process 1 out of 12 running on host cn117 This is Process 4 out of 12 running on host cn117 This is Process 10 out of 12 running on host cn117 This is Process 8 out of 12 running on host cn117 This is Process 11 out of 12 running on host cn117 = Although I can manully do it, I still hope the system can be more automatic. I tried to add “sleep 10s;/etc/init.d/slurm restart” in the end of the file, rc.local but the issue is still there. Can anyone help me about that? Sincerely, Tingyang Xu HPC Administrator University of Connecticut PS: some information of the infiniband: $ slurmd -V slurm 14.03.0 cn117$ ofed_info|head -n1 MLNX_OFED_LINUX-2.2-1.0.1 (OFED-2.2-1.0.0): cn117$ ibv_devinfo hca_id: mlx4_0 transport: InfiniBand (0) fw_ver:2.11.550 node_guid: sys_image_guid: ## vendor_id: ## vendor_part_id: hw_ver:0x0 board_id: phys_port_cnt: 2 port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 4096 (5)
[slurm-dev] logrotate causing job authentication failure
Had 2 jobs die yesterday morning with a slurm_load_jobs error: Protocol authentication error from inside DRMAA, and this interesting message in the log: If munged is up, restart with --num-threads=10 error: Munge encode failed: Unable to access /var/run/munge/munge.socket.2: No such file or directory error: authentication: Munged communication error The slurmctl.log has this error: slurm_receive_msg: Zero Bytes were transmitted or received right about the same time. Digging deeper it appears that the jobs state's were changing in the slurmctl just as the munge daemon got restarted for a logrotate. I changed logrotate to rotate munge.log based on size instead of daily, which may fix the problem, but feels more like a work around. Any other suggestions? I'd be nice to have some sort of retry in the code, but not really sure if it'd be in the slurmctl or the DRMAA code.
[slurm-dev] Re: logrotate causing job authentication failure
Slurm already has connect retry logic (10 times with 0.1 sec between retries). DRMAA should need no changes unless it directly accesses munge. Has anyone else seen this problem? Quoting E V eliven...@gmail.com: Had 2 jobs die yesterday morning with a slurm_load_jobs error: Protocol authentication error from inside DRMAA, and this interesting message in the log: If munged is up, restart with --num-threads=10 error: Munge encode failed: Unable to access /var/run/munge/munge.socket.2: No such file or directory error: authentication: Munged communication error The slurmctl.log has this error: slurm_receive_msg: Zero Bytes were transmitted or received right about the same time. Digging deeper it appears that the jobs state's were changing in the slurmctl just as the munge daemon got restarted for a logrotate. I changed logrotate to rotate munge.log based on size instead of daily, which may fix the problem, but feels more like a work around. Any other suggestions? I'd be nice to have some sort of retry in the code, but not really sure if it'd be in the slurmctl or the DRMAA code. -- Morris Moe Jette CTO, SchedMD LLC
[slurm-dev] Re: reccomended software stack for development?
Hi Manuel, The first rule is Keep it simple! I suggest that you start by viewing this as 2 problems: 1. Learning how to work with Slurm 2. Learning how to work with clusters For learning how to work with Slurm, cloning a copy of the repo is a good start. In the Developers notes in the documentation, you'll find instructions for running Slurm on a single node, which makes it MUCH easier for testing and debugging than running on multiple nodes. Once you've got a simple test version running, then you can start thinking about writing new code. As for Puppet, Jenkins, et al., start with something easy -- perhaps just ensuring that you can set up 2 nodes and ssh into them. Once you're comfortable with Slurm, you can add it to your virtual environment. Hope this helps! Andy On 10/27/2014 12:59 PM, Manuel Rodríguez Pascual wrote: reccomended software stack for development? Hi all, I have the intention of working on Slurm, modifying it to satisfy my needs and (hopefully) include some new functionalities. I am however kind of newbie with this kind of software development, so I am writing looking for advise. My question is, can you recommend me any tools for the development of slurm? As a first layer, my idea is to use plain virtual machines and employ Puppet to configure them and then install MPICH and BLCR. Then, Jenkins would install and configure a Slurm-based cluster and run a set of tests. I am however new in using both tools and in developing Slurm, so I am kind of lost right now. then, before starting to build and configure all this, I would really appreciate some suggestions from more experienced developers. I have planned to clone Slurm github repo to work with my own github, and then employ Jenkins for Continuous Integration. I have some doubts on how to exactly do that, in particular regarding the contextualization of the compilation process, and the integration of the included regression tests with Jenkins. Have you got any suggestions on this? Again, any feedback on the best tools to work with Slurm would be welcome. Thanks for your help. Best regards, Manuel
[slurm-dev] Re: reccomended software stack for development?
I wouldn't count what I've done as production-ready but I have a Puppet module for BLCR [1] and one for SLURM [2]. Also there's one for managing SLURM QOS and clusters using native Puppet types [3]. They likely won't aid in development as the two SLURM related modules both assume you have build RPMs and placed them in some accessible repository for your hosts to install from. If those modules aren't exactly what your looking for, they may offer ideas on how to get started with your own. The SLURM module was originally a fork from CERNOps but has since been completely rewritten. The SLURM module [2] uses beaker to provision 4 VMs. One is the controller, two are compute nodes and one is a client (in my environment that is the login nodes, web server, etc). Those automated tests assume you pass a URL to a yum repo containing RPMs. The module relies on exported resources so the provisioning of those 4 VMs is painfully long due to having to also setup Postgresql and PuppetDB. - Trey [1]: https://forge.puppetlabs.com/treydock/blcr [2]: https://github.com/treydock/puppet-slurm [3]: https://github.com/treydock/puppet-slurm_providers = Trey Dockendorf Systems Analyst I Texas AM University Academy for Advanced Telecommunications and Learning Technologies Phone: (979)458-2396 Email: treyd...@tamu.edu Jabber: treyd...@tamu.edu On Mon, Oct 27, 2014 at 12:00 PM, Manuel Rodríguez Pascual manuel.rodriguez.pasc...@gmail.com wrote: Hi all, I have the intention of working on Slurm, modifying it to satisfy my needs and (hopefully) include some new functionalities. I am however kind of newbie with this kind of software development, so I am writing looking for advise. My question is, can you recommend me any tools for the development of slurm? As a first layer, my idea is to use plain virtual machines and employ Puppet to configure them and then install MPICH and BLCR. Then, Jenkins would install and configure a Slurm-based cluster and run a set of tests. I am however new in using both tools and in developing Slurm, so I am kind of lost right now. then, before starting to build and configure all this, I would really appreciate some suggestions from more experienced developers. I have planned to clone Slurm github repo to work with my own github, and then employ Jenkins for Continuous Integration. I have some doubts on how to exactly do that, in particular regarding the contextualization of the compilation process, and the integration of the included regression tests with Jenkins. Have you got any suggestions on this? Again, any feedback on the best tools to work with Slurm would be welcome. Thanks for your help. Best regards, Manuel
[slurm-dev] Re: reccomended software stack for development?
Manuel == Manuel Rodríguez Pascual manuel.rodriguez.pasc...@gmail.com writes: Hi Manuel, Manuel Hi all, I have the intention of working on Slurm, modifying Manuel it to satisfy my needs and (hopefully) include some new Manuel functionalities. I am however kind of newbie with this kind Manuel of software development, so I am writing looking for Manuel advise. My question is, can you recommend me any tools for Manuel the development of slurm? I agree with Andy, that it's best to view this as 2 separate tasks (cluster setup/management + slurm development). For your cluster setup, you could use Qlustar which will allow you to easily setup a ready to run virtual demo cluster incl. a functioning slurm and OpenMPI in about 30 min (no exaggeration, just follow https://www.qlustar.com/book/docs/install-guide and https://www.qlustar.com/book/docs/first-steps). The Qlustar Basic Edition is free for academic usage and has everything needed for your use case. Once setup, you have all the tools of Ubuntu or Debian at your finger-tips to jump into development. Good luck, Roland --- http://www.q-leap.com / http://qlustar.com --- HPC / Storage / Cloud Linux Cluster OS --- Manuel As a first layer, my idea is to use plain virtual machines Manuel and employ Puppet to configure them and then install MPICH Manuel and BLCR. Then, Jenkins would install and configure a Manuel Slurm-based cluster and run a set of tests. Manuel I am however new in using both tools and in developing Manuel Slurm, so I am kind of lost right now. then, before starting Manuel to build and configure all this, I would really appreciate Manuel some suggestions from more experienced developers. Manuel I have planned to clone Slurm github repo to work with my Manuel own github, and then employ Jenkins for Continuous Manuel Integration. I have some doubts on how to exactly do that, Manuel in particular regarding the contextualization of the Manuel compilation process, and the integration of the included Manuel regression tests with Jenkins. Have you got any suggestions Manuel on this? Again, any feedback on the best tools to work with Manuel Slurm would be welcome. Manuel Thanks for your help. Best regards, Manuel Manuel
[slurm-dev] Re: Understanding Fairshare and effect on background/backfill type partitions
Trey, I'm not sure why your jobs aren't starting. Someone else will have to answer that question. You can model an organizational hierarchy a lot better in 14.11 due to changes in Fairshare=parent for accounts. If you only want fairshare to matter at the research group and user levels but want to maintain an account structure that reflects your organization, set everything above the research group to be Fairshare=parent. It makes it so that those accounts disappear for fairshare calculation purposes (but not limits, accounting, etc). As for fairshare, precision loss can be a real issue and I'm guessing that you're affected. I won't rehash our Slurm UG presentation here, but we spent some time discussing precision loss issues. What normalized shares values do you see? Try plugging that into 2^(-EffectvUsage / SharesNorm) to see how small the number is. That number then has to be multiplied by PriorityWeightFairshare, which I see you sized properly. I would suggest looking at the Fair Tree fairshare algorithm once 14.11 is released. In case you want more information: http://slurm.schedmd.com/SUG14/fair_tree.pdf and https://fsl.byu.edu/documentation/slurm/fair_tree.php. The slides in the first link also discuss Fairshare=parent in slides 82-91. Ryan Disclaimer: I have some personal interest in both of the suggestions since we developed them. On 10/24/2014 10:49 AM, Trey Dockendorf wrote: Understanding Fairshare and effect on background/backfill type partitions In our setup we use a background partition that can be preempted but has access to the entire cluster. The idea is that when stakeholder partitions are not fully utilized, users can be opportunistic in making use of the cluster when the system is not 100% utilized. Recently I submitted a batch of jobs , ~60, to our background partition. All nodes were idle but half my jobs ended up pending with reason of Priority. I checked sshare and my FairShare value was at 0.00. Would my Fairshare dropping to 0 cause my jobs to be queued when resources were IDLE and no other jobs were queued in that partition besides my own? I'm also wondering what method is used to come up with sane Fairshare values. We have a (likely unnecessarily) complex account structure in slurmdbd that mimics the organizational structure of the departments / colleges / research groups using the cluster. Be interested how other groups have configured fairshare and the multifactor priority. For completeness, here are relevant config items I'm working with: AccountingStorageEnforce=limits,qos PreemptMode=SUSPEND,GANG PreemptType=preempt/partition_prio PriorityCalcPeriod=5 PriorityDecayHalfLife=7-0 PriorityFavorSmall=YES PriorityFlags=SMALL_RELATIVE_TO_TIME PriorityMaxAge=7-0 PriorityType=priority/multifactor PriorityUsageResetPeriod=NONE PriorityWeightAge=2000# 20% PriorityWeightFairshare=4000 # 40% PriorityWeightJobSize=3000# 30% PriorityWeightPartition=0 # 0% PriorityWeightQOS=1000# 10% SchedulerParameters=assume_swap # An option for in-house patch SchedulerTimeSlice=30 SchedulerType=sched/backfill SelectType=select/cons_res SelectTypeParameters=CR_CPU_Memory,CR_CORE_DEFAULT_DIST_BLOCK Example of a stakeholder partition and background: PartitionName=hepx Nodes=c0[101-116,120-132,227,416,530-532,933-936] Priority=100 AllowQOS=hepx MaxNodes=1 MaxTime=120:00:00 State=UP PartitionName=background Priority=10 AllowQOS=background MaxNodes=1 MaxTime=96:00:00 State=UP Thanks, - Trey = Trey Dockendorf Systems Analyst I Texas AM University Academy for Advanced Telecommunications and Learning Technologies Phone: (979)458-2396 Email: treyd...@tamu.edu mailto:treyd...@tamu.edu Jabber: treyd...@tamu.edu mailto:treyd...@tamu.edu