Re: [slurm-users] slurm sinfo format memory
Looks like Slurm itself only supports that format(in MB unit). Slurm commands output format is not very user friendly to me. If it can add some easy options, like for the output info of sinfo command in this email thread, how about adding support for lazy options, like sinfo -ABC, etc. For the desired format in GB, one workaround may be to prepare a wrapper shell script, read in the sinfo output, and convert the MB to GB and print out to the screen. On Thu, Jul 20, 2023 at 12:28 PM Arsene Marian Alain wrote: > > > Dear slurm users, > > > > I would like to see the following information of my nodes "hostname, total > mem, free mem and cpus". So, I used ‘sinfo -o "%8n %8m %8e %C"’ but in the > output it shows me the memory in MB like "190560" and I need it in GB > (without decimals if possible) like "190GB". Any ideas or suggestions on > how I can do that? > > > > current output: > > > > HOSTNAME MEMORY FREE_MEM CPUS(A/I/O/T) > > node01 190560 125249 60/4/0/64 > > node02 190560 171944 40/24/0/64 > > node05 93280 91584 0/40/0/40 > > node06 513120 509448 0/96/0/96 > > node07 513120 512086 0/96/0/96 > > node08 513120 512328 0/96/0/96 > > node09 513120 512304 0/96/0/96 > > > > desired output: > > > > HOSTNAME MEMORY FREE_MEM CPUS(A/I/O/T) > > node01 190GB 125GB 60/4/0/64 > > node02 190GB 171GB 40/24/0/64 > > node05 93GB 91GB0/40/0/40 > > node06 512GB 500GB 0/96/0/96 > > node07 512GB 512GB 0/96/0/96 > > node08 512GB 512GB 0/96/0/96 > > node09 512GB 512GB 0/96/0/96 > > > > > > > > I would appreciate any help. > > > > Thank you. > > > > Best Regards, > > > > Alain >
[slurm-users] slurm sinfo format memory
Dear slurm users, I would like to see the following information of my nodes "hostname, total mem, free mem and cpus". So, I used 'sinfo -o "%8n %8m %8e %C"' but in the output it shows me the memory in MB like "190560" and I need it in GB (without decimals if possible) like "190GB". Any ideas or suggestions on how I can do that? current output: HOSTNAME MEMORY FREE_MEM CPUS(A/I/O/T) node01 190560 125249 60/4/0/64 node02 190560 171944 40/24/0/64 node05 93280 91584 0/40/0/40 node06 513120 509448 0/96/0/96 node07 513120 512086 0/96/0/96 node08 513120 512328 0/96/0/96 node09 513120 512304 0/96/0/96 desired output: HOSTNAME MEMORY FREE_MEM CPUS(A/I/O/T) node01 190GB 125GB 60/4/0/64 node02 190GB 171GB 40/24/0/64 node05 93GB 91GB0/40/0/40 node06 512GB 500GB 0/96/0/96 node07 512GB 512GB 0/96/0/96 node08 512GB 512GB 0/96/0/96 node09 512GB 512GB 0/96/0/96 I would appreciate any help. Thank you. Best Regards, Alain
Re: [slurm-users] GRES and GPUs
Hey everyone, I am answering my own question: It wasn't working because I need to *reload slurmd* on the machine, too. So the full "test gpu management without gpu" workflow is: 1. Start your slurm cluster. 2. Add a gpu to an instance of your choice in the *slurm.conf* For example:* * *DebugFlags=GRES *# consider this for initial setup. *SelectType=select/cons_tres** **GresTypes=gpu* NodeName=master SocketsPerBoard=8 CoresPerSocket=1 RealMemory=8000 *GRES=gpu:1* State=UNKNOWN 3. Register it at *gres.conf *and give it *some file* NodeName=master Name=gpu File=/dev/tty0 Count=1 # count seems to be optional 4. Reload slurmctld (on the master) and slurmd (on the gpu node)* * *sudo systemctl restart slurmctld** **sudo systemctl restart slurmd* I haven't tested this solution thoroughly yet, but at least commands like:* * *sudo systemctl restart slurmd* # master run without any issues afterwards. Thank you for all your help! Best regards, Xaver On 19.07.23 17:05, Xaver Stiensmeier wrote: Hi Hermann, count doesn't make a difference, but I noticed that when I reconfigure slurm and do reloads afterwards, the error "gpu count lower than configured" no longer appears - so maybe it is just because a reconfigure is needed after reloading slurmctld - or maybe it doesn't show the error anymore, because the node is still invalid? However, I still get the error: error: _slurm_rpc_node_registration node=NName: Invalid argument If I understand correctly, this is telling me that there's something wrong with my slurm.conf. I know that all pre-existing parameters are correct, so I assume it must be the gpus entry, but I don't see where it's wrong: NodeName=NName SocketsPerBoard=8 CoresPerSocket=1 RealMemory=8000 Gres=gpu:1 State=CLOUD # bibiserv Thanks for all the help, Xaver On 19.07.23 15:04, Hermann Schwärzler wrote: Hi Xaver, I think you are missing the "Count=..." part in gres.conf It should read NodeName=NName Name=gpu File=/dev/tty0 Count=1 in your case. Regards, Hermann On 7/19/23 14:19, Xaver Stiensmeier wrote: Okay, thanks to S. Zhang I was able to figure out why nothing changed. While I did restart systemctld at the beginning of my tests, I didn't do so later, because I felt like it was unnecessary, but it is right there in the fourth line of the log that this is needed. Somehow I misread it and thought it automatically restarted slurmctld. Given the setup: slurm.conf ... GresTypes=gpu NodeName=NName SocketsPerBoard=8 CoresPerSocket=1 RealMemory=8000 GRES=gpu:1 State=UNKNOWN ... gres.conf NodeName=NName Name=gpu File=/dev/tty0 When restarting, I get the following error: error: Setting node NName state to INVAL with reason:gres/gpu count reported lower than configured (0 < 1) So it is still not working, but at least I get a more helpful log message. Because I know that this /dev/tty trick works, I am still unsure where the current error lies, but I will try to investigate it further. I am thankful for any ideas in that regard. Best regards, Xaver On 19.07.23 10:23, Xaver Stiensmeier wrote: Alright, I tried a few more things, but I still wasn't able to get past: srun: error: Unable to allocate resources: Invalid generic resource (gres) specification. I should mention that the node I am trying to test GPU with, doesn't really have a gpu, but Rob was so kind to find out that you do not need a gpu as long as you just link to a file in /dev/ in the gres.conf. As mentioned: This is just for testing purposes - in the end we will run this on a node with a gpu, but it is not available at the moment. *The error isn't changing* If I omitt "GresTypes=gpu" and "Gres=gpu:1", I still get the same error. *Debug Info* I added the gpu debug flag and logged the following: [2023-07-18T14:59:45.026] restoring original state of nodes [2023-07-18T14:59:45.026] select/cons_tres: part_data_create_array: select/cons_tres: preparing for 2 partitions [2023-07-18T14:59:45.026] error: GresPlugins changed from (null) to gpu ignored [2023-07-18T14:59:45.026] error: Restart the slurmctld daemon to change GresPlugins [2023-07-18T14:59:45.026] read_slurm_conf: backup_controller not specified [2023-07-18T14:59:45.026] error: GresPlugins changed from (null) to gpu ignored [2023-07-18T14:59:45.026] error: Restart the slurmctld daemon to change GresPlugins [2023-07-18T14:59:45.026] select/cons_tres: select_p_reconfigure: select/cons_tres: reconfigure [2023-07-18T14:59:45.027] select/cons_tres: part_data_create_array: select/cons_tres: preparing for 2 partitions [2023-07-18T14:59:45.027] No parameter for mcs plugin, default values set [2023-07-18T14:59:45.027] mcs: MCSParameters = (null). ondemand set. [2023-07-18T14:59:45.028] _slurm_rpc_reconfigure_controller: completed usec=5898 [2023-07-18T14:59:45.952] SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_max_job_start=0,sched_min_interval=2 I am a bit unsure wh
Re: [slurm-users] Notify users about job submit plugin actions
Hi Lorenzo, On 7/20/23 12:16, Lorenzo Bosio wrote: > One more thing I'd like to point out, is that I need to monitor jobs going > from pending to running state (after waiting in the jobs queue). I > currently have a separate pthread to achieve this, but I think at this > point the job_submit()/job_modify() function has already exited. > I do get the output of the slurm_kill_job() function when called, but > that's not useful for the user and I couldn't find a way to append custom > messages. Maybe it's useful to have E-mail notifications sent to your users when the job transitions its state? According to the sbatch man-page the user can specify himself what mail alerts he would like: --mail-type= Notify user by email when certain event types occur. Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to BEGIN, END, FAIL, INVALID_DEPEND, REQUEUE, and STAGE_OUT), INVALID_DEPEND (dependency never satisfied), STAGE_OUT (burst buffer stage out and teardown completed), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent of time limit), TIME_LIMIT_80 (reached 80 percent of time limit), TIME_LIMIT_50 (reached 50 percent of time limit) and ARRAY_TASKS (send emails for each array task). /Ole
Re: [slurm-users] MIG-Slice: Unavailable GRES
Hi Rob, thank you very much for that hint. I tried setting the MIG slices manually in the gres.conf and it works now. Thank you very much. Best regards, Timon -- Timon Vogt Arbeitsgruppe "Computing" Nationales Hochleistungsrechnen (NHR) Scientific Employee NHR Tel.: +49 551 39-30146, E-Mail:timon.v...@gwdg.de - Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG) Burckhardtweg 4, 37077 Göttingen, URL:https://gwdg.de Support: Tel.: +49 551 39-3, URL:https://gwdg.de/support Sekretariat: Tel.: +49 551 39-30001, E-Mail:g...@gwdg.de Geschäftsführer: Prof. Dr. Ramin Yahyapour Aufsichtsratsvorsitzender: Prof. Dr. Christian Griesinger Sitz der Gesellschaft: Göttingen Registergericht: Göttingen, Handelsregister-Nr. B 598 Zertifiziert nach ISO 9001 und ISO 27001 - Am 19.07.23 um 21:21 schrieb Groner, Rob: At some point when we were experimenting with MIG, I was being entirely frustrated in getting it to work until I finally removed the autodetect from gres.conf and explicitly listed the stuff instead. THEN it worked. I think you can find the list of files that are the device files using nvidia-smi. Here is the entry we use in our gres.conf for one of the nodes: NodeName=p-gc-3037 Name=gpu Type=1g.5gb File=/dev/nvidia-caps/nvidia-cap[66,75,84,102,111,120,129,201,210,219,228,237,246,255] Something to TRY anyway. Odd that 3g.20gb works. You might try reconfiguring the node for that instead and see if it works then. We've used 3g.20gb and 1g.5gb on our nodes and it works fine, never tried 2g.10gb. Rob *From:* slurm-users on behalf of Vogt, Timon *Sent:* Wednesday, July 19, 2023 3:08 PM *To:* slurm-us...@schedmd.com *Subject:* [slurm-users] MIG-Slice: Unavailable GRES Dear Slurm Mailing List, I am experiencing a problem which affects our cluster and for which I am completely out of ideas by now, so I would like to ask the community for hints or ideas. We run a partition on our cluster containing multiple nodes with Nvidia A100 GPUs (40GB), which we have sliced up using Nvidia Multi-Instance GPUs (MIG) into one 3g.20gb slice and two 2g.10gb slices per GPU. Now, when submitting a job to it and requesting the 3g.20gb slice (like with "srun -p mig-partition -G 3g.20gb:1 hostname"), the job runs fine, but when a job requests one of the 2g.10gb slices instead (like with "srun -p mig-partition -G 2g.10gb:1 hostname"), the job does not get scheduled and the controller repeatedly outputs the error: slurmctld[28945]: error: _set_job_bits1: job 4780824 failed to find any available GRES on node 1471 slurmctld[28945]: error: gres_select_filter_select_and_set job 4780824 failed to satisfy gres-per-job counter Our cluster uses the AutoDetect=nvml feature for the nodes in the gres.conf and both slice types are defined in "AccountingStorageTRES" and in the GRES parameter of the node definition. The slurmd on the node also finds both types of slices and reports the correct amounts. They are also visible in the "Gres=" section of "scontrol show node", again in correct amounts. I have also ensured that the nodes are not used otherwise by creating a reservation on them accessible only to me, and I have restarted all slurmd's and the slurmctld. By now, I am out of ideas. Does someone here have a suggestion on what else I can try? Has someone already seen this error and knows more about it? Thank you very much in advance and best regards, Timon -- Timon Vogt Arbeitsgruppe "Computing" Nationales Hochleistungsrechnen (NHR) Scientific Employee NHR Tel.: +49 551 39-30146, E-Mail: timon.v...@gwdg.de - Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG) Burckhardtweg 4, 37077 Göttingen, URL: https://gwdg.de Support: Tel.: +49 551 39-3, URL: https://gwdg.de/support Sekretariat: Tel.: +49 551 39-30001, E-Mail: g...@gwdg.de Geschäftsführer: Prof. Dr. Ramin Yahyapour Aufsichtsratsvorsitzender: Prof. Dr. Christian Griesinger Sitz der Gesellschaft: Göttingen Registergericht: Göttingen, Handelsregister-Nr. B 598 Zertifiziert nach ISO 9001 und ISO 27001 - OpenPGP_0x6441BD7DD0CD6C40.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [slurm-users] Notify users about job submit plugin actions
Hello everyone, thanks for all the answers. To elaborate further: I'm developing in C, but that's not a problem since I can find an equivalent to LUA as Jeffrey T Frey said. One more thing I'd like to point out, is that I need to monitor jobs going from pending to running state (after waiting in the jobs queue). I currently have a separate pthread to achieve this, but I think at this point the job_submit()/job_modify() function has already exited. I do get the output of the slurm_kill_job() function when called, but that's not useful for the user and I couldn't find a way to append custom messages. Again, thanks everyone who helped. Regards, Lorenzo On 19/07/23 16:00, Jeffrey T Frey wrote: In case you're developing the plugin in C and not LUA, behind the scenes the LUA mechanism is concatenating all log_user() strings into a single variable (user_msg). When the LUA code completes, the C code sets the *err_msg argument to the job_submit()/job_modify() function to that string, then NULLs-out user-msg. (There's a mutex around all of that code so slurmctld never executes LUA job submit/modify scripts concurrently.) The slurmctld then communicates that returned string back to sbatch/salloc/srun for display to the user. Your C plugin would do likewise — set *err_msg before returning from job_submit()/job_modify() — and needn't be mutex'ed if the code is reentrant. On Jul 19, 2023, at 08:37, Angel de Vicente wrote: Hello Lorenzo, Lorenzo Bosio writes: I'm developing a job submit plugin to check if some conditions are met before a job runs. I'd need a way to notify the user about the plugin actions (i.e. why its jobs was killed and what to do), but after a lot of research I could only write to logs and not the user shell. The user gets the output of slurm_kill_job but I can't find a way to add a custom note. Can anyone point me to the right api/function in the code? In our "job_submit.lua" script we have the following for that purpose: , | slurm.log_user("%s: WARNING: [...]", log_prefix) ` -- Ángel de Vicente Research Software Engineer (Supercomputing and BigData) Tel.: +34 922-605-747 Web.:http://research.iac.es/proyecto/polmag/ GPG: 0x8BDC390B69033F52 -- */Dott. Mag. Lorenzo Bosio/* Tecnico di Ricerca Dipartimento di Informatica Università degli Studi di Torino Corso Svizzera, 185 - 10149 Torino tel. +39 011 670 6836