Durai Arasan writes:
> Is there a way of improving this situation? E.g. by not blocking IDLE nodes
> with jobs that only use a fraction of the 8 GPUs? Why are single GPU jobs
> not scheduled to fill already MIXED nodes before using IDLE ones?
>
> What parameters/configuration need to be adjusted for this to be enforced?
There are two SchedulerParameters you could experiment with (from man
slurm.conf):
bf_busy_nodes
When selecting resources for pending jobs to reserve for future
execution (i.e. the job can not be started immediately), then pref‐
erentially select nodes that are in use. This will tend to leave
currently idle resources available for backfilling longer running
jobs, but may result in allocations having less than optimal
network topology. This option is currently only supported by the
select/cons_res and select/cons_tres plugins (or
select/cray_aries with SelectTypeParameters set to "OTHER_CONS_RES" or
"OTHER_CONS_TRES", which layers the select/cray_aries plugin over the
select/cons_res or select/cons_tres plugin respectively).
pack_serial_at_end
If used with the select/cons_res or select/cons_tres plugin, then put
serial jobs at the end of the available nodes rather than
using a best fit algorithm. This may reduce resource fragmentation
for some workloads.
--
B/H
signature.asc
Description: PGP signature