Re: [slurm-users] Guarantee minimum amount of GPU resources to a Slurm account

2023-09-13 Thread Loris Bennett
Chris Samuel  writes:

> On 12/9/23 9:22 am, Stephan Roth wrote:
>
>> Thanks Noam, this looks promising!
>
> I would suggest that as was as the "magnetic" flag you may want the
> "flex" flag on the reservation too in order to let jobs that match it
> run on GPUs outside of the reservation.

We have a dedicated partition for some GPUs nodes which belong to an
individual PI and only members of the PI's group can use the partition.
However, the nodes are also members of a 'scavenger' partition, which
can be used by anyone, albeit with certain restrictions, such as a
shorter maximum run-time.

What are the pros and cons of the reservation approach compared with the
above partition-based approach?

Cheers,

Loris

-- 
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin



Re: [slurm-users] Guarantee minimum amount of GPU resources to a Slurm account

2023-09-13 Thread Stephan Roth

Markus, thanks for the heads-up.

I intend to either reserve specific nodes with GPUs or use features.

Best,
Stephan

On 13.09.23 09:08, Markus Kötter wrote:

Hi,


currently reservations do not work for gres.

https://bugs.schedmd.com/show_bug.cgi?id=5771

23.11 might change this.


MfG




Re: [slurm-users] Guarantee minimum amount of GPU resources to a Slurm account

2023-09-13 Thread Markus Kötter

Hi,


currently reservations do not work for gres.

https://bugs.schedmd.com/show_bug.cgi?id=5771

23.11 might change this.


MfG
--
Markus Kötter, +49 681 870832434
30159 Hannover, Lange Laube 6
Helmholtz Center for Information Security


OpenPGP_0x4571F02A83828A0F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [slurm-users] Guarantee minimum amount of GPU resources to a Slurm account

2023-09-13 Thread Stephan Roth

Thanks Chris, this completes what I was looking for.

Should have had a better look at the scontrol man page.

Best,
Stephan

On 13.09.23 02:24, Chris Samuel wrote:

On 12/9/23 9:22 am, Stephan Roth wrote:


Thanks Noam, this looks promising!


I would suggest that as was as the "magnetic" flag you may want the 
"flex" flag on the reservation too in order to let jobs that match it 
run on GPUs outside of the reservation.


All the best,
Chris




Re: [slurm-users] Guarantee minimum amount of GPU resources to a Slurm account

2023-09-12 Thread Chris Samuel

On 12/9/23 9:22 am, Stephan Roth wrote:


Thanks Noam, this looks promising!


I would suggest that as was as the "magnetic" flag you may want the 
"flex" flag on the reservation too in order to let jobs that match it 
run on GPUs outside of the reservation.


All the best,
Chris



Re: [slurm-users] Guarantee minimum amount of GPU resources to a Slurm account

2023-09-12 Thread Stephan Roth

Thanks Noam, this looks promising!

I'll have to test whether a job allowed to use such a reservation will 
run outside of it in case the reservation's resources are all occupied 
or queue up waiting to run in the reservation.



On 12.09.23 16:28, Bernstein, Noam CIV USN NRL (6393) Washington DC 
(USA) wrote:

Is this what you want?

Magnetic Reservations

The default behavior for reservations is that jobs must request a
reservation in order to run in it. The MAGNETIC flag allows you to
create a reservation that will allow jobs to run in it without
requiring that they specify the name of the reservation. The
reservation will only "attract" jobs that meet the access control
requirements.


(from https://slurm.schedmd.com/reservations.html 
)


On Sep 12, 2023, at 10:14 AM, Stephan Roth > wrote:


Dear Slurm users,

I'm looking to fulfill the requirement of guaranteeing availability of 
GPU resources to a Slurm account, while allowing this account to use 
other available GPU resources as well





Re: [slurm-users] Guarantee minimum amount of GPU resources to a Slurm account

2023-09-12 Thread Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)
Is this what you want?
Magnetic Reservations

The default behavior for reservations is that jobs must request a reservation 
in order to run in it. The MAGNETIC flag allows you to create a reservation 
that will allow jobs to run in it without requiring that they specify the name 
of the reservation. The reservation will only "attract" jobs that meet the 
access control requirements.

(from https://slurm.schedmd.com/reservations.html)

On Sep 12, 2023, at 10:14 AM, Stephan Roth 
mailto:stephan.r...@ee.ethz.ch>> wrote:

Dear Slurm users,

I'm looking to fulfill the requirement of guaranteeing availability of GPU 
resources to a Slurm account, while allowing this account to use other 
available GPU resources as well.











U.S. NAVAL



RESEARCH


LABORATORY
Noam Bernstein, Ph.D.
Center for Materials Physics and Technology
U.S. Naval Research Laboratory
T +1 202 404 8628 F +1 202 404 7546
https://www.nrl.navy.mil



[slurm-users] Guarantee minimum amount of GPU resources to a Slurm account

2023-09-12 Thread Stephan Roth

Dear Slurm users,

I'm looking to fulfill the requirement of guaranteeing availability of 
GPU resources to a Slurm account, while allowing this account to use 
other available GPU resources as well.


The guaranteed GPU resources should be of at least 1 type, optionally up 
to 3 types, as in:

Gres=gpu:type_1:N,gpu:type_2:P,gpu:type_3:Q

The version of Slurm I'm using is 20.11.9.


Ideas I came up with so far:

Placing a reservation seems like the simplest solution. But this forces 
users of the account to decide whether to submit their jobs within the 
reservation or outside, based on a manual check of currently available 
GPU resources in the cluster.


Changing the partition setup by moving nodes into a new partition for 
exclusive use of the account is an overhead I'd like to avoid, as this 
is a time-limited scenario.
Even though this looks like a working solution when combined with an 
extension to the job_submit.lua prioritizing partitions for users of 
said account.



I haven't looked at QOS, yet, hoping for a short-cut from anyone who 
already has a working solution to my problem.


If you have such a solution, would you mind sharing it?

Thanks,
Stephan