[slurm-users] Re: How to exclude master from computing? Set to DRAINED?

2024-06-24 Thread Xaver Stiensmeier via slurm-users

Thanks Steffen,

that makes a lot of sense. I will just not start slurmd in the master
ansible role when the master is not to be used for computing.

Best regards,
Xaver

On 24.06.24 14:23, Steffen Grunewald via slurm-users wrote:

On Mon, 2024-06-24 at 13:54:43 +0200, Slurm users wrote:

Dear Slurm users,

in our project we exclude the master from computing before starting
Slurmctld. We used to exclude the master from computing by simply not
mentioning it in the configuration i.e. just not having:

     PartitionName=SomePartition Nodes=master

or something similar. Apparently, this is not the way to do this as it
is now a fatal error

fatal: Unable to determine this slurmd's NodeName

You're attempting to start the slurmd - which isn't required on this
machine, as you say. Disable it. Keep slurmctld enabled (and declared
in the config).


therefore, my *question:*

What is the best practice for excluding the master node from work?

Not defining it as a worker node.


I personally primarily see the option to set the node into DOWN, DRAINED
or RESERVED.

These states are slurmd states, and therefor meaningless for a machine
that doesn't have a running slurmd. (It's the nodes that are defined in
the config that are supposed to be able to run slurmd.)


So is *DRAINED* the correct setting in such a case?

Since this only applies to a node that has been defined in the config,
and you (correctly) didn't do so, there's no need (and no means) to
"drain" it.

Best
  Steffen



--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] How to exclude master from computing? Set to DRAINED?

2024-06-24 Thread Xaver Stiensmeier via slurm-users

Dear Slurm users,

in our project we exclude the master from computing before starting
Slurmctld. We used to exclude the master from computing by simply not
mentioning it in the configuration i.e. just not having:

    PartitionName=SomePartition Nodes=master

or something similar. Apparently, this is not the way to do this as it
is now a fatal error

   fatal: Unable to determine this slurmd's NodeName

therefore, my *question:*

   What is the best practice for excluding the master node from work?

I personally primarily see the option to set the node into DOWN, DRAINED
or RESERVED. Since we use ReturnToService=2, I guess DOWN is not the way
to go. RESERVED fits with the second part "The node is in an advanced
reservation and *not generally available*." and DRAINED "The node is
unavailable for use per system administrator request." fits completely.
So is *DRAINED* the correct setting in such a case?

Best regards,
Xaver

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Slurm.conf and workers

2024-04-15 Thread Xaver Stiensmeier via slurm-users

Dear slurm-user list,

as far as I understood it, the slurm.conf needs to be present on the
master and on the workers at slurm.conf (if no other path is set via
SLURM_CONF). However, I noticed that when adding a partition only in the
master's slurm.conf, all workers were able to "correctly" show the added
partition when calling sinfo on them.

Is the stored slurm.conf on every instance just a fallback for when
connection is down or what is the purpose? The documentation only says:
"This file should be consistent across all nodes in the cluster."

Best regards,
Xaver


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Elastic Computing: Is it possible to incentivize grouping power_up calls?

2024-04-09 Thread Xaver Stiensmeier via slurm-users

Thank you Brian,

while ResumeRate might be able to keep the CPU usage within an
acceptable margin, it's not really a fix, but a workaround. I would
prefer a solution that groups resume requests and therefore makes use of
a single Ansible playbook run per second instead of <=ResumeRate.

As we completely destroy our instances when powering down, we need to
set them up from anew using Ansible. Running Ansible on the worker nodes
would be possible, but that comes with additional steps in order to save
all log files on the master in case the startup fails and you want to
investigate. For now I feel like using the master to setup workers is
the better structure.

Best regards,
Xaver

On 08.04.24 18:18, Brian Andrus via slurm-users wrote:


Xaver,

You may want to look at the ResumeRate option in slurm.conf:

ResumeRate
The rate at which nodes in power save mode are returned to normal
operation by ResumeProgram. The value is a number of nodes per
minute and it can be used to prevent power surges if a large
number of nodes in power save mode are assigned work at the same
time (e.g. a large job starts). A value of zero results in no
limits being imposed. The default value is 300 nodes per minute.

I have all our nodes in the cloud and they power down/deallocate when
idle for a bit. I do not use ansible to start them and use the cli
interface directly, so the only cpu usage is by that command. I do
plan on having ansible run from the node to do any hot-fix/updates
from the base image or changes. By running it from the node, it would
alleviate any cpu spikes on the slurm head node.

Just a possible path to look at.

Brian Andrus

On 4/8/2024 6:10 AM, Xaver Stiensmeier via slurm-users wrote:

Dear slurm user list,

we make use of elastic cloud computing i.e. node instances are created
on demand and are destroyed when they are not used for a certain amount
of time. Created instances are set up via Ansible. If more than one
instance is requested at the exact same time, Slurm will pass those into
the resume script together and one Ansible call will handle all those
instances.

However, more often than not workflows will request multiple instances
within the same second, but not at the exact same time. This leads to
multiple resume script calls and therefore to multiple Ansible calls.
This will lead to less clear log files, greater CPU consumption by the
multiple running Ansible calls and so on.

What I am looking for is an option to force Slurm to wait a certain
amount and then perform a single resume call for all instances within
that time frame (let's say 1 second).

Is this somehow possible?

Best regards,
Xaver





-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Elastic Computing: Is it possible to incentivize grouping power_up calls?

2024-04-08 Thread Xaver Stiensmeier via slurm-users

Dear slurm user list,

we make use of elastic cloud computing i.e. node instances are created
on demand and are destroyed when they are not used for a certain amount
of time. Created instances are set up via Ansible. If more than one
instance is requested at the exact same time, Slurm will pass those into
the resume script together and one Ansible call will handle all those
instances.

However, more often than not workflows will request multiple instances
within the same second, but not at the exact same time. This leads to
multiple resume script calls and therefore to multiple Ansible calls.
This will lead to less clear log files, greater CPU consumption by the
multiple running Ansible calls and so on.

What I am looking for is an option to force Slurm to wait a certain
amount and then perform a single resume call for all instances within
that time frame (let's say 1 second).

Is this somehow possible?

Best regards,
Xaver


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Can't schedule on cloud node: State=IDLE+CLOUD+POWERED_DOWN+NOT_RESPONDING

2024-02-29 Thread Xaver Stiensmeier via slurm-users

I am wondering why my question (below) didn't catch anyone's attention.
Just for me as a feedback. Is it unclear where my problem lies or is it
clear, but no solution is known? I looked through the documentation and
now searched the Slurm repository, but am still unable to clearly
identify how to handle "NOT_RESPONDING".

I would really like to improve my question if necessary.

Best regards,
Xaver

On 23.02.24 18:55, Xaver Stiensmeier wrote:

Dear slurm-user list,

I have a cloud node that is powered up and down on demand. Rarely it
can happen that slurm's resumeTimeout is reached and the node is
therefore powered down. We have set ReturnToService=2 in order to
avoid the node being marked down, because the instance behind that
node is created on demand and therefore after a failure nothing stops
the system to start the node again as it is a different instance.

I thought this would be enough, but apparently the node is still
marked with "NOT_RESPONDING" which leads to slurm not trying to
schedule on it.

After a while NOT_RESPONDING is removed, but I would like to move it
directly from within my fail script if possible so that the node can
return to service immediately and not be blocked by "NOT_RESPONDING".

Best regards,
Xaver



--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Can't schedule on cloud node: State=IDLE+CLOUD+POWERED_DOWN+NOT_RESPONDING

2024-02-23 Thread Xaver Stiensmeier via slurm-users

Dear slurm-user list,

I have a cloud node that is powered up and down on demand. Rarely it can
happen that slurm's resumeTimeout is reached and the node is therefore
powered down. We have set ReturnToService=2 in order to avoid the node
being marked down, because the instance behind that node is created on
demand and therefore after a failure nothing stops the system to start
the node again as it is a different instance.

I thought this would be enough, but apparently the node is still marked
with "NOT_RESPONDING" which leads to slurm not trying to schedule on it.

After a while NOT_RESPONDING is removed, but I would like to move it
directly from within my fail script if possible so that the node can
return to service immediately and not be blocked by "NOT_RESPONDING".

Best regards,
Xaver


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Slurm Power Saving Guide: Why doesnt slurm mark as failed when resumeProgram returns =/= 0

2024-02-19 Thread Xaver Stiensmeier via slurm-users

Dear slurm-user list,

I had cases where our resumeProgram failed due to temporary cloud
timeouts. In that case the resumeProgram returns a value =/= 0. Why does
Slurm still wait until resumeTimeout instead of just accepting the
startup as failed which then should lead to a rescheduling of the job.

Is there some way to achieve the described effect i.e. tell Slurm: "You
can stop waiting, the node won't come alive." or am I missing the
correct way how this should be handled in Slurm?

Best regards,
Xaver


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Errors upgrading to 23.11.0 -- jwt-secret.key

2024-02-08 Thread Xaver Stiensmeier via slurm-users

Thank you for your response.

I have found found out why there was no error in the log: I've been
looking at the wrong log. The error didn't occur on the master, but on
our vpn-gateway (it is a hybrid cloud setup) - but you can thin of it as
just another worker in the same network. The error I get there is:

`
Feb 08 11:38:25 cluster-vpngtw-3ts770ji3a8ubr1-0 slurmctld[32014]:
slurmctld: fatal: auth/jwt: cannot stat '/etc/slurm/jwt-secret.key': No
such file or directory
Feb 08 11:38:25 cluster-vpngtw-3ts770ji3a8ubr1-0 systemd[1]:
slurmctld.service: Main process exited, code=exited, status=1/FAILURE
Feb 08 11:38:25 cluster-vpngtw-3ts770ji3a8ubr1-0 systemd[1]:
slurmctld.service: Failed with result 'exit-code'.
Feb 08 11:38:25 cluster-vpngtw-3ts770ji3a8ubr1-0 systemd[1]: Failed to
start Slurm controller daemon.
`

In the past we have created the `jwt-secret.key` on the master at
`etc/slurm` and that was enough, but I must admit that I am not
completely familiar with it, but I will now look into it closer and also
double check whether such a key is stored there in the old slurm version.

Best regards,
Xaver

On 08.02.24 11:07, Luke Sudbery via slurm-users wrote:

Your systemctl output shows that slurmctld is running OK, but that doesn't 
match with your first entry, so it's hard to tell what's going on.

But if slurmctld won't start under systemd but it's not clear why the first 
step would be to enable something like `SlurmctldDebug = debug` and check the 
full logs in journalctl or just run slurmctld in the forground with:

/usr/sbin/slurmctld -D -vvv

Make sure the system service is properly stopped and there aren't any rouge 
slurmctld processes anywhere.

Many thanks,

Luke



--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Errors upgrading to 23.11.0

2024-02-07 Thread Xaver Stiensmeier via slurm-users

Dear slurm-user list,

I got this error:

Unable to start service slurmctld: Job for slurmctld.service failed
because the control process exited with error code.\nSee \"systemctl
status slurmctld.service\" and \"journalctl -xeu slurmctld.service\" for
details.

but in slurmctld.service I see nothing suspicious:

slurmctld.service - Slurm controller daemon
 Loaded: loaded (/lib/systemd/system/slurmctld.service; enabled;
vendor preset: enabled)
    Drop-In: /etc/systemd/system/slurmctld.service.d
 └─override.conf
 Active: active (running) since Wed 2024-02-07 15:50:56 UTC; 19min ago
   Main PID: 51552 (slurmctld)
  Tasks: 21 (limit: 9363)
 Memory: 10.4M
    CPU: 1min 16.088s
 CGroup: /system.slice/slurmctld.service
 ├─51552 /usr/sbin/slurmctld --systemd
 └─51553 "slurmctld: slurmscriptd" "" "" "" "" "" ""

Feb 07 15:58:21 cluster-master-2vt2bqh7ahec04c slurmctld[51552]:
slurmctld: sched: _slurm_rpc_allocate_resources JobId=3 NodeList=(null)
usec=959
Feb 07 15:58:23 cluster-master-2vt2bqh7ahec04c slurmctld[51552]:
slurmctld: _job_complete: JobId=3 WTERMSIG 2
Feb 07 15:58:23 cluster-master-2vt2bqh7ahec04c slurmctld[51552]:
slurmctld: _job_complete: JobId=3 cancelled by interactive user
Feb 07 15:58:23 cluster-master-2vt2bqh7ahec04c slurmctld[51552]:
slurmctld: _job_complete: JobId=3 done
Feb 07 15:58:23 cluster-master-2vt2bqh7ahec04c slurmctld[51552]:
slurmctld: _slurm_rpc_complete_job_allocation: JobId=3 error Job/step
already completing or completed
Feb 07 15:58:42 cluster-master-2vt2bqh7ahec04c slurmctld[51552]:
slurmctld: sched: _slurm_rpc_allocate_resources JobId=4
NodeList=cluster-master-2vt2bqh7ahec04c,cluster-worker-2vt2bqh7ahec04c-2
usec=512
Feb 07 16:06:04 cluster-master-2vt2bqh7ahec04c slurmctld[51553]:
slurmctld: error: _run_script: JobId=0 resumeprog exit status 1:0
Feb 07 16:09:33 cluster-master-2vt2bqh7ahec04c slurmctld[51552]:
slurmctld: _job_complete: JobId=4 WTERMSIG 2
Feb 07 16:09:33 cluster-master-2vt2bqh7ahec04c slurmctld[51552]:
slurmctld: _job_complete: JobId=4 done
Feb 07 16:09:33 cluster-master-2vt2bqh7ahec04c slurmctld[51552]:
slurmctld: _slurm_rpc_complete_job_allocation: JobId=4 error Job/step
already completing or completed

I am unsure how to debug this further. It might be coming from a
previous problem I tried to fix (basically a few deprecated keys in the
configuration).

I will try to restart the entire cluster with the added changes to rule
out any follow up errors, but maybe it's something obvious a fellow list
user can see.

Best regards,
Xaver


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


Re: [slurm-users] SlurmdSpoolDir full

2023-12-10 Thread Xaver Stiensmeier

Hello Brian Andrus,

we ran 'df -h' to determine the amount of free space I mentioned below.
I also should add that at the time we inspected the node, there was
still around 38 GB of space left - however, we were unable to watch the
remaining space while the error occurred so maybe the large file(s) got
removed immediately.

I will take a look at /var/log. That's a good idea. I don't think that
there will be anything unusual, but it's something I haven't thought
about yet (the reason of the error being somewhere else).

Best regards
Xaver

On 10.12.23 00:41, Brian Andrus wrote:

Xaver,

It is likely your /var or /var/spool mount.
That may be a separate partition or part of your root partition. It is
the partition that is full, not the directory itself. So the cause
could very well be log files in /var/log. I would check to see what
(if any) partitions are getting filled on the node. You can run 'df
-h' and see some info that would get you started.

Brian Andrus

On 12/8/2023 7:00 AM, Xaver Stiensmeier wrote:

Dear slurm-user list,

during a larger cluster run (the same I mentioned earlier 242 nodes), I
got the error "SlurmdSpoolDir full". The SlurmdSpoolDir is apparently a
directory on the workers that is used for job state information
(https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmdSpoolDir). However,
I was unable to find more precise information on that dictionary. We
compute all data on another volume so SlurmdSpoolDir has roughly 38 GB
of free space where nothing is intentionally put during the run. This
error only occurred on very few nodes.

I would like to understand what Slurmd is placing in this dir that fills
up the space. Do you have any ideas? Due to the workflow used, we have a
hard time reconstructing the exact scenario that caused this error. I
guess, the "fix" is to just pick a bit larger disk, but I am unsure
whether Slurm behaves normal here.

Best regards
Xaver Stiensmeier








[slurm-users] SlurmdSpoolDir full

2023-12-08 Thread Xaver Stiensmeier

Dear slurm-user list,

during a larger cluster run (the same I mentioned earlier 242 nodes), I
got the error "SlurmdSpoolDir full". The SlurmdSpoolDir is apparently a
directory on the workers that is used for job state information
(https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmdSpoolDir). However,
I was unable to find more precise information on that dictionary. We
compute all data on another volume so SlurmdSpoolDir has roughly 38 GB
of free space where nothing is intentionally put during the run. This
error only occurred on very few nodes.

I would like to understand what Slurmd is placing in this dir that fills
up the space. Do you have any ideas? Due to the workflow used, we have a
hard time reconstructing the exact scenario that caused this error. I
guess, the "fix" is to just pick a bit larger disk, but I am unsure
whether Slurm behaves normal here.

Best regards
Xaver Stiensmeier




Re: [slurm-users] Power Save: When is RESUME an invalid node state?

2023-12-06 Thread Xaver Stiensmeier

Hi Ole,

for multiple reasons we build it ourself, but I am not really involved
in that process, but I will contact the person who is. Thanks for the
recommendation! We should probably implement a regular check whether
there is a new slurm version. I am not 100% whether this will fix our
issues or not, but it's worth a try.

Best regards
Xaver

On 06.12.23 12:03, Ole Holm Nielsen wrote:

On 12/6/23 11:51, Xaver Stiensmeier wrote:

Good idea. Here's our current version:

```
sinfo -V
slurm 22.05.7
```

Quick googling told me that the latest version is 23.11. Does the
upgrade change anything in that regard? I will keep reading.


There are nice bug fixes in 23.02 mentioned in my SLUG'23 talk "Saving
Power with Slurm" at https://slurm.schedmd.com/publications.html

For reasons of security and functionality it is recommended to follow
Slurm's releases (maybe not the first few minor versions of new major
releases like 23.11).  FYI, I've collected information about upgrading
Slurm in the Wiki page
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/#upgrading-slurm

/Ole





Re: [slurm-users] Power Save: When is RESUME an invalid node state?

2023-12-06 Thread Xaver Stiensmeier

Hi Ole,

Good idea. Here's our current version:

```
sinfo -V
slurm 22.05.7
```

Quick googling told me that the latest version is 23.11. Does the
upgrade change anything in that regard? I will keep reading.

Xaver

On 06.12.23 11:09, Ole Holm Nielsen wrote:

Hi Xaver,

Your version of Slurm may matter for your power saving experience.  Do
you run an updated version?

/Ole

On 12/6/23 10:54, Xaver Stiensmeier wrote:

Hi Ole,

I will double check, but I am very sure that giving a reason is possible
as it has been done at least 20 other times without error during that
exact run. It might be ignored though. You can also give a reason when
defining the states POWER_UP and POWER_DOWN. Slurm's documentation is
not always giving all information. We run our solution for about a year
now so I don't think there's a general problem (as in something that
necessarily occurs) with the command. But I will take a closer look. I
really feel like it has to be something more conditional though as
otherwise the error would've occurred more often (i.e. every time when
handling a fail and the command is execute).
>>

IHTH,
Ole








Re: [slurm-users] Power Save: When is RESUME an invalid node state?

2023-12-06 Thread Xaver Stiensmeier

Hi Ole,

I will double check, but I am very sure that giving a reason is possible
as it has been done at least 20 other times without error during that
exact run. It might be ignored though. You can also give a reason when
defining the states POWER_UP and POWER_DOWN. Slurm's documentation is
not always giving all information. We run our solution for about a year
now so I don't think there's a general problem (as in something that
necessarily occurs) with the command. But I will take a closer look. I
really feel like it has to be something more conditional though as
otherwise the error would've occurred more often (i.e. every time when
handling a fail and the command is execute).

Your repository would've been really helpful for me when we started
implementing the cloud scheduling, but I feel like we have implemented
most things you mention there already. But I will take a look at
`DebugFlags=Power`. `PrivateData=cloud` was an annoying thing to find
out; SLURM plans/planned to change that in the future (cloud key behaves
different than any other key in PrivateData). Of course our setup
differs a little in the details.

Best regards
Xaver

On 06.12.23 10:30, Ole Holm Nielsen wrote:

Hi Xavier,

On 12/6/23 09:28, Xaver Stiensmeier wrote:

using https://slurm.schedmd.com/power_save.html we had one case out
of many (>242) node starts that resulted in

|slurm_update error: Invalid node state specified|

when we called:

|scontrol update NodeName="$1" state=RESUME reason=FailedStartup|

in the Fail script. We run this to make 100% sure that the instances
- that are created on demand - are again `~idle` after being removed
by the fail program. They are set to RESUME before the actual
instance gets destroyed. I remember that I had this case manually
before, but I don't remember when it occurs.

Maybe someone has a great idea how to tackle this problem.


Probably you can't assign a "reason" when you update a node with
state=RESUME.  The scontrol manual page says:

Reason= Identify the reason the node is in a "DOWN",
"DRAINED", "DRAINING", "FAILING" or "FAIL" state.

Maybe you will find some useful hints in my Wiki page
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_cloud_bursting/#configuring-slurm-conf-for-power-saving

and in my power saving tools at
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/power_save

IHTH,
Ole






[slurm-users] Power Save: When is RESUME an invalid node state?

2023-12-06 Thread Xaver Stiensmeier

Dear Slurm User list,

using https://slurm.schedmd.com/power_save.html we had one case out of
many (>242) node starts that resulted in

|slurm_update error: Invalid node state specified|

when we called:

|scontrol update NodeName="$1" state=RESUME reason=FailedStartup|

in the Fail script. We run this to make 100% sure that the instances -
that are created on demand - are again `~idle` after being removed by
the fail program. They are set to RESUME before the actual instance gets
destroyed. I remember that I had this case manually before, but I don't
remember when it occurs.

Maybe someone has a great idea how to tackle this problem.

Best regards
Xaver Stiensmeier


Re: [slurm-users] GRES and GPUs

2023-07-20 Thread Xaver Stiensmeier

Hey everyone,

I am answering my own question:
It wasn't working because I need to *reload slurmd* on the machine, too.
So the full "test gpu management without gpu" workflow is:

1. Start your slurm cluster.
2. Add a gpu to an instance of your choice in the *slurm.conf*

For example:*
*

   *DebugFlags=GRES *# consider this for initial setup.
   *SelectType=select/cons_tres**
   **GresTypes=gpu*
   NodeName=master SocketsPerBoard=8 CoresPerSocket=1 RealMemory=8000
   *GRES=gpu:1* State=UNKNOWN

3. Register it at *gres.conf *and give it *some file*

   NodeName=master Name=gpu File=/dev/tty0 Count=1 # count seems to be
   optional

4. Reload slurmctld (on the master) and slurmd (on the gpu node)*
*

   *sudo systemctl restart slurmctld**
   **sudo systemctl restart slurmd*

I haven't tested this solution thoroughly yet, but at least commands like:*
*

   *sudo systemctl restart slurmd*
   # master

run without any issues afterwards.

Thank you for all your help!

Best regards,
Xaver

On 19.07.23 17:05, Xaver Stiensmeier wrote:


Hi Hermann,

count doesn't make a difference, but I noticed that when I reconfigure
slurm and do reloads afterwards, the error "gpu count lower than
configured" no longer appears - so maybe it is just because a
reconfigure is needed after reloading slurmctld - or maybe it doesn't
show the error anymore, because the node is still invalid? However, I
still get the error:

    error: _slurm_rpc_node_registration node=NName: Invalid argument

If I understand correctly, this is telling me that there's something
wrong with my slurm.conf. I know that all pre-existing parameters are
correct, so I assume it must be the gpus entry, but I don't see where
it's wrong:

NodeName=NName SocketsPerBoard=8 CoresPerSocket=1 RealMemory=8000
Gres=gpu:1 State=CLOUD # bibiserv

Thanks for all the help,
Xaver

On 19.07.23 15:04, Hermann Schwärzler wrote:

Hi Xaver,

I think you are missing the "Count=..." part in gres.conf

It should read

NodeName=NName Name=gpu File=/dev/tty0 Count=1

in your case.

Regards,
Hermann

On 7/19/23 14:19, Xaver Stiensmeier wrote:

Okay,

thanks to S. Zhang I was able to figure out why nothing changed.
While I did restart systemctld at the beginning of my tests, I
didn't do so later, because I felt like it was unnecessary, but it
is right there in the fourth line of the log that this is needed.
Somehow I misread it and thought it automatically restarted slurmctld.

Given the setup:

slurm.conf
...
GresTypes=gpu
NodeName=NName SocketsPerBoard=8 CoresPerSocket=1 RealMemory=8000
GRES=gpu:1 State=UNKNOWN
...

gres.conf
NodeName=NName Name=gpu File=/dev/tty0

When restarting, I get the following error:

error: Setting node NName state to INVAL with reason:gres/gpu count
reported lower than configured (0 < 1)

So it is still not working, but at least I get a more helpful log
message. Because I know that this /dev/tty trick works, I am still
unsure where the current error lies, but I will try to investigate
it further. I am thankful for any ideas in that regard.

Best regards,
Xaver

On 19.07.23 10:23, Xaver Stiensmeier wrote:


Alright,

I tried a few more things, but I still wasn't able to get past:
srun: error: Unable to allocate resources: Invalid generic resource
(gres) specification.

I should mention that the node I am trying to test GPU with,
doesn't really have a gpu, but Rob was so kind to find out that you
do not need a gpu as long as you just link to a file in /dev/ in
the gres.conf. As mentioned: This is just for testing purposes - in
the end we will run this on a node with a gpu, but it is not
available at the moment.

*The error isn't changing*

If I omitt "GresTypes=gpu" and "Gres=gpu:1", I still get the same
error.

*Debug Info*

I added the gpu debug flag and logged the following:

[2023-07-18T14:59:45.026] restoring original state of nodes
[2023-07-18T14:59:45.026] select/cons_tres: part_data_create_array:
select/cons_tres: preparing for 2 partitions
[2023-07-18T14:59:45.026] error: GresPlugins changed from (null) to
gpu ignored
[2023-07-18T14:59:45.026] error: Restart the slurmctld daemon to
change GresPlugins
[2023-07-18T14:59:45.026] read_slurm_conf: backup_controller not
specified
[2023-07-18T14:59:45.026] error: GresPlugins changed from (null) to
gpu ignored
[2023-07-18T14:59:45.026] error: Restart the slurmctld daemon to
change GresPlugins
[2023-07-18T14:59:45.026] select/cons_tres: select_p_reconfigure:
select/cons_tres: reconfigure
[2023-07-18T14:59:45.027] select/cons_tres: part_data_create_array:
select/cons_tres: preparing for 2 partitions
[2023-07-18T14:59:45.027] No parameter for mcs plugin, default
values set
[2023-07-18T14:59:45.027] mcs: MCSParameters = (null). ondemand set.
[2023-07-18T14:59:45.028] _slurm_rpc_reconfigure_controller:
completed usec=5898
[2023-07-18T14:59:45.952]
SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_ma

Re: [slurm-users] GRES and GPUs

2023-07-19 Thread Xaver Stiensmeier

Hi Hermann,

count doesn't make a difference, but I noticed that when I reconfigure
slurm and do reloads afterwards, the error "gpu count lower than
configured" no longer appears - so maybe it is just because a
reconfigure is needed after reloading slurmctld - or maybe it doesn't
show the error anymore, because the node is still invalid? However, I
still get the error:

    error: _slurm_rpc_node_registration node=NName: Invalid argument

If I understand correctly, this is telling me that there's something
wrong with my slurm.conf. I know that all pre-existing parameters are
correct, so I assume it must be the gpus entry, but I don't see where
it's wrong:

   NodeName=NName SocketsPerBoard=8 CoresPerSocket=1 RealMemory=8000
   Gres=gpu:1 State=CLOUD # bibiserv

Thanks for all the help,
Xaver

On 19.07.23 15:04, Hermann Schwärzler wrote:

Hi Xaver,

I think you are missing the "Count=..." part in gres.conf

It should read

NodeName=NName Name=gpu File=/dev/tty0 Count=1

in your case.

Regards,
Hermann

On 7/19/23 14:19, Xaver Stiensmeier wrote:

Okay,

thanks to S. Zhang I was able to figure out why nothing changed.
While I did restart systemctld at the beginning of my tests, I didn't
do so later, because I felt like it was unnecessary, but it is right
there in the fourth line of the log that this is needed. Somehow I
misread it and thought it automatically restarted slurmctld.

Given the setup:

slurm.conf
...
GresTypes=gpu
NodeName=NName SocketsPerBoard=8 CoresPerSocket=1 RealMemory=8000
GRES=gpu:1 State=UNKNOWN
...

gres.conf
NodeName=NName Name=gpu File=/dev/tty0

When restarting, I get the following error:

error: Setting node NName state to INVAL with reason:gres/gpu count
reported lower than configured (0 < 1)

So it is still not working, but at least I get a more helpful log
message. Because I know that this /dev/tty trick works, I am still
unsure where the current error lies, but I will try to investigate it
further. I am thankful for any ideas in that regard.

Best regards,
Xaver

On 19.07.23 10:23, Xaver Stiensmeier wrote:


Alright,

I tried a few more things, but I still wasn't able to get past:
srun: error: Unable to allocate resources: Invalid generic resource
(gres) specification.

I should mention that the node I am trying to test GPU with, doesn't
really have a gpu, but Rob was so kind to find out that you do not
need a gpu as long as you just link to a file in /dev/ in the
gres.conf. As mentioned: This is just for testing purposes - in the
end we will run this on a node with a gpu, but it is not available
at the moment.

*The error isn't changing*

If I omitt "GresTypes=gpu" and "Gres=gpu:1", I still get the same
error.

*Debug Info*

I added the gpu debug flag and logged the following:

[2023-07-18T14:59:45.026] restoring original state of nodes
[2023-07-18T14:59:45.026] select/cons_tres: part_data_create_array:
select/cons_tres: preparing for 2 partitions
[2023-07-18T14:59:45.026] error: GresPlugins changed from (null) to
gpu ignored
[2023-07-18T14:59:45.026] error: Restart the slurmctld daemon to
change GresPlugins
[2023-07-18T14:59:45.026] read_slurm_conf: backup_controller not
specified
[2023-07-18T14:59:45.026] error: GresPlugins changed from (null) to
gpu ignored
[2023-07-18T14:59:45.026] error: Restart the slurmctld daemon to
change GresPlugins
[2023-07-18T14:59:45.026] select/cons_tres: select_p_reconfigure:
select/cons_tres: reconfigure
[2023-07-18T14:59:45.027] select/cons_tres: part_data_create_array:
select/cons_tres: preparing for 2 partitions
[2023-07-18T14:59:45.027] No parameter for mcs plugin, default
values set
[2023-07-18T14:59:45.027] mcs: MCSParameters = (null). ondemand set.
[2023-07-18T14:59:45.028] _slurm_rpc_reconfigure_controller:
completed usec=5898
[2023-07-18T14:59:45.952]
SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_max_job_start=0,sched_min_interval=2

I am a bit unsure what to do next to further investigate this issue.

Best regards,
Xaver

On 17.07.23 15:57, Groner, Rob wrote:

That would certainly do it.  If you look at the slurmctld log when
it comes up, it will say that it's marking that node as invalid
because it has less (0) gres resources then you say it should
have.  That's because slurmd on that node will come up and say
"What gres resources??"

For testing purposes,  you can just create a dummy file on the
node, then in gres.conf, point to that file as the "graphics file"
interface.  As long as you don't try to actually use it as a
graphics file, that should be enough for that node to think it has
gres/gpu resources. That's what I do in my vagrant slurm cluster.

Rob

--------

*From:* slurm-users  on
behalf of Xaver Stiensmeier 
*Sent:* Monday, July 17, 2023 9:43 AM
*To:* slurm-users@lists.schedmd.com 
*Subject:* Re: [slurm-users] GRES and GPUs
Hi Hermann,

G

Re: [slurm-users] GRES and GPUs

2023-07-19 Thread Xaver Stiensmeier

Okay,

thanks to S. Zhang I was able to figure out why nothing changed. While I
did restart systemctld at the beginning of my tests, I didn't do so
later, because I felt like it was unnecessary, but it is right there in
the fourth line of the log that this is needed. Somehow I misread it and
thought it automatically restarted slurmctld.

Given the setup:

slurm.conf
...
GresTypes=gpu
NodeName=NName SocketsPerBoard=8 CoresPerSocket=1 RealMemory=8000
GRES=gpu:1 State=UNKNOWN
...

gres.conf
NodeName=NName Name=gpu File=/dev/tty0

When restarting, I get the following error:

error: Setting node NName state to INVAL with reason:gres/gpu count
reported lower than configured (0 < 1)

So it is still not working, but at least I get a more helpful log
message. Because I know that this /dev/tty trick works, I am still
unsure where the current error lies, but I will try to investigate it
further. I am thankful for any ideas in that regard.

Best regards,
Xaver

On 19.07.23 10:23, Xaver Stiensmeier wrote:


Alright,

I tried a few more things, but I still wasn't able to get past: srun:
error: Unable to allocate resources: Invalid generic resource (gres)
specification.

I should mention that the node I am trying to test GPU with, doesn't
really have a gpu, but Rob was so kind to find out that you do not
need a gpu as long as you just link to a file in /dev/ in the
gres.conf. As mentioned: This is just for testing purposes - in the
end we will run this on a node with a gpu, but it is not available at
the moment.

*The error isn't changing*

If I omitt "GresTypes=gpu" and "Gres=gpu:1", I still get the same error.

*Debug Info*

I added the gpu debug flag and logged the following:

[2023-07-18T14:59:45.026] restoring original state of nodes
[2023-07-18T14:59:45.026] select/cons_tres: part_data_create_array:
select/cons_tres: preparing for 2 partitions
[2023-07-18T14:59:45.026] error: GresPlugins changed from (null) to
gpu ignored
[2023-07-18T14:59:45.026] error: Restart the slurmctld daemon to
change GresPlugins
[2023-07-18T14:59:45.026] read_slurm_conf: backup_controller not specified
[2023-07-18T14:59:45.026] error: GresPlugins changed from (null) to
gpu ignored
[2023-07-18T14:59:45.026] error: Restart the slurmctld daemon to
change GresPlugins
[2023-07-18T14:59:45.026] select/cons_tres: select_p_reconfigure:
select/cons_tres: reconfigure
[2023-07-18T14:59:45.027] select/cons_tres: part_data_create_array:
select/cons_tres: preparing for 2 partitions
[2023-07-18T14:59:45.027] No parameter for mcs plugin, default values set
[2023-07-18T14:59:45.027] mcs: MCSParameters = (null). ondemand set.
[2023-07-18T14:59:45.028] _slurm_rpc_reconfigure_controller: completed
usec=5898
[2023-07-18T14:59:45.952]
SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_max_job_start=0,sched_min_interval=2

I am a bit unsure what to do next to further investigate this issue.

Best regards,
Xaver

On 17.07.23 15:57, Groner, Rob wrote:

That would certainly do it.  If you look at the slurmctld log when it
comes up, it will say that it's marking that node as invalid because
it has less (0) gres resources then you say it should have.  That's
because slurmd on that node will come up and say "What gres resources??"

For testing purposes,  you can just create a dummy file on the node,
then in gres.conf, point to that file as the "graphics file"
interface.  As long as you don't try to actually use it as a graphics
file, that should be enough for that node to think it has gres/gpu
resources. That's what I do in my vagrant slurm cluster.

Rob

----
*From:* slurm-users  on behalf
of Xaver Stiensmeier 
*Sent:* Monday, July 17, 2023 9:43 AM
*To:* slurm-users@lists.schedmd.com 
*Subject:* Re: [slurm-users] GRES and GPUs
Hi Hermann,

Good idea, but we are already using `SelectType=select/cons_tres`. After
setting everything up again (in case I made an unnoticed mistake), I saw
that the node got marked STATE=inval.

To be honest, I thought I can just claim that a node has a gpu even if
it doesn't have one - just for testing purposes. Could this be the issue?

Best regards,
Xaver Stiensmeier

On 17.07.23 14:11, Hermann Schwärzler wrote:
> Hi Xaver,
>
> what kind of SelectType are you using in your slurm.conf?
>
> Per
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fgres.html=05%7C01%7Crug262%40psu.edu%7Cbc4b7775beae4d2e376c08db86cbfc7b%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638251982928987379%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=PqvE6pL2sKSb6KxLngi0sbm6qhIv8MRYTmUM%2Bgq1hrI%3D=0
<https://slurm.schedmd.com/gres.html> you have to consider:
> "As for the --gpu* option, these options are only supported by Slurm's
> select/cons_tres plugin."
>
> So you

Re: [slurm-users] GRES and GPUs

2023-07-19 Thread Xaver Stiensmeier

Alright,

I tried a few more things, but I still wasn't able to get past: srun:
error: Unable to allocate resources: Invalid generic resource (gres)
specification.

I should mention that the node I am trying to test GPU with, doesn't
really have a gpu, but Rob was so kind to find out that you do not need
a gpu as long as you just link to a file in /dev/ in the gres.conf. As
mentioned: This is just for testing purposes - in the end we will run
this on a node with a gpu, but it is not available at the moment.

*The error isn't changing*

If I omitt "GresTypes=gpu" and "Gres=gpu:1", I still get the same error.

*Debug Info*

I added the gpu debug flag and logged the following:

[2023-07-18T14:59:45.026] restoring original state of nodes
[2023-07-18T14:59:45.026] select/cons_tres: part_data_create_array:
select/cons_tres: preparing for 2 partitions
[2023-07-18T14:59:45.026] error: GresPlugins changed from (null) to gpu
ignored
[2023-07-18T14:59:45.026] error: Restart the slurmctld daemon to change
GresPlugins
[2023-07-18T14:59:45.026] read_slurm_conf: backup_controller not specified
[2023-07-18T14:59:45.026] error: GresPlugins changed from (null) to gpu
ignored
[2023-07-18T14:59:45.026] error: Restart the slurmctld daemon to change
GresPlugins
[2023-07-18T14:59:45.026] select/cons_tres: select_p_reconfigure:
select/cons_tres: reconfigure
[2023-07-18T14:59:45.027] select/cons_tres: part_data_create_array:
select/cons_tres: preparing for 2 partitions
[2023-07-18T14:59:45.027] No parameter for mcs plugin, default values set
[2023-07-18T14:59:45.027] mcs: MCSParameters = (null). ondemand set.
[2023-07-18T14:59:45.028] _slurm_rpc_reconfigure_controller: completed
usec=5898
[2023-07-18T14:59:45.952]
SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_max_job_start=0,sched_min_interval=2

I am a bit unsure what to do next to further investigate this issue.

Best regards,
Xaver

On 17.07.23 15:57, Groner, Rob wrote:

That would certainly do it.  If you look at the slurmctld log when it
comes up, it will say that it's marking that node as invalid because
it has less (0) gres resources then you say it should have.  That's
because slurmd on that node will come up and say "What gres resources??"

For testing purposes,  you can just create a dummy file on the node,
then in gres.conf, point to that file as the "graphics file"
interface.  As long as you don't try to actually use it as a graphics
file, that should be enough for that node to think it has gres/gpu
resources.  That's what I do in my vagrant slurm cluster.

Rob

----
*From:* slurm-users  on behalf
of Xaver Stiensmeier 
*Sent:* Monday, July 17, 2023 9:43 AM
*To:* slurm-users@lists.schedmd.com 
*Subject:* Re: [slurm-users] GRES and GPUs
Hi Hermann,

Good idea, but we are already using `SelectType=select/cons_tres`. After
setting everything up again (in case I made an unnoticed mistake), I saw
that the node got marked STATE=inval.

To be honest, I thought I can just claim that a node has a gpu even if
it doesn't have one - just for testing purposes. Could this be the issue?

Best regards,
Xaver Stiensmeier

On 17.07.23 14:11, Hermann Schwärzler wrote:
> Hi Xaver,
>
> what kind of SelectType are you using in your slurm.conf?
>
> Per
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fgres.html=05%7C01%7Crug262%40psu.edu%7Cbc4b7775beae4d2e376c08db86cbfc7b%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638251982928987379%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=PqvE6pL2sKSb6KxLngi0sbm6qhIv8MRYTmUM%2Bgq1hrI%3D=0
<https://slurm.schedmd.com/gres.html> you have to consider:
> "As for the --gpu* option, these options are only supported by Slurm's
> select/cons_tres plugin."
>
> So you can use "--gpus ..." only when you state
> SelectType  = select/cons_tres
> in your slurm.conf.
>
> But "--gres=gpu:1" should work always.
>
> Regards
> Hermann
>
>
> On 7/17/23 13:43, Xaver Stiensmeier wrote:
>> Hey,
>>
>> I am currently trying to understand how I can schedule a job that
>> needs a GPU.
>>
>> I read about GRES
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fgres.html=05%7C01%7Crug262%40psu.edu%7Cbc4b7775beae4d2e376c08db86cbfc7b%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638251982928987379%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=PqvE6pL2sKSb6KxLngi0sbm6qhIv8MRYTmUM%2Bgq1hrI%3D=0
<https://slurm.schedmd.com/gres.html> and tried to use:
>>
>> GresTypes=gpu
>> NodeName=test Gres=gpu:1
>>
>> But calling - after a 'sud

Re: [slurm-users] GRES and GPUs

2023-07-17 Thread Xaver Stiensmeier

Hi Hermann,

Good idea, but we are already using `SelectType=select/cons_tres`. After
setting everything up again (in case I made an unnoticed mistake), I saw
that the node got marked STATE=inval.

To be honest, I thought I can just claim that a node has a gpu even if
it doesn't have one - just for testing purposes. Could this be the issue?

Best regards,
Xaver Stiensmeier

On 17.07.23 14:11, Hermann Schwärzler wrote:

Hi Xaver,

what kind of SelectType are you using in your slurm.conf?

Per https://slurm.schedmd.com/gres.html you have to consider:
"As for the --gpu* option, these options are only supported by Slurm's
select/cons_tres plugin."

So you can use "--gpus ..." only when you state
SelectType  = select/cons_tres
in your slurm.conf.

But "--gres=gpu:1" should work always.

Regards
Hermann


On 7/17/23 13:43, Xaver Stiensmeier wrote:

Hey,

I am currently trying to understand how I can schedule a job that
needs a GPU.

I read about GRES https://slurm.schedmd.com/gres.html and tried to use:

GresTypes=gpu
NodeName=test Gres=gpu:1

But calling - after a 'sudo scontrol reconfigure':

srun --gpus 1 hostname

didn't work:

srun: error: Unable to allocate resources: Invalid generic resource
(gres) specification

so I read more https://slurm.schedmd.com/gres.conf.html but that
didn't really help me.


I am rather confused. GRES claims to be generic resources but then it
comes with three defined resources (GPU, MPS, MIG) and using one of
those didn't work in my case.

Obviously, I am misunderstanding something, but I am unsure where to
look.


Best regards,
Xaver Stiensmeier







[slurm-users] GRES and GPUs

2023-07-17 Thread Xaver Stiensmeier

Hey,

I am currently trying to understand how I can schedule a job that needs
a GPU.

I read about GRES https://slurm.schedmd.com/gres.html and tried to use:

GresTypes=gpu
NodeName=test Gres=gpu:1

But calling - after a 'sudo scontrol reconfigure':

srun --gpus 1 hostname

didn't work:

srun: error: Unable to allocate resources: Invalid generic resource (gres) 
specification

so I read more https://slurm.schedmd.com/gres.conf.html but that didn't
really help me.


I am rather confused. GRES claims to be generic resources but then it
comes with three defined resources (GPU, MPS, MIG) and using one of
those didn't work in my case.

Obviously, I am misunderstanding something, but I am unsure where to look.


Best regards,
Xaver Stiensmeier


[slurm-users] Prevent CLOUD node from being shutdown after startup

2023-05-12 Thread Xaver Stiensmeier

Dear slurm-users,

I am currently looking into options how I can deactivate suspending for
nodes. I am both interested in the general case:

Allowing all nodes to be powered up, but for all nodes without automatic
suspending except when triggering power down manually.

And the special case:

Allowing all nodes to be powered up, but without automatic suspending
for some nodes except when triggering power down manually.

---

I tried using negative times for SuspendTime, but that didn't seem to
work as no nodes are powered up then.

Best regards,
Xaver Stiensmeier




[slurm-users] Submit sbatch to multiple partitions

2023-04-17 Thread Xaver Stiensmeier

Dear slurm-users list,

let's say I want to submit a large batch job that should run on 8 nodes.
I have two partitions, each holding 4 nodes. Slurm will now tell me that
"Requested node configuration is not available". However, my desired
output would be that slurm makes use of both partitions and allocates
all 8 nodes.

Best regards,
Xaver Stiensmeier




Re: [slurm-users] Multiple default partitions

2023-04-17 Thread Xaver Stiensmeier

I found a solution that works for me, but it doesn't really answer the
question:

It's the option
https://slurm.schedmd.com/slurm.conf.html#OPT_all_partitions for
JobSubmitPlugins. It works for me, because all partitions are default in
my case, but it doesn't /really/ answer my question as my question asks
how to have multiple default partitions which could include having
others that are not default.

Best regards,
Xaver Stiensmeier

On 17.04.23 11:12, Xaver Stiensmeier wrote:

Dear slurm-users list,

is it possible to somehow have two default partitions? In the best case
in a way that slurm schedules to partition1 on default and only to
partition2 when partition1 can't handle the job right now.

Best regards,
Xaver Stiensmeier




[slurm-users] Multiple default partitions

2023-04-17 Thread Xaver Stiensmeier

Dear slurm-users list,

is it possible to somehow have two default partitions? In the best case
in a way that slurm schedules to partition1 on default and only to
partition2 when partition1 can't handle the job right now.

Best regards,
Xaver Stiensmeier




[slurm-users] Evaluation: How collect data regarding slurms cloud scheduling performance?

2023-02-28 Thread Xaver Stiensmeier

Dear slurm-user list,

I am currently investigating ways of evaluation regarding slurms cloud
scheduling performance. As we are all aware there are many adjustment
screws when it comes to cloud scheduling.

We can change the regular scheduling (prioritizing, ...), powerup and
powerdown times. There's probably a lot more.

However, my question today is not about improving cloud scheduling
performance, but how we collect data like:

When were nodes powered up [down]. To what degree were the powered up
machines used? Were the "right" instances started for the given jobs or
were larger instances started than needed? ...

I know that this question is currently very open, but I am still trying
to narrow down where I have to look. The final goal is of course to use
this evaluation to pick better timeout values and improve cloud scheduling.

Best regards,
Xaver Stiensmeier




[slurm-users] Request nodes with a custom resource?

2023-02-05 Thread Xaver Stiensmeier

Dear slurm-user list,

how would you implement a custom resource requirement? For example you
have a group of nodes that has direct access to a large database so it
would be best to run jobs regarding that database on those nodes.

How would you schedule a job (let's say using srun) to work on these
nodes? Of course this would be interesting in a dynamic case, too
(assuming that the database is downloaded to nodes during job
execution), but for now I would be happy with a static solution (where
it is okay to store a value before that says something like
"hasDatabase=1" on nodes.

So I am basically looking for custom requirements.

Best regards,
Xaver Stiensmeier




[slurm-users] How to set default partition in slurm configuration

2023-01-25 Thread Xaver Stiensmeier

Dear slurm-user list,

I am aware that this question sounds very simple and it should be
resolved by just taking one look at the documentation, but for some
reason I am unable to find where I can set the default partition in the
slurm configuration.

For me it feels like simply the last added configuration becomes
automatically the default configuration, but I might be wrong in this
assessment of the situation. However, I would prefer just setting a key
instead of fiddling around with the positioning of partition definitions.

I only found reference to the "default partition" in `JobSubmitPlugins`
and this might be the solution. However, I think this is something so
basic that it probably shouldn't need a plugin so I am unsure.

Can anyone point me towards how setting the default partition is done?

Best regards,
Xaver Stiensmeier




[slurm-users] Slurm: Handling nodes that fail to POWER_UP in a cloud scheduling system

2022-11-23 Thread Xaver Stiensmeier

Hello slurm-users,
The question can be found in a similar fashion here:
https://stackoverflow.com/questions/74529491/slurm-handling-nodes-that-fail-to-power-up-in-a-cloud-scheduling-system


 Issue


   Current behavior and problem description

When a node fails to |POWER_UP|, it is marked |DOWN|. While this is a
great idea in general, this is not useful when working with |CLOUD|
nodes, because said |CLOUD| node is likely to be started on a different
machine and therefore to |POWER_UP| without issues. But since the node
is marked as down, that cloud resource is no longer used and never
started again until freed manually.


   Wanted behavior

Ideally slurm would not mark the node as |DOWN|, but just attempt to
start another. If that's not possible, automatically resuming |DOWN|
nodes would also be an option.


   Question

How can I prevent slurm from marking nodes that fail to |POWER_UP| as
|DOWN| or make slurm restore |DOWN| nodes automatically to prevent slurm
from forgetting cloud resources?


 Attempts and Thoughts


   ReturnToService

I tried solving this using |ReturnToService|
<https://slurm.schedmd.com/slurm.conf.html#OPT_ReturnToService> but that
didn't seem to solve my issue, since, if I understand it correctly, that
will only accept slurm nodes starting up by themselves or manually not
taking them in consideration when scheduling jobs until they've been
started.


   SlurmctldParameters=idle_on_node_suspend

While this is great and definitely helpful, it doesn't solve the issue
at hand since a node that failed during power up, is not suspended.


   ResumeFailedProgram

I considered using |ResumeFailedProgram|
<https://slurm.schedmd.com/slurm.conf.html#OPT_ResumeFailProgram>, but
it sounds odd that you have to write yourself a script for returning
your nodes to service if they fail on startup. This case sounds too
usual to not be implemented in slurm. However, this will be my next
attempt: Implement a script that calls for every given node

   sudo scontrol update NodeName=$NODE_NAME state=RESUME
   reason=FailedShutdown


 Additional Information

In the |POWER_UP| script I am terminating the server if the setup fails
for any reason and return an exit code unequal to 0.

In our Cloud Scheduling
<https://slurm.schedmd.com/elastic_computing.html> instances are created
once they are needed and deleted once they are no longer deleted. This
means that slurm stores that a node is |DOWN| while no real instance
behind it exists anymore. If that node wouldn't be marked |DOWN| and a
job would be scheduled towards it at a later time, it would simply start
an instance and run on that new instance. I am just stating this to be
maximum explicit.

Best regards,
Xaver Stiensmeier

PS: This is the first time I use the slurm-user list and I hope I am not
violating any rules with this question. Please let me know, if I do.