Re: [slurm-users] slurmrestd service broken by 22.05.07 update

2022-12-29 Thread Chris Samuel

On 29/12/22 11:31 am, Timo Rothenpieler wrote:

Having service files in top level dirs like /run or /var/lib is bound to 
cause issues like this.


You can use local systemd overrides for things like this. In this case I 
suspect you can create this directory:


/etc/systemd/system/slurmrestd.service.d/

and drop files into it via the Configuration Management System Of Your 
Choice to override/augment the vendor supplied configuration.


https://www.freedesktop.org/software/systemd/man/systemd.unit.html

> Along with a unit file foo.service, a "drop-in" directory
> foo.service.d/ may exist. All files with the suffix ".conf"
> from this directory will be merged in the alphanumeric order
> and parsed after the main unit file itself has been parsed.
> This is useful to alter or add configuration settings for a
> unit, without having to modify unit files. Each drop-in file
> must contain appropriate section headers. For instantiated
> units, this logic will first look for the instance ".d/"
> subdirectory (e.g. "foo@bar.service.d/") and read its ".conf"
> files, followed by the template ".d/" subdirectory
> (e.g. "foo@.service.d/") and the ".conf" files there.

Caveat: written whilst travelling and without testing or even having 
access to a system where I can test, but we do use this method for other 
services already.


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




Re: [slurm-users] slurmrestd service broken by 22.05.07 update

2022-12-29 Thread Timo Rothenpieler
Ideally, the systemd service would specify the User/Group already, and 
then also specify RuntimeDirectory=slurmrestd.
It then pre-creates a slurmrestd directory in /run for the service to 
put its runtime files (like sockets) into, avoiding any permission issues.


Having service files in top level dirs like /run or /var/lib is bound to 
cause issues like this.


On 29.12.2022 16:53, Chris Stackpole wrote:

Thanks Brian!

I also discovered that I can edit the service file to remove the unix 
socket. Doesn't seem to impact the things I'm working with anyway. But 
this design choice still seems strange to me that editing the service 
file is required. It seems like this should also be a configurable item 
like the user information at the very least. But again, I've not found 
any official documentation on how the devs expect us to configure this.


Thanks!

On 12/29/22 09:46, Brian Andrus wrote:
I dug up my old stuff for getting it started and see that I just 
disabled the unix socket completely. I was never able to get it to 
work for the reasons you are seeing, so I enabled it in listening 
mode. There are comments in the service file about it, but to do so, I 
changed the 'ExecStart' line in the systemd service file to be:


/*ExecStart=/usr/sbin/slurmrestd $SLURMRESTD_OPTIONS*/

Then I created /etc/default/slurmrestd and added:

    /*SLURM_JWT=daemon*//*
    *//*SLURMRESTD_LISTEN=0.0.0.0:8081*//*
    *//*SLURMRESTD_DEBUG=4*//*
    *//*SLURMRESTD_OPTIONS="-f /etc/slurm/slurm.conf"*/

You can change those as needed. This made it listen on port 8081 only 
(no socket and not 6820)


I was then able to just use curl on port 8081 to test things.

Hope that helps.

Brian Andrus

On 12/29/2022 6:49 AM, Chris Stackpole wrote:

Greetings,

Thanks for responding!

On 12/28/22 20:35, Brian Andrus wrote:
I suspect if you delete /var/lib/slurmrestd.socket and then start 
slurmrestd, it will create it as the user you need it to be.


Or just change the owner of it to the slurmrestd owner.



No go on that. Because /var/lib requires root to create 
/var/lib/slurmrestd.socket . Which is what I meant by "has to write 
into a root-only directory to create the unix socket".

Here, I'll show what happens with me.
Spun up a virtual machine with nothing changed on a fresh compile of 
22.05.07.


# rm -rf /var/lib/slurmrestd.socket
# systemctl start slurmrestd
# systemctl status slurmrestd

Active: failed (Result: exit-code) since Thu 2022-12-29 08:39:45 CST; 
54s ago



# journalctl -xe

Dec 29 08:39:45 testslurmvm.cluster slurmrestd[114317]: fatal: 
_create_socket: [unix:/var/lib/slurmrestd.socket] Unable to bind UNIX 
socket: Permission denied
Dec 29 08:39:45 testslurmvm.cluster systemd[1]: slurmrestd.service: 
Main process exited, code=exited, status=1/FAILURE


Now what about giving ownership to the user?

# touch /var/lib/slurmrestd.socket
# systemctl start slurmrestd
# systemctl status slurmrestd

Active: failed (Result: exit-code) since Thu 2022-12-29 08:45:37 CST; 
1min 2s ago


# journalctl -xe

Dec 29 08:45:37 testslurmvm.cluster slurmrestd[114402]: error: Error 
unlink(/var/lib/slurmrestd.socket): Permission denied
Dec 29 08:45:37 testslurmvm.cluster slurmrestd[114402]: fatal: 
_create_socket: [unix:/var/lib/slurmrestd.socket] Unable to bind UNIX 
socket: Address already in use


Again, it doesn't have permissions to modify those files nor create 
files inside that directory.


On 12/28/22 20:35, Brian Andrus wrote:
> I have been running slurmrestd as a separate user for some time.

Under 22.05.07? Because that's what broke things for me. And I think 
that it's this change:


| -- slurmrestd - switch users earlier on startup to avoid sockets being
| made as root.

I'm not saying it's a bad change either - but I don't see any 
documentation on the proper way to handle it and I don't feel like 
editing the service file is the proper way to handle it.


Thanks!







Re: [slurm-users] slurmrestd service broken by 22.05.07 update

2022-12-29 Thread Chris Stackpole

Thanks Brian!

I also discovered that I can edit the service file to remove the unix 
socket. Doesn't seem to impact the things I'm working with anyway. But 
this design choice still seems strange to me that editing the service 
file is required. It seems like this should also be a configurable item 
like the user information at the very least. But again, I've not found 
any official documentation on how the devs expect us to configure this.


Thanks!

On 12/29/22 09:46, Brian Andrus wrote:
I dug up my old stuff for getting it started and see that I just 
disabled the unix socket completely. I was never able to get it to work 
for the reasons you are seeing, so I enabled it in listening mode. There 
are comments in the service file about it, but to do so, I changed the 
'ExecStart' line in the systemd service file to be:


/*ExecStart=/usr/sbin/slurmrestd $SLURMRESTD_OPTIONS*/

Then I created /etc/default/slurmrestd and added:

/*SLURM_JWT=daemon*//*
*//*SLURMRESTD_LISTEN=0.0.0.0:8081*//*
*//*SLURMRESTD_DEBUG=4*//*
*//*SLURMRESTD_OPTIONS="-f /etc/slurm/slurm.conf"*/

You can change those as needed. This made it listen on port 8081 only 
(no socket and not 6820)


I was then able to just use curl on port 8081 to test things.

Hope that helps.

Brian Andrus

On 12/29/2022 6:49 AM, Chris Stackpole wrote:

Greetings,

Thanks for responding!

On 12/28/22 20:35, Brian Andrus wrote:
I suspect if you delete /var/lib/slurmrestd.socket and then start 
slurmrestd, it will create it as the user you need it to be.


Or just change the owner of it to the slurmrestd owner.



No go on that. Because /var/lib requires root to create 
/var/lib/slurmrestd.socket . Which is what I meant by "has to write 
into a root-only directory to create the unix socket".

Here, I'll show what happens with me.
Spun up a virtual machine with nothing changed on a fresh compile of 
22.05.07.


# rm -rf /var/lib/slurmrestd.socket
# systemctl start slurmrestd
# systemctl status slurmrestd

Active: failed (Result: exit-code) since Thu 2022-12-29 08:39:45 CST; 
54s ago



# journalctl -xe

Dec 29 08:39:45 testslurmvm.cluster slurmrestd[114317]: fatal: 
_create_socket: [unix:/var/lib/slurmrestd.socket] Unable to bind UNIX 
socket: Permission denied
Dec 29 08:39:45 testslurmvm.cluster systemd[1]: slurmrestd.service: 
Main process exited, code=exited, status=1/FAILURE


Now what about giving ownership to the user?

# touch /var/lib/slurmrestd.socket
# systemctl start slurmrestd
# systemctl status slurmrestd

Active: failed (Result: exit-code) since Thu 2022-12-29 08:45:37 CST; 
1min 2s ago


# journalctl -xe

Dec 29 08:45:37 testslurmvm.cluster slurmrestd[114402]: error: Error 
unlink(/var/lib/slurmrestd.socket): Permission denied
Dec 29 08:45:37 testslurmvm.cluster slurmrestd[114402]: fatal: 
_create_socket: [unix:/var/lib/slurmrestd.socket] Unable to bind UNIX 
socket: Address already in use


Again, it doesn't have permissions to modify those files nor create 
files inside that directory.


On 12/28/22 20:35, Brian Andrus wrote:
> I have been running slurmrestd as a separate user for some time.

Under 22.05.07? Because that's what broke things for me. And I think 
that it's this change:


| -- slurmrestd - switch users earlier on startup to avoid sockets being
| made as root.

I'm not saying it's a bad change either - but I don't see any 
documentation on the proper way to handle it and I don't feel like 
editing the service file is the proper way to handle it.


Thanks!





Re: [slurm-users] slurmrestd service broken by 22.05.07 update

2022-12-29 Thread Brian Andrus
I dug up my old stuff for getting it started and see that I just 
disabled the unix socket completely. I was never able to get it to work 
for the reasons you are seeing, so I enabled it in listening mode. There 
are comments in the service file about it, but to do so, I changed the 
'ExecStart' line in the systemd service file to be:


/*ExecStart=/usr/sbin/slurmrestd $SLURMRESTD_OPTIONS*/

Then I created /etc/default/slurmrestd and added:

   /*SLURM_JWT=daemon*//*
   *//*SLURMRESTD_LISTEN=0.0.0.0:8081*//*
   *//*SLURMRESTD_DEBUG=4*//*
   *//*SLURMRESTD_OPTIONS="-f /etc/slurm/slurm.conf"*/

You can change those as needed. This made it listen on port 8081 only 
(no socket and not 6820)


I was then able to just use curl on port 8081 to test things.

Hope that helps.

Brian Andrus

On 12/29/2022 6:49 AM, Chris Stackpole wrote:

Greetings,

Thanks for responding!

On 12/28/22 20:35, Brian Andrus wrote:
I suspect if you delete /var/lib/slurmrestd.socket and then start 
slurmrestd, it will create it as the user you need it to be.


Or just change the owner of it to the slurmrestd owner.



No go on that. Because /var/lib requires root to create 
/var/lib/slurmrestd.socket . Which is what I meant by "has to write 
into a root-only directory to create the unix socket".

Here, I'll show what happens with me.
Spun up a virtual machine with nothing changed on a fresh compile of 
22.05.07.


# rm -rf /var/lib/slurmrestd.socket
# systemctl start slurmrestd
# systemctl status slurmrestd

Active: failed (Result: exit-code) since Thu 2022-12-29 08:39:45 CST; 
54s ago



# journalctl -xe

Dec 29 08:39:45 testslurmvm.cluster slurmrestd[114317]: fatal: 
_create_socket: [unix:/var/lib/slurmrestd.socket] Unable to bind UNIX 
socket: Permission denied
Dec 29 08:39:45 testslurmvm.cluster systemd[1]: slurmrestd.service: 
Main process exited, code=exited, status=1/FAILURE


Now what about giving ownership to the user?

# touch /var/lib/slurmrestd.socket
# systemctl start slurmrestd
# systemctl status slurmrestd

Active: failed (Result: exit-code) since Thu 2022-12-29 08:45:37 CST; 
1min 2s ago


# journalctl -xe

Dec 29 08:45:37 testslurmvm.cluster slurmrestd[114402]: error: Error 
unlink(/var/lib/slurmrestd.socket): Permission denied
Dec 29 08:45:37 testslurmvm.cluster slurmrestd[114402]: fatal: 
_create_socket: [unix:/var/lib/slurmrestd.socket] Unable to bind UNIX 
socket: Address already in use


Again, it doesn't have permissions to modify those files nor create 
files inside that directory.


On 12/28/22 20:35, Brian Andrus wrote:
> I have been running slurmrestd as a separate user for some time.

Under 22.05.07? Because that's what broke things for me. And I think 
that it's this change:


| -- slurmrestd - switch users earlier on startup to avoid sockets being
| made as root.

I'm not saying it's a bad change either - but I don't see any 
documentation on the proper way to handle it and I don't feel like 
editing the service file is the proper way to handle it.


Thanks!


Re: [slurm-users] slurmrestd service broken by 22.05.07 update

2022-12-29 Thread Chris Stackpole

Greetings,

Thanks for responding!

On 12/28/22 20:35, Brian Andrus wrote:
I suspect if you delete /var/lib/slurmrestd.socket and then start 
slurmrestd, it will create it as the user you need it to be.


Or just change the owner of it to the slurmrestd owner.



No go on that. Because /var/lib requires root to create 
/var/lib/slurmrestd.socket . Which is what I meant by "has to write into 
a root-only directory to create the unix socket".

Here, I'll show what happens with me.
Spun up a virtual machine with nothing changed on a fresh compile of 
22.05.07.


# rm -rf /var/lib/slurmrestd.socket
# systemctl start slurmrestd
# systemctl status slurmrestd

Active: failed (Result: exit-code) since Thu 2022-12-29 08:39:45 CST; 
54s ago



# journalctl -xe

Dec 29 08:39:45 testslurmvm.cluster slurmrestd[114317]: fatal: 
_create_socket: [unix:/var/lib/slurmrestd.socket] Unable to bind UNIX 
socket: Permission denied
Dec 29 08:39:45 testslurmvm.cluster systemd[1]: slurmrestd.service: Main 
process exited, code=exited, status=1/FAILURE


Now what about giving ownership to the user?

# touch /var/lib/slurmrestd.socket
# systemctl start slurmrestd
# systemctl status slurmrestd

Active: failed (Result: exit-code) since Thu 2022-12-29 08:45:37 CST; 
1min 2s ago


# journalctl -xe

Dec 29 08:45:37 testslurmvm.cluster slurmrestd[114402]: error: Error 
unlink(/var/lib/slurmrestd.socket): Permission denied
Dec 29 08:45:37 testslurmvm.cluster slurmrestd[114402]: fatal: 
_create_socket: [unix:/var/lib/slurmrestd.socket] Unable to bind UNIX 
socket: Address already in use


Again, it doesn't have permissions to modify those files nor create 
files inside that directory.


On 12/28/22 20:35, Brian Andrus wrote:
> I have been running slurmrestd as a separate user for some time.

Under 22.05.07? Because that's what broke things for me. And I think 
that it's this change:


| -- slurmrestd - switch users earlier on startup to avoid sockets being
| made as root.

I'm not saying it's a bad change either - but I don't see any 
documentation on the proper way to handle it and I don't feel like 
editing the service file is the proper way to handle it.


Thanks!