Re: [systemd-devel] umount NFS problem

2019-04-05 Thread Lennart Poettering
On Fr, 05.04.19 12:42, Harald Dunkel (harald.dun...@aixigo.de) wrote:

> On 4/5/19 12:21 PM, Lennart Poettering wrote:
> > On Fr, 05.04.19 11:53, Harald Dunkel (harald.dun...@aixigo.de) wrote:
> >
> > >
> > > This is a VNC session, started via crontab @reboot.
> >
> > IIRC debian/ubuntu do not have pam-systemd in their PAM configuration
> > for cron, which means these services are not tracked by
> > logind/systemd, and hence only killed when crond likes to do that.
> >
> > It's a configuration bug in debian/ubuntu.
> >
>
> No, it was just a sample. Surely systemd is sufficiently stable
> to recover from some lost processes?

Sure, it is. I mean, your system did shutdown in the end, didn't it?
After the timeouts are hit it will go down anyway, ignoring those
left-over processes.

> The point is that rpcbind (and maybe others) are stopped before
> the NFS umounts come up. Hopefully you agree that this is
> unrelated to some cron jobs?

iirc rpcbind is not needed for NFS to work after the mount is
established. if rpcbind is necessary for NFS mounts it should be
ordered before remote-fs.target, so that NFS moutns are shutdown
before rpcbind goes down. But all of that is really something for the
distro to figure out, systemd upstream doesn't really care about NFS
we just provide the hooks so that distros can order their service
files to the right places.

Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] umount NFS problem

2019-04-05 Thread Harald Dunkel

On 4/5/19 12:21 PM, Lennart Poettering wrote:

On Fr, 05.04.19 11:53, Harald Dunkel (harald.dun...@aixigo.de) wrote:



This is a VNC session, started via crontab @reboot.


IIRC debian/ubuntu do not have pam-systemd in their PAM configuration
for cron, which means these services are not tracked by
logind/systemd, and hence only killed when crond likes to do that.

It's a configuration bug in debian/ubuntu.



No, it was just a sample. Surely systemd is sufficiently stable
to recover from some lost processes?

The point is that rpcbind (and maybe others) are stopped before
the NFS umounts come up. Hopefully you agree that this is
unrelated to some cron jobs?


Regards
Harri
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] umount NFS problem

2019-04-05 Thread Mantas Mikulėnas
On Fri, Apr 5, 2019 at 12:53 PM Harald Dunkel 
wrote:

> Hi Lennart,
>
> On 4/5/19 10:28 AM, Lennart Poettering wrote:
> >
> > For some reason a number of X session processes stick around to the
> > very end and thus keep your /home busy.
> >
> > [82021.052357] systemd-shutdown[1]: Sending SIGKILL to remaining
> processes...
> > [82021.101976] systemd-shutdown[1]: Sending SIGKILL to PID 2513
> (gpg-agent).
> > [82021.130507] systemd-shutdown[1]: Sending SIGKILL to PID 2886
> (xstartup).
> > [82021.158510] systemd-shutdown[1]: Sending SIGKILL to PID 2896
> (xstartup).
> > [82021.186052] systemd-shutdown[1]: Sending SIGKILL to PID 2959 (xterm).
> > [82021.213129] systemd-shutdown[1]: Sending SIGKILL to PID 2960 (xterm).
> > [82021.239971] systemd-shutdown[1]: Sending SIGKILL to PID 2961 (xterm).
> > [82021.266285] systemd-shutdown[1]: Sending SIGKILL to PID 2966 (bash).
> > [82021.292234] systemd-shutdown[1]: Sending SIGKILL to PID 2967 (bash).
> > [82021.318061] systemd-shutdown[1]: Sending SIGKILL to PID 9146
> (utempter).
> > [82021.343331] systemd-shutdown[1]: Sending SIGKILL to PID 9147
> (utempter).
> >
> > The question is how though. How do you start your X session? gdm?
> > startx from the console?
> >
>
> This is a VNC session, started via crontab @reboot.
>

Well that's a pretty significant omission... If (which I'm guessing is the
case) the processes are started without going through regular PAM modules
like a user login would, then they won't be tracked *as* a user session –
they just remain as part of the 'cron' service, and systemd has no
knowledge about them needing to be killed before stopping NFS. This is the
problem if you see all the X11 processes in `systemctl status cron` but no
session entry in `loginctl`.

(While cron has its own PAM stack, due to being a non-interactive tool it
uses a different module configuration – IIRC it's also special-cased in
pam_systemd, but some distros don't even include pam_systemd in its config
to avoid other issues.)

So if your VNC server doesn't call pam_open_session() on startup, you
should write a wrapper that does; similar to how existing display managers
xdm/gdm/sddm work. (It doesn't need to call any of the auth/account
functions from PAM.)

Alternatively you could convert this into a system service that has User=
and PAMName= settings; I've seen people do that to automatically start Xorg
sessions and it should work all the same with VNC. (Generally, what's the
point of using @reboot if you can write a .service?)

You can also order the entire cron.service after remote-fs.target in order
to at least avoid the race.

-- 
Mantas Mikulėnas
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] umount NFS problem

2019-04-05 Thread Lennart Poettering
On Fr, 05.04.19 11:53, Harald Dunkel (harald.dun...@aixigo.de) wrote:

> Hi Lennart,
>
> On 4/5/19 10:28 AM, Lennart Poettering wrote:
> >
> > For some reason a number of X session processes stick around to the
> > very end and thus keep your /home busy.
> >
> > [82021.052357] systemd-shutdown[1]: Sending SIGKILL to remaining 
> > processes...
> > [82021.101976] systemd-shutdown[1]: Sending SIGKILL to PID 2513 (gpg-agent).
> > [82021.130507] systemd-shutdown[1]: Sending SIGKILL to PID 2886 (xstartup).
> > [82021.158510] systemd-shutdown[1]: Sending SIGKILL to PID 2896 (xstartup).
> > [82021.186052] systemd-shutdown[1]: Sending SIGKILL to PID 2959 (xterm).
> > [82021.213129] systemd-shutdown[1]: Sending SIGKILL to PID 2960 (xterm).
> > [82021.239971] systemd-shutdown[1]: Sending SIGKILL to PID 2961 (xterm).
> > [82021.266285] systemd-shutdown[1]: Sending SIGKILL to PID 2966 (bash).
> > [82021.292234] systemd-shutdown[1]: Sending SIGKILL to PID 2967 (bash).
> > [82021.318061] systemd-shutdown[1]: Sending SIGKILL to PID 9146 (utempter).
> > [82021.343331] systemd-shutdown[1]: Sending SIGKILL to PID 9147 (utempter).
> >
> > The question is how though. How do you start your X session? gdm?
> > startx from the console?
> >
>
> This is a VNC session, started via crontab @reboot.

IIRC debian/ubuntu do not have pam-systemd in their PAM configuration
for cron, which means these services are not tracked by
logind/systemd, and hence only killed when crond likes to do that.

It's a configuration bug in debian/ubuntu.

Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] umount NFS problem

2019-04-05 Thread Harald Dunkel

Hi Lennart,

On 4/5/19 10:28 AM, Lennart Poettering wrote:


For some reason a number of X session processes stick around to the
very end and thus keep your /home busy.

[82021.052357] systemd-shutdown[1]: Sending SIGKILL to remaining processes...
[82021.101976] systemd-shutdown[1]: Sending SIGKILL to PID 2513 (gpg-agent).
[82021.130507] systemd-shutdown[1]: Sending SIGKILL to PID 2886 (xstartup).
[82021.158510] systemd-shutdown[1]: Sending SIGKILL to PID 2896 (xstartup).
[82021.186052] systemd-shutdown[1]: Sending SIGKILL to PID 2959 (xterm).
[82021.213129] systemd-shutdown[1]: Sending SIGKILL to PID 2960 (xterm).
[82021.239971] systemd-shutdown[1]: Sending SIGKILL to PID 2961 (xterm).
[82021.266285] systemd-shutdown[1]: Sending SIGKILL to PID 2966 (bash).
[82021.292234] systemd-shutdown[1]: Sending SIGKILL to PID 2967 (bash).
[82021.318061] systemd-shutdown[1]: Sending SIGKILL to PID 9146 (utempter).
[82021.343331] systemd-shutdown[1]: Sending SIGKILL to PID 9147 (utempter).

The question is how though. How do you start your X session? gdm?
startx from the console?



This is a VNC session, started via crontab @reboot.

Regards
Harri
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] umount NFS problem

2019-04-05 Thread Harald Dunkel

On 4/5/19 8:45 AM, Mantas Mikulėnas wrote:


Normally I'd expect user sessions (user-*.slice, session-*.scope, 
user@*.service) to be killed before mount units are stopped; I wonder how 
random gpg-agent processes have managed to escape that. (Actually, doesn't 
Debian now manage gpg-agent via user@.service? That *really* should be cleaning 
up everything properly...)



Probably a remote login.

@Michael, libpam-systemd is installed. UsePAM is enabled, too.


You might also try to enable [Mount] LazyUnmount= for home.mount so that 
umounts appear to succeed immediately and the kernel cleans them up when it 
can. It mostly just hides the problem though.



Looking at the log file I have the impression that rpcbind has
been stopped even before the first umount attempt of /home. Can
you confirm this?


Regards
Harri
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] umount NFS problem

2019-04-05 Thread Lennart Poettering
On Fr, 05.04.19 08:28, Harald Dunkel (harald.dun...@aixigo.de) wrote:
65;5403;1c
> Hi folks,
>
> I've got a device-busy-problem with /home, mounted via NFS.
> Shutdown of the host takes more than 180 secs. See attached
> log file.
>
> Apparently the umount of /home at 81925.154995 failed, (device
> busy, in my case it was a lost gpg-agent). This error was
> ignored, the NFS framework was shut down, the network was
> stopped, and then it was too late to properly handle the /home
> mount point.
>
> AFAIK the mount units are generated from /etc/fstab, so I wonder
> if this could be improved?
>
> The hosts (about 50 developer PCs) are running Debian 9, systemd
> 232-25+deb9u9. Unfortunately we are bound to this platform at
> least for another year.
>
>
> Every helpful hint is highly appreciated.

For some reason a number of X session processes stick around to the
very end and thus keep your /home busy.

[82021.052357] systemd-shutdown[1]: Sending SIGKILL to remaining processes...
[82021.101976] systemd-shutdown[1]: Sending SIGKILL to PID 2513 (gpg-agent).
[82021.130507] systemd-shutdown[1]: Sending SIGKILL to PID 2886 (xstartup).
[82021.158510] systemd-shutdown[1]: Sending SIGKILL to PID 2896 (xstartup).
[82021.186052] systemd-shutdown[1]: Sending SIGKILL to PID 2959 (xterm).
[82021.213129] systemd-shutdown[1]: Sending SIGKILL to PID 2960 (xterm).
[82021.239971] systemd-shutdown[1]: Sending SIGKILL to PID 2961 (xterm).
[82021.266285] systemd-shutdown[1]: Sending SIGKILL to PID 2966 (bash).
[82021.292234] systemd-shutdown[1]: Sending SIGKILL to PID 2967 (bash).
[82021.318061] systemd-shutdown[1]: Sending SIGKILL to PID 9146 (utempter).
[82021.343331] systemd-shutdown[1]: Sending SIGKILL to PID 9147 (utempter).

The question is how though. How do you start your X session? gdm?
startx from the console?

Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] umount NFS problem

2019-04-04 Thread Michael Biebl
Am Fr., 5. Apr. 2019 um 08:45 Uhr schrieb Mantas Mikulėnas :

> The job order (home.mount vs nfs-client.target) already looks correct, so 
> fstab options probably won't help much; I'd try to ensure that the umount 
> doesn't fail in the first place.
>
> Normally I'd expect user sessions (user-*.slice, session-*.scope, 
> user@*.service) to be killed before mount units are stopped; I wonder how 
> random gpg-agent processes have managed to escape that. (Actually, doesn't 
> Debian now manage gpg-agent via user@.service? That *really* should be 
> cleaning up everything properly...)

I would check if libpam-systemd is installed and enabled.
Do you have remote logins via SSH? Does sshd_config have "UsePAM yes"

-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] umount NFS problem

2019-04-04 Thread Mantas Mikulėnas
On Fri, Apr 5, 2019 at 9:28 AM Harald Dunkel 
wrote:

> Hi folks,
>
> I've got a device-busy-problem with /home, mounted via NFS.
> Shutdown of the host takes more than 180 secs. See attached
> log file.
>
> Apparently the umount of /home at 81925.154995 failed, (device
> busy, in my case it was a lost gpg-agent). This error was
> ignored, the NFS framework was shut down, the network was
> stopped, and then it was too late to properly handle the /home
> mount point.
>
> AFAIK the mount units are generated from /etc/fstab, so I wonder
> if this could be improved?
>

The job order (home.mount vs nfs-client.target) already looks correct, so
fstab options probably won't help much; I'd try to ensure that the umount
doesn't fail in the first place.

Normally I'd expect user sessions (user-*.slice, session-*.scope,
user@*.service)
to be killed before mount units are stopped; I wonder how random gpg-agent
processes have managed to escape that. (Actually, doesn't Debian now manage
gpg-agent via user@.service? That *really* should be cleaning up everything
properly...)

You might also try to enable [Mount] LazyUnmount= for home.mount so that
umounts appear to succeed immediately and the kernel cleans them up when it
can. It mostly just hides the problem though.

-- 
Mantas Mikulėnas
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

[systemd-devel] umount NFS problem

2019-04-04 Thread Harald Dunkel

Hi folks,

I've got a device-busy-problem with /home, mounted via NFS.
Shutdown of the host takes more than 180 secs. See attached
log file.

Apparently the umount of /home at 81925.154995 failed, (device
busy, in my case it was a lost gpg-agent). This error was
ignored, the NFS framework was shut down, the network was
stopped, and then it was too late to properly handle the /home
mount point.

AFAIK the mount units are generated from /etc/fstab, so I wonder
if this could be improved?

The hosts (about 50 developer PCs) are running Debian 9, systemd
232-25+deb9u9. Unfortunately we are bound to this platform at
least for another year.


Every helpful hint is highly appreciated.

Harri
--
aixigo AG, Karl-Friedrich-Strasse 68, 52072 Aachen, Germany
phone: +49 241 559709-79, fax: +49 241 559709-99
eMail: harald.dun...@aixigo.de, web: http://www.aixigo.de
Amtsgericht Aachen - HRB 8057, Vorstand: Erich Borsch, Christian Friedrich, 
Tobias Haustein, Vors. des Aufsichtsrates: Prof. Dr. Ruediger von Nitzsch


shutdown-log.txt.gz
Description: application/gzip
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel