Re: [systemd-devel] systemd and NFS "bg" mounts.
On Mon, May 29 2017, Lennart Poettering wrote: > On Fri, 26.05.17 12:46, NeilBrown (ne...@suse.com) wrote: > >> >> Hi all, >> it appears that systemd doesn't play well with NFS "bg" mounts. >> I can see a few options for how to address this and wonder if anyone >> has any opinions. > > Yeah, this has come up before. Long story short: "bg" is simply not > compatible with systemd. We assume that the /bin/mount's effect is > visible in /proc/self/mountinfo, and everything else is considered a > bug, i.e. /bin/mount lying to us. And I think that's a pretty rational > assumption and requirement to make. > > I am not particularly convinced the "bg" usecase is really such a > great idea, since it is necessarily racy: you never know whether mount > is actually in effect or not, even though /bin/mount claims so. I am > pretty sure other options (such as autofs mounts, which are dead-easy > to use in system: just replace the "bg" mount option in fstab by > "x-systemd.automount") are much better approaches to the problem at > hand: they also make your local system less dependent on remote > network access, but they do give proper guarantees about their > runtime: when the autofs mount is established the path is available. > > Hence I am tempted to treat the issue as a documentation and warning > issue: accept that "bg" is not supported, but document this better. In > addition, we should probably log about "bg" being broken in the > fstab-generator. I file a bug about that now: > > https://github.com/systemd/systemd/issues/6046 There is a weird distorted sort of justice here. When NFS first appeared, it broke various long-standing Unix practices, such as O_EXCL|O_CREAT for lock files. Now systemd appears and breaks a long-standing NFS practice: bg mounts. I hoped we could find a way to make them work, but I won't cry over their demise. I much prefer automount I think all NFS mounts should be automounts. I see this is already documented in systemd.mount: The NFS mount option bg for NFS background mounts as documented in nfs(5) is not supported in /etc/fstab entries. I wonder how many people actually read that. We should probably add symmetric documentation to nfs(5) Both should give clear pointers to x-systemd.automount. > >> This is better, but the background mount.nfs can persist for a long >> time. I don't think it persists forever, but at least half an hour I >> think. >> >> When the foo.mount unit is stopped, the mount.nfs process isn't >> killed. > > Hmm, if you see this, then this would be a bug: mount units that are > stopped should not leave processes around. > >> I don't think this is a major problem, but it is unfortunate and could >> be confusing. During testing I've had multiple mount.nfs background >> processes all attached to the one .mount unit. > > Humpf, could you file a bug? https://github.com/systemd/systemd/issues/6048 > > While I think the "bg" concept is broken, as discussed above, having > FUSE mounts with processes in the background is actually supported, > and we should clean them up properly when the mount unit is stopped. > > Hmm, maybe mount.nfs isn't properly killable? i.e. systemd tries to > kill it, but it refuses to be killed? mount.nfs responds cleanly to SIGTERM. > >> What should we do about bg NFS mounts with systemd? >> Some options: >>- declare "bg" to be not-supported. If you don't need the filesystem >> to be present for boot, then use x-systemd.automount, or some other >> automount scheme. >> If we did this, we probably need to make it very obvious that "bg" >> mounts aren't supported - maybe a log message that appears when >> you do "systemctl status ..." ?? > > I am all for this, as suggested above. I'd only log from > fstab-generator though. (That said, if we want something stronger, we > could also add the fact that we stripped "bg" from the mount optoins > to the DEscription= of the generated mount unit.) That last bit sounds like a very good idea. Stripping "bg" could be seen as a "surprising" thing for fstab-generator to do. Making it as obvious as possible to the sys-admin would be a good thing (and would probably make support personnel happy too). > >>- decide that "bg" is really just a lame attempt at automounting, and >> that now we have real automounting, "bg" can be translated to that. >> So systemd-fstab-generator would treat "bg" like >> "x-systemd.automount" and otherwise strip it from the list of >> options. > > I am a bit afraid of this I must say. The semantics are different > enough to likely cause more problems then we'd solve with this. Not > supporting this at all sounds like the much better approach here: > let's strip "bg" when specified. > >>- do our best to really support "bg". That means, at least, applying >> a much larger timeout to "bg" mounts, and preferably killing any >> background processes when a mount unit is stopped. Below is a
Re: [systemd-devel] systemd and NFS "bg" mounts.
On Fri, 26.05.17 12:46, NeilBrown (ne...@suse.com) wrote: > > Hi all, > it appears that systemd doesn't play well with NFS "bg" mounts. > I can see a few options for how to address this and wonder if anyone > has any opinions. Yeah, this has come up before. Long story short: "bg" is simply not compatible with systemd. We assume that the /bin/mount's effect is visible in /proc/self/mountinfo, and everything else is considered a bug, i.e. /bin/mount lying to us. And I think that's a pretty rational assumption and requirement to make. I am not particularly convinced the "bg" usecase is really such a great idea, since it is necessarily racy: you never know whether mount is actually in effect or not, even though /bin/mount claims so. I am pretty sure other options (such as autofs mounts, which are dead-easy to use in system: just replace the "bg" mount option in fstab by "x-systemd.automount") are much better approaches to the problem at hand: they also make your local system less dependent on remote network access, but they do give proper guarantees about their runtime: when the autofs mount is established the path is available. Hence I am tempted to treat the issue as a documentation and warning issue: accept that "bg" is not supported, but document this better. In addition, we should probably log about "bg" being broken in the fstab-generator. I file a bug about that now: https://github.com/systemd/systemd/issues/6046 > This is better, but the background mount.nfs can persist for a long > time. I don't think it persists forever, but at least half an hour I > think. > > When the foo.mount unit is stopped, the mount.nfs process isn't > killed. Hmm, if you see this, then this would be a bug: mount units that are stopped should not leave processes around. > I don't think this is a major problem, but it is unfortunate and could > be confusing. During testing I've had multiple mount.nfs background > processes all attached to the one .mount unit. Humpf, could you file a bug? While I think the "bg" concept is broken, as discussed above, having FUSE mounts with processes in the background is actually supported, and we should clean them up properly when the mount unit is stopped. Hmm, maybe mount.nfs isn't properly killable? i.e. systemd tries to kill it, but it refuses to be killed? > What should we do about bg NFS mounts with systemd? > Some options: >- declare "bg" to be not-supported. If you don't need the filesystem > to be present for boot, then use x-systemd.automount, or some other > automount scheme. > If we did this, we probably need to make it very obvious that "bg" > mounts aren't supported - maybe a log message that appears when > you do "systemctl status ..." ?? I am all for this, as suggested above. I'd only log from fstab-generator though. (That said, if we want something stronger, we could also add the fact that we stripped "bg" from the mount optoins to the DEscription= of the generated mount unit.) >- decide that "bg" is really just a lame attempt at automounting, and > that now we have real automounting, "bg" can be translated to that. > So systemd-fstab-generator would treat "bg" like > "x-systemd.automount" and otherwise strip it from the list of > options. I am a bit afraid of this I must say. The semantics are different enough to likely cause more problems then we'd solve with this. Not supporting this at all sounds like the much better approach here: let's strip "bg" when specified. >- do our best to really support "bg". That means, at least, applying > a much larger timeout to "bg" mounts, and preferably killing any > background processes when a mount unit is stopped. Below is a > little patch which does this last bit, but I'm not sure it is generally > safe. As mentioned I think this would just trade one race for a couple of new ones, and that appears to be a bad idea to me. > A side question is: should this knowledge about NFS be encoded in > systemd, or should nfs-utils add the necessary knowledge? I am pretty sure we should keep special understanding of NFS at a minimum in PID 1, but I think we can be less strict in fstab-generator, as its primary job is compat with UNIX anyway. > > i.e. we could add an nfs-fstab-generator to nfs-utils which creates > drop-ins to modify the behaviour of the drop-ins provided by > systemd-fstab-generator. > Adding the TimeoutSec= would be easy. Stripping the "bg" would be > possible. > Changing the remote-fs.target.requires/foo.mount symlink to be > remote-fs.target.requires/foo.automount would be problematic > though. Well, I'd be fine with letting NFS do its own handling of the NFS /etc/fstab entries, but I think the special casing of "bg" is fine to simply add to the existing generator in systemd. > Could we teach systemd-fstab-generator to ignore $TYPE filesystems > if TYPE-fstab-generator existed? Well, generators can
[systemd-devel] systemd and NFS "bg" mounts.
Hi all, it appears that systemd doesn't play well with NFS "bg" mounts. I can see a few options for how to address this and wonder if anyone has any opinions. "bg" mounts will try to mount the filesystem just like normal, but if the server cannot be contacted before a "major timeout" (4.5 minutes by default for TCP), mount.nfs will fork and continue in the background. Meanwhile the original mount process reports success (even though the filesystem wasn't mounted). This allows the boot process to continue and succeed. Currently if you specify the "bg" option in /etc/fstab and are using systemd, the "bg" has no useful effect. systemd imposes its own timeout of 90 seconds (which is less than 4.5 minutes). After 90 seconds, systemd will kill the mount process and decide that the mount failed. This will lead to remote-fs.target not being reached, and boot not completing. If you set TimeoutSec=0 (aka "infinity") for the mount unit, either by hacking fstab-generator or adding "x-systemd.mount-timeout=infinity" if you have systemd 233 or later, then systemd won't kill the mount process, and after 4.5 minutes it will exit. (to quote a comment in systemd/src/core/mount.c " /bin/mount lies to us and is broken" :-) This is better, but the background mount.nfs can persist for a long time. I don't think it persists forever, but at least half an hour I think. When the foo.mount unit is stopped, the mount.nfs process isn't killed. I don't think this is a major problem, but it is unfortunate and could be confusing. During testing I've had multiple mount.nfs background processes all attached to the one .mount unit. What should we do about bg NFS mounts with systemd? Some options: - declare "bg" to be not-supported. If you don't need the filesystem to be present for boot, then use x-systemd.automount, or some other automount scheme. If we did this, we probably need to make it very obvious that "bg" mounts aren't supported - maybe a log message that appears when you do "systemctl status ..." ?? - decide that "bg" is really just a lame attempt at automounting, and that now we have real automounting, "bg" can be translated to that. So systemd-fstab-generator would treat "bg" like "x-systemd.automount" and otherwise strip it from the list of options. - do our best to really support "bg". That means, at least, applying a much larger timeout to "bg" mounts, and preferably killing any background processes when a mount unit is stopped. Below is a little patch which does this last bit, but I'm not sure it is generally safe. A side question is: should this knowledge about NFS be encoded in systemd, or should nfs-utils add the necessary knowledge? i.e. we could add an nfs-fstab-generator to nfs-utils which creates drop-ins to modify the behaviour of the drop-ins provided by systemd-fstab-generator. Adding the TimeoutSec= would be easy. Stripping the "bg" would be possible. Changing the remote-fs.target.requires/foo.mount symlink to be remote-fs.target.requires/foo.automount would be problematic though. Could we teach systemd-fstab-generator to ignore $TYPE filesystems if TYPE-fstab-generator existed? Or should we just build all this filesystem-specific knowledge into systemd? Thanks for your thoughts, NeilBrown hackish patch to kill backgrounded mount.nfs processes: diff --git a/src/core/mount.c b/src/core/mount.c index ca0c4b0d5eed..91939b48d11a 100644 --- a/src/core/mount.c +++ b/src/core/mount.c @@ -883,6 +883,18 @@ static void mount_enter_unmounting(Mount *m) { MOUNT_UNMOUNTING_SIGKILL)) m->n_retry_umount = 0; +if (m->result == MOUNT_SUCCESS && +!m->from_proc_self_mountinfo) { +/* There is no mountpoint, but mount seemed to succeed. + * Could be a bg mount.nfs. + * In any case, kill any processes that might be hanging + * around, they cannot be doing anything useful. + */ +sd_bus_error error = SD_BUS_ERROR_NULL; +unit_kill_common(UNIT(m), KILL_ALL, SIGTERM, -1, -1, &error); +} + + m->control_command_id = MOUNT_EXEC_UNMOUNT; m->control_command = m->exec_command + MOUNT_EXEC_UNMOUNT; signature.asc Description: PGP signature ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel