Re: procfs difference between NetBSD and Linux
O_EXCL) == 0) + ndp->ni_cnd.cn_flags |= NONEXCLHACK; } else { ndp->ni_cnd.cn_nameiop = LOOKUP; ndp->ni_cnd.cn_flags |= LOCKLEAF; @@ -183,7 +185,12 @@ error = veriexec_openchk(l, ndp->ni_vp, pathstring, fmode); if (error) { /* We have to release the locks ourselves */ - if (fmode & O_CREAT) { + /* +* 20210604 dholland passing NONEXCLHACK means we can +* get ni_dvp == NULL back if ni_vp exists, and we should +* treat that like the non-O_CREAT case. +*/ + if ((fmode & O_CREAT) != 0 && ndp->ni_dvp != NULL) { if (vp == NULL) { vput(ndp->ni_dvp); } else { @@ -202,7 +209,10 @@ } #endif /* NVERIEXEC > 0 */ - if (fmode & O_CREAT) { + /* +* 20210604 dholland ditto +*/ + if ((fmode & O_CREAT) != 0 && ndp->ni_dvp != NULL) { if (ndp->ni_vp == NULL) { vattr_null(); va.va_type = VREG; Index: sys/sys/namei.src === RCS file: /cvsroot/src/sys/sys/namei.src,v retrieving revision 1.58 diff -u -r1.58 namei.src --- sys/sys/namei.src 30 May 2020 20:16:14 - 1.58 +++ sys/sys/namei.src 4 Jun 2021 20:04:58 - @@ -154,7 +154,8 @@ in ni_erootdir */ NAMEIFLLOCKSHARED 0x0100 /* want shared locks if possible */ NAMEIFLNOCHROOT0x0100 /* no chroot on abs path lookups */ -NAMEIFLMODMASK 0x010001fc /* mask of operational modifiers */ +NAMEIFLNONEXCLHACK 0x0200 /* open wwith O_CREAT but not O_EXCL */ +NAMEIFLMODMASK 0x030001fc /* mask of operational modifiers */ /* * Namei parameter descriptors. */ -- David A. Holland dholl...@netbsd.org
Re: procfs difference between NetBSD and Linux
Date:Fri, 04 Jun 2021 10:32:24 +1000 From:Simon Burge Message-ID: <20210604003224.cc9b44e...@thoreau.thistledown.com.au> | https://pubs.opengroup.org/onlinepubs/007908799/xsh/open.html doesn't | mention anything about what filesystem types back the path being opened. No, but there are lots of other things also not mentioned that also affect what posix requires.Eg: and somewhat bizarrely, if the process in question was started with one (or more) of fds 0 1 or 2 closed, then what happens would also be unspecified. There the underlying issue is that the open might (in fact, would) return the lowest of the closed ones of those which isn't what applications expect, and so bizarre things happen. But POSIX has no notion of types of bizarre, there is just unspecified (something happens, but implementations get to decide), undefined (anything goes, reasonable or not) and specified. As soon as you move out of the POSIX defined environment, everything becomes unspecified or undefined. Obviously, it wouldn't be useful to take liberties, we wouldn't want open("/dev/null", 0) to call abort() just because the system has a procfs mounted, and particularly if the application isn't using it. But POSIX doesn't say we cannot. This is simply outside the standard. | It does say that O_CREAT without O_EXCL should have no effect if the | files exists. Yes, and obviously, wherever possible, that's what should happen, it is just that you cannot say "required for POSIX conformance" when the file is on a filesystem that doesn't conform with POSIX. | That this particular instance is related to procfs | shouldn't make a difference, right? I'm not aware of any discussions related to procfs type filesystems related to POSIX (doesn't mean there have never been any) but this type of issue comes up from time to time related to NFS, which also has slightly different semantics than "normal" filesystems - and I believe the answer has always been that as soon as you step away from a POSIX environment, the requirements no longer apply. Files and operations on an NFS filesystem aren't required to behave the same way as files on a normal filesystem (which is good, as they don't). kre ps: if we were to be overly cynical, we could also say that to conform to POSIX all that is required (of the implementation, leaving aside for now all the paperwork etc required of the implementors) is that the system pass the POSIX conformance tests. Those have no procfs (or NFS) because those things are not POSIX. Hence testing O_CREAT on a /proc/$$/fd/N type file name will never be done (or it could be, but it would just be a regular file in a regular directory, and irrelevant here), and so cannot cause a system to fail, whatever it does.
Re: procfs difference between NetBSD and Linux
Date:Thu, 3 Jun 2021 18:45:53 - (UTC) From:mlel...@serpens.de (Michael van Elst) Message-ID: | procfs will anser EOPNOTSUPP on VOP_CREATE. But it never comes that | far. No, it doesn't. What I was suggesting doesn't come close to fitting the way things actually work, I should have considered it more before sending. | On the other hand, the logic in namei() might not be correct. I'm not sure it is that simple (that's what I though a half hour or so ago). | It looks like a check to prevent CREATE operations on a mountpoint, | but that's neither necessary nor compatible when the object | already exists. The issue (which is easier to see in much older versions of namei() than the current one) is that a parent vnode pointer is required for CREATE (and DELETE and RENAME) vnode ops, but across a mount point that makes no sense (or does it? Could we simply return the previous vnode in the path regardless of the filesys - or would that wreck the locking somewhere?) If the CREATE is for a mkdir() or link() (or mknod() mkfifo() ...) then all of this makes sense, the EEXIST is correct, and simply returning the existing vnode as it is might not be. But open(path, O_CREAT|..., ...) is different, it is only a CREATE if the path doesn't exist, otherwise it is simply an open. It could do 2 lookups, one to discover if the path exists (returning if it does), and then a second CREATE lookup if it doesn't - but that would be full of races or locking nightmares. kre
Re: procfs difference between NetBSD and Linux
Replying to myself... | I think I am going to experiment with simply removing that error case | and see if anything breaks. but that cannot work, the issue is that the operations in question return the parent vnode, which, when a mount point has been crossed, isn't possible. Simply returning success in that case won't work at all for all the other uses of the CREAT vnop, which expect that parent vnode. I considered dealing with EEXIST in open() (where it makes no sense, unless O_EXCL is set) but that is unlikely to work, as namei() when it returns an error isn't also going to be properly returning the target vnode. My guess at the minute is that to fix this we need a new vnop, OCREATE (optional create) (or something), which works identically to the CREATE operation, except that it doesn't fail if it cannot return the parent vnode - and then callers (which would probably only be open()) using this new op would need to deal with that when it happens. But that's beyond my pay grade here, someone who has worked a lot on namei() and the vnops needs to consider all this a lot more. kre
Re: procfs difference between NetBSD and Linux
Date:Fri, 4 Jun 2021 14:29:51 - (UTC) From:mlel...@serpens.de (Michael van Elst) Message-ID: | We need to understand why namei() does this check and how it can be | corrected. Yes, I was wondering about that, it seems to make no sense to me. A mountpoint, by definition, must exist, so the O_CREAT flag (without O_EXCL) will never be creating anything, so if we hit a mountpoint boundary, just at the resolution of the name, the result cannot be affected by O_CREAT (alone - O_CREAT|O_EXCL is always going to fail, mountpoint or not, if the target name exists). Simply removing whatever the test is should (hypothetically anyway) make no difference to anything, so discovering why the check was added would be useful. I've been taking a bit of a look at the history, and while the error wasn't always EEXIST (that's only from 2011) the test has been there for ages. At the minute I'm thinking it might be a deficit in the design of the vnops ... the error comes from a "create" operation on a mount point, which obviously is going to fail (as do delete and rename). The problem is that O_CREAT isn't always a create op, it can simply be a lookup, it only turns into an actual create operation if the target doesn't exist. Perhaps that means the way the create vnop works needs to be altered, or perhaps this test doesn't really need to be there, as if it is really intended to be a create (as in mkdir() or link()) it should simply fail when it detects the target exists (mount point or not) and if it is an "optional create" (as on O_CREAT on open) then if the target exists it isn't really a create at all. I think I am going to experiment with simply removing that error case and see if anything breaks. kre
Re: procfs difference between NetBSD and Linux
On Fri, Jun 04, 2021 at 02:29:51PM -, Michael van Elst wrote: > mar...@duskware.de (Martin Husemann) writes: > > >And any software requiring a procfs mount to work correctly is not portable > >(for some sense of portable). > > The bad behaviour is not restricted to procfs. Yes, but Samba will require a mounted /proc procfs even after the kernel fix. (I was kinda considering the original problem solved "soonish") Martin
Re: procfs difference between NetBSD and Linux
mar...@duskware.de (Martin Husemann) writes: >And any software requiring a procfs mount to work correctly is not portable >(for some sense of portable). The bad behaviour is not restricted to procfs. % pwd /tmp/x % ls -l total 4 -rw-r--r-- 1 mlelstv wheel 6 Jun 4 16:20 mountpoint2 % echo Hello > ordinaryfile % echo Hello > mountpoint2 mountpoint2: File exists. % ls -l total 12 -rw-r--r-- 1 mlelstv wheel 6 Jun 4 16:20 mountpoint2 -rw-r--r-- 1 mlelstv wheel 6 Jun 4 16:24 ordinaryfile % mount -v | grep overlay fud:/d/1/overlay on /tmp/x/mountpoint2 type nfs (fsid: 0xb03/0x70b, reads: sync 0 async 0, writes: sync 0 async 0) We need to understand why namei() does this check and how it can be corrected.
Re: procfs difference between NetBSD and Linux
On Fri, 4 Jun 2021 at 06:27, Martin Husemann wrote: > > On Fri, Jun 04, 2021 at 12:47:09AM +0700, Robert Elz wrote: > > ps: But I'm not sure this is a POSIX problem, POSIX has no procfs, > > and so anything that uses one is outside the bounds of what POSIX > > specifies, and into the great vastness of beyond all knowledge - > > ie: for POSIX, anything on a procfs is an unspecified operation. > > And any software requiring a procfs mount to work correctly is not portable > (for some sense of portable). > > We should document this requirement in the samba pkg, but I guess MESSAGE > is not a good place for that. Maybe make the rc scripts check for it? The plot thickens a bit - according to the original post and the test program supplied, one should have the same problem with FreeBSD (13 at least) and OpenBSD. I haven't gone yet through the test on OpenBSD (the Samba version supplied by its ports system is 4.9 still), but I bootstrapped pkgsrc on a clean FreeBSD 13.0 system and built, among the other stuff, Samba 4.14.4 (4.13.x is available using the new pkg system). It works as expected, so it would appear the problem is NetBSD-specific. > > Martin I guess it is interesting enough to bootstrap pkgsrc on OpenBSD and test it there as well. Chavdar --