Re: procfs difference between NetBSD and Linux

2021-06-04 Thread David Holland
 O_EXCL) == 0)
+   ndp->ni_cnd.cn_flags |= NONEXCLHACK;
} else {
ndp->ni_cnd.cn_nameiop = LOOKUP;
ndp->ni_cnd.cn_flags |= LOCKLEAF;
@@ -183,7 +185,12 @@
error = veriexec_openchk(l, ndp->ni_vp, pathstring, fmode);
if (error) {
    /* We have to release the locks ourselves */
-   if (fmode & O_CREAT) {
+   /*
+* 20210604 dholland passing NONEXCLHACK means we can
+* get ni_dvp == NULL back if ni_vp exists, and we should
+* treat that like the non-O_CREAT case.
+*/
+   if ((fmode & O_CREAT) != 0 && ndp->ni_dvp != NULL) {
if (vp == NULL) {
vput(ndp->ni_dvp);
} else {
@@ -202,7 +209,10 @@
}
 #endif /* NVERIEXEC > 0 */
 
-   if (fmode & O_CREAT) {
+   /*
+* 20210604 dholland ditto
+*/
+   if ((fmode & O_CREAT) != 0 && ndp->ni_dvp != NULL) {
if (ndp->ni_vp == NULL) {
vattr_null();
va.va_type = VREG;
Index: sys/sys/namei.src
===
RCS file: /cvsroot/src/sys/sys/namei.src,v
retrieving revision 1.58
diff -u -r1.58 namei.src
--- sys/sys/namei.src   30 May 2020 20:16:14 -  1.58
+++ sys/sys/namei.src   4 Jun 2021 20:04:58 -
@@ -154,7 +154,8 @@
   in ni_erootdir */
 NAMEIFLLOCKSHARED  0x0100  /* want shared locks if 
possible */
 NAMEIFLNOCHROOT0x0100  /* no chroot on abs path 
lookups */
-NAMEIFLMODMASK 0x010001fc  /* mask of operational 
modifiers */
+NAMEIFLNONEXCLHACK 0x0200  /* open wwith O_CREAT but not 
O_EXCL */
+NAMEIFLMODMASK 0x030001fc  /* mask of operational 
modifiers */
 /*
  * Namei parameter descriptors.
  */

-- 
David A. Holland
dholl...@netbsd.org


Re: procfs difference between NetBSD and Linux

2021-06-04 Thread Robert Elz
Date:Fri, 04 Jun 2021 10:32:24 +1000
From:Simon Burge 
Message-ID:  <20210604003224.cc9b44e...@thoreau.thistledown.com.au>

  | https://pubs.opengroup.org/onlinepubs/007908799/xsh/open.html doesn't
  | mention anything about what filesystem types back the path being opened.

No, but there are lots of other things also not mentioned that also
affect what posix requires.Eg: and somewhat bizarrely, if the process
in question was started with one (or more) of fds 0 1 or 2 closed, then
what happens would also be unspecified.   There the underlying issue is
that the open might (in fact, would) return the lowest of the closed ones
of those which isn't what applications expect, and so bizarre things happen.

But POSIX has no notion of types of bizarre, there is just unspecified
(something happens, but implementations get to decide), undefined (anything
goes, reasonable or not) and specified.   As soon as you move out of the
POSIX defined environment, everything becomes unspecified or undefined.

Obviously, it wouldn't be useful to take liberties, we wouldn't want
open("/dev/null", 0) to call abort() just because the system has a
procfs mounted, and particularly if  the application isn't using it.
But POSIX doesn't say we cannot.   This is simply outside the standard.

  | It does say that O_CREAT without O_EXCL should have no effect if the
  | files exists.

Yes, and obviously, wherever possible, that's what should happen, it is
just that you cannot say "required for POSIX conformance" when the file
is on a filesystem that doesn't conform with POSIX.

  | That this particular instance is related to procfs
  | shouldn't make a difference, right?

I'm not aware of any discussions related to procfs type filesystems
related to POSIX (doesn't mean there have never been any) but this
type of issue comes up from time to time related to NFS, which also
has slightly different semantics than "normal" filesystems - and I
believe the answer has always been that as soon as you step away from
a POSIX environment, the requirements no longer apply.

Files and operations on an NFS filesystem aren't required to behave the
same way as files on a normal filesystem (which is good, as they don't).

kre

ps: if we were to be overly cynical, we could also say that to conform
to POSIX all that is required (of the implementation, leaving aside for
now all the paperwork etc required of the implementors) is that the system
pass the POSIX conformance tests.   Those have no procfs (or NFS) because
those things are not POSIX.   Hence testing O_CREAT on a /proc/$$/fd/N
type file name will never be done (or it could be, but it would just be
a regular file in a regular directory, and irrelevant here), and so cannot
cause a system to fail, whatever it does.




Re: procfs difference between NetBSD and Linux

2021-06-04 Thread Robert Elz
Date:Thu, 3 Jun 2021 18:45:53 - (UTC)
From:mlel...@serpens.de (Michael van Elst)
Message-ID:  

  | procfs will anser EOPNOTSUPP on VOP_CREATE. But it never comes that
  | far.

No, it doesn't.

What I was suggesting doesn't come close to fitting the way things
actually work, I should have considered it more before sending.

  | On the other hand, the logic in namei() might not be correct.

I'm not sure it is that simple (that's what I though a half hour or so ago).

  | It looks like a check to prevent CREATE operations on a mountpoint,
  | but that's neither necessary nor compatible when the object
  | already exists.

The issue (which is easier to see in much older versions of namei() than
the current one) is that a parent vnode pointer is required for
CREATE (and DELETE and RENAME) vnode ops, but across a mount point that
makes no sense (or does it?   Could we simply return the previous vnode
in the path regardless of the filesys - or would that wreck the locking
somewhere?)

If the CREATE is for a mkdir() or link() (or mknod() mkfifo() ...) then
all of this makes sense, the EEXIST is correct, and simply returning the
existing vnode as it is might not be.

But open(path, O_CREAT|..., ...) is different, it is only a CREATE if the
path doesn't exist, otherwise it is simply an open.   It could do 2 lookups,
one to discover if the path exists (returning if it does), and then a
second CREATE lookup if it doesn't - but that would be full of races
or locking nightmares.

kre




Re: procfs difference between NetBSD and Linux

2021-06-04 Thread Robert Elz
Replying to myself...

  | I think I am going to experiment with simply removing that error case
  | and see if anything breaks.

but that cannot work, the issue is that the operations in question return
the parent vnode, which, when a mount point has been crossed, isn't possible.
Simply returning success in that case won't work at all for all the other
uses of the CREAT vnop, which expect that parent vnode.

I considered dealing with EEXIST in open() (where it makes no sense,
unless O_EXCL is set) but that is unlikely to work, as namei()
when it returns an error isn't also going to be properly returning the
target vnode.

My guess at the minute is that to fix this we need a new vnop, OCREATE
(optional create) (or something), which works identically to the CREATE
operation, except that it doesn't fail if it cannot return the parent
vnode - and then callers (which would probably only be open()) using
this new op would need to deal with that when it happens.

But that's beyond my pay grade here, someone who has worked a lot on
namei() and the vnops needs to consider all this a lot more.

kre



Re: procfs difference between NetBSD and Linux

2021-06-04 Thread Robert Elz
Date:Fri, 4 Jun 2021 14:29:51 - (UTC)
From:mlel...@serpens.de (Michael van Elst)
Message-ID:  

  | We need to understand why namei() does this check and how it can be
  | corrected.

Yes, I was wondering about that, it seems to make no sense to me.

A mountpoint, by definition, must exist, so the O_CREAT flag (without
O_EXCL) will never be creating anything, so if we hit a mountpoint
boundary, just at the resolution of the name, the result cannot be
affected by O_CREAT (alone - O_CREAT|O_EXCL is always going to fail,
mountpoint or not, if the target name exists).

Simply removing whatever the test is should (hypothetically anyway)
make no difference to anything, so discovering why the check was added
would be useful.

I've been taking a bit of a look at the history, and while the error
wasn't always EEXIST (that's only from 2011) the test has been there
for ages.   At the minute I'm thinking it might be a deficit in the
design of the vnops ... the error comes from a "create" operation on
a mount point, which obviously is going to fail (as do delete and rename).
The problem is that O_CREAT isn't always a create op, it can simply be
a lookup, it only turns into an actual create operation if the target
doesn't exist.

Perhaps that means the way the create vnop works needs to be altered, or
perhaps this test doesn't really need to be there, as if it is really
intended to be a create (as in mkdir() or link()) it should simply fail
when it detects the target exists (mount point or not) and if it is an
"optional create" (as on O_CREAT on open) then if the target exists
it isn't really a create at all.

I think I am going to experiment with simply removing that error case
and see if anything breaks.

kre



Re: procfs difference between NetBSD and Linux

2021-06-04 Thread Martin Husemann
On Fri, Jun 04, 2021 at 02:29:51PM -, Michael van Elst wrote:
> mar...@duskware.de (Martin Husemann) writes:
> 
> >And any software requiring a procfs mount to work correctly is not portable
> >(for some sense of portable).
> 
> The bad behaviour is not restricted to procfs.

Yes, but Samba will require a mounted /proc procfs even after the
kernel fix. (I was kinda considering the original problem solved "soonish")

Martin


Re: procfs difference between NetBSD and Linux

2021-06-04 Thread Michael van Elst
mar...@duskware.de (Martin Husemann) writes:

>And any software requiring a procfs mount to work correctly is not portable
>(for some sense of portable).

The bad behaviour is not restricted to procfs.

% pwd
/tmp/x
% ls -l
total 4
-rw-r--r--  1 mlelstv  wheel  6 Jun  4 16:20 mountpoint2
% echo Hello > ordinaryfile
% echo Hello > mountpoint2
mountpoint2: File exists.
% ls -l
total 12
-rw-r--r--  1 mlelstv  wheel  6 Jun  4 16:20 mountpoint2
-rw-r--r--  1 mlelstv  wheel  6 Jun  4 16:24 ordinaryfile

% mount -v | grep overlay
fud:/d/1/overlay on /tmp/x/mountpoint2 type nfs (fsid: 0xb03/0x70b, reads: sync 
0 async 0, writes: sync 0 async 0)

We need to understand why namei() does this check and how it can be
corrected.



Re: procfs difference between NetBSD and Linux

2021-06-04 Thread Chavdar Ivanov
On Fri, 4 Jun 2021 at 06:27, Martin Husemann  wrote:
>
> On Fri, Jun 04, 2021 at 12:47:09AM +0700, Robert Elz wrote:
> > ps: But I'm not sure this is a POSIX problem, POSIX has no procfs,
> > and so anything that uses one is outside the bounds of what POSIX
> > specifies, and into the great vastness of beyond all knowledge -
> > ie: for POSIX, anything on a procfs is an unspecified operation.
>
> And any software requiring a procfs mount to work correctly is not portable
> (for some sense of portable).
>
> We should document this requirement in the samba pkg, but I guess MESSAGE
> is not a good place for that. Maybe make the rc scripts check for it?

The plot thickens a bit - according to the original post and the test
program supplied, one should have the same problem with FreeBSD (13 at
least) and OpenBSD. I haven't gone yet through the test on OpenBSD
(the Samba version supplied by its ports system is 4.9 still), but I
bootstrapped pkgsrc on a clean FreeBSD 13.0 system and built, among
the other stuff, Samba 4.14.4 (4.13.x is available using the new pkg
system). It works as expected, so it would appear the problem is
NetBSD-specific.

>
> Martin

I guess it is interesting enough to bootstrap pkgsrc on OpenBSD and
test it there as well.

Chavdar




--