Re: fcntl(F_GETPATH) support

2019-09-15 Thread Mateusz Guzik
On 9/15/19, Jason Thorpe  wrote:
>
>
>> On Sep 15, 2019, at 11:03 AM, Christos Zoulas 
>> wrote:
>>
>> I think it is quite reliable because all the file descriptors would be
>> recently
>> opened and therefore be in the cache.  One would need to DoS the cache
>> cause eviction. If that turns out to be false, we can make the namecache
>> reliable, or withdraw it.
>
> If we had a way to ask the file system for the / a name+parent in the event
> we don't find an entry in the name cache, then we don't have to make the
> name cache "reliable".

See my other e-mail about turning realpath into a syscall. Then you have
everything to work with and the code should be trivial unless you want
to handle hardlinks. In the most simplistic implementation you do the
lookup and then do the reverse lookup of what you found. Things still
can get evicted in the time window in-between but that's probably
tolerable.

>
> FWIW, I don't think making the name cache "reliable" is necessarily a good
> idea -- it could be abused to consume kernel memory by creating lots of
> long-named files / directories.
>

I don't think it's a material change in terms of memory use. You can
already force this scenario by creating files and stating them.
Getting rid of namecache entries is not going to save you much compared
to all the vnodes which need to be dropped/recycled (which will also
remove the nc entries).

> In the event that the name really can't be determined (perhaps it's an
> open-unlinked file?), then I think it's perfectly reasonable to return
> ENOENT for F_GETPATH; callers need to be capable of handling that.
>

Corner case of the sort is probably not accounted for either and not
expected to be ran into -- if you remove the files from under the compiler
you are tampering with the build and it is free to fail.

On the other hand name eviction is not a problem you expect to run
into.

I can't easily test right now for sure, but historically BSDs have not
been adding newly created files to the namecache and perhaps
NetBSD is not doing it either. In which case F_GETPATH on newly
created files is guaranteed to fail unless someone looked them up
separately. Whether this should be modified is imho a separate
discussion, but is relevant to points made earlier about memory
use and DoSability. Imho it should be fine to change the kernel
to always add these.

-- 
Mateusz Guzik 



Re: CVS commit: src/sys/kern

2019-09-15 Thread Michael van Elst
On Mon, Sep 16, 2019 at 03:39:35AM +0200, Emmanuel Dreyfus wrote:
> Thor Lancelot Simon  wrote:
> 
> > I've never been a fan of the wedge: syntax (as was noted at the time, it
> > conflicts with the syntax for specifying network filesystem servers,
> > which actually broke automation I had in place); I would not mind seeing
> > it go.
> 
> IMO, both wedge: and NAME= are ugly, but beside your point I would be in
> favor of NAME= because that is what I used in x86 bootloaders, and
> lazyness commands not to redo it. :-)

A naming scheme like PROTOCOL ":" SPECIFIER isn't that bad. I chose
the NAME= prefix instead for several reasons. Compatibility with
NFS mount paths (that we get told are deprecated whenever we use
them), lookalike with other OSes, and also to not further reduce
path lengths as the string sometimes needs to be stored in small
buffers.


> When you talk about letting it go, you means removing it, or just
> keeping it as is undocumented? Removing could break some setups badly.

We need to keep it for compatibility for some time, and that's how
it should be documented.


> About documentation, wedge: is present in
> share/man/man8/man8.x86/boot.8
> share/man/man9/cpu_rootconf.9
> sys/arch/evbarm/conf/IYONIX
> sys/arch/evbarm/fdt/fdt_machdep.c
> sys/arch/iyonix/conf/GENERIC
> usr.bin/config/config.5

sys/arch/evbarm/fdt/fdt_machdep.c is a different beast. The boot
device selection in it should go completely as is.

There could be a MI version that just allows to feed a bootspec to
the kernel for systems that support FDT but cannot specify boot_args.

We could also adopt its selection of network interfaces per MAC
address for MI code. Just need to come up with a proper encoding
as bootspec.



Greetings,
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: fcntl(F_GETPATH) support

2019-09-15 Thread Kamil Rytarowski
On 15.09.2019 20:03, Christos Zoulas wrote:
> I think it is quite reliable because all the file descriptors would be 
> recently
> opened and therefore be in the cache.  One would need to DoS the cache
> cause eviction. If that turns out to be false, we can make the namecache
> reliable, or withdraw it.

As discussed with Mateusz Guzik offlist, the dentry approach looks
reliable and as a way to go.

It changes fd -> vnode -> inode to fd -> dentry -> vnode.

We could switch from catching program name on exec to proper pathname
resolution with KERN_PROC_PATHNAME.

Certain programs require always correct path to be resolved from
hardlinks, not the last one from the cache. This used to affect LLVM.

There is also a hole in the current namecache implementation as it
misses entry for newly created files (as informed by Mateusz). Example:

#include 
#include 
#include 
#include 
#include 

int
main(void)
{
char buf[1024];
int fd;

fd = open("/tmp/test", O_RDWR|O_CREAT|O_EXCL,0600);
if (fd == -1)
err(1, "open");
if (fcntl(fd, F_GETPATH, buf) < 0)
err(1, "fcntl");
printf("[%s]\n", buf);
}

For the time being, the current code is still an improvement over the
past state.



signature.asc
Description: OpenPGP digital signature


Re: CVS commit: src/sys/kern

2019-09-15 Thread Emmanuel Dreyfus
Thor Lancelot Simon  wrote:

> I've never been a fan of the wedge: syntax (as was noted at the time, it
> conflicts with the syntax for specifying network filesystem servers,
> which actually broke automation I had in place); I would not mind seeing
> it go.

IMO, both wedge: and NAME= are ugly, but beside your point I would be in
favor of NAME= because that is what I used in x86 bootloaders, and
lazyness commands not to redo it. :-)

When you talk about letting it go, you means removing it, or just
keeping it as is undocumented? Removing could break some setups badly.

About documentation, wedge: is present in
share/man/man8/man8.x86/boot.8
share/man/man9/cpu_rootconf.9
sys/arch/evbarm/conf/IYONIX
sys/arch/evbarm/fdt/fdt_machdep.c
sys/arch/iyonix/conf/GENERIC
usr.bin/config/config.5

I suspect  the wedge: references there could now just be replaced by
NAME= after  src/sys/kern/kern_subr.c 1.226-1.227 was committed.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: fcntl(F_GETPATH) support

2019-09-15 Thread Jason Thorpe



> On Sep 15, 2019, at 1:17 PM, Mouse  wrote:
> 
> [top-posting damage repaired manually]
>>> If you can get a file desriptor to a symlink, it will work; I don't
>>> think that we have a way to do this now.
>> FYI - XNU has O_SYMLINK for this.
> 
> What operations does it support on fds open onto symlinks?

fstat(), fchflags(), etc.  I don't recall if read() is allowed, but I'm pretty 
sure write() is not.

-- thorpej



Re: fcntl(F_GETPATH) support

2019-09-15 Thread Mouse
>>> If you can get a file desriptor to a symlink, [...]
> There should be no way to get a fd for a symlink; if you can,
> something bad has happened.

At present, perhaps.

> Because of the way short symlinks are handled in ffs, I wouldn't be
> surprised if managing to get one let you corrupt the fs.

Depends on what operations are defined on such a descriptor, and how
they are implemented.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: fcntl(F_GETPATH) support

2019-09-15 Thread Mouse
[top-posting damage repaired manually]
>> If you can get a file desriptor to a symlink, it will work; I don't
>> think that we have a way to do this now.
> FYI - XNU has O_SYMLINK for this.

What operations does it support on fds open onto symlinks?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: fcntl(F_GETPATH) support

2019-09-15 Thread David Holland
On Sun, Sep 15, 2019 at 09:53:41AM -0700, Jason Thorpe wrote:
 > FYI - XNU has O_SYMLINK for this.  Not saying we should necessarily add it 
 > to NetBSD, just noting the precedent.
 > 
 > -- thorpej
 > Sent from my iPhone.
 > 
 > > On Sep 14, 2019, at 5:06 PM, Christos Zoulas  wrote:
 > > 
 > > If you can get a file desriptor to a symlink, it will work; I don't think
 > > that we have a way to do this now.

There should be no way to get a fd for a symlink; if you can,
something bad has happened.

Because of the way short symlinks are handled in ffs, I wouldn't be
surprised if managing to get one let you corrupt the fs.

-- 
A. No
Q. Should I include quotations after my reply?



Re: NCHNAMLEN vnode cache limitation removal

2019-09-15 Thread David Holland
On Sat, Sep 14, 2019 at 09:51:12AM -0700, Jason Thorpe wrote:
 > >> Given the list of smart people I've seen discussing this, I'm
 > >> presumably just missing something, but what?
 > > 
 > > Loonix.
 > 
 > Linux isn't the only system that has such a capability.

No, but I'm pretty sure it was the first to commit to it (was quite a
ways back now) and it's where the assumption you can do this gets
baked into dodgy software.

-- 
David A. Holland
dholl...@netbsd.org


Re: fcntl(F_GETPATH) support

2019-09-15 Thread Jason Thorpe



> On Sep 15, 2019, at 11:03 AM, Christos Zoulas  wrote:
> 
> I think it is quite reliable because all the file descriptors would be 
> recently
> opened and therefore be in the cache.  One would need to DoS the cache
> cause eviction. If that turns out to be false, we can make the namecache
> reliable, or withdraw it.

If we had a way to ask the file system for the / a name+parent in the event we 
don't find an entry in the name cache, then we don't have to make the name 
cache "reliable".

FWIW, I don't think making the name cache "reliable" is necessarily a good idea 
-- it could be abused to consume kernel memory by creating lots of long-named 
files / directories.

In the event that the name really can't be determined (perhaps it's an 
open-unlinked file?), then I think it's perfectly reasonable to return ENOENT 
for F_GETPATH; callers need to be capable of handling that.

-- thorpej



Re: fcntl(F_GETPATH) support

2019-09-15 Thread Mateusz Guzik
On 9/15/19, Mateusz Guzik  wrote:
> On 9/14/19, Christos Zoulas  wrote:
>>
>> Comments?
>>
>> +error = vnode_to_path(kpath, MAXPATHLEN, fp->f_vnode, l, l->l_proc);
>
> What motivates this change?
>
> I think it is a little problematic in that namecache resolution is not
> guaranteed to work. If the consumer does not check for failure or fallback
> to something else the user is going to experience "once in the blue moon"
> failures.
>
> For example clang likes to get full paths to names. On Linux it can
> trivially do it thanks to dentry cache. On other systems there is a
> compilation check for F_GETPATH. If none of this is present it goes for
> realpath, which is slover but basically works. If F_GETPATH is present,
> it is always used and there is no fallback if it fails. Then you risk
> setting yourself up for a weird compilation failure, which may not even
> repeat itself after you retry.
>
> All in all I don't think this should be added unless namecache becomes
> reliable. The above reasoning is why I did not add it to FreeBSD.
>

Now that I wrote this I suspect a kernel-based realpath could do the
trick here. Will have to take a closer look.

-- 
Mateusz Guzik 



Re: NCHNAMLEN vnode cache limitation removal

2019-09-15 Thread Mateusz Guzik
On 9/14/19, Mouse  wrote:
>> [...], because fexecve is causing rumbles about doing significantly
>> more reverse lookups.
>
> Why is everyone so concerned about finding "the" name of an inode, or
> indeed any name for an inode?  As far as I can tell there has never
> been any promise that any given inode has any names pointing to it, at
> least not unless it's got no other references to it (in which case it
> would be freed if it had no names).
>
> Given the list of smart people I've seen discussing this, I'm
> presumably just missing something, but what?
>

I think an always working name resolution is nicer to users when it comes
to diagnosing issues in userspace. In particular if someone spawns a new
program, that program performs a lot of ops in given fd, you can check
what file it is if name resolution works (or you have a facility similar
to linux's /proc//fd). Otherwise you are left with dirty hacks.

There is a very different angle to this though and it has to do with
performance, especially in face of concurrent access. When doing SMP
you want to avoid writes to shared areas as they induce caches misses
on next access by other CPUs. If everyone accesses read-only they can
keep doing it without interfering with others. If everyone writes to
it, everyone else (minus 1) stalls waiting for their turn.

Vast majority of all lookups only care about the last path component.
Currently the kernel leapfrogs through all of them, e.g. consider the
lookup of /foo/bar/baz/qux.

Along the way it refs foo, locks foo, finds bar, references bar, locks
bar, unlocks and unrefs foo... repeat that with s/bar/baz/
and s/foo/bar/. Note that's very simplified (e.g. i skipped the
beginning of the lookup and other work like permission checking).
Crossing filesystems is another hurdle. This is only to highlight
the point below.

That's a lot of work single-threaded which can be partially avoided in
the common case, provided all the info is structured for it. But most
importantly that's a lot of atomic ops on shared areas which severely
limit your performance in case of concurrent access (especially on
multi-socket systems). It's a long way to get to a point where you can
do this in a write-free manner and FreeBSD is very early in there.

If everything is cached and you can roll forward in the common case
without refing/locking anything but the last component you win big
time. (Of course you can't "just" do it, the leapfrogging is there for
a reason but it can be worked out with tracking the state and having
safe memory reclamation.)

TL;DR it's minor quality of life improvement for users and a de facto
mandatory part of VFS if it is to scale on big systems.

-- 
Mateusz Guzik 



Re: fcntl(F_GETPATH) support

2019-09-15 Thread Mateusz Guzik
On 9/14/19, Christos Zoulas  wrote:
>
> Comments?
>
> + error = vnode_to_path(kpath, MAXPATHLEN, fp->f_vnode, l, l->l_proc);

What motivates this change?

I think it is a little problematic in that namecache resolution is not
guaranteed to work. If the consumer does not check for failure or fallback
to something else the user is going to experience "once in the blue moon"
failures.

For example clang likes to get full paths to names. On Linux it can
trivially do it thanks to dentry cache. On other systems there is a
compilation check for F_GETPATH. If none of this is present it goes for
realpath, which is slover but basically works. If F_GETPATH is present,
it is always used and there is no fallback if it fails. Then you risk
setting yourself up for a weird compilation failure, which may not even
repeat itself after you retry.

All in all I don't think this should be added unless namecache becomes
reliable. The above reasoning is why I did not add it to FreeBSD.

-- 
Mateusz Guzik 



Re: fcntl(F_GETPATH) support

2019-09-15 Thread Christos Zoulas



> On Sep 15, 2019, at 12:43 PM, Mateusz Guzik  wrote:
> 
> On 9/14/19, Christos Zoulas  wrote:
>> 
>> Comments?
>> 
>> +error = vnode_to_path(kpath, MAXPATHLEN, fp->f_vnode, l, l->l_proc);
> 
> What motivates this change?
> 
> I think it is a little problematic in that namecache resolution is not
> guaranteed to work. If the consumer does not check for failure or fallback
> to something else the user is going to experience "once in the blue moon"
> failures.
> 
> For example clang likes to get full paths to names. On Linux it can
> trivially do it thanks to dentry cache. On other systems there is a
> compilation check for F_GETPATH. If none of this is present it goes for
> realpath, which is slover but basically works. If F_GETPATH is present,
> it is always used and there is no fallback if it fails. Then you risk
> setting yourself up for a weird compilation failure, which may not even
> repeat itself after you retry.
> 
> All in all I don't think this should be added unless namecache becomes
> reliable. The above reasoning is why I did not add it to FreeBSD.
> 

I think it is quite reliable because all the file descriptors would be recently
opened and therefore be in the cache.  One would need to DoS the cache
cause eviction. If that turns out to be false, we can make the namecache
reliable, or withdraw it.

christos



Re: fcntl(F_GETPATH) support

2019-09-15 Thread Jason Thorpe
FYI - XNU has O_SYMLINK for this.  Not saying we should necessarily add it to 
NetBSD, just noting the precedent.

-- thorpej
Sent from my iPhone.

> On Sep 14, 2019, at 5:06 PM, Christos Zoulas  wrote:
> 
> If you can get a file desriptor to a symlink, it will work; I don't think
> that we have a way to do this now.



Re: more fexecve questions

2019-09-15 Thread Christos Zoulas



> On Sep 15, 2019, at 1:40 AM, m...@netbsd.org  wrote:
> 
> Please add a comment explaining that this is for pathnames we can't
> resolve in chroots.

Done. It is not for chroots though It is for when we can't resolve vnode to 
name.

> 
> segvguard and veriexec ony use the name for printing error messages.
> I don't think it's necessary to give up on them because of it.
> 
> segvguard even handles a NULL name already.
> 

Fixed.

thanks,

christos

Re: CVS commit: src/sys/kern

2019-09-15 Thread Thor Lancelot Simon
On Sun, Sep 15, 2019 at 08:34:18AM +0200, Michael van Elst wrote:
> 
> We have to decide if we really want to switch things to the NAME=
> syntax and allow wedge: only for compatibility. Then we should also
> adjust what dkwedge_print_wnames() prints, currently it uses the wedge:
> prefix. It is called in getdisk() when you enter something invalid
> for a root device.

I've never been a fan of the wedge: syntax (as was noted at the time, it
conflicts with the syntax for specifying network filesystem servers,
which actually broke automation I had in place); I would not mind seeing
it go.



Re: CVS commit: src/sys/kern

2019-09-15 Thread Michael van Elst
m...@netbsd.org (Emmanuel Dreyfus) writes:

>Michael van Elst  wrote:

>> We have to decide if we really want to switch things to the NAME=
>> syntax and allow wedge: only for compatibility. 

>I went for NAME= here because x86 bootloader now uses that everywhere,
>and my concern was multiboot command line root options passed as is from
>bootloader to kernel:

>menu=netbsd-8.1/amd64 XEN3_DOM0:load NAME=ffs:/netbsd81-XEN3_DOM0.gz
>console=com0 root=NAME=ffs;multiboot NAME=ffs:/xen-4.11.1 dom0_mem=256M
>console=com1 com1=115200,8n1


That's nice if it works, but device or even partition naming of
bootloaders was always different from what the booted system uses.
Think about BIOS device numbers (mapped to fdX: hdX: and cdX:).

The same syntax for slightly different things is at least confusing.

The load and multiboot argument is interpreted by the bootloader
as a GPT partition label, but possibly also as a BIOS device name.

The root argument is interpreted by the kernel as a wedge name.

Only if all the world consists of GPT formatted disks, then both
are the same.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: CVS commit: src/sys/kern

2019-09-15 Thread Emmanuel Dreyfus
Michael van Elst  wrote:

> We have to decide if we really want to switch things to the NAME=
> syntax and allow wedge: only for compatibility. 

I went for NAME= here because x86 bootloader now uses that everywhere,
and my concern was multiboot command line root options passed as is from
bootloader to kernel:

menu=netbsd-8.1/amd64 XEN3_DOM0:load NAME=ffs:/netbsd81-XEN3_DOM0.gz
console=com0 root=NAME=ffs;multiboot NAME=ffs:/xen-4.11.1 dom0_mem=256M
console=com1 com1=115200,8n1



-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org