Re: fcntl(F_GETPATH) support
On 9/15/19, Jason Thorpe wrote: > > >> On Sep 15, 2019, at 11:03 AM, Christos Zoulas >> wrote: >> >> I think it is quite reliable because all the file descriptors would be >> recently >> opened and therefore be in the cache. One would need to DoS the cache >> cause eviction. If that turns out to be false, we can make the namecache >> reliable, or withdraw it. > > If we had a way to ask the file system for the / a name+parent in the event > we don't find an entry in the name cache, then we don't have to make the > name cache "reliable". See my other e-mail about turning realpath into a syscall. Then you have everything to work with and the code should be trivial unless you want to handle hardlinks. In the most simplistic implementation you do the lookup and then do the reverse lookup of what you found. Things still can get evicted in the time window in-between but that's probably tolerable. > > FWIW, I don't think making the name cache "reliable" is necessarily a good > idea -- it could be abused to consume kernel memory by creating lots of > long-named files / directories. > I don't think it's a material change in terms of memory use. You can already force this scenario by creating files and stating them. Getting rid of namecache entries is not going to save you much compared to all the vnodes which need to be dropped/recycled (which will also remove the nc entries). > In the event that the name really can't be determined (perhaps it's an > open-unlinked file?), then I think it's perfectly reasonable to return > ENOENT for F_GETPATH; callers need to be capable of handling that. > Corner case of the sort is probably not accounted for either and not expected to be ran into -- if you remove the files from under the compiler you are tampering with the build and it is free to fail. On the other hand name eviction is not a problem you expect to run into. I can't easily test right now for sure, but historically BSDs have not been adding newly created files to the namecache and perhaps NetBSD is not doing it either. In which case F_GETPATH on newly created files is guaranteed to fail unless someone looked them up separately. Whether this should be modified is imho a separate discussion, but is relevant to points made earlier about memory use and DoSability. Imho it should be fine to change the kernel to always add these. -- Mateusz Guzik
Re: CVS commit: src/sys/kern
On Mon, Sep 16, 2019 at 03:39:35AM +0200, Emmanuel Dreyfus wrote: > Thor Lancelot Simon wrote: > > > I've never been a fan of the wedge: syntax (as was noted at the time, it > > conflicts with the syntax for specifying network filesystem servers, > > which actually broke automation I had in place); I would not mind seeing > > it go. > > IMO, both wedge: and NAME= are ugly, but beside your point I would be in > favor of NAME= because that is what I used in x86 bootloaders, and > lazyness commands not to redo it. :-) A naming scheme like PROTOCOL ":" SPECIFIER isn't that bad. I chose the NAME= prefix instead for several reasons. Compatibility with NFS mount paths (that we get told are deprecated whenever we use them), lookalike with other OSes, and also to not further reduce path lengths as the string sometimes needs to be stored in small buffers. > When you talk about letting it go, you means removing it, or just > keeping it as is undocumented? Removing could break some setups badly. We need to keep it for compatibility for some time, and that's how it should be documented. > About documentation, wedge: is present in > share/man/man8/man8.x86/boot.8 > share/man/man9/cpu_rootconf.9 > sys/arch/evbarm/conf/IYONIX > sys/arch/evbarm/fdt/fdt_machdep.c > sys/arch/iyonix/conf/GENERIC > usr.bin/config/config.5 sys/arch/evbarm/fdt/fdt_machdep.c is a different beast. The boot device selection in it should go completely as is. There could be a MI version that just allows to feed a bootspec to the kernel for systems that support FDT but cannot specify boot_args. We could also adopt its selection of network interfaces per MAC address for MI code. Just need to come up with a proper encoding as bootspec. Greetings, -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: fcntl(F_GETPATH) support
On 15.09.2019 20:03, Christos Zoulas wrote: > I think it is quite reliable because all the file descriptors would be > recently > opened and therefore be in the cache. One would need to DoS the cache > cause eviction. If that turns out to be false, we can make the namecache > reliable, or withdraw it. As discussed with Mateusz Guzik offlist, the dentry approach looks reliable and as a way to go. It changes fd -> vnode -> inode to fd -> dentry -> vnode. We could switch from catching program name on exec to proper pathname resolution with KERN_PROC_PATHNAME. Certain programs require always correct path to be resolved from hardlinks, not the last one from the cache. This used to affect LLVM. There is also a hole in the current namecache implementation as it misses entry for newly created files (as informed by Mateusz). Example: #include #include #include #include #include int main(void) { char buf[1024]; int fd; fd = open("/tmp/test", O_RDWR|O_CREAT|O_EXCL,0600); if (fd == -1) err(1, "open"); if (fcntl(fd, F_GETPATH, buf) < 0) err(1, "fcntl"); printf("[%s]\n", buf); } For the time being, the current code is still an improvement over the past state. signature.asc Description: OpenPGP digital signature
Re: CVS commit: src/sys/kern
Thor Lancelot Simon wrote: > I've never been a fan of the wedge: syntax (as was noted at the time, it > conflicts with the syntax for specifying network filesystem servers, > which actually broke automation I had in place); I would not mind seeing > it go. IMO, both wedge: and NAME= are ugly, but beside your point I would be in favor of NAME= because that is what I used in x86 bootloaders, and lazyness commands not to redo it. :-) When you talk about letting it go, you means removing it, or just keeping it as is undocumented? Removing could break some setups badly. About documentation, wedge: is present in share/man/man8/man8.x86/boot.8 share/man/man9/cpu_rootconf.9 sys/arch/evbarm/conf/IYONIX sys/arch/evbarm/fdt/fdt_machdep.c sys/arch/iyonix/conf/GENERIC usr.bin/config/config.5 I suspect the wedge: references there could now just be replaced by NAME= after src/sys/kern/kern_subr.c 1.226-1.227 was committed. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org
Re: fcntl(F_GETPATH) support
> On Sep 15, 2019, at 1:17 PM, Mouse wrote: > > [top-posting damage repaired manually] >>> If you can get a file desriptor to a symlink, it will work; I don't >>> think that we have a way to do this now. >> FYI - XNU has O_SYMLINK for this. > > What operations does it support on fds open onto symlinks? fstat(), fchflags(), etc. I don't recall if read() is allowed, but I'm pretty sure write() is not. -- thorpej
Re: fcntl(F_GETPATH) support
>>> If you can get a file desriptor to a symlink, [...] > There should be no way to get a fd for a symlink; if you can, > something bad has happened. At present, perhaps. > Because of the way short symlinks are handled in ffs, I wouldn't be > surprised if managing to get one let you corrupt the fs. Depends on what operations are defined on such a descriptor, and how they are implemented. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: fcntl(F_GETPATH) support
[top-posting damage repaired manually] >> If you can get a file desriptor to a symlink, it will work; I don't >> think that we have a way to do this now. > FYI - XNU has O_SYMLINK for this. What operations does it support on fds open onto symlinks? /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: fcntl(F_GETPATH) support
On Sun, Sep 15, 2019 at 09:53:41AM -0700, Jason Thorpe wrote: > FYI - XNU has O_SYMLINK for this. Not saying we should necessarily add it > to NetBSD, just noting the precedent. > > -- thorpej > Sent from my iPhone. > > > On Sep 14, 2019, at 5:06 PM, Christos Zoulas wrote: > > > > If you can get a file desriptor to a symlink, it will work; I don't think > > that we have a way to do this now. There should be no way to get a fd for a symlink; if you can, something bad has happened. Because of the way short symlinks are handled in ffs, I wouldn't be surprised if managing to get one let you corrupt the fs. -- A. No Q. Should I include quotations after my reply?
Re: NCHNAMLEN vnode cache limitation removal
On Sat, Sep 14, 2019 at 09:51:12AM -0700, Jason Thorpe wrote: > >> Given the list of smart people I've seen discussing this, I'm > >> presumably just missing something, but what? > > > > Loonix. > > Linux isn't the only system that has such a capability. No, but I'm pretty sure it was the first to commit to it (was quite a ways back now) and it's where the assumption you can do this gets baked into dodgy software. -- David A. Holland dholl...@netbsd.org
Re: fcntl(F_GETPATH) support
> On Sep 15, 2019, at 11:03 AM, Christos Zoulas wrote: > > I think it is quite reliable because all the file descriptors would be > recently > opened and therefore be in the cache. One would need to DoS the cache > cause eviction. If that turns out to be false, we can make the namecache > reliable, or withdraw it. If we had a way to ask the file system for the / a name+parent in the event we don't find an entry in the name cache, then we don't have to make the name cache "reliable". FWIW, I don't think making the name cache "reliable" is necessarily a good idea -- it could be abused to consume kernel memory by creating lots of long-named files / directories. In the event that the name really can't be determined (perhaps it's an open-unlinked file?), then I think it's perfectly reasonable to return ENOENT for F_GETPATH; callers need to be capable of handling that. -- thorpej
Re: fcntl(F_GETPATH) support
On 9/15/19, Mateusz Guzik wrote: > On 9/14/19, Christos Zoulas wrote: >> >> Comments? >> >> +error = vnode_to_path(kpath, MAXPATHLEN, fp->f_vnode, l, l->l_proc); > > What motivates this change? > > I think it is a little problematic in that namecache resolution is not > guaranteed to work. If the consumer does not check for failure or fallback > to something else the user is going to experience "once in the blue moon" > failures. > > For example clang likes to get full paths to names. On Linux it can > trivially do it thanks to dentry cache. On other systems there is a > compilation check for F_GETPATH. If none of this is present it goes for > realpath, which is slover but basically works. If F_GETPATH is present, > it is always used and there is no fallback if it fails. Then you risk > setting yourself up for a weird compilation failure, which may not even > repeat itself after you retry. > > All in all I don't think this should be added unless namecache becomes > reliable. The above reasoning is why I did not add it to FreeBSD. > Now that I wrote this I suspect a kernel-based realpath could do the trick here. Will have to take a closer look. -- Mateusz Guzik
Re: NCHNAMLEN vnode cache limitation removal
On 9/14/19, Mouse wrote: >> [...], because fexecve is causing rumbles about doing significantly >> more reverse lookups. > > Why is everyone so concerned about finding "the" name of an inode, or > indeed any name for an inode? As far as I can tell there has never > been any promise that any given inode has any names pointing to it, at > least not unless it's got no other references to it (in which case it > would be freed if it had no names). > > Given the list of smart people I've seen discussing this, I'm > presumably just missing something, but what? > I think an always working name resolution is nicer to users when it comes to diagnosing issues in userspace. In particular if someone spawns a new program, that program performs a lot of ops in given fd, you can check what file it is if name resolution works (or you have a facility similar to linux's /proc//fd). Otherwise you are left with dirty hacks. There is a very different angle to this though and it has to do with performance, especially in face of concurrent access. When doing SMP you want to avoid writes to shared areas as they induce caches misses on next access by other CPUs. If everyone accesses read-only they can keep doing it without interfering with others. If everyone writes to it, everyone else (minus 1) stalls waiting for their turn. Vast majority of all lookups only care about the last path component. Currently the kernel leapfrogs through all of them, e.g. consider the lookup of /foo/bar/baz/qux. Along the way it refs foo, locks foo, finds bar, references bar, locks bar, unlocks and unrefs foo... repeat that with s/bar/baz/ and s/foo/bar/. Note that's very simplified (e.g. i skipped the beginning of the lookup and other work like permission checking). Crossing filesystems is another hurdle. This is only to highlight the point below. That's a lot of work single-threaded which can be partially avoided in the common case, provided all the info is structured for it. But most importantly that's a lot of atomic ops on shared areas which severely limit your performance in case of concurrent access (especially on multi-socket systems). It's a long way to get to a point where you can do this in a write-free manner and FreeBSD is very early in there. If everything is cached and you can roll forward in the common case without refing/locking anything but the last component you win big time. (Of course you can't "just" do it, the leapfrogging is there for a reason but it can be worked out with tracking the state and having safe memory reclamation.) TL;DR it's minor quality of life improvement for users and a de facto mandatory part of VFS if it is to scale on big systems. -- Mateusz Guzik
Re: fcntl(F_GETPATH) support
On 9/14/19, Christos Zoulas wrote: > > Comments? > > + error = vnode_to_path(kpath, MAXPATHLEN, fp->f_vnode, l, l->l_proc); What motivates this change? I think it is a little problematic in that namecache resolution is not guaranteed to work. If the consumer does not check for failure or fallback to something else the user is going to experience "once in the blue moon" failures. For example clang likes to get full paths to names. On Linux it can trivially do it thanks to dentry cache. On other systems there is a compilation check for F_GETPATH. If none of this is present it goes for realpath, which is slover but basically works. If F_GETPATH is present, it is always used and there is no fallback if it fails. Then you risk setting yourself up for a weird compilation failure, which may not even repeat itself after you retry. All in all I don't think this should be added unless namecache becomes reliable. The above reasoning is why I did not add it to FreeBSD. -- Mateusz Guzik
Re: fcntl(F_GETPATH) support
> On Sep 15, 2019, at 12:43 PM, Mateusz Guzik wrote: > > On 9/14/19, Christos Zoulas wrote: >> >> Comments? >> >> +error = vnode_to_path(kpath, MAXPATHLEN, fp->f_vnode, l, l->l_proc); > > What motivates this change? > > I think it is a little problematic in that namecache resolution is not > guaranteed to work. If the consumer does not check for failure or fallback > to something else the user is going to experience "once in the blue moon" > failures. > > For example clang likes to get full paths to names. On Linux it can > trivially do it thanks to dentry cache. On other systems there is a > compilation check for F_GETPATH. If none of this is present it goes for > realpath, which is slover but basically works. If F_GETPATH is present, > it is always used and there is no fallback if it fails. Then you risk > setting yourself up for a weird compilation failure, which may not even > repeat itself after you retry. > > All in all I don't think this should be added unless namecache becomes > reliable. The above reasoning is why I did not add it to FreeBSD. > I think it is quite reliable because all the file descriptors would be recently opened and therefore be in the cache. One would need to DoS the cache cause eviction. If that turns out to be false, we can make the namecache reliable, or withdraw it. christos
Re: fcntl(F_GETPATH) support
FYI - XNU has O_SYMLINK for this. Not saying we should necessarily add it to NetBSD, just noting the precedent. -- thorpej Sent from my iPhone. > On Sep 14, 2019, at 5:06 PM, Christos Zoulas wrote: > > If you can get a file desriptor to a symlink, it will work; I don't think > that we have a way to do this now.
Re: more fexecve questions
> On Sep 15, 2019, at 1:40 AM, m...@netbsd.org wrote: > > Please add a comment explaining that this is for pathnames we can't > resolve in chroots. Done. It is not for chroots though It is for when we can't resolve vnode to name. > > segvguard and veriexec ony use the name for printing error messages. > I don't think it's necessary to give up on them because of it. > > segvguard even handles a NULL name already. > Fixed. thanks, christos
Re: CVS commit: src/sys/kern
On Sun, Sep 15, 2019 at 08:34:18AM +0200, Michael van Elst wrote: > > We have to decide if we really want to switch things to the NAME= > syntax and allow wedge: only for compatibility. Then we should also > adjust what dkwedge_print_wnames() prints, currently it uses the wedge: > prefix. It is called in getdisk() when you enter something invalid > for a root device. I've never been a fan of the wedge: syntax (as was noted at the time, it conflicts with the syntax for specifying network filesystem servers, which actually broke automation I had in place); I would not mind seeing it go.
Re: CVS commit: src/sys/kern
m...@netbsd.org (Emmanuel Dreyfus) writes: >Michael van Elst wrote: >> We have to decide if we really want to switch things to the NAME= >> syntax and allow wedge: only for compatibility. >I went for NAME= here because x86 bootloader now uses that everywhere, >and my concern was multiboot command line root options passed as is from >bootloader to kernel: >menu=netbsd-8.1/amd64 XEN3_DOM0:load NAME=ffs:/netbsd81-XEN3_DOM0.gz >console=com0 root=NAME=ffs;multiboot NAME=ffs:/xen-4.11.1 dom0_mem=256M >console=com1 com1=115200,8n1 That's nice if it works, but device or even partition naming of bootloaders was always different from what the booted system uses. Think about BIOS device numbers (mapped to fdX: hdX: and cdX:). The same syntax for slightly different things is at least confusing. The load and multiboot argument is interpreted by the bootloader as a GPT partition label, but possibly also as a BIOS device name. The root argument is interpreted by the kernel as a wedge name. Only if all the world consists of GPT formatted disks, then both are the same. -- -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: CVS commit: src/sys/kern
Michael van Elst wrote: > We have to decide if we really want to switch things to the NAME= > syntax and allow wedge: only for compatibility. I went for NAME= here because x86 bootloader now uses that everywhere, and my concern was multiboot command line root options passed as is from bootloader to kernel: menu=netbsd-8.1/amd64 XEN3_DOM0:load NAME=ffs:/netbsd81-XEN3_DOM0.gz console=com0 root=NAME=ffs;multiboot NAME=ffs:/xen-4.11.1 dom0_mem=256M console=com1 com1=115200,8n1 -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org