Slightly extending NOTE_EXTEND
I was looking into the locking issues pointed out in the Linux inotify emulation I wrote (see https://mail-index.netbsd.org/source-changes-d/2023/08/25/msg014056.html), and I think it can simultaneously be made much simpler and solve the locking problems if extra context is provided through kevent. To do this I propose to 1. add a void * parameter to filterop::f_event (and also to knote(), VN_KNOTE(), etc.), mostly to pass in the relevant componentname; and 2. send a NOTE_EXTEND event during a rename (in addition to the NOTE_WRITE) in both the source and destination directories if the source and destination directories differ. Note that this is not an original extension, FreeBSD's NOTE_EXTEND does this. Does this seem sane and/or do we want the change to NOTE_EXTEND? Theo(dore)
Re: ulimits and memfd(2)
On 2023-08-11 03:00, Taylor R Campbell wrote: > So is there a way to limit the memory use of memfd? You are correct, currently the only limits are via F_SEAL_*. > Maybe the memfd > should contribute toward RLIMIT_AS somehow, or something like that? It could contribute directly to RLIMIT_AS (in the same way that mmap does it), but then the memory would be double counted when you mmap the memfd(?), which seems undesirable. Maybe memfd should respect RLIMIT_FSIZE, since memfds are supposed to behave like regular files as much as possible. Also, I think this is much simpler to add (just an extra check when truncating?). Theo(dore)
Re: kevent ext member?
On 2023-08-09 04:58, Taylor R Campbell wrote: >> Date: Tue, 8 Aug 2023 11:47:17 -0400 >> From: Theodore Preduta >> >> On 2023-08-08 06:53, Taylor R Campbell wrote: >>> What is the new struct kevent::ext member about? I read the new man >>> page but I'm still unclear on it. >>> 8. Is this API shared with any other BSDs, or is anyone proposing to >>>share it, or is it now a unique NetBSD extension? >> >> This comes from FreeBSD, which takes it from MacOS's kevent64_s. >> >>> 4. If ext[2]/ext[3] are passed through verbatim, how are they >>>different from just a larger udata member? Why can't the user >>>application just use a pointer to a larger buffer here? >>> 3. Why is ext[0]/ext[1] treated differently from ext[2]/ext[3]? >> >> The semantics come from MacOS (and are the same in FreeBSD). > > OK, thanks! What do they use it for? MacOS uses it for their EVFILT_MACHPORT (which is EVFILT_READ but for MacOS's IPC primitive) to return a pointer to the received message, which I doubt can be ported. I don't think that FreeBSD currently makes use of it outside of Linux epoll compat. > Do we have any upcoming uses of > filters that would use the extra members? I don't think so. >>> 7. What's an example of a use from userland? Why did we need to >>>change the kevent syscall? >> >> Because epoll makes use of it. There isn't really a good other place to >> put the epoll_event::data member, given its semantics. > > OK, so is the only current user the kernel side of the epoll code? Yes. Theo(dore)
Re: kevent ext member?
On 2023-08-08 06:53, Taylor R Campbell wrote: > What is the new struct kevent::ext member about? I read the new man > page but I'm still unclear on it. > 8. Is this API shared with any other BSDs, or is anyone proposing to >share it, or is it now a unique NetBSD extension? This comes from FreeBSD, which takes it from MacOS's kevent64_s. > 4. If ext[2]/ext[3] are passed through verbatim, how are they >different from just a larger udata member? Why can't the user >application just use a pointer to a larger buffer here? >> 3. Why is ext[0]/ext[1] treated differently from ext[2]/ext[3]? The semantics come from MacOS (and are the same in FreeBSD). > 1. Who owns it? Does it depend on the filter, the identifier, or >both? Depends on the filter(?). It's stored on the knote (same as udata). > 2. Who writes or reads it, inside the kernel? It doesn't seem to be >mentioned in kern_event.c. It's copied implicitly when the kevent gets added to the knote, ie. kern_event.c:1977 n->kn_kevent = *kev; in kqueue_register and kern_event.c:2450 *kevp = kn->kn_kevent; in kqueue_scan. Although, now that I'm looking at it again, I think there might be a bug with how it's copied in. Specifically, udata can be modified with another call to kevent(), but the same is not true for ext. > 5. What happens if I fill nonzero ext[0]/ext[1] in a userland call to >kevent? What filters respect it and how do I find out, short of >reading the code? > > 6. What do I get if I read out of it in a userland call to kevent? >What filters fill it and how do I find out, short of reading the >code? My understanding of the description of the ext field is that also says "if the filter does not mention the ext field, it does not use it" (so all of the fields should just be copied unchanged), but I guess that should be written explicitly. > 7. What's an example of a use from userland? Why did we need to >change the kevent syscall? Because epoll makes use of it. There isn't really a good other place to put the epoll_event::data member, given its semantics. Theo(dore)
Re: Testing Emulation Syscalls
On 2023-08-01 12:15, Emmanuel Dreyfus wrote: > On Tue, Aug 01, 2023 at 05:52:57PM +0200, Martin Husemann wrote: >> Yes we are. But the question is if we can create (tiny) test binaries >> just for this purpose, without Linux dev tools and libs around, >> from within a standard build.sh run. > > You could craft a NetBSD binary that abuses the ELFNAME2(linux,probe) > test in src/sys/compat/linux/common/linux_exec_elf32.c so that the kernel > think it is a Linux binary. Of course if you use a syscall that does > not have the same number in NetBSD and Linux, you crash, hence I am > not sure it could be useful. My understanding of the "without Linux dev tools and libs around" part is that each test case would be a separate (static Linux) binary that is just a main() that only directly calls syscalls. And then I guess integration with the rest of ATF could be done through something like atf-sh-api(3), with status reporting done via exit codes (maybe stdout/strerr too? but that would require more tooling). Theo(dore)
Re: Testing Emulation Syscalls
On 2023-08-01 07:04, Valery Ushakov wrote: > On Tue, Aug 01, 2023 at 12:39:46 +0200, Martin Husemann wrote: > >> On Tue, Aug 01, 2023 at 01:34:54PM +0300, Valery Ushakov wrote: >>> As for testing emulated syscalls - can we solve this problem with a >>> bit of elf branding to convince the kernel to start the binary under >>> emulation directly? Inventing a whole new backdoor API for that seems >>> kinda an overkill. >> >> That is probably quite easy to do, but we have a toolchain problem then >> (solvable too). >> >> We need build.sh to be able to produce the test binaries (including >> any needed libs, which don't have to be "native" libs of the emulated >> system). > > For simple cases - can we get away with tiny'ish freestanding test > programs that invoke the tested syscalls so that we don't have to pull > a new cross-compilation setup out of thin air just for that? I don't think so. If the binary starts under Linux emulation then the kernel will expect that the syscall arguments follow Linux's calling convention. (Which I guess could be done, but, correct me if I'm wrong, are we not a native Linux binary at that point?) Theo(dore)
Testing Emulation Syscalls
This comes somewhat as a "part 2" to https://mail-index.netbsd.org/tech-kern/2023/06/21/msg028926.html Given the responses to that thread, I decided to add native stubs for epoll (the fact epoll is widespread alone justifies it, but it has already had some negative side effects, see: https://mail-index.netbsd.org/source-changes-d/2023/07/30/msg013999.html). However, there are other syscalls, namely inotify, where it can't really be justified, but the code still deserves tests. As such I'm looking for a way to test emulation syscalls with ATF. One idea (mentioned in the original thread) would be to introduce a syscall along the lines of int emul_syscall(const char *emul_name, int number, ...) which executes a single syscall. The flaw with this idea is that state may need to be stored across syscalls in struct linux_emuldata, but I don't know how this interface could accommodate this. Another idea would be to introduce a syscall along the lines of int setemul(const char *emul_name) which would switch the syscall table dynamically so that the test case could be run under emulation (preserving emuldata state) and then switch back to report the result. (And then individual syscalls would be called via __syscall(2).) Both of these ideas have security security implications. So they should be limited to root (or perhaps a new kauth capability). A third idea would be to figure out a way to compile ATF tests directly as Linux binaries. That way no new syscall would be needed, but this will likely cause pain when trying to distribute the tests. Once again... thoughts? Theo(dore)
Re: Anonymous vnodes?
On 2023-06-26 20:03, Taylor R Campbell wrote: >> Date: Mon, 26 Jun 2023 18:13:17 -0400 >> From: Theodore Preduta >> >> Is it possible to create a vnode for a regular file in a file system >> without linking the vnode to any directory, so that it disappears when >> all open file descriptors to it are closed? (As far as I can tell, this >> isn't possible with any of the vn_* or VOP_* functions?) >> >> If this idea is indeed not possible, should/could I implement something >> like this? (If so, how?) >> >> For context, I'm currently working on implementing memfd_create(2), and >> thought this might be a shortcut. Otherwise, I'll have to implement it >> in terms of uvm operations (which is fine, just more work). > > For a syscall, you should implement it in terms of uvm anonymous > objects: Is there a preexisting way to resize a uvm_object? Or do I need to write a function similar (but not really that similar) to uvm_vnp_setsize? Theo(dore)
Anonymous vnodes?
Is it possible to create a vnode for a regular file in a file system without linking the vnode to any directory, so that it disappears when all open file descriptors to it are closed? (As far as I can tell, this isn't possible with any of the vn_* or VOP_* functions?) If this idea is indeed not possible, should/could I implement something like this? (If so, how?) For context, I'm currently working on implementing memfd_create(2), and thought this might be a shortcut. Otherwise, I'll have to implement it in terms of uvm operations (which is fine, just more work). Thanks, Theo(dore)
RFC: Native epoll syscalls
While implementing Linux's epoll syscalls in compat_linux for my GSoC project, my mentor and I had the idea to also add native NetBSD entry points for epoll. After some back and forth, we thought it would be good to solicit opinions from the rest of the community. There are two main benefits to adding native epoll syscalls: 1. They can be used to help port Linux software to NetBSD. 2. They can be used to test the epoll code with ATF. It should be noted that this isn't the first time something like this has been done. Specifically, benefit 1 has already been used to justify the existence of clone(2). So... thoughts? Theo(dore)
[GSoC] Proposal RFC: Linux syscalls
Exactly as the subject implies. The proposal can be found at https://www.pta.gg/gsoc.pdf Comments and feedback (read: flaws and criticisms!) is/are much appreciated. (especially regarding section 6: Implementation Plans) Theo(dore)
Re: [GSoC] Emulating missing Linux syscalls project questions
> The Linux Test Project (http://linux-test-project.github.io) would help > not only with finding missing syscalls, but also with finding bugs / > missing functionality in the existing Linux emul code. Yes this is a great idea! Although my interpretation of the project idea is that the expectations are that the binary is functional by the end of the summer. I obviously will not be able to implement all missing syscalls by the end of the summer, so I would have to draw an arbitrary line as to what I would/would not try to implement. Which brings me to my next comment. > It would be nice to have this running on NetBSD. In what way exactly does the LTP not function on NetBSD? I tried it today and (after a few hours of troubleshooting) seemingly got it to work. Some assorted notes about what I did/what it took to get it to work: - I only looked at system call tests (so for all I know the other types of tests could be what you're referring to). - The actual testcases themselves can be trivially (just add -static) statically compiled on any Linux distro and can be run individually just fine, but the rest of the testing infrastructure cannot (because glibc). (Most of my time spent on this was dealing with glibc versions) - Otherwise you can compile everything normally on OpenSUSE 15.4, and with suse15_base installed the binaries will *almost* just work. - The ltp-pan binary does depend on /dev/kmsg (which doesn't currently exist in the emul code), but only writes to it, so touch /emul/linux/dev/kmsg is sufficient to trick it into working. - As expected, lots of tests fail, but also lots of tests pass! I haven't looked to hard into the failing tests (yet), but I didn't find anything too surprising in the list of failing tests. Overall, I did enjoy going down this rabbit hole! It definitely taught me a few new things about how the emul subsystem behaves. Theo(dore)
[GSoC] Emulating missing Linux syscalls project questions
As the subject suggests, I think the emulating missing Linux syscalls project might be fun. I am just wondering (1) Is there any documentation on the internals of this subsystem? (manpage and wiki seem to just be how to use it) (2) Is there a better binary-finding strategy than trying Linux binaries on NetBSD, and if they fail (have a script) compare the output of strace from a Linux run of the program with the table in sys/compat/linux/arch/*/linux_syscalls.c? (3) Do y'all have any suggestions for binaries? :P (4) Most of the (failing) binaries I've found so far seem to have the epoll set of system calls in common, is there's some technical reason why that's not been implemented yet? (or it it just a matter of no one has done it yet) Thanks in advance, Theo(dore)