Slightly extending NOTE_EXTEND

2023-08-28 Thread Theodore Preduta
I was looking into the locking issues pointed out in the Linux inotify
emulation I wrote (see
https://mail-index.netbsd.org/source-changes-d/2023/08/25/msg014056.html),
and I think it can simultaneously be made much simpler and solve the
locking problems if extra context is provided through kevent.  To do
this I propose to

1. add a void * parameter to filterop::f_event (and also to knote(),
VN_KNOTE(), etc.), mostly to pass in the relevant componentname; and

2. send a NOTE_EXTEND event during a rename (in addition to the
NOTE_WRITE) in both the source and destination directories if the source
and destination directories differ.  Note that this is not an original
extension, FreeBSD's NOTE_EXTEND does this.

Does this seem sane and/or do we want the change to NOTE_EXTEND?

Theo(dore)


Re: ulimits and memfd(2)

2023-08-12 Thread Theodore Preduta
On 2023-08-11 03:00, Taylor R Campbell wrote:
> So is there a way to limit the memory use of memfd?

You are correct, currently the only limits are via F_SEAL_*.

> Maybe the memfd
> should contribute toward RLIMIT_AS somehow, or something like that?
It could contribute directly to RLIMIT_AS (in the same way that mmap
does it), but then the memory would be double counted when you mmap the
memfd(?), which seems undesirable.

Maybe memfd should respect RLIMIT_FSIZE, since memfds are supposed to
behave like regular files as much as possible.  Also, I think this is
much simpler to add (just an extra check when truncating?).

Theo(dore)



Re: kevent ext member?

2023-08-09 Thread Theodore Preduta
On 2023-08-09 04:58, Taylor R Campbell wrote:
>> Date: Tue, 8 Aug 2023 11:47:17 -0400
>> From: Theodore Preduta 
>>
>> On 2023-08-08 06:53, Taylor R Campbell wrote:
>>> What is the new struct kevent::ext member about?  I read the new man
>>> page but I'm still unclear on it.
>>> 8. Is this API shared with any other BSDs, or is anyone proposing to
>>>share it, or is it now a unique NetBSD extension?
>>
>> This comes from FreeBSD, which takes it from MacOS's kevent64_s.
>>
>>> 4. If ext[2]/ext[3] are passed through verbatim, how are they
>>>different from just a larger udata member?  Why can't the user
>>>application just use a pointer to a larger buffer here?
>>> 3. Why is ext[0]/ext[1] treated differently from ext[2]/ext[3]?
>>
>> The semantics come from MacOS (and are the same in FreeBSD).
> 
> OK, thanks!  What do they use it for?

MacOS uses it for their EVFILT_MACHPORT (which is EVFILT_READ but for
MacOS's IPC primitive) to return a pointer to the received message,
which I doubt can be ported.

I don't think that FreeBSD currently makes use of it outside of Linux
epoll compat.

> Do we have any upcoming uses of
> filters that would use the extra members?

I don't think so.

>>> 7. What's an example of a use from userland?  Why did we need to
>>>change the kevent syscall?
>>
>> Because epoll makes use of it.  There isn't really a good other place to
>> put the epoll_event::data member, given its semantics.
> 
> OK, so is the only current user the kernel side of the epoll code?

Yes.

Theo(dore)



Re: kevent ext member?

2023-08-08 Thread Theodore Preduta
On 2023-08-08 06:53, Taylor R Campbell wrote:
> What is the new struct kevent::ext member about?  I read the new man
> page but I'm still unclear on it.
> 8. Is this API shared with any other BSDs, or is anyone proposing to
>share it, or is it now a unique NetBSD extension?

This comes from FreeBSD, which takes it from MacOS's kevent64_s.

> 4. If ext[2]/ext[3] are passed through verbatim, how are they
>different from just a larger udata member?  Why can't the user
>application just use a pointer to a larger buffer here?
>> 3. Why is ext[0]/ext[1] treated differently from ext[2]/ext[3]?

The semantics come from MacOS (and are the same in FreeBSD).

> 1. Who owns it?  Does it depend on the filter, the identifier, or
>both?

Depends on the filter(?).  It's stored on the knote (same as udata).

> 2. Who writes or reads it, inside the kernel?  It doesn't seem to be
>mentioned in kern_event.c.

It's copied implicitly when the kevent gets added to the knote, ie.

kern_event.c:1977   n->kn_kevent = *kev;

in kqueue_register and

kern_event.c:2450   *kevp = kn->kn_kevent;

in kqueue_scan.

Although, now that I'm looking at it again, I think there might be a bug
with how it's copied in.  Specifically, udata can be modified with
another call to kevent(), but the same is not true for ext.

> 5. What happens if I fill nonzero ext[0]/ext[1] in a userland call to
>kevent?  What filters respect it and how do I find out, short of
>reading the code?
> 
> 6. What do I get if I read out of it in a userland call to kevent?
>What filters fill it and how do I find out, short of reading the
>code?

My understanding of the description of the ext field is that also says
"if the filter does not mention the ext field, it does not use it" (so
all of the fields should just be copied unchanged), but I guess that
should be written explicitly.

> 7. What's an example of a use from userland?  Why did we need to
>change the kevent syscall?

Because epoll makes use of it.  There isn't really a good other place to
put the epoll_event::data member, given its semantics.

Theo(dore)



Re: Testing Emulation Syscalls

2023-08-01 Thread Theodore Preduta
On 2023-08-01 12:15, Emmanuel Dreyfus wrote:
> On Tue, Aug 01, 2023 at 05:52:57PM +0200, Martin Husemann wrote:
>> Yes we are. But the question is if we can create (tiny) test binaries
>> just for this purpose, without Linux dev tools and libs around,
>> from within a standard build.sh run.
> 
> You could craft a NetBSD binary that abuses the ELFNAME2(linux,probe)
> test in src/sys/compat/linux/common/linux_exec_elf32.c so that the kernel
> think it is a Linux binary. Of course if you use a syscall that does
> not have the same number in NetBSD and Linux, you crash, hence I am 
> not sure it could be useful.

My understanding of the "without Linux dev tools and libs around" part
is that each test case would be a separate (static Linux) binary that is
just a main() that only directly calls syscalls.

And then I guess integration with the rest of ATF could be done through
something like atf-sh-api(3), with status reporting done via exit codes
(maybe stdout/strerr too? but that would require more tooling).

Theo(dore)



Re: Testing Emulation Syscalls

2023-08-01 Thread Theodore Preduta
On 2023-08-01 07:04, Valery Ushakov wrote:
> On Tue, Aug 01, 2023 at 12:39:46 +0200, Martin Husemann wrote:
> 
>> On Tue, Aug 01, 2023 at 01:34:54PM +0300, Valery Ushakov wrote:
>>> As for testing emulated syscalls - can we solve this problem with a
>>> bit of elf branding to convince the kernel to start the binary under
>>> emulation directly?  Inventing a whole new backdoor API for that seems
>>> kinda an overkill.
>>
>> That is probably quite easy to do, but we have a toolchain problem then
>> (solvable too).
>>
>> We need build.sh to be able to produce the test binaries (including
>> any needed libs, which don't have to be "native" libs of the emulated
>> system).
> 
> For simple cases - can we get away with tiny'ish freestanding test
> programs that invoke the tested syscalls so that we don't have to pull
> a new cross-compilation setup out of thin air just for that?

I don't think so.  If the binary starts under Linux emulation then the
kernel will expect that the syscall arguments follow Linux's calling
convention.  (Which I guess could be done, but, correct me if I'm wrong,
are we not a native Linux binary at that point?)

Theo(dore)



Testing Emulation Syscalls

2023-07-31 Thread Theodore Preduta
This comes somewhat as a "part 2" to
https://mail-index.netbsd.org/tech-kern/2023/06/21/msg028926.html

Given the responses to that thread, I decided to add native stubs
for epoll (the fact epoll is widespread alone justifies it, but it has
already had some negative side effects, see:
https://mail-index.netbsd.org/source-changes-d/2023/07/30/msg013999.html).

However, there are other syscalls, namely inotify, where it can't really
be justified, but the code still deserves tests.  As such I'm looking
for a way to test emulation syscalls with ATF.

One idea (mentioned in the original thread) would be to introduce a
syscall along the lines of

int emul_syscall(const char *emul_name, int number, ...)

which executes a single syscall.  The flaw with this idea is that state
may need to be stored across syscalls in struct linux_emuldata, but I
don't know how this interface could accommodate this.

Another idea would be to introduce a syscall along the lines of

int setemul(const char *emul_name)

which would switch the syscall table dynamically so that the test case
could be run under emulation (preserving emuldata state) and then switch
back to report the result.  (And then individual syscalls would be
called via __syscall(2).)

Both of these ideas have security security implications.  So they should
be limited to root (or perhaps a new kauth capability).

A third idea would be to figure out a way to compile ATF tests directly
as Linux binaries.  That way no new syscall would be needed, but this
will likely cause pain when trying to distribute the tests.

Once again... thoughts?

Theo(dore)


Re: Anonymous vnodes?

2023-06-27 Thread Theodore Preduta
On 2023-06-26 20:03, Taylor R Campbell wrote:
>> Date: Mon, 26 Jun 2023 18:13:17 -0400
>> From: Theodore Preduta 
>>
>> Is it possible to create a vnode for a regular file in a file system
>> without linking the vnode to any directory, so that it disappears when
>> all open file descriptors to it are closed?  (As far as I can tell, this
>> isn't possible with any of the vn_* or VOP_* functions?)
>>
>> If this idea is indeed not possible, should/could I implement something
>> like this?  (If so, how?)
>>
>> For context, I'm currently working on implementing memfd_create(2), and
>> thought this might be a shortcut.  Otherwise, I'll have to implement it
>> in terms of uvm operations (which is fine, just more work).
> 
> For a syscall, you should implement it in terms of uvm anonymous
> objects:

Is there a preexisting way to resize a uvm_object?  Or do I need to
write a function similar (but not really that similar) to uvm_vnp_setsize?

Theo(dore)



Anonymous vnodes?

2023-06-26 Thread Theodore Preduta
Is it possible to create a vnode for a regular file in a file system
without linking the vnode to any directory, so that it disappears when
all open file descriptors to it are closed?  (As far as I can tell, this
isn't possible with any of the vn_* or VOP_* functions?)

If this idea is indeed not possible, should/could I implement something
like this?  (If so, how?)

For context, I'm currently working on implementing memfd_create(2), and
thought this might be a shortcut.  Otherwise, I'll have to implement it
in terms of uvm operations (which is fine, just more work).

Thanks,

Theo(dore)


RFC: Native epoll syscalls

2023-06-21 Thread Theodore Preduta
While implementing Linux's epoll syscalls in compat_linux for my GSoC
project, my mentor and I had the idea to also add native NetBSD entry
points for epoll.  After some back and forth, we thought it would be
good to solicit opinions from the rest of the community.

There are two main benefits to adding native epoll syscalls:

1. They can be used to help port Linux software to NetBSD.

2. They can be used to test the epoll code with ATF.

It should be noted that this isn't the first time something like this
has been done.  Specifically, benefit 1 has already been used to justify
the existence of clone(2).

So... thoughts?

Theo(dore)


[GSoC] Proposal RFC: Linux syscalls

2023-03-29 Thread Theodore Preduta
Exactly as the subject implies.  The proposal can be found at

https://www.pta.gg/gsoc.pdf

Comments and feedback (read: flaws and criticisms!) is/are much
appreciated.  (especially regarding section 6: Implementation Plans)

Theo(dore)


Re: [GSoC] Emulating missing Linux syscalls project questions

2023-03-18 Thread Theodore Preduta
> The Linux Test Project (http://linux-test-project.github.io) would help
> not only with finding missing syscalls, but also with finding bugs /
> missing functionality in the existing Linux emul code.

Yes this is a great idea!  Although my interpretation of the project
idea is that the expectations are that the binary is functional by the
end of the summer.  I obviously will not be able to implement all
missing syscalls by the end of the summer, so I would have to draw an
arbitrary line as to what I would/would not try to implement.

Which brings me to my next comment.

> It would be nice to have this running on NetBSD.

In what way exactly does the LTP not function on NetBSD?  I tried it
today and (after a few hours of troubleshooting) seemingly got it to work.

Some assorted notes about what I did/what it took to get it to work:

- I only looked at system call tests (so for all I know the other types
of tests could be what you're referring to).

- The actual testcases themselves can be trivially (just add -static)
statically compiled on any Linux distro and can be run individually just
fine, but the rest of the testing infrastructure cannot (because glibc).
(Most of my time spent on this was dealing with glibc versions)

- Otherwise you can compile everything normally on OpenSUSE 15.4, and
with suse15_base installed the binaries will *almost* just work.

- The ltp-pan binary does depend on /dev/kmsg (which doesn't currently
exist in the emul code), but only writes to it, so touch
/emul/linux/dev/kmsg is sufficient to trick it into working.

- As expected, lots of tests fail, but also lots of tests pass!  I
haven't looked to hard into the failing tests (yet), but I didn't find
anything too surprising in the list of failing tests.

Overall, I did enjoy going down this rabbit hole!  It definitely taught
me a few new things about how the emul subsystem behaves.

Theo(dore)



[GSoC] Emulating missing Linux syscalls project questions

2023-03-12 Thread Theodore Preduta
As the subject suggests, I think the emulating missing Linux syscalls 
project might be fun. I am just wondering


(1) Is there any documentation on the internals of this subsystem? 
(manpage and wiki seem to just be how to use it)


(2) Is there a better binary-finding strategy than trying Linux binaries 
on NetBSD, and if they fail (have a script) compare the output of strace 
from a Linux run of the program with the table in 
sys/compat/linux/arch/*/linux_syscalls.c?


(3) Do y'all have any suggestions for binaries? :P

(4) Most of the (failing) binaries I've found so far seem to have the 
epoll set of system calls in common, is there's some technical reason 
why that's not been implemented yet? (or it it just a matter of no one 
has done it yet)


Thanks in advance,

Theo(dore)