Re: Emulating missing linux syscalls project
On Tue, Mar 28, 2023 at 03:32:35PM -0700, Zophiel wrote: > - What binary's in the current NetBSD stack does not run compact_test as a > test case increasing order of missing test cases ? The only regular compat_* tests running are compat_netbsd32, that is e.g. testing a NetBSD/i386 userland on a NetBSD/amd64 machine. Martin
Re: Emulating missing linux syscalls
On Tue, Apr 19, 2022 at 4:38 AM Joerg Sonnenberger wrote: > > Am Tue, Apr 19, 2022 at 02:39:44AM +0530 schrieb Piyush Sachdeva: > > On Sat, Apr 16, 2022 at 2:06 AM Joerg Sonnenberger wrote: > > > > > > Am Wed, Apr 13, 2022 at 09:51:31PM - schrieb Christos Zoulas: > > > > In article , Joerg Sonnenberger > > > > wrote: > > > > >Am Tue, Apr 12, 2022 at 04:56:05PM - schrieb Christos Zoulas: > > > > > > > > >splice(2) as a concept is much older than the current Linux > > > > >implementation. > > > > >There is no reason why zero-copying for sockets should require a > > > > >different system call for zero-copying from/to pipes. There are valid > > > > >reasons for other combinations, too. Consider /bin/cp for example. > > > > > > > > You don't need two system calls because the kernel knows the type of > > > > the file descriptors and can dispatch to different implementations. > > > > One of the questions is do you provide the means to pass an additional > > > > header/trailer to the output data like FreeBSD does for its sendfile(2) > > > > implementation? > > > > > > > > int > > > > splice(int infd, off_t *inoff, int outfd, off_t *outoff, size_t len, > > > > const struct { > > > > struct iov *head; > > > > size_t headcnt; > > > > struct iov *tail; > > > > size_t tailcnt; > > > > } *ht, int flags); > > > > > > There are essentially two use cases here: > > > (1) I want a simple interface to transfer data from one fd to another > > > without extra copies. > > > > > > (2) I wanto avoid copies AND I want to avoid system calls. > > > > > > For the former: > > > int splice(int dstfd, int srcfd, off_t *len); > > > > > > is more than good enough. "Transfer up to [*len] octets from srcfd to > > > dstfd, updating [len] with the actually transferred amount and returning > > > the first error if any. > > > > > > For the second category, an interface more like the posix_spawn > > > interface (but without all the extra allocations) would be useful. > > > > > > > Therefore, having the above const struct *ht to support > > mmap() will be a good option I guess. > > It covers a very limited subset of the desired options. Basically, what > you want in this case is something like: > > int splicev(int dstfd, struct spliceop ops[], size_t *lenops, off_t > *outoff); > > where spliceops is used to specify the supported operations: > - read from a fd with possible seek > - read from memory > - seek output > and maybe other operations I can't think of right now. lenops provides > the number of operations in input and the remaining operations on > return, outoff is the remaining output in the current block. Some > variant of this might be possible. Thank you Joerg and Christos for helping me with this. I have successfully submitted a proposal for this project through the GSoC portal. Hope to make the cut this time :) -- Regards, Piyush
Re: Emulating missing linux syscalls
Am Tue, Apr 19, 2022 at 02:39:44AM +0530 schrieb Piyush Sachdeva: > On Sat, Apr 16, 2022 at 2:06 AM Joerg Sonnenberger wrote: > > > > Am Wed, Apr 13, 2022 at 09:51:31PM - schrieb Christos Zoulas: > > > In article , Joerg Sonnenberger > > > wrote: > > > >Am Tue, Apr 12, 2022 at 04:56:05PM - schrieb Christos Zoulas: > > > > > > >splice(2) as a concept is much older than the current Linux > > > >implementation. > > > >There is no reason why zero-copying for sockets should require a > > > >different system call for zero-copying from/to pipes. There are valid > > > >reasons for other combinations, too. Consider /bin/cp for example. > > > > > > You don't need two system calls because the kernel knows the type of > > > the file descriptors and can dispatch to different implementations. > > > One of the questions is do you provide the means to pass an additional > > > header/trailer to the output data like FreeBSD does for its sendfile(2) > > > implementation? > > > > > > int > > > splice(int infd, off_t *inoff, int outfd, off_t *outoff, size_t len, > > > const struct { > > > struct iov *head; > > > size_t headcnt; > > > struct iov *tail; > > > size_t tailcnt; > > > } *ht, int flags); > > > > There are essentially two use cases here: > > (1) I want a simple interface to transfer data from one fd to another > > without extra copies. > > > > (2) I wanto avoid copies AND I want to avoid system calls. > > > > For the former: > > int splice(int dstfd, int srcfd, off_t *len); > > > > is more than good enough. "Transfer up to [*len] octets from srcfd to > > dstfd, updating [len] with the actually transferred amount and returning > > the first error if any. > > > > For the second category, an interface more like the posix_spawn > > interface (but without all the extra allocations) would be useful. > > > > Therefore, having the above const struct *ht to support > mmap() will be a good option I guess. It covers a very limited subset of the desired options. Basically, what you want in this case is something like: int splicev(int dstfd, struct spliceop ops[], size_t *lenops, off_t *outoff); where spliceops is used to specify the supported operations: - read from a fd with possible seek - read from memory - seek output and maybe other operations I can't think of right now. lenops provides the number of operations in input and the remaining operations on return, outoff is the remaining output in the current block. Some variant of this might be possible. Joerg
Re: Emulating missing linux syscalls
On Sat, Apr 16, 2022 at 2:06 AM Joerg Sonnenberger wrote: > > Am Wed, Apr 13, 2022 at 09:51:31PM - schrieb Christos Zoulas: > > In article , Joerg Sonnenberger > > wrote: > > >Am Tue, Apr 12, 2022 at 04:56:05PM - schrieb Christos Zoulas: > > > > >splice(2) as a concept is much older than the current Linux implementation. > > >There is no reason why zero-copying for sockets should require a > > >different system call for zero-copying from/to pipes. There are valid > > >reasons for other combinations, too. Consider /bin/cp for example. > > > > You don't need two system calls because the kernel knows the type of > > the file descriptors and can dispatch to different implementations. > > One of the questions is do you provide the means to pass an additional > > header/trailer to the output data like FreeBSD does for its sendfile(2) > > implementation? > > > > int > > splice(int infd, off_t *inoff, int outfd, off_t *outoff, size_t len, > > const struct { > > struct iov *head; > > size_t headcnt; > > struct iov *tail; > > size_t tailcnt; > > } *ht, int flags); > > There are essentially two use cases here: > (1) I want a simple interface to transfer data from one fd to another > without extra copies. > > (2) I wanto avoid copies AND I want to avoid system calls. > > For the former: > int splice(int dstfd, int srcfd, off_t *len); > > is more than good enough. "Transfer up to [*len] octets from srcfd to > dstfd, updating [len] with the actually transferred amount and returning > the first error if any. > > For the second category, an interface more like the posix_spawn > interface (but without all the extra allocations) would be useful. > Therefore, having the above const struct *ht to support mmap() will be a good option I guess. > > >I was saying that the Linux system call can be implemented without a > > >kernel backend, because I don't consider zero copy a necessary part of > > >the interface contract. It's a perfectly valid, if a bit slower > > >implementation to do allocate a kernel buffer and do IO via that. > > > > Of course, but how do you make an existing binary use it? LD_PRELOAD > > a binary to override the symbol in the linux glibc? By that logic you > > don't need an in kernel linux emulation, you can do it all in userland :-) > > You still provide the system call as front end, but internally implement > it on top of regular read/write to a temporary buffer. > Got it, thank you Joerg! -- Regards, Piyush
Re: Emulating missing linux syscalls
Am Wed, Apr 13, 2022 at 09:51:31PM - schrieb Christos Zoulas: > In article , Joerg Sonnenberger > wrote: > >Am Tue, Apr 12, 2022 at 04:56:05PM - schrieb Christos Zoulas: > > >splice(2) as a concept is much older than the current Linux implementation. > >There is no reason why zero-copying for sockets should require a > >different system call for zero-copying from/to pipes. There are valid > >reasons for other combinations, too. Consider /bin/cp for example. > > You don't need two system calls because the kernel knows the type of > the file descriptors and can dispatch to different implementations. > One of the questions is do you provide the means to pass an additional > header/trailer to the output data like FreeBSD does for its sendfile(2) > implementation? > > int > splice(int infd, off_t *inoff, int outfd, off_t *outoff, size_t len, > const struct { > struct iov *head; > size_t headcnt; > struct iov *tail; > size_t tailcnt; > } *ht, int flags); There are essentially two use cases here: (1) I want a simple interface to transfer data from one fd to another without extra copies. (2) I wanto avoid copies AND I want to avoid system calls. For the former: int splice(int dstfd, int srcfd, off_t *len); is more than good enough. "Transfer up to [*len] octets from srcfd to dstfd, updating [len] with the actually transferred amount and returning the first error if any. For the second category, an interface more like the posix_spawn interface (but without all the extra allocations) would be useful. > >I was saying that the Linux system call can be implemented without a > >kernel backend, because I don't consider zero copy a necessary part of > >the interface contract. It's a perfectly valid, if a bit slower > >implementation to do allocate a kernel buffer and do IO via that. > > Of course, but how do you make an existing binary use it? LD_PRELOAD > a binary to override the symbol in the linux glibc? By that logic you > don't need an in kernel linux emulation, you can do it all in userland :-) You still provide the system call as front end, but internally implement it on top of regular read/write to a temporary buffer. Joerg
Re: Emulating missing linux syscalls
On Thu, Apr 14, 2022 at 3:22 AM Christos Zoulas wrote: > > In article , Joerg Sonnenberger > wrote: > >Am Tue, Apr 12, 2022 at 04:56:05PM - schrieb Christos Zoulas: > > >splice(2) as a concept is much older than the current Linux implementation. > >There is no reason why zero-copying for sockets should require a > >different system call for zero-copying from/to pipes. There are valid > >reasons for other combinations, too. Consider /bin/cp for example. > I was under the assumption that zero-copying would be a preference. I did go through /bin/cp and the important copy_file() function. There mmap() is being used and then data is written to the destination using write(2) in chunks. Thanks for this Joerg! > You don't need two system calls because the kernel knows the type of > the file descriptors and can dispatch to different implementations. Yes. Therefore, I am assuming that only one general splice(2) function will be implemented and in case it's supplied a socketfd, it will behave like sendfile(2). (as it is also clear from the function def you have provided under.) Also, add the sendfile(2) functionality and have it invoke splice(2). > One of the questions is do you provide the means to pass an additional > header/trailer to the output data like FreeBSD does for its sendfile(2) > implementation? > > int > splice(int infd, off_t *inoff, int outfd, off_t *outoff, size_t len, > const struct { > struct iov *head; > size_t headcnt; > struct iov *tail; > size_t tailcnt; > } *ht, int flags); > I will be more than happy to provide the functionality (taking reference from writev(2) for the struct iovec and the FreeBSD implementation of sendfile(2)). > >I was saying that the Linux system call can be implemented without a > >kernel backend, because I don't consider zero copy a necessary part of > >the interface contract. It's a perfectly valid, if a bit slower > >implementation to do allocate a kernel buffer and do IO via that. > Right, Joerg! As I was initially also hoping to broaden the project by actually adding the syscall to the NetBSD kernel as well (adds a feature) and then have the compat_linux layer profit from that call. Unless that is something you are trying to avoid/steer away from. Now the final question for me is: The splice() prototype that you just mentioned above, Christos. Is that for a NetBSD syscall (as I would hope given the struct iovec parameter) and then have both splice(2) and sendfile(2) implemented in compat_linux layer profiting from this syscall? Or is it just splice(2) and sendfile(2) (which will call splice(2)) both in the linux layer only? > Of course, but how do you make an existing binary use it? LD_PRELOAD > a binary to override the symbol in the linux glibc? By that logic you > don't need an in kernel linux emulation, you can do it all in userland :-) > Christos, if you can shine some more light on this. I guess this will make a great proposal and I will send you something by Monday, I hope, for a first pass. Hope to hear from you soon. -- Regards, Piyush
Re: Emulating missing linux syscalls
In article , Joerg Sonnenberger wrote: >Am Tue, Apr 12, 2022 at 04:56:05PM - schrieb Christos Zoulas: >splice(2) as a concept is much older than the current Linux implementation. >There is no reason why zero-copying for sockets should require a >different system call for zero-copying from/to pipes. There are valid >reasons for other combinations, too. Consider /bin/cp for example. You don't need two system calls because the kernel knows the type of the file descriptors and can dispatch to different implementations. One of the questions is do you provide the means to pass an additional header/trailer to the output data like FreeBSD does for its sendfile(2) implementation? int splice(int infd, off_t *inoff, int outfd, off_t *outoff, size_t len, const struct { struct iov *head; size_t headcnt; struct iov *tail; size_t tailcnt; } *ht, int flags); >I was saying that the Linux system call can be implemented without a >kernel backend, because I don't consider zero copy a necessary part of >the interface contract. It's a perfectly valid, if a bit slower >implementation to do allocate a kernel buffer and do IO via that. Of course, but how do you make an existing binary use it? LD_PRELOAD a binary to override the symbol in the linux glibc? By that logic you don't need an in kernel linux emulation, you can do it all in userland :-) christos
Re: Emulating missing linux syscalls
Am Tue, Apr 12, 2022 at 04:56:05PM - schrieb Christos Zoulas: > In article , Joerg Sonnenberger > wrote: > >Am Tue, Apr 12, 2022 at 12:29:21PM - schrieb Christos Zoulas: > >> In article > >, > >> Piyush Sachdeva wrote: > >> >-=-=-=-=-=- > >> > > >> >Dear Stephen Borrill, > >> >My name is Piyush, and I was looking into the > >> >'Emulating missing Linux syscalls' project hoping to contribute > >> >to this year's GSoC. > >> > > >> >I wanted to be sure of a few basic things before I go ahead: > >> >- linux binaries are found in- src/sys/compat/linux > >> >- particular implementation in - src/sys/compat/linux/common > >> >- a few architecture-specific implementations in- > >> > src/sys/compat/linux/arch/. > >> >- The src/sys/compat/linux/arch//linux_syscalls.c file > >> > lists of system calls, and states if a particular syscall is present or > >> >not. > >> > > >> >I was planning to work on the 'sendfile()' syscall, which I believe > >> >is unimplemented for amd64 and a few other architectures as well. > >> > > >> >Considering the above points, I was hoping you could point me in > >> >the right direction for this project. Hope to hear from you soon. > >> > >> I would look into porting the FreeBSD implementation of sendfile to NetBSD. > > > >sendfile(2) for Linux compat can be emulated in the kernel without > >backing. That said, a real splice(2) or even splicev(2) would be really > >nice to have. But that's a different project and arguable, a potentially > >more generally useful one, too. > > > Yes, splice is more general (as opposed to send a file to a socket), but I > think splice has limitations too (one of the fds needs to be a pipe). > Is that true only for linux? splice(2) as a concept is much older than the current Linux implementation. There is no reason why zero-copying for sockets should require a different system call for zero-copying from/to pipes. There are valid reasons for other combinations, too. Consider /bin/cp for example. I was saying that the Linux system call can be implemented without a kernel backend, because I don't consider zero copy a necessary part of the interface contract. It's a perfectly valid, if a bit slower implementation to do allocate a kernel buffer and do IO via that. Joerg
Re: Emulating missing linux syscalls
Thank you Christos and Joerg! On Tue, Apr 12, 2022 at 10:26 PM Christos Zoulas wrote: > > In article , Joerg Sonnenberger > wrote: > > >>> I would look into porting the FreeBSD implementation of sendfile to > >>> NetBSD. > I had a look at the FreeBSD implementation. What I found was- - linux_sendfile() function in freebsd-src/sys/compat/linux/linux_socket.c ends up calling linux_sendfile_common() (in the same file) which in turn calls fo_sendfile(). I am guessing, this is supported by sendfile(2) syscall which is present in the FreeBSD kernel. - Therefore in case the implementation just needs to be ported, then it would make for a very simple project. > > >> sendfile(2) for Linux compat can be emulated in the kernel without > >> backing. Joerg, would you please explain to me how this will be possible? As I understand, anything in the compat layer needs backing functions, and I didn't find anything pertaining to sendfile(2) as a syscall in the NetBSD kernel. Or maybe you are talking about using in-kernel support functions, which would have been used to support sendfile(2) had it been present? In this case, I guess, these functions support zero-copy and it will be great if you could point me to them. > > That said, a real splice(2) or even splicev(2) would be really > >> nice to have. But that's a different project and arguable, a potentially > >> more generally useful one, too. > > > > > Yes, splice is more general (as opposed to send a file to a socket), but I > > think splice has limitations too (one of the fds needs to be a pipe). > > Is that true only for linux? > splice(2) for sure is an amazing project, and as Christos said, splice(2) requires one of the fds to be a pipe. Given that splice is only implemented in linux (from what I found), we can have a slightly different implementation in the NetBSD kernel according to requirements (if allowed). I haven't found sendfile(2) or splice(2) syscall in the NetBSD kernel. I did find a reference to sendfile(), but that was for the tftp daemon. It will make an interesting project, to add support for these calls in the NetBSD kernel first. Later these very syscalls can back functionality for the linux compat layer as well. What I wish to know is, what other zero-copy functionality is already present in the NetBSD kernel, which can support these two system calls. I hope this makes some sense and please do correct me where I have made a wrong assumption. Hope to hear from you soon -- Regards, Piyush
Re: Emulating missing linux syscalls
In article , Joerg Sonnenberger wrote: >Am Tue, Apr 12, 2022 at 12:29:21PM - schrieb Christos Zoulas: >> In article >, >> Piyush Sachdeva wrote: >> >-=-=-=-=-=- >> > >> >Dear Stephen Borrill, >> >My name is Piyush, and I was looking into the >> >'Emulating missing Linux syscalls' project hoping to contribute >> >to this year's GSoC. >> > >> >I wanted to be sure of a few basic things before I go ahead: >> >- linux binaries are found in- src/sys/compat/linux >> >- particular implementation in - src/sys/compat/linux/common >> >- a few architecture-specific implementations in- >> > src/sys/compat/linux/arch/. >> >- The src/sys/compat/linux/arch//linux_syscalls.c file >> > lists of system calls, and states if a particular syscall is present or >> >not. >> > >> >I was planning to work on the 'sendfile()' syscall, which I believe >> >is unimplemented for amd64 and a few other architectures as well. >> > >> >Considering the above points, I was hoping you could point me in >> >the right direction for this project. Hope to hear from you soon. >> >> I would look into porting the FreeBSD implementation of sendfile to NetBSD. > >sendfile(2) for Linux compat can be emulated in the kernel without >backing. That said, a real splice(2) or even splicev(2) would be really >nice to have. But that's a different project and arguable, a potentially >more generally useful one, too. Yes, splice is more general (as opposed to send a file to a socket), but I think splice has limitations too (one of the fds needs to be a pipe). Is that true only for linux? christos
Re: Emulating missing linux syscalls
Am Tue, Apr 12, 2022 at 12:29:21PM - schrieb Christos Zoulas: > In article > , > Piyush Sachdeva wrote: > >-=-=-=-=-=- > > > >Dear Stephen Borrill, > >My name is Piyush, and I was looking into the > >'Emulating missing Linux syscalls' project hoping to contribute > >to this year's GSoC. > > > >I wanted to be sure of a few basic things before I go ahead: > >- linux binaries are found in- src/sys/compat/linux > >- particular implementation in - src/sys/compat/linux/common > >- a few architecture-specific implementations in- > > src/sys/compat/linux/arch/. > >- The src/sys/compat/linux/arch//linux_syscalls.c file > > lists of system calls, and states if a particular syscall is present or > >not. > > > >I was planning to work on the 'sendfile()' syscall, which I believe > >is unimplemented for amd64 and a few other architectures as well. > > > >Considering the above points, I was hoping you could point me in > >the right direction for this project. Hope to hear from you soon. > > I would look into porting the FreeBSD implementation of sendfile to NetBSD. sendfile(2) for Linux compat can be emulated in the kernel without backing. That said, a real splice(2) or even splicev(2) would be really nice to have. But that's a different project and arguable, a potentially more generally useful one, too. Joerg
Re: Emulating missing linux syscalls
In article , Piyush Sachdeva wrote: >-=-=-=-=-=- > >Dear Stephen Borrill, >My name is Piyush, and I was looking into the >'Emulating missing Linux syscalls' project hoping to contribute >to this year's GSoC. > >I wanted to be sure of a few basic things before I go ahead: >- linux binaries are found in- src/sys/compat/linux >- particular implementation in - src/sys/compat/linux/common >- a few architecture-specific implementations in- > src/sys/compat/linux/arch/. >- The src/sys/compat/linux/arch//linux_syscalls.c file > lists of system calls, and states if a particular syscall is present or >not. > >I was planning to work on the 'sendfile()' syscall, which I believe >is unimplemented for amd64 and a few other architectures as well. > >Considering the above points, I was hoping you could point me in >the right direction for this project. Hope to hear from you soon. I would look into porting the FreeBSD implementation of sendfile to NetBSD. christos