On Donnerstag, 7. Oktober 2021 07:23:59 CEST Stefan Hajnoczi wrote: > On Mon, Oct 04, 2021 at 09:38:00PM +0200, Christian Schoenebeck wrote: > > At the moment the maximum transfer size with virtio is limited to 4M > > (1024 * PAGE_SIZE). This series raises this limit to its maximum > > theoretical possible transfer size of 128M (32k pages) according to the > > virtio specs: > > > > https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html# > > x1-240006 > Hi Christian, > I took a quick look at the code: > > - The Linux 9p driver restricts descriptor chains to 128 elements > (net/9p/trans_virtio.c:VIRTQUEUE_NUM)
Yes, that's the limitation that I am about to remove (WIP); current kernel patches: https://lore.kernel.org/netdev/cover.1632327421.git.linux_...@crudebyte.com/ > - The QEMU 9pfs code passes iovecs directly to preadv(2) and will fail > with EINVAL when called with more than IOV_MAX iovecs > (hw/9pfs/9p.c:v9fs_read()) Hmm, which makes me wonder why I never encountered this error during testing. Most people will use the 9p qemu 'local' fs driver backend in practice, so that v9fs_read() call would translate for most people to this implementation on QEMU side (hw/9p/9p-local.c): static ssize_t local_preadv(FsContext *ctx, V9fsFidOpenState *fs, const struct iovec *iov, int iovcnt, off_t offset) { #ifdef CONFIG_PREADV return preadv(fs->fd, iov, iovcnt, offset); #else int err = lseek(fs->fd, offset, SEEK_SET); if (err == -1) { return err; } else { return readv(fs->fd, iov, iovcnt); } #endif } > Unless I misunderstood the code, neither side can take advantage of the > new 32k descriptor chain limit? > > Thanks, > Stefan I need to check that when I have some more time. One possible explanation might be that preadv() already has this wrapped into a loop in its implementation to circumvent a limit like IOV_MAX. It might be another "it works, but not portable" issue, but not sure. There are still a bunch of other issues I have to resolve. If you look at net/9p/client.c on kernel side, you'll notice that it basically does this ATM kmalloc(msize); for every 9p request. So not only does it allocate much more memory for every request than actually required (i.e. say 9pfs was mounted with msize=8M, then a 9p request that actually would just need 1k would nevertheless allocate 8M), but also it allocates > PAGE_SIZE, which obviously may fail at any time. With those kernel patches above and QEMU being patched with these series as well, I can go above 4M msize now, and the test system runs stable if 9pfs was mounted with an msize not being "too high". If I try to mount 9pfs with msize being very high, the upper described kmalloc() issue would kick in and cause an immediate kernel oops when mounting. So that's a high priority issue that I still need to resolve. Best regards, Christian Schoenebeck