On 5/23/26 17:00, Demi Marie Obenour wrote: > On 5/8/26 06:11, Stefano Garzarella wrote: >> On Fri, May 08, 2026 at 05:58:06AM -0400, Michael S. Tsirkin wrote: >>> On Fri, May 08, 2026 at 11:41:21AM +0200, Stefano Garzarella wrote: >>>> On Thu, May 07, 2026 at 06:48:47PM -0400, Michael S. Tsirkin wrote: >>>>> On Thu, May 07, 2026 at 02:59:13PM +0200, Stefano Garzarella wrote: >>>>>> On Thu, May 07, 2026 at 07:45:10AM -0400, Michael S. Tsirkin wrote: >>>>>>> On Thu, May 07, 2026 at 11:09:47AM +0200, Stefano Garzarella wrote: >>>> >>>> [...] >>>> >>>>>>>> For now, we're already doing something: >>>>>>>> merging the skuffs if they don't have EOM set. >>>>>>> >>>>>>> >>>>>>> Right that's good. You could go further and merge with EOM too >>>>>>> if you stick the info about message boundaries somewhere else. >>>>>> >>>>>> This adds a lot of complexity IMO, but we can try. >>>>>> >>>>>> Do you have something in mind? >>>>> >>>>> BER is clearly overkill but here's a POC that claude made for me, >>>>> just to give u an idea. It's clearly has a ton of issues, >>>>> for example I dislike how GFP_ATOMIC is handled. >>>> >>>> Okay, I somewhat understand, but clearly this isn't net material >>> >>> I doubt we have many other options given reverting the regression was >>> ruled out. >> >> As Eric pointed out, we can't revert it. > > Could there be an option to disable the mitigation in guests, for the > situation where the host is trusted? There are VMMs that implement > AF_VSOCK in userspace with a backing AF_UNIX socket. > >>>> so for now >>>> I think the best thing to do is to merge the fixup I sent (or something >>>> similar): >>>> https://lore.kernel.org/netdev/[email protected]/ >>> >>> I reviewed that one, problem is it's a spec violation/change that we'll >>> have to support forever. >> >> I have a few points to make on this, but let's discuss them there. >> >>> >>>> This is a major change that should be merged with more caution. >>>> Could this have too much of an impact on performance? >>>> >>>> Thanks, >>>> Stefano >>> >>> It's really a POC, real patch is left as an excersise for the reader >>> :). >> >> eheh, I see, but honestly, this overcomplication scares me. I'll try to >> think it over. >> >>> The correct approach IMHO is to only start using this >>> when we wasted a lot of memory on small packets. >>> >>> For example, if sum(truesize) >= buf size. >>> >>> then we'll not see a perf impact unless it's already pathological. >> >> Agree on this, which is similar to what I'm doing in that patch. >> Reducing the advertised buf_alloc only in pathological cases (e.g. >> overhead > buf_alloc). > > This isn't enough to prevent data loss due to race conditions. > > I'm CCing the virtio-comment list and a few others. > > Right now, any application that needs to send massive amount of > data over a vsock is simply broken. This isn't just theoretical. > It's causing real-world problems for users of Waypipe. > Waypipe forwards Wayland protocol messages over AF_VSOCK, > so it can send a large amount of traffic over the socket. > See https://gitlab.freedesktop.org/mstoceckl/waypipe/work_items/165. > > If one is willing to mutate the ring buffer in-place, or to maintain > an auxiliary counter, it's possible to store all messages with bounded > (in practice) overhead. Specifically: > > - If the first byte of a block of data is nonzero, it's a > variable-length length. 1 byte for messages less than 128 bytes. > > - If the first byte of a block of data is zero, the subsequent bytes > are a variable-length counter that stores the number of consecutive > zero-byte messages. > > That adds a lot of complexity, which is very unfortunate for something > that needs to be backported to stable kernels. I also suspect it > requires all access to the ring buffer to take a lock rather than > being lock-free. But it's the only approach that I can think of that > can work with the current spec. > > Could there at least be a normative note stating that drivers and > devices should treat each message as consuming 1024 bytes + the size > of the message itself, and warning that anything that doesn't is > going to be broken in practice? > > I'm CCing Val Packet (of Invisible Things Lab) and Alyssa Ross > (of Spectrum) because both of them are working on systems that rely > critically on vsock.
Update: I see that patches have been upstreamed (with CC: stable) that reset the connection instead of data loss. -- Sincerely, Demi Marie Obenour (she/her/hers)
OpenPGP_0xB288B55FFF9C22C1.asc
Description: OpenPGP public key
OpenPGP_signature.asc
Description: OpenPGP digital signature
