On 5/8/26 06:11, Stefano Garzarella wrote:
> On Fri, May 08, 2026 at 05:58:06AM -0400, Michael S. Tsirkin wrote:
>> On Fri, May 08, 2026 at 11:41:21AM +0200, Stefano Garzarella wrote:
>>> On Thu, May 07, 2026 at 06:48:47PM -0400, Michael S. Tsirkin wrote:
>>>> On Thu, May 07, 2026 at 02:59:13PM +0200, Stefano Garzarella wrote:
>>>>> On Thu, May 07, 2026 at 07:45:10AM -0400, Michael S. Tsirkin wrote:
>>>>>> On Thu, May 07, 2026 at 11:09:47AM +0200, Stefano Garzarella wrote:
>>>
>>> [...]
>>>
>>>>>>> For now, we're already doing something:
>>>>>>> merging the skuffs if they don't have EOM set.
>>>>>>
>>>>>>
>>>>>> Right that's good. You could go further and merge with EOM too
>>>>>> if you stick the info about message boundaries somewhere else.
>>>>>
>>>>> This adds a lot of complexity IMO, but we can try.
>>>>>
>>>>> Do you have something in mind?
>>>>
>>>> BER is clearly overkill but here's a POC that claude made for me,
>>>> just to give u an idea. It's clearly has a ton of issues,
>>>> for example I dislike how GFP_ATOMIC is handled.
>>>
>>> Okay, I somewhat understand, but clearly this isn't net material
>>
>> I doubt we have many other options given reverting the regression was
>> ruled out.
> 
> As Eric pointed out, we can't revert it.

Could there be an option to disable the mitigation in guests, for the
situation where the host is trusted?  There are VMMs that implement
AF_VSOCK in userspace with a backing AF_UNIX socket.

>>> so for now
>>> I think the best thing to do is to merge the fixup I sent (or something
>>> similar):
>>> https://lore.kernel.org/netdev/[email protected]/
>>
>> I reviewed that one, problem is it's a spec violation/change that we'll
>> have to support forever.
> 
> I have a few points to make on this, but let's discuss them there.
> 
>>
>>> This is a major change that should be merged with more caution.
>>> Could this have too much of an impact on performance?
>>>
>>> Thanks,
>>> Stefano
>>
>> It's really a POC, real patch is left as an excersise for the reader 
>> :).
> 
> eheh, I see, but honestly, this overcomplication scares me. I'll try to 
> think it over.
> 
>> The correct approach IMHO is to only start using this
>> when we wasted a lot of memory on small packets.
>>
>> For example, if sum(truesize) >= buf size.
>>
>> then we'll not see a perf impact unless it's already pathological.
> 
> Agree on this, which is similar to what I'm doing in that patch.  
> Reducing the advertised buf_alloc only in pathological cases (e.g.  
> overhead > buf_alloc).

This isn't enough to prevent data loss due to race conditions.

I'm CCing the virtio-comment list and a few others.

Right now, any application that needs to send massive amount of
data over a vsock is simply broken.  This isn't just theoretical.
It's causing real-world problems for users of Waypipe.
Waypipe forwards Wayland protocol messages over AF_VSOCK,
so it can send a large amount of traffic over the socket.
See https://gitlab.freedesktop.org/mstoceckl/waypipe/work_items/165.

If one is willing to mutate the ring buffer in-place, or to maintain
an auxiliary counter, it's possible to store all messages with bounded
(in practice) overhead.  Specifically:

- If the first byte of a block of data is nonzero, it's a
  variable-length length.  1 byte for messages less than 128 bytes.

- If the first byte of a block of data is zero, the subsequent bytes
  are a variable-length counter that stores the number of consecutive
  zero-byte messages.

That adds a lot of complexity, which is very unfortunate for something
that needs to be backported to stable kernels.  I also suspect it
requires all access to the ring buffer to take a lock rather than
being lock-free.  But it's the only approach that I can think of that
can work with the current spec.

Could there at least be a normative note stating that drivers and
devices should treat each message as consuming 1024 bytes + the size
of the message itself, and warning that anything that doesn't is
going to be broken in practice?

I'm CCing Val Packet (of Invisible Things Lab) and Alyssa Ross
(of Spectrum) because both of them are working on systems that rely
critically on vsock.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)

Attachment: OpenPGP_0xB288B55FFF9C22C1.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to