On Mon, May 11, 2026 at 12:54:44PM +0200, Stefano Garzarella wrote: > On Fri, 8 May 2026 at 12:38, Stefano Garzarella <[email protected]> > wrote: > > > > On Fri, May 08, 2026 at 06:33:13AM -0400, Michael S. Tsirkin wrote: > > >On Fri, May 08, 2026 at 12:01:50PM +0200, Stefano Garzarella wrote: > > >> On Fri, May 08, 2026 at 05:53:07AM -0400, Michael S. Tsirkin wrote: > > >> > On Fri, May 08, 2026 at 11:23:30AM +0200, Stefano Garzarella wrote: > > >> > > From: Stefano Garzarella <[email protected]> > > >> > > > > >> > > After commit 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb > > >> > > queue"), virtio_transport_inc_rx_pkt() subtracts per-skb overhead > > >> > > from > > >> > > buf_alloc when checking whether a new packet fits. This reduces the > > >> > > effective receive buffer below what the user configured via > > >> > > SO_VM_SOCKETS_BUFFER_SIZE, causing legitimate data packets to be > > >> > > silently dropped and applications that rely on the full buffer size > > >> > > to deadlock. > > >> > > > > >> > > Also, the reduced space is not communicated to the remote peer, so > > >> > > its credit calculation accounts more credit than the receiver will > > >> > > actually accept, causing data loss (there is no retransmission). > > >> > > > > >> > > This also causes failures in tools/testing/vsock/vsock_test.c. > > >> > > Test 18 sometimes fails, while test 22 always fails in this way: > > >> > > 18 - SOCK_STREAM MSG_ZEROCOPY...hash mismatch > > >> > > > > >> > > 22 - SOCK_STREAM virtio credit update + SO_RCVLOWAT...send > > >> > > failed: > > >> > > Resource temporarily unavailable > > >> > > > > >> > > Fix this by introducing virtio_transport_rx_buf_size() to calculate > > >> > > the > > >> > > size of the RX buffer based on the overhead. Using it in the > > >> > > acceptance > > >> > > check, the advertised buf_alloc, and the credit update decision. > > >> > > Use buf_alloc * 2 as total budget (payload + overhead), similar to > > >> > > how > > >> > > SO_RCVBUF is doubled to reserve space for sk_buff metadata. > > >> > > The function returns buf_alloc as long as overhead fits within the > > >> > > reservation, then gradually reduces toward 0 as overhead exceeds > > >> > > buf_alloc (e.g. under small-packet flooding), informing the peer to > > >> > > slow down. > > >> > > > > >> > > Fixes: 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb > > >> > > queue") > > >> > > Signed-off-by: Stefano Garzarella <[email protected]> > > >> > > > >> > > > >> > unfortunately, this is a bit of a spec violation and there is no > > >> > guarantee > > >> > it helps. > > >> > > >> Loosing data like we are doing in 059b7dbd20a6 is even worse IMHO. > > >> > > >> > > > >> > a spec violation because the spec says: > > >> > Only payload bytes are counted and header bytes are not > > >> > included > > >> > > > >> > and the implication is that a side can not reduce its own buf_alloc. > > >> > > > >> > no guarantee because the other side is not required to process your > > >> > packets, so it might not see your buf alloc reduction. > > >> > > > >> > as designed in the current spec, you can only increase your buf alloc, > > >> > not decrease it. > > >> > > >> We never enforced this, currently an user can reduce it by > > >> SO_VM_SOCKETS_BUFFER_SIZE and we haven't blocked it since virtio-vsock > > >> was > > >> introduced, should we update the spec? > > > > > > > > >it's not that we need to enforce it, it's that all synchronization > > >assumes this. as in, anyone can use an old copy until they run out > > >of credits. > > > > > > > > >> > > > >> > what can be done: > > >> > - more efficient storage for small packets (poc i posted) > > >> > - reduce buf alloc ahead of the time > > >> > > >> That's basically what I'm doing here: I'm using twice the size of > > >> `buf_alloc` (just like `SO_RCVBUF` does for other socket types) and > > >> telling > > >> the other peer just `buf_alloc`. > > >> > > >> But then, somehow, we have to let the other person know that we're > > >> running > > >> out of space. With this patch that only happens when the other peer isn't > > >> behaving properly, sending so many small packets that the overhead > > >> exceeds > > >> `buf_alloc`. > > >> > > >> Stefano > > > > > >what is "not proper" here, it is up to the application what to send. > > > > Sure, but here we're just slowing down the application by telling it we > > don't have any more space. > > > > Again, without this patch we are just dropping data, which IMO is even > > worse. > > > > So I think we should merge this for now, while we handle better the EOM. > > If you prefer, I can drop the part where we reduce the buf_alloc > > advertised to the other peer, but at least we should drop data after > > `buf_alloc * 2` IMO. > > Okay, I thought it over over the weekend, and I agree that this patch > still doesn't solve the problem and would still result in packet loss. > So, until we resolve the issue permanently, and since 059b7dbd20a6 is > coming to stable, I'd like to rework this patch so that we only start > dropping packets when the overhead plus the queued bytes exceeds > `buf_alloc * 2`. > Removing the other changes to reduce the buf_alloc advertised, but > terminating the connection so that both peers are aware that something > went wrong. > > Any objections? > > Stefano
Let's try to first fix it upstream properly please. Discuss backporting later. -- MST

