On Mon, May 11, 2026 at 04:48:41PM -0700, Jakub Kicinski wrote:
On Mon, 11 May 2026 15:17:56 +0200 Stefano Garzarella wrote:
> > Okay, I thought it over over the weekend, and I agree that this patch
> > still doesn't solve the problem and would still result in packet loss.
> > So, until we resolve the issue permanently, and since 059b7dbd20a6 is
> > coming to stable, I'd like to rework this patch so that we only start
> > dropping packets when the overhead plus the queued bytes exceeds
> > `buf_alloc * 2`.
> > Removing the other changes to reduce the buf_alloc advertised, but
> > terminating the connection so that both peers are aware that something
> > went wrong.
> >
> > Any objections?
> >
> > Stefano
>
> Let's try to first fix it upstream properly please. Discuss backporting
> later.
Commit 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb
queue") is already in Linus tree and will land soon on stable. Which
issue do you see on having a patch on top of that to close the
connection instead of losing data and breaking our test suite?
IMO we need that change in any case, because the previous code also
discarded packets without any notification, whereas breaking the
connection would be better in that case.
Sorry if I'm speaking out of turn or misunderstanding but, like Michael
says, let's focus on fixing this the way we want it to be fixed?
IMHO, these are two separate issues (related but still separated):
1. taking overhead into account and setting a limit (which Eric started
doing in 059b7dbd20a6)
2. reducing overhead by also merging SEQPACKET as Michael proposed (we
already do this for STREAM).
Even with option 2, the overhead won’t disappear, and we should maintain
a limit as Eric rightly pointed out, especially since the initial
problem was a flood of EOMs with 0 or 1 bytes, which option 2 doesn’t
solve, since the overhead will eventually explode.
So I agree that we should implement 2 as Michael proposed, but I think
we also need to fix 1 by improving it slightly (as I’ve tried here, and
would like to do in v2).
Since I have v2 ready and tested, I'll send it because I think we should
have it in any case, even with option 2 implemented.
IIUC you are trying to minimize the size of the fix, please don't worry
about the LoC in the diff at this stage.
Okay, I'll try to work on it, but this week is a nightmare. I hope to
come up with something by building on Michael's idea.
Thanks,
Stefano