On Mon, May 4, 2026 at 7:53 AM Ankit Jain <[email protected]> wrote:
>
> When an application locks SO_RCVBUF, it expects strict memory bounds and
> disables TCP window auto-tuning. However, recent TCP memory fragmentation
> optimizations still apply dynamic truesize penalties to the `scaling_ratio`
> of these locked sockets.
>
> For workloads processing small, fragmented packets (like Java's Tomcat),
> this penalty drops the scaling_ratio to 1. This shrinks the dynamically
> calculated advertised window, leading to Silly Window Syndrome (SWS)
> deadlocks and 504 Gateway Timeouts.
>
> This patch fixes the issue by bypassing the truesize penalty for sockets
> with `SOCK_RCVBUF_LOCK` set. To ensure the kernel still defends against
> memory exhaustion from large aggregate payloads (e.g., GRO), the penalty
> is still applied if `skb->len` exceeds the advertised MSS.
>
> Fixes: a2cbb1603943 ("tcp: Update window clamping condition")
> Reported-by: Karen Badiryan <[email protected]>
> Signed-off-by: Ankit Jain <[email protected]>
> ---
>  net/ipv4/tcp_input.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index d5c9e65d9760..569299dafa88 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -240,8 +240,14 @@ static void tcp_measure_rcv_mss(struct sock *sk, const 
> struct sk_buff *skb)
>                 /* Note: divides are still a bit expensive.
>                  * For the moment, only adjust scaling_ratio
>                  * when we update icsk_ack.rcv_mss.
> +                *
> +                * Protect locked SO_RCVBUF from Silly Window Syndrome
> +                * due to truesize penalties on small packets. Allow
> +                * penalty if aggregate payload (e.g., GRO) exceeds MSS.
>                  */
> -               if (unlikely(len != icsk->icsk_ack.rcv_mss)) {
> +               if (unlikely(len != icsk->icsk_ack.rcv_mss &&
> +                            (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK) ||
> +                             skb->len > tcp_sk(sk)->advmss))) {

Testing tp->advmss is not doing what you want I think.

A remote peer can send GRO packets with tiny segments, regardless of tp->advmss

If GRO is what you are looking for, why not testing (skb->len > len) ?

>                         u64 val = (u64)skb->len << TCP_RMEM_TO_WIN_SCALE;
>                         u8 old_ratio = tcp_sk(sk)->scaling_ratio;
>
> --
> 2.53.0
>

Reply via email to