Hi,
this v2 addresses the review on the earlier quantization series.
Simon was right that the original 3/3 only showed the explicit
rcv_ssthresh-limited ALIGN-up behavior. For v2, 3/3 is replaced with an
OOO-memory-based reproducer that first grows rcv_ssthresh with in-order
data and then drives raw backed free_space below rcv_ssthresh without
advancing rcv_nxt. In the instrumented old-behavior run that shaped this
test, the critical ACK reached free_space=86190, rcv_ssthresh=86286,
and still advertised 87040 (85 << 10). With 2/3 applied, the same ACK
stays at 84.
That follow-up also clarified why the broader 2/3 change is required.
A narrower variant that preserved the old rcv_ssthresh-limited ALIGN-up
behavior was not sufficient: earlier ACKs still stored 85 in tp->rcv_wnd,
and tcp_select_window() later preserved that extra unit because shrinking
was disallowed. Keeping tp->rcv_wnd representable across the scaled
no-shrink path is what lets later ACKs settle at the correct
wire-visible edge.
Problem
=======
In the scaled no-shrink path, __tcp_select_window() rounds free_space up
to the receive-window scale quantum:
window = ALIGN(free_space, 1 << tp->rx_opt.rcv_wscale);
When raw backed free_space sits just below the next quantum, that can
expose fresh sender-visible credit that is not actually backed by the
current receive-memory state.
Approach
========
This repost keeps only the part with a clear fail-before/pass-after
story today:
- relax one unrelated packetdrill test which was pinning an
incidental advertised window
- keep tp->rcv_wnd representable in scaled units by rounding larger
windows down to the scale quantum
- preserve only the small non-zero case that would otherwise scale
away to zero; changing that longstanding non-zero-to-zero behavior
would be a separate change from the bug proven here
- prove the actual raw-free_space case with a packetdrill sequence
that reaches free_space < rcv_ssthresh without changing SO_RCVBUF
after the handshake
Tests
=====
Local validation:
- git diff --check
- checkpatch on the touched diff
- local vmksft targeted run of
net/packetdrill:tcp_rcv_quantization_credit.pkt passes with this
series applied for ipv4, ipv6, and ipv4-mapped-ipv6
- the same packetdrill fails on HEAD without 2/3 with:
expected: win 84
actual: win 85
Changes in v2
=============
- leave 1/3 unchanged
- rename gran to granularity in 2/3
- clarify in 2/3 why representable tp->rcv_wnd state is required across
later no-shrink transitions
- clarify in 2/3 that the smaller longstanding non-zero case remains
intentionally unchanged in this series
- replace 3/3 with the proven OOO-memory reproducer for the raw
free_space case
- drop the IPv4-only restriction in 3/3 after validating the test on
the default packetdrill protocol set
Series layout
=============
1/3 selftests: packetdrill: stop pinning rwnd in tcp_ooo_rcv_mss
2/3 tcp: keep scaled no-shrink window representable
3/3 selftests: packetdrill: cover scaled rwnd quantization slack
Thanks,
Wesley Atwell
---
net/ipv4/tcp_output.c | 16 +++++++++++-----
.../selftests/net/packetdrill/tcp_ooo_rcv_mss.pkt | 8 +++++---
.../packetdrill/tcp_rcv_quantization_credit.pkt | 62 ++++++++++++++++++++++
3 files changed, 78 insertions(+), 8 deletions(-)
--
2.43.0