> -----Original Message----- > From: Paolo Abeni <pab...@redhat.com> > Sent: Tuesday, April 29, 2025 12:14 PM > To: Chia-Yu Chang (Nokia) <chia-yu.ch...@nokia-bell-labs.com>; > ho...@kernel.org; dsah...@kernel.org; kun...@amazon.com; > b...@vger.kernel.org; net...@vger.kernel.org; dave.t...@gmail.com; > j...@mojatatu.com; k...@kernel.org; step...@networkplumber.org; > xiyou.wangc...@gmail.com; j...@resnulli.us; da...@davemloft.net; > eduma...@google.com; andrew+net...@lunn.ch; donald.hun...@gmail.com; > a...@fiberby.net; liuhang...@gmail.com; sh...@kernel.org; > linux-kselftest@vger.kernel.org; i...@kernel.org; ncardw...@google.com; Koen > De Schepper (Nokia) <koen.de_schep...@nokia-bell-labs.com>; g.white > <g.wh...@cablelabs.com>; ingemar.s.johans...@ericsson.com; > mirja.kuehlew...@ericsson.com; chesh...@apple.com; rs.i...@gmx.at; > jason_living...@comcast.com; vidhi_goel <vidhi_g...@apple.com> > Cc: Olivier Tilmans (Nokia) <olivier.tilm...@nokia.com> > Subject: Re: [PATCH v5 net-next 03/15] tcp: AccECN core > > > CAUTION: This is an external email. Please be very careful when clicking > links or opening attachments. See the URL nok.it/ext for additional > information. > > > > On 4/22/25 5:35 PM, chia-yu.ch...@nokia-bell-labs.com wrote: > > @@ -298,6 +298,9 @@ struct tcp_sock { > > u32 snd_up; /* Urgent pointer */ > > u32 delivered; /* Total data packets delivered incl. rexmits > > */ > > u32 delivered_ce; /* Like the above but only ECE marked packets > > */ > > + u32 received_ce; /* Like the above but for rcvd CE marked pkts > > */ > > + u8 received_ce_pending:4, /* Not yet transmit cnt of received_ce > > */ > > + unused2:4; > > AFAICS this uses a 4 bytes hole present prior to this patch after "rcv_wnd", > leaving a 3 bytes hole after 'unused2'. Possibly should be worth mentioning > the hole presence. > > @Eric: would it make sense use this hole for 'noneagle'/'rate_app_limited' > and shrink the 'tcp_sock_write_txrx' group a bit?
Hi Paolo, Thanks for the feedback and sorry for my late response. I can either mention it here or move the places. However, as the following patches will continue change holes, so maybe I mention the hole change per patch make it more understandable. If this is find for you, then I will make revision in the next version. > > [...] > > @@ -5095,7 +5097,7 @@ static void __init tcp_struct_check(void) > > /* 32bit arches with 8byte alignment on u64 fields might need padding > > * before tcp_clock_cache. > > */ > > - CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock, tcp_sock_write_txrx, 92 > > + 4); > > + CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock, > > + tcp_sock_write_txrx, 97 + 7); > > Really? I *think* the change here should not move the cacheline end around, > due to holes. Could you please include the relevant pahole > (trimmed) output prior to this patch and after in the commit message? > Here is pahole output before and after this patch. Indeed, it creates 3 bytes hole after 'unused2' so it shall add (5+3)=8 to the original 92 + 4. Finally, it will be 92 + 4 + (5 + 3) = 97 + 7. *BEFORE this patch* __u8 __cacheline_group_begin__tcp_sock_write_txrx[0]; /* 2585 0 */ /* XXX 3 bytes hole, try to pack */ __be32 pred_flags; /* 2588 4 */ u64 tcp_clock_cache; /* 2592 8 */ u64 tcp_mstamp; /* 2600 8 */ u32 rcv_nxt; /* 2608 4 */ u32 snd_nxt; /* 2612 4 */ u32 snd_una; /* 2616 4 */ u32 window_clamp; /* 2620 4 */ /* --- cacheline 41 boundary (2624 bytes) --- */ u32 srtt_us; /* 2624 4 */ u32 packets_out; /* 2628 4 */ u32 snd_up; /* 2632 4 */ u32 delivered; /* 2636 4 */ u32 delivered_ce; /* 2640 4 */ u32 app_limited; /* 2644 4 */ u32 rcv_wnd; /* 2648 4 */ struct tcp_options_received rx_opt; /* 2652 24 */ u8 nonagle:4; /* 2676: 0 1 */ u8 rate_app_limited:1; /* 2676: 4 1 */ /* XXX 3 bits hole, try to pack */ __u8 __cacheline_group_end__tcp_sock_write_txrx[0]; /* 2677 0 */ /* XXX 3 bytes hole, try to pack */ *AFTER this patch* __u8 __cacheline_group_begin__tcp_sock_write_txrx[0]; /* 2585 0 */ /* XXX 3 bytes hole, try to pack */ __be32 pred_flags; /* 2588 4 */ u64 tcp_clock_cache; /* 2592 8 */ u64 tcp_mstamp; /* 2600 8 */ u32 rcv_nxt; /* 2608 4 */ u32 snd_nxt; /* 2612 4 */ u32 snd_una; /* 2616 4 */ u32 window_clamp; /* 2620 4 */ /* --- cacheline 41 boundary (2624 bytes) --- */ u32 srtt_us; /* 2624 4 */ u32 packets_out; /* 2628 4 */ u32 snd_up; /* 2632 4 */ u32 delivered; /* 2636 4 */ u32 delivered_ce; /* 2640 4 */ u32 received_ce; /* 2644 4 */ u8 received_ce_pending:4; /* 2648: 0 1 */ u8 unused2:4; /* 2648: 4 1 */ /* XXX 3 bytes hole, try to pack */ u32 app_limited; /* 2652 4 */ u32 rcv_wnd; /* 2656 4 */ struct tcp_options_received rx_opt; /* 2660 24 */ u8 nonagle:4; /* 2684: 0 1 */ u8 rate_app_limited:1; /* 2684: 4 1 */ /* XXX 3 bits hole, try to pack */ __u8 __cacheline_group_end__tcp_sock_write_txrx[0]; /* 2685 0 */ > [...] > > @@ -384,17 +387,16 @@ static void tcp_data_ecn_check(struct sock *sk, const > > struct sk_buff *skb) > > if (tcp_ca_needs_ecn(sk)) > > tcp_ca_event(sk, CA_EVENT_ECN_IS_CE); > > > > - if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) { > > + if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR) && > > + tcp_ecn_mode_rfc3168(tp)) { > > /* Better not delay acks, sender can have a very low > > cwnd */ > > tcp_enter_quickack_mode(sk, 2); > > tp->ecn_flags |= TCP_ECN_DEMAND_CWR; > > } > > - tp->ecn_flags |= TCP_ECN_SEEN; > > At this point is not entirely clear to me why the removal of the above line > is needed/correct. What I see is to move the place to set this flag from here to tcp_ecn_received_counters(). Also, this function will work when receiving data by calling tcp_event_data_recv(). While tcp_ecn_received_counters() takes effect in more places (e.g., either len <= tcp_header_len or NOT) to ensure ACE counter tracks all segments including pure ACKs. > [...] > > @@ -4056,6 +4118,11 @@ static int tcp_ack(struct sock *sk, const > > struct sk_buff *skb, int flag) > > > > tcp_rack_update_reo_wnd(sk, &rs); > > > > + if (tcp_ecn_mode_accecn(tp)) > > + ecn_count = tcp_accecn_process(sk, skb, > > + tp->delivered - delivered, > > + &flag); > > AFAICS the above could set FLAG_ECE in flags, menaning the previous > tcp_clean_rtx_queue() will run with such flag cleared and the later function > checking such flag will not. I wondering if this inconsistency could cause > problems? This flag set by tcp_accecn_process() will be used by following functions: tcp_in_ack_event(), tcp_fastretrans_alert(). And this shall only impact the AccECN mode. Best regards, Chia-Yu > > /P