> -----Original Message-----
> From: Chia-Yu Chang (Nokia)
> Sent: Monday, May 5, 2025 5:25 PM
> To: Paolo Abeni <[email protected]>; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; Koen De Schepper (Nokia)
> <[email protected]>; g.white <[email protected]>;
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; vidhi_goel
> <[email protected]>
> Cc: Olivier Tilmans (Nokia) <[email protected]>
> Subject: RE: [PATCH v5 net-next 03/15] tcp: AccECN core
>
> > -----Original Message-----
> > From: Paolo Abeni <[email protected]>
> > Sent: Tuesday, April 29, 2025 12:14 PM
> > To: Chia-Yu Chang (Nokia) <[email protected]>;
> > [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > Koen De Schepper (Nokia) <[email protected]>;
> > g.white <[email protected]>; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]; vidhi_goel <[email protected]>
> > Cc: Olivier Tilmans (Nokia) <[email protected]>
> > Subject: Re: [PATCH v5 net-next 03/15] tcp: AccECN core
> >
> >
> > CAUTION: This is an external email. Please be very careful when clicking
> > links or opening attachments. See the URL nok.it/ext for additional
> > information.
> >
> >
> >
> > On 4/22/25 5:35 PM, [email protected] wrote:
> > > @@ -298,6 +298,9 @@ struct tcp_sock {
> > > u32 snd_up; /* Urgent pointer */
> > > u32 delivered; /* Total data packets delivered incl.
> > > rexmits */
> > > u32 delivered_ce; /* Like the above but only ECE marked
> > > packets */
> > > + u32 received_ce; /* Like the above but for rcvd CE marked
> > > pkts */
> > > + u8 received_ce_pending:4, /* Not yet transmit cnt of
> > > received_ce */
> > > + unused2:4;
> >
> > AFAICS this uses a 4 bytes hole present prior to this patch after
> > "rcv_wnd", leaving a 3 bytes hole after 'unused2'. Possibly should be worth
> > mentioning the hole presence.
> >
> > @Eric: would it make sense use this hole for 'noneagle'/'rate_app_limited'
> > and shrink the 'tcp_sock_write_txrx' group a bit?
Hi,
By moving noneagle/rate_app_limited in the beginning of this group, I manage to
reuse the 3-byte hole in the beginning of
__cacheline_group_begin__tcp_sock_write_txrx.
Thus, I will include it in the next version, and you can find the pahole
results below:
/*BEFORE this patch*/
__u8 __cacheline_group_end__tcp_sock_write_tx[0]; /*
2585 0 */
__u8 __cacheline_group_begin__tcp_sock_write_txrx[0]; /*
2585 0 */
/* XXX 3 bytes hole, try to pack */
__be32 pred_flags; /* 2588 4 */
u64 tcp_clock_cache; /* 2592 8 */
u64 tcp_mstamp; /* 2600 8 */
u32 rcv_nxt; /* 2608 4 */
u32 snd_nxt; /* 2612 4 */
u32 snd_una; /* 2616 4 */
u32 window_clamp; /* 2620 4 */
/* --- cacheline 41 boundary (2624 bytes) --- */
u32 srtt_us; /* 2624 4 */
u32 packets_out; /* 2628 4 */
u32 snd_up; /* 2632 4 */
u32 delivered; /* 2636 4 */
u32 delivered_ce; /* 2640 4 */
u32 app_limited; /* 2644 4 */
u32 rcv_wnd; /* 2648 4 */
struct tcp_options_received rx_opt; /* 2652 24 */
u8 nonagle:4; /* 2676: 0 1 */
u8 rate_app_limited:1; /* 2676: 4 1 */
/* XXX 3 bits hole, try to pack */
__u8 __cacheline_group_end__tcp_sock_write_txrx[0]; /*
2677 0 */
/* XXX 3 bytes hole, try to pack */
__u8 __cacheline_group_begin__tcp_sock_write_rx[0]
__attribute__((__aligned__(8))); /* 2680 0 */
/*AFTER this patch*/
__u8 __cacheline_group_end__tcp_sock_write_tx[0]; /*
2585 0 */
__u8 __cacheline_group_begin__tcp_sock_write_txrx[0]; /*
2585 0 */
u8 nonagle:4; /* 2585: 0 1 */
u8 rate_app_limited:1; /* 2585: 4 1 */
/* XXX 3 bits hole, try to pack */
/* Force alignment to the next boundary: */
u8 :0;
u8 received_ce_pending:4; /* 2586: 0 1 */
u8 unused2:4; /* 2586: 4 1 */
/* XXX 1 byte hole, try to pack */
__be32 pred_flags; /* 2588 4 */
u64 tcp_clock_cache; /* 2592 8 */
u64 tcp_mstamp; /* 2600 8 */
u32 rcv_nxt; /* 2608 4 */
u32 snd_nxt; /* 2612 4 */
u32 snd_una; /* 2616 4 */
u32 window_clamp; /* 2620 4 */
/* --- cacheline 41 boundary (2624 bytes) --- */
u32 srtt_us; /* 2624 4 */
u32 packets_out; /* 2628 4 */
u32 snd_up; /* 2632 4 */
u32 delivered; /* 2636 4 */
u32 delivered_ce; /* 2640 4 */
u32 received_ce; /* 2644 4 */
u32 app_limited; /* 2648 4 */
u32 rcv_wnd; /* 2652 4 */
struct tcp_options_received rx_opt; /* 2656 24 */
__u8 __cacheline_group_end__tcp_sock_write_txrx[0]; /*
2680 0 */
__u8 __cacheline_group_begin__tcp_sock_write_rx[0]
__attribute__((__aligned__(8))); /* 2680 0 */
>
> Hi Paolo,
>
> Thanks for the feedback and sorry for my late response.
> I can either mention it here or move the places.
> However, as the following patches will continue change holes, so maybe
> I mention the hole change per patch make it more understandable.
> If this is find for you, then I will make revision in the next version.
>
> >
> > [...]
> > > @@ -5095,7 +5097,7 @@ static void __init tcp_struct_check(void)
> > > /* 32bit arches with 8byte alignment on u64 fields might need
> > > padding
> > > * before tcp_clock_cache.
> > > */
> > > - CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock, tcp_sock_write_txrx,
> > > 92 + 4);
> > > + CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock,
> > > + tcp_sock_write_txrx, 97 + 7);
> >
> > Really? I *think* the change here should not move the cacheline end
> > around, due to holes. Could you please include the relevant pahole
> > (trimmed) output prior to this patch and after in the commit message?
> >
>
> Here is pahole output before and after this patch.
> Indeed, it creates 3 bytes hole after 'unused2' so it shall add (5+3)=8 to
> the original 92 + 4.
> Finally, it will be 92 + 4 + (5 + 3) = 97 + 7.
>
> *BEFORE this patch*
> __u8
> __cacheline_group_begin__tcp_sock_write_txrx[0]; /* 2585 0 */
>
> /* XXX 3 bytes hole, try to pack */
>
> __be32 pred_flags; /* 2588 4 */
> u64 tcp_clock_cache; /* 2592 8 */
> u64 tcp_mstamp; /* 2600 8 */
> u32 rcv_nxt; /* 2608 4 */
> u32 snd_nxt; /* 2612 4 */
> u32 snd_una; /* 2616 4 */
> u32 window_clamp; /* 2620 4 */
> /* --- cacheline 41 boundary (2624 bytes) --- */
> u32 srtt_us; /* 2624 4 */
> u32 packets_out; /* 2628 4 */
> u32 snd_up; /* 2632 4 */
> u32 delivered; /* 2636 4 */
> u32 delivered_ce; /* 2640 4 */
> u32 app_limited; /* 2644 4 */
> u32 rcv_wnd; /* 2648 4 */
> struct tcp_options_received rx_opt; /* 2652 24 */
> u8 nonagle:4; /* 2676: 0 1 */
> u8 rate_app_limited:1; /* 2676: 4 1 */
>
> /* XXX 3 bits hole, try to pack */
>
> __u8 __cacheline_group_end__tcp_sock_write_txrx[0];
> /* 2677 0 */
>
> /* XXX 3 bytes hole, try to pack */
>
> *AFTER this patch*
> __u8
> __cacheline_group_begin__tcp_sock_write_txrx[0]; /* 2585 0 */
>
> /* XXX 3 bytes hole, try to pack */
>
> __be32 pred_flags; /* 2588 4 */
> u64 tcp_clock_cache; /* 2592 8 */
> u64 tcp_mstamp; /* 2600 8 */
> u32 rcv_nxt; /* 2608 4 */
> u32 snd_nxt; /* 2612 4 */
> u32 snd_una; /* 2616 4 */
> u32 window_clamp; /* 2620 4 */
> /* --- cacheline 41 boundary (2624 bytes) --- */
> u32 srtt_us; /* 2624 4 */
> u32 packets_out; /* 2628 4 */
> u32 snd_up; /* 2632 4 */
> u32 delivered; /* 2636 4 */
> u32 delivered_ce; /* 2640 4 */
> u32 received_ce; /* 2644 4 */
> u8 received_ce_pending:4; /* 2648: 0 1 */
> u8 unused2:4; /* 2648: 4 1 */
>
> /* XXX 3 bytes hole, try to pack */
>
> u32 app_limited; /* 2652 4 */
> u32 rcv_wnd; /* 2656 4 */
> struct tcp_options_received rx_opt; /* 2660 24 */
> u8 nonagle:4; /* 2684: 0 1 */
> u8 rate_app_limited:1; /* 2684: 4 1 */
>
> /* XXX 3 bits hole, try to pack */
>
> __u8 __cacheline_group_end__tcp_sock_write_txrx[0];
> /* 2685 0 */
>
>
> > [...]
> > > @@ -384,17 +387,16 @@ static void tcp_data_ecn_check(struct sock *sk,
> > > const struct sk_buff *skb)
> > > if (tcp_ca_needs_ecn(sk))
> > > tcp_ca_event(sk, CA_EVENT_ECN_IS_CE);
> > >
> > > - if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
> > > + if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR) &&
> > > + tcp_ecn_mode_rfc3168(tp)) {
> > > /* Better not delay acks, sender can have a very
> > > low cwnd */
> > > tcp_enter_quickack_mode(sk, 2);
> > > tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
> > > }
> > > - tp->ecn_flags |= TCP_ECN_SEEN;
> >
> > At this point is not entirely clear to me why the removal of the above line
> > is needed/correct.
>
> What I see is to move the place to set this flag from here to
> tcp_ecn_received_counters().
>
> Also, this function will work when receiving data by calling
> tcp_event_data_recv().
>
> While tcp_ecn_received_counters() takes effect in more places (e.g., either
> len <= tcp_header_len or NOT) to ensure ACE counter tracks all segments
> including pure ACKs.
>
> > [...]
> > > @@ -4056,6 +4118,11 @@ static int tcp_ack(struct sock *sk, const
> > > struct sk_buff *skb, int flag)
> > >
> > > tcp_rack_update_reo_wnd(sk, &rs);
> > >
> > > + if (tcp_ecn_mode_accecn(tp))
> > > + ecn_count = tcp_accecn_process(sk, skb,
> > > + tp->delivered - delivered,
> > > + &flag);
> >
> > AFAICS the above could set FLAG_ECE in flags, menaning the previous
> > tcp_clean_rtx_queue() will run with such flag cleared and the later
> > function checking such flag will not. I wondering if this inconsistency
> > could cause problems?
>
> This flag set by tcp_accecn_process() will be used by following functions:
> tcp_in_ack_event(), tcp_fastretrans_alert().
>
> And this shall only impact the AccECN mode.
>
> Best regards,
> Chia-Yu
>
> >
> > /P