RE: [PATCH v5 net-next 03/15] tcp: AccECN core

Chia-Yu Chang (Nokia) Mon, 05 May 2025 08:25:05 -0700

> -----Original Message-----
> From: Paolo Abeni <pab...@redhat.com> 
> Sent: Tuesday, April 29, 2025 12:14 PM
> To: Chia-Yu Chang (Nokia) <chia-yu.ch...@nokia-bell-labs.com>; 
> ho...@kernel.org; dsah...@kernel.org; kun...@amazon.com; 
> b...@vger.kernel.org; net...@vger.kernel.org; dave.t...@gmail.com; 
> j...@mojatatu.com; k...@kernel.org; step...@networkplumber.org; 
> xiyou.wangc...@gmail.com; j...@resnulli.us; da...@davemloft.net; 
> eduma...@google.com; andrew+net...@lunn.ch; donald.hun...@gmail.com; 
> a...@fiberby.net; liuhang...@gmail.com; sh...@kernel.org; 
> linux-kselftest@vger.kernel.org; i...@kernel.org; ncardw...@google.com; Koen 
> De Schepper (Nokia) <koen.de_schep...@nokia-bell-labs.com>; g.white 
> <g.wh...@cablelabs.com>; ingemar.s.johans...@ericsson.com; 
> mirja.kuehlew...@ericsson.com; chesh...@apple.com; rs.i...@gmx.at; 
> jason_living...@comcast.com; vidhi_goel <vidhi_g...@apple.com>
> Cc: Olivier Tilmans (Nokia) <olivier.tilm...@nokia.com>
> Subject: Re: [PATCH v5 net-next 03/15] tcp: AccECN core
> 
> 
> CAUTION: This is an external email. Please be very careful when clicking 
> links or opening attachments. See the URL nok.it/ext for additional 
> information.
> 
> 
> 
> On 4/22/25 5:35 PM, chia-yu.ch...@nokia-bell-labs.com wrote:
> > @@ -298,6 +298,9 @@ struct tcp_sock {
> >       u32     snd_up;         /* Urgent pointer               */
> >       u32     delivered;      /* Total data packets delivered incl. rexmits 
> > */
> >       u32     delivered_ce;   /* Like the above but only ECE marked packets 
> > */
> > +     u32     received_ce;    /* Like the above but for rcvd CE marked pkts 
> > */
> > +     u8      received_ce_pending:4, /* Not yet transmit cnt of received_ce 
> > */
> > +             unused2:4;
> 
> AFAICS this uses a 4 bytes hole present prior to this patch after "rcv_wnd", 
> leaving a 3 bytes hole after 'unused2'. Possibly should be worth mentioning 
> the hole presence.
> 
> @Eric: would it make sense use this hole for 'noneagle'/'rate_app_limited' 
> and shrink the 'tcp_sock_write_txrx' group a bit?


Hi Paolo,

        Thanks for the feedback and sorry for my late response.
        I can either mention it here or move the places.
        However, as the following patches will continue change holes, so maybe 
I mention the hole change per patch make it more understandable.
        If this is find for you, then I will make revision in the next version.

> 
> [...]
> > @@ -5095,7 +5097,7 @@ static void __init tcp_struct_check(void)
> >       /* 32bit arches with 8byte alignment on u64 fields might need padding
> >        * before tcp_clock_cache.
> >        */
> > -     CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock, tcp_sock_write_txrx, 92 
> > + 4);
> > +     CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock, 
> > + tcp_sock_write_txrx, 97 + 7);
> 
> Really? I *think* the change here should not move the cacheline end around, 
> due to holes. Could you please include the relevant pahole
> (trimmed) output prior to this patch and after in the commit message?
> 

Here is pahole output before and after this patch.
Indeed, it creates 3 bytes hole after 'unused2' so it shall add (5+3)=8 to the 
original 92 + 4.
Finally, it will be 92 + 4 + (5 + 3) = 97 + 7.
        
*BEFORE this patch*
    __u8                       __cacheline_group_begin__tcp_sock_write_txrx[0]; 
/*  2585     0 */

    /* XXX 3 bytes hole, try to pack */

    __be32                     pred_flags;           /*  2588     4 */
    u64                        tcp_clock_cache;      /*  2592     8 */
    u64                        tcp_mstamp;           /*  2600     8 */
    u32                        rcv_nxt;              /*  2608     4 */
    u32                        snd_nxt;              /*  2612     4 */
    u32                        snd_una;              /*  2616     4 */
    u32                        window_clamp;         /*  2620     4 */
    /* --- cacheline 41 boundary (2624 bytes) --- */
    u32                        srtt_us;              /*  2624     4 */
    u32                        packets_out;          /*  2628     4 */
    u32                        snd_up;               /*  2632     4 */
    u32                        delivered;            /*  2636     4 */
    u32                        delivered_ce;         /*  2640     4 */
    u32                        app_limited;          /*  2644     4 */
    u32                        rcv_wnd;              /*  2648     4 */
    struct tcp_options_received rx_opt;              /*  2652    24 */
    u8                         nonagle:4;            /*  2676: 0  1 */
    u8                         rate_app_limited:1;   /*  2676: 4  1 */

    /* XXX 3 bits hole, try to pack */

    __u8                       __cacheline_group_end__tcp_sock_write_txrx[0]; 
/*  2677     0 */

    /* XXX 3 bytes hole, try to pack */

*AFTER this patch*
    __u8                       __cacheline_group_begin__tcp_sock_write_txrx[0]; 
/*  2585     0 */

    /* XXX 3 bytes hole, try to pack */              
    
    __be32                     pred_flags;           /*  2588     4 */
    u64                        tcp_clock_cache;      /*  2592     8 */
    u64                        tcp_mstamp;           /*  2600     8 */
    u32                        rcv_nxt;              /*  2608     4 */
    u32                        snd_nxt;              /*  2612     4 */
    u32                        snd_una;              /*  2616     4 */
    u32                        window_clamp;         /*  2620     4 */
    /* --- cacheline 41 boundary (2624 bytes) --- */
    u32                        srtt_us;              /*  2624     4 */
    u32                        packets_out;          /*  2628     4 */
    u32                        snd_up;               /*  2632     4 */
    u32                        delivered;            /*  2636     4 */
    u32                        delivered_ce;         /*  2640     4 */ 
    u32                        received_ce;          /*  2644     4 */
    u8                         received_ce_pending:4; /*  2648: 0  1 */
    u8                         unused2:4;            /*  2648: 4  1 */

    /* XXX 3 bytes hole, try to pack */

    u32                        app_limited;          /*  2652     4 */
    u32                        rcv_wnd;              /*  2656     4 */
    struct tcp_options_received rx_opt;              /*  2660    24 */
    u8                         nonagle:4;            /*  2684: 0  1 */
    u8                         rate_app_limited:1;   /*  2684: 4  1 */

    /* XXX 3 bits hole, try to pack */

    __u8                       __cacheline_group_end__tcp_sock_write_txrx[0]; 
/*  2685     0 */


> [...]
> > @@ -384,17 +387,16 @@ static void tcp_data_ecn_check(struct sock *sk, const 
> > struct sk_buff *skb)
> >               if (tcp_ca_needs_ecn(sk))
> >                       tcp_ca_event(sk, CA_EVENT_ECN_IS_CE);
> >
> > -             if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
> > +             if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR) &&
> > +                 tcp_ecn_mode_rfc3168(tp)) {
> >                       /* Better not delay acks, sender can have a very low 
> > cwnd */
> >                       tcp_enter_quickack_mode(sk, 2);
> >                       tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
> >               }
> > -             tp->ecn_flags |= TCP_ECN_SEEN;
> 
> At this point is not entirely clear to me why the removal of the above line 
> is needed/correct.

What I see is to move the place to set this flag from here to 
tcp_ecn_received_counters().

Also, this function will work when receiving data by calling 
tcp_event_data_recv(). 

While tcp_ecn_received_counters() takes effect in more places (e.g., either len 
<= tcp_header_len or NOT) to ensure ACE counter tracks all segments including 
pure ACKs.

> [...]
> > @@ -4056,6 +4118,11 @@ static int tcp_ack(struct sock *sk, const 
> > struct sk_buff *skb, int flag)
> >
> >       tcp_rack_update_reo_wnd(sk, &rs);
> >
> > +     if (tcp_ecn_mode_accecn(tp))
> > +             ecn_count = tcp_accecn_process(sk, skb,
> > +                                            tp->delivered - delivered,
> > +                                            &flag);
> 
> AFAICS the above could set FLAG_ECE in flags, menaning the previous
> tcp_clean_rtx_queue() will run with such flag cleared and the later function 
> checking such flag will not. I wondering if this inconsistency could cause 
> problems?

This flag set by tcp_accecn_process() will be used by following functions: 
tcp_in_ack_event(), tcp_fastretrans_alert().

And this shall only impact the AccECN mode.

Best regards,
Chia-Yu

> 
> /P

RE: [PATCH v5 net-next 03/15] tcp: AccECN core

Reply via email to