On 17-10-2025 23:36, Eric Dumazet wrote:
On Fri, Oct 17, 2025 at 10:41 AM Aditya Garg
<[email protected]> wrote:

On 08-10-2025 20:58, Aditya Garg wrote:
On 08-10-2025 20:51, Eric Dumazet wrote:
On Wed, Oct 8, 2025 at 8:16 AM Aditya Garg
<[email protected]> wrote:

On 03-10-2025 21:45, Eric Dumazet wrote:
On Fri, Oct 3, 2025 at 8:47 AM Aditya Garg
<[email protected]> wrote:

The MANA hardware supports a maximum of 30 scatter-gather entries
(SGEs)
per TX WQE. In rare configurations where MAX_SKB_FRAGS + 2 exceeds
this
limit, the driver drops the skb. Add a check in mana_start_xmit() to
detect such cases and linearize the SKB before transmission.

Return NETDEV_TX_BUSY only for -ENOSPC from
mana_gd_post_work_request(),
send other errors to free_sgl_ptr to free resources and record the tx
drop.

Signed-off-by: Aditya Garg <[email protected]>
Reviewed-by: Dipayaan Roy <[email protected]>
---
    drivers/net/ethernet/microsoft/mana/mana_en.c | 26 +++++++++++++
++----
    include/net/mana/gdma.h                       |  8 +++++-
    include/net/mana/mana.h                       |  1 +
    3 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/
drivers/net/ethernet/microsoft/mana/mana_en.c
index f4fc86f20213..22605753ca84 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -20,6 +20,7 @@

    #include <net/mana/mana.h>
    #include <net/mana/mana_auxiliary.h>
+#include <linux/skbuff.h>

    static DEFINE_IDA(mana_adev_ida);

@@ -289,6 +290,19 @@ netdev_tx_t mana_start_xmit(struct sk_buff
*skb, struct net_device *ndev)
           cq = &apc->tx_qp[txq_idx].tx_cq;
           tx_stats = &txq->stats;

+       BUILD_BUG_ON(MAX_TX_WQE_SGL_ENTRIES !=
MANA_MAX_TX_WQE_SGL_ENTRIES);
+       #if (MAX_SKB_FRAGS + 2 > MANA_MAX_TX_WQE_SGL_ENTRIES)
+               if (skb_shinfo(skb)->nr_frags + 2 >
MANA_MAX_TX_WQE_SGL_ENTRIES) {
+                       netdev_info_once(ndev,
+                                        "nr_frags %d exceeds max
supported sge limit. Attempting skb_linearize\n",
+                                        skb_shinfo(skb)->nr_frags);
+                       if (skb_linearize(skb)) {

This will fail in many cases.

This sort of check is better done in ndo_features_check()

Most probably this would occur for GSO packets, so can ask a software
segmentation
to avoid this big and risky kmalloc() by all means.

Look at idpf_features_check()  which has something similar.

Hi Eric,
Thank you for your review. I understand your concerns regarding the use
of skb_linearize() in the xmit path, as it can fail under memory
pressure and introduces additional overhead in the transmit path. Based
on your input, I will work on a v2 that will move the SGE limit check to
the ndo_features_check() path and for GSO skbs exceding the hw limit
will disable the NETIF_F_GSO_MASK to enforce software segmentation in
kernel before the call to xmit.
Also for non GSO skb exceeding the SGE hw limit should we go for using
skb_linearize only then or would you suggest some other approach here?

I think that for non GSO, the linearization attempt is fine.

Note that this is extremely unlikely for non malicious users,
and MTU being usually small (9K or less),
the allocation will be much smaller than a GSO packet.

Okay. Will send a v2
Hi Eric,
I tested the code by disabling GSO in ndo_features_check when the number
of SGEs exceeds the hardware limit, using iperf for a single TCP
connection with zerocopy enabled. I noticed a significant difference in
throughput compared to when we linearize the skbs.
For reference, the throughput is 35.6 Gbits/sec when using
skb_linearize, but drops to 6.75 Gbits/sec when disabling GSO per skb.

You must be doing something very wrong.

Difference between TSO and non TSO should not be that high.

ethtool -K eth0 tso on
netperf -H tjbp27
MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to
tjbp27.prod.google.com () port 0 AF_INET6
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

540000 262144 262144    10.00    92766.69


ethtool -K eth0 tso off
netperf -H tjbp27
MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to
tjbp27.prod.google.com () port 0 AF_INET6
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

540000 262144 262144    10.00    52218.97

Now if I force linearization, you can definitely see the very high
cost of the copies !

ethtool -K eth1 sg off
tjbp26:/home/edumazet# ./netperf -H tjbp27
MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to
tjbp27.prod.google.com () port 0 AF_INET6
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

540000 262144 262144    10.00    16951.32


Hence, We propose to  linearizing skbs until the first failure occurs.

Hmm... basically hiding a bug then ?

After that, we switch to a fail-safe mode by disabling GSO for SKBs with
   sge > hw limit using the ndo_feature_check implementation, while
continuing to apply  skb_linearize() for non-GSO packets that exceed the
hardware limit. This ensures we remain on the optimal performance path
initially, and only transition to the fail-safe path after encountering
a failure.

Please post your patch (adding the check in ndo_features_check()),
perhaps one of us is able to help.

Okay Eric, I'll Post a v2 with RFC. Please let me know.

Regards,
Aditya

Reply via email to